read time = 5 minutes
Summary
This is the third in a series of posts exploring fundamental malware analysis techniques. Please check out Part 1 and Part 2 for some additional background.
The following techniques are presented as an alternative to automated sandboxing, which are effective and powerful tools. However, as we showed in Part 1, they may fail to capture all indicators in certain configurations with particular malware variants. In part 1 and part 2, we compared and contrasted analysis results from automated sandboxes and a variety of other tools such as FakeNet, CMD Watcher/CyberChef, and oletools/ViperMonkey.
The particular method covered here in Part 3 is laborious and should not be used in an active incident response (IR) engagement. It is presented purely as a learning exercise for practitioners just starting out with malware analysis. Ultimately, analysts will use automated tools such as those covered in parts 1 and 2 in order to save time. That being said, it is critical to have a solid grasp of the fundamentals in order to understand the underlying processes used in automated analysis.
Here, we will continue to focus on the command and control (C&C) links for stage 1 downloads in documents that have been weaponized with encoded VBA macros. We will use the same sample examined in the previous entries. At the time of this writing the encoding scheme of this malware variant remains the same, so it is still timely and relevant.
Emotet is used as the sample. SHA 256: 6f0bf1f1302c4d3bab6b0a34c4374e84c78581bd2bee054a322908d897416cd3
Static Analysis Techniques with text Editor
In this specific example, we are examining Emotet, but the general process would be the same for any malware family that leverages VBA macros in a Word document. The Emotet document loader contains 5 hardcoded URLs in the macro. Our goal will be to extract all 5 of these indicators.
Before beginning, a cautious analyst might disconnect their VM’s NIC or run an application such as CMD Watcher to stop any malicious processes from running incase the macro is accidentally executed to infect the system. If it is necessary to enable the macros, hold the SHIFT key to disable the autoopen() function. At the very least, make sure there is a clean snapshot that can be restored.
To begin, open the malicious document in Microsoft Word:
The developer tab in recent versions of Word is disabled by default. To get it to show up in the ribbon, you may to need to enable this setting before proceeding. Select the File tab > Options > Customize Ribbon > Developer
Search through the VBA project for a function named “Sub autoopen()”. This instructs the code to run on document open if macros are enabled, or when a user enables them when prompted.
Depending on the encoding scheme, you may need to search multiple modules. This step will often include some trial and error as most schemes are different and the analyst will likely encounter obfuscation of some kind. There could be hundreds of lines of garbage code and functions that do not perform any actual actions.
At this point some analysts prefer to debug the code inside the VBA editor, which is a viable option. The editor has several nice such features such as setting breakpoints and stepping through the code that can be very helpful (This particular sample really does not require this, so it is outside of our current scope. I will cover these techniques in a future post). If you choose to debug the code inside the project, you will need to “enable content”, just make sure to hold down the SHIFT key to disable the autopen() function to avoid infecting the system.
Your milage may vary, but once the correct function is identified, it is time to copy this into your text editor of choice. I prefer Notepad++ mainly for familiarity, but it also has several useful features for search/replace, decoding/encoding, and more.
Here we have the VBA code nicely copied into our editor. With Notepad++, you can highlight matching strings in your code, and this is the initial step to start removing junk code. Any function or variable that does not refer to or call anything else can usually be removed for clarity.
Looking at the VBA code above, it appears needlessly complex and confusing — and that is the point. The application of anti-analysis techniques in the obfuscated code is purely intended to make static analysis more difficult and frustrating. To begin getting a handle on what the code is actually doing, we must start by removing the no-operations (nop). As we can see, the code is filled with uninitialized and unused variables that simply set a string to the value of another string. If these aren’t referenced again, they can be removed. There are also several data conversion type functions that look complicated, but do nothing — these can also be removed. Ultimately we are left with very little functional code to analyze:
So here we have peeled away most of the obfuscation and are zeroing in on the functional portion of the macro. The problem here is not only are there some remaining strings that appear to be obfuscated, the code itself seems way too short to store a malicious script. Confusion sets in at this point. Our best bet is to jump back into the VBA project and see if anything was overlooked.
Back in the VBA project, we find a form with the name of the curious UkB44A_ string that was popping up in our macro code. Digging a little deeper, we can see that there are actually five hidden text boxes in the form named UkB44A_ and they are all stacked on top of each other.
This is an interesting technique by stashing the script in hidden text boxes. This is not completely out of character for this malware variant, as up until recently, they were hiding the scripts in very tiny text boxes in the upper left corner of the document itself. This technique here is essentially the same idea, just a different execution. So in this case, the VBA code is not holding the script, but simply serves as a process to call and execute the script from the hidden text boxes. The other unidentified strings in the VBA code were the names of the individual text boxes. Here is what is looks like all put together.
At this point, the macro is de-obfuscated, but there remains the base64 encoded PowerShell script to deal with (i.e., we’re just getting started).
This is typically the starting point if an analyst has used olevba.py or oledump.py or another tool to extract the script. There is likely several more layers of obfuscation, but Notepad++ is still well equipped to help us finish the job.
Notepad++ has a built-in Base64 decoder. To begin, select the base64 string and right click. This will pull up the context menu. Navigate to Pugin commands > Base64 decode.
When decoded in this case, the output includes a strange NUL character following all of the others characters. I’m not sure if this is some sort of mistake or intentional anti-analysis. Either way, we can clean this up with regex and/or a simple find and replace for \x00.
In this scenario, we are actually pretty lucky. Very often with this malware variant and others like it, there are many, many more layers of obfuscation techniques. We will frequently see character substitutions, multitudes of environment variables, arrays, string reversals, inflate/deflate, DOSfuscation, for loops, and more. For whatever reason, the obfuscation is relatively simple in this sample. Here is the cleaned up code after the removal of the NUL characters.
To make this easier to see, I have split the line at each semicolon. Things are looking pretty good after decoding the Base64 and removing those weird NULs. The only other layer of obfuscation remaining is string concatenation with ‘+’. This is a simple find and replace operation:
Replacing the ‘+’ characters will be the final operation needed in this example. To get a better view of the URL strings, we can split the code again at the ‘@’ character to get a cleaner view.
Conclusion
This is the final entry in a tutorial series presenting a wide variety of techniques to extract indicators of compromise (IOCs) from malicious document macros. The series has examined multiple use-cases and the most appropriate tools for a given situation. In this entry, we looked at the manual extraction of IOCs, a labor intensive process which should typically be reserved for training or educational research scenarios. The built-in VBA editor is a useful tool for many tasks, but we opted for using Notepad++ for decoding obfuscation layers and the find and replace operations. The static analysis techniques presented here were fruitful not only in an educational sense as they also revealed an interesting TTP for this malware variant’s script execution from five hidden text boxes.