This blog post is the result of a research project on Managed vs Unmanaged execution inside Windows processes. The main difference between unmanaged and managed code is that the former runs machine code directly in the processor, while the latter runs at a higher level and needs a runtime translation from an intermediate representation to machine code.
The purpose of the research was to find ways to run assembly code (i.e. shellcode) and direct system calls in C# programs for offensive purposes.
The research started from the analysis of the DInvoke project, which is a great source for offensive C# development. The project includes helpers to call unmanaged code and allows to dynamically call Windows APIs at runtime. Moreover, it implements the function GetSyscallStub to retrieve system call stubs by reading NTDLL from disk.
"GetSyscallStub: Maps a fresh copy of ntdll.dll and copies the bytes of a syscall wrapper from the fresh copy. This can be used to directly execute syscalls" (Reference)
This approach, however, might trigger detection since the action of reading NTDLL from disk is considered suspicious by many security products.
This blog post will analyse a different way of running shellcode and direct system calls in C# implementing stealth techniques that may help in decreasing the likelihood of detection. In addition, we are releasing two projects (SharpASM and SharpWhispers) that are going to make use of the techniques described in this blog post.
The following section is meant to describe some basic concepts of .NET internals. If you are familiar with the .NET framework and JIT execution principles, feel free to skip this section.
".NET provides a run-time environment, called the common language runtime, that runs the code and provides services that make the development process easier. Compilers and tools expose the common language runtime's functionality and enable you to write code that benefits from this managed execution environment. Code that you develop with a language compiler that targets the runtime is called managed code. Managed code benefits from features such as cross-language integration, cross-language exception handling, enhanced security, versioning and deployment support, a simplified model for component interaction, and debugging and profiling services." (Reference)
A .NET program is compiled "two times":
- from high level code (e.g. C#, VB, etc.) to intermediate representation (Microsoft Intermediate Language - MSIL)
- from MSIL to machine code
The actual execution of .NET programs, regardless of the language, is carried out by the Common Language Runtime (CLR). This component implements a Just In Time (JIT) compiler that converts into machine code the intermediate representation expressed in MSIL.
(Image borrowed here)
"JIT compilation takes into account the possibility that some code might never be called during execution. Instead of using time and memory to convert all the MSIL in a PE file to native code, it converts the MSIL as needed during execution and stores the resulting native code in memory so that it is accessible for subsequent calls in the context of that process. The loader creates and attaches a stub to each method in a type when the type is loaded and initialized. When a method is called for the first time, the stub passes control to the JIT compiler, which converts the MSIL for that method into native code and modifies the stub to point directly to the generated native code. Therefore, subsequent calls to the JIT-compiled method go directly to the native code." (source)
The implementation of the JIT compilation takes advantages of RWX memory areas that will store the machine instructions to execute. The JIT compiler generates x86 instructions and stores them into the JIT Code Heap (Reference).
Let's feed x64dbg with a .NET program to analyse what happens in the process address space.
First of all, we see that there are some memory areas marked as
Private with RWX protection.
Since the program did not allocate those pages, we can safely guess that the memory has been allocated by a Windows library. Let's set a memory breakpoint on write accesses on these pages.
We can also analyse the Call Stack to understand the execution path that led to the write operation.
Indeed, mscorwks.dll writes there and, if we add a memory breakpoint to catch execution memory access, we can see that the instructions written in those pages are executed.
Analysing the Call Stack, we can see that the execution of this memory area originated from Windows DLLs.
Please note that this is just a rough analysis needed to understand the intuition: the CLR allocates RWX memory to host dynamically generated instructions at runtime and those pages often contain free space that may be used as a code cave. This observation will be the foundation of the next sections. We will not dig into the details of CLR internals since it is out of the scope for this blog post.
The idea behind SharpASM is to exploit the unused space in JIT Code Heap to temporarily store the instructions we want to execute. Since the memory is marked as RWX, we can use the identified code cave to store and execute our code.
Since the CLR is a JIT compiler, it is designed to dynamically execute code at runtime (hence the "Just In Time" phrasing). This means that the CLR can be abused to dynamically execute code at runtime, which is exactly what a piece of malware would like to do.
In particular, we can find a memory page allocated by the CLR marked as RWX, search for an empty area (i.e. zero'd portion of the page) and write our stub there. Once the stub is stored in the RWX region, we just need to call the pointer to that memory area by exploiting .NET Delegates execution, switching to unmanaged code execution exactly like DInvoke does.
"A delegate is a type that represents references to methods with a particular parameter list and return type. When you instantiate a delegate, you can associate its instance with any method with a compatible signature and return type. You can invoke (or call) the method through the delegate instance." (Reference)
This approach has the advantage of not using any memory allocation or memory protection functions, allowing the attacker to stay under the radar since those operations do not interact with the operating system. Additionally, all the detection mechanisms based on Image loading / memory allocation monitoring are not effective since we are actually using memory allocated by the CLR that is supposed to host "mutable" executable code. Moreover, the instructions are deleted right after they are executed so there won't be any leftover of the executed code after its execution, making the detection window smaller.
It should be noted, however, that this approach is not 100% safe in terms of concurrent code execution because it involves the usage of memory areas managed by the CLR. This means that there is a time window in which the CLR might overwrite the bytes containing the instructions we wrote (in the worst case, the page may potentially be freed) and this will throw an exception when we try to execute our code by calling the pointer.
The following screenshots show a 32 bit process that crashes because the stub gets overwritten. This example was made by searching the code cave (i.e. sequence of 0s) starting from the top of the page.
One easy solution to the problem just described would be to suspend all the threads except the thread executing our code. However, this solution is not easy to implement without relying on Windows API (e.g. PInvoke / DInvoke) since access to unmanaged threads is not supported by the .NET Framework and thus it's not possible to directly suspend another thread in native C# (Reference).
The solution adopted in SharpASM consists of writing at the bottom of the page (where it is less likely that the CLR will overwrite that portion), starting to search backwards from the end of the page looking for an unused memory area (i.e. memory area filled with 0s).
Moreover, to guarantee that even complex ASM instructions will be executed without issues (e.g. in 64 bit systems, SSE2 instructions must be aligned to 128 bit - Reference), the research process assures that the returned pointer will be correctly aligned according to the architecture (i.e. 32bit > 64bit alignment, 64bit > 128bit alignment).
Despite this approach working in practice, it is not a perfect solution in terms of reliability since the likelihood that the identified code cave gets overwritten by the CLR is low (but not zero). To render the process more reliable, if an exception is thrown, the code catches the exception and the whole algorithm is executed again until the code is successfully executed. To avoid infinite recursion issues caused by excessive failed attempts, a "fail" counter has been added to count how many times an exception is thrown due to failed code execution; in case the number of failed attempts is greater than a threshold, the program will give up and exit.
Finally, it should be noted that there is the chance of a race condition in which we overwrite CLR code that has been written between the time we find the code cave and the time we actually write in it. However, in practice this is a very rare event and has never happened during our tests.
To summarise, the following are the high-level steps of the process:
- Enumerate the process address space using
- Search for an allocated memory area marked as MEM_COMMIT and RWX
- Start from the bottom of the identified area and search for a sequence of 0 bytes large enough to store the ASM stub
- Fix pointer alignment
- Write the ASM stub by dereferencing the pointer (a-la
- Execute the ASM stub using Delegates
- If the execution fails (i.e. exception thrown), try again from scratch
- Delete the ASM stub by zeroing back the area
It should be noted that
VirtualQueryEx is called using PInvoke allowing SharpASM code to be minimal and "standalone" (i.e. it does not depend on DInvoke) while not losing OPSEC since importing
VirtualQueryEx API is not considered suspicious.
The following screenshot shows the RWX memory area hosting the system call stub needed to call
NtAllocateVirtualMemory. The stub is written in an area characterised by the following properties:
- it is located at the bottom of the page
- it is the first area filled with 0s large enough to store the stub
- it is aligned to 128 bits
The library usage is very easy:
static byte shellcode
0x90, 0x90, 0x90, 0x90
to quickly generate the shellcode from ASM code we can use shell-storm's assembler-disassembler as shown below:
or we can just compile the ASM code using a compiler locally and extract the bytes :)
Once we had identified a way to execute arbitrary ASM code from C#, it was time to apply those concepts to something actually useful.
Executing direct system calls in C# can be achieved by self-injecting shellcode (i.e. the system call stub) into the currently running program. To do so, we need to call a number of Windows API functions (e.g.
kernel32!VirtualProtect, etc.) that may trigger detection, in particular if the libraries are hooked by AV/EDRs.
Using SharpASM we can execute shellcode (i.e. ASM stubs) stealthily obtaining the capability to interact with the OS services (i.e. system calls) without actually touching "risky" Windows library functions. Consequently, we can exploit the full power of direct system calls (namely interacting directly with the OS without passing through Windows wrappers) and the only interaction with the OS will be through
VirtualQueryEx API, needed to enumerate the process address space.
SharpWhispers aims to ease direct system call usage, porting SysWhispers2's capabilities to C#.
Similarly to Syswhispers2, to retrieve system call numbers we use ElephantSe4l's technique that can be summarised in the following steps:
- Find all the
Zw* functions exported by NTDLL
- Order the functions by address in ascending order
- The index (i.e. position) of a function in the list is its corresponding system call number
This technique has two main advantages:
- It doesn't require to read NTDLL from disk, thus increasing OPSEC since this operation is often considered suspicious by security products.
- It works even if NTDLL is hooked in the process address space, thus always obtaining correct results even in monitored environments.
The first problem to solve is: how to get the base address of NTDLL so that we can enumerate its export table?
DInvoke provides a function called
GetPebLdrModuleEntry that uses a call to
NtQueryInformationProcessBasicInformation. "GetPebLdrModuleEntry: Searches for the base address of a currently loaded module by searching for a reference to it in the PEB." (Reference)
However, since we have the capability to execute arbitrary ASM stubs, why not get the PEB using the unmanaged way?
In SharpWhispers we have implemented PEB retrieval using ASM without calling NtQueryInformationProcessBasicInformation by running the following shellcode:
static byte bReadgsqword =
0x65, 0x48, 0x8B, 0x04, 0x25, 0x60, // mov rax, qword ptr gs:[0x60]
0x00, 0x00, 0x00,
0xc3 // ret
static byte bReadfsdword =
0x64, 0xA1, 0x30, 0x00, 0x00, 0x00, // mov eax,dword ptr fs:
0xC3 // ret
To retrieve the PEB on a 64-bit system, we can just execute
IntPtr peb = SharpASM.callASM(bReadgsqword);
On the other hand, to retrieve the PEB on a 32-bit system, we can execute
IntPtr peb = SharpASM.callASM(bReadfsdword);
Once the base address of NTDLL has been obtained, we have to create the list of
Zw* functions and order them.
To do so, we have to walk the export table of NTDLL and add all the functions starting with
Zw* to a list. To implement this function we borrowed some code from DInvoke's
GetExportAddress because it contains the logic we need to parse the export table.
Zw* functions are arranged by address in ascending order, we can get the system call number by checking the index of the corresponding function. At this point, we can use the same technique used in SharpASM to write and execute the system call stub.
Following the stub used in 64-bit processes. This is the standard x86-64 stub we are used to seeing, and the 5th byte contains the system call number.
static byte bSyscallStub =
0x4C, 0x8B, 0xD1, // mov r10, rcx
0xB8, 0x18, 0x00, 0x00, 0x00, // mov eax, 0x18 (NtAllocateVirtualMemory Syscall)
0x0F, 0x05, // syscall
0xC3 // ret
With regards to x86 (32 bit) running under WOW64, we have a slightly more complex stub. The parameters should be passed in the stack and we have to call the function
KiFastSystemCall instead of executing the
syscall instruction. The system call number is written at the 27th byte.
The address of
KiFastSystemCall is taken from the TIB and we use the "ret-pop" technique to get the address in which we are executing (i.e.
EIP) so as to push the correct return value before calling
static byte bSyscallStub =
0x55, // push ebp
0x8B, 0xEC, // mov ebp,esp
0xB9, 0xAB, 0x00, 0x00, 0x00, // mov ecx,AB ; number of parameters
0x49, // dec ecx
0xFF, 0x74, 0x8D, 0x08, // push dword ptr ss:[ebp+ecx*4+8] ; parameter
0x75, 0xF9, // jne <x86syscallasm.push_argument>
// ; push ret_address_epilog
0xE8, 0x00, 0x00, 0x00, 0x00, // call <x86syscallasm.get_eip> ; get eip with ret-pop
0x58, // pop eax
0x83, 0xC0, 0x15, // add eax,15 ; Push return address
0x50, // push eax
0xB8, 0xCD, 0x00, 0x00, 0x00, // mov eax,CD ; Syscall number
// ; Get Address from TIB
0x64, 0xFF, 0x15, 0xC0, 0x00, 0x00, 0x00, // call dword ptr fs:[C0] ; call KiFastSystemCall
0x8D, 0x64, 0x24, 0x04, // lea esp,dword ptr ss:[esp+4]
0x8B, 0xE5, // mov esp,ebp
0x5D, // pop ebp
0xC3 // ret
After the system call has executed, the memory area hosting the stub is cleaned up by zeroing back the memory. This is done to remove the artifacts of our execution from memory (namely, the system call stub instructions) increasing OPSEC. During our experiments, in some cases, an Access Violation exception was thrown because the stub got deleted before ending its execution. The root cause of the issue is not easy to understand because the bug disappears when running the program in a debugger. Understanding why this happens remains an open question and I would be happy to discuss it if you have any ideas!
The issue has been handled by picking different code caves in subsequent execution of the stubs (i.e. we never pick the same memory area twice in a row). Moreover, in case an exception is thrown, it’s enough to catch the exception and call the function again.
The project is designed to be 100% compatible with DInvoke. This means that it is possible to use SharpWhispers in projects using DInvoke with no additional effort.
The idea is that SharpWhispers should be a tool used in addition to DInvoke to implement stealth offensive tools that make use of both direct system calls and dynamic API calls. This gives developers the flexibility to implement complex logic while using modern evasion techniques in an easy way.
SharpWhispers is supposed to be as minimal as possible: the Python script generates C# files containing only the needed type definitions and the generated code can be included inside a project as-is.
A number of data types are taken from the DInvoke project and we decided to use different namespaces to avoid name collisions.
Moreover, all the helpers for Nt functions implemented in DInvoke have been included in SharpWhispers with the same signature. This means that in SharpWhispers it is possible to call those functions using the very same code used for calling DInvoke's
Static Analysis OPSEC consideration
SharpWhisper's helpers do not use the function name to identify the system call but use a hash instead. The hash is computed by the Python script when the code is generated (following the same idea from SysWhispers2).
This is done to hide the strings including the system call names from the binary, even though both the helper function and the Delegate names leak the system call name in the binary.
This may be mitigated in a second stage (e.g. obfuscating the code after using the library in a user-friendly way), though analysing this aspect is not part of this article.
SharpASM and SharpWhispers can be downloaded from SECFORCE's Github.
The python script will generate the C# source code needed and, after importing the generated files in a .NET Framework Visual Studio project, it is required to import the
SharpWhispers.Data namespaces, for example by doing the following:
using Data = DInvoke.Data;
using Syscall = Syscalls.Syscalls;
The system calls can be used using the
For example, to call
NtAllocateVirtualMemory if the namespace has been imported like above, the following code can be used:
var retValue = (Data.Native.NTSTATUS)Syscall.NtAllocateVirtualMemory(
(IntPtr)(-1), ref BaseAddress, IntPtr.Zero, ref AllocationSize,
Data.Win32.Kernel32.MEM_COMMIT | Data.Win32.Kernel32.MEM_RESERVE,
In this blog post we analysed a different way of executing shellcode and direct system calls in C# programs. The objective of the research was to find stealthier ways of running shellcode and issuing direct system calls in C# offensive programs to increase OPSEC in Red Team operations. This blog post analysed a way to abuse CLR internal behaviour to store and execute arbitrary unmanaged code and described how to apply those concepts for direct system call execution.
In particular, we introduced a different way of storing code in a .NET process address space, by finding RWX code caves allocated by the CLR. This allows to execute unmanaged code without allocating memory, storing it in memory areas managed by Windows components. Temporarily storing malicious code into the CLR's memory areas increases OPSEC by using legitimate memory areas to store dynamic code while also minimising the interaction with the OS.
With regards to the actual unmanaged execution, the well-known Interop interface is used, following the same idea implemented in DInvoke.
Moreover, SharpWhispers brings to C# Elep4ntS34l's technique to dynamically retrieve system call numbers at runtime by porting SysWhispers2 code. This allows to identify system call numbers without reading NTDLL from disk, increasing OPSEC once more.
The two released projects are PoCs of the techniques described in this article and contain a number of "hacks" to handle possible data corruption caused by race conditions. Future research work may improve this implementation by better handling race condition cases and by potentially finding a better way of enumerating the process address space.
The following list shows a summary of the concepts exposed in this blog post.
- Execute shellcode in C# projects
- No Allocation: re-usage of RWX CLR memory as temporary code cave
- Clean-up: the shellcode is deleted after execution
- Execute direct system calls in C# projects using ElephantSe4l's technique
- Retrieves NTDLL base address without calling windows APIs by reading the address of PEB from the processor using SharpASM.
- Uses SharpASM's technique to store and run the system call stubs into CLR code caves
- The code is minimal and 100% compatible with DInvoke
- Need obfuscation before delivery (this applies to any C# project though, not just to SharpWhispers)
Finally, we left an open question regarding the race condition between the system call execution and the stub deletion that sometimes happens. Please reach out on Twitter (@d_glenx) if you have any idea about that :)