This blog post is the first part of a series focused on malware detection evasion techniques on Windows. In particular, we look at userland API hooking techniques employed by various security products and ways to identify and bypass them. The research was made by Giorgio Bernardinetti (gbyolo) from CNIT and myself Dimitri Di Cristofaro (GlenX).
User-space API hooking is a well-known technique used by AntiVirus (AV) and Endpoint Detection and Response (EDR) products to monitor the execution of a process at run-time in order to detect malicious patterns.
A number of system functions are hijacked by the security product by overwriting the first assembly instructions of the function with a jump (JMP) instruction which redirects execution flow to a piece of code controlled by the security software before returning to the original API code. Which exact functions are hooked depends on the security product in use, however, functions that are commonly used for malicious purposes are often hooked.
API hooking is a very effective detection technique as it allows to monitor real-time events that could trigger the detection of malicious software after it has been executed. For example, AVG Internet Security was able to detect a Cobalt Strike’s raw stageless beacon shellcode packed with PEzoNG (a packer we developed, not yet public at the time of writing) without us having enabled the unhooking feature of our packer.
The malware was not detected when the file was placed on disk (static analysis), nor when the beacon was loaded in memory (dynamic analysis) nor when it connected to the Command-and-Control server but rather when a certain command was executed on the system through the beacon. The reason behind this is that once the beacon was run, the packer couldn’t protect it anymore because the AV software employed run-time detection techniques, namely by hooking Windows APIs.
From an attacker’s perspective, one way to bypass these security products is to attempt to remove the hooking. There are many documented techniques to remove user-space hooking (e.g. here , here, here, here) but all of them require either reading the original library (DLL) from disk or reading its contents from a remote process’ memory space before the library is already hooked by the security product.
Detection of those techniques is usually implemented with the support of Windows kernel, by using minifilter drivers. Windows allows anti-malware software to register callbacks for a number of system events including file operations and process creation. This means that the AV will be notified when such events happen in the system and could trigger a deeper analysis that would potentially lead to detection. For example, reading the contents of the NTDLL.dll file, which should only be loaded during process creation, can be considered suspicious and could lead to detection.
In this post we present Whisper2Shout, a different unhooking technique which does not require knowledge of the original (unhooked) library.
As a proof of concept we have packed a Cobalt Strike beacon shellcode (and several other payloads detected as malicious) using the Whisper2Shout technique. We have successfully bypassed several security products which rely on API hooking. This technique is fully implemented in PEzoNG, but it can be deployed in custom standalone executables.
Our technique uses a number of observations to restore the prologue of hooked functions with the original bytes without the need of reading the contents of the original library.
When we started working on this technique, the original idea was to divide the hooked functions into two subsets - system calls and higher-level APIs.
In the system calls case, we could exploit the same property used in Syswhispers2:
- The system call number could be obtained by ordering by address the Zw* functions in NTDLL.dll, even if the user-space system call stub has been hooked.
- Once the correct system call number is obtained, if a system call stub is hooked, restoring the original bytes is trivial as the stub used to call a system call is well-known; thus, it is possible to unhook any system call stub hooked in NTDLL by overwriting the “hooking” instructions with the system call stub using the right system call number.
The following shows what a system call stub looks like. We omitted some instructions between the number of the system call and the syscall instruction as they are not important for the purpose of the example.
The following figure shows the system call stub of the NtClose call which, in this case, is hooked by BitDefender Total Security.
This figure shows the reconstructed stub after the unhooking process.
The API case is less trivial because there are a number of different techniques that could be used by AV/EDRs to hook a Windows API.
With the help of previous research on the topic as well as our own analysis of various security software products we have compiled a list of techniques that are used for userland API hooking and have developed a universal unhooking technique which works against all of the analysed products.
To hook a function it is necessary to divert the execution by jumping somewhere else without altering the content of the registers involved in the function call. To do this, a hook has to overwrite the first few bytes (first instructions) with instructions allowing to jump to a controlled area. Moreover, the CPU context should be preserved in order not to alter the correct behaviour of the original function. For these reasons, the security products overwrite as few bytes as possible when hooking a function, and there are only limited ways of doing so in assembly.
Jumping to the trampoline stub (which is allocated and written by the AV dll at runtime) could be done in the following ways:
- Using a short jump (opcode 0xe9)
- Using a mov and a jmp to jump to an absolute address
It should be noted that those are not the only possible ways to hook a function, however, during our research we found that in practice only these two techniques are used.
After jumping to the AV controlled area, there must be a way to jump back to the original function. Since now execution is in the AV controlled area, there is not any restriction on the number of instructions to use in order to restore the execution flow.
We identified that the techniques used to jump back to the original function were the following:
- Jump back to the original function with a jmp instruction (e.g. implemented by Detours hooking library)
- Double-Push technique (Nikolay Igotti) (e.g. implemented by BitDefender Total Security)
During our research we identified a common pattern with regards to the memory allocated to storing pointers and trampolines needed for hooking. We found that the memory type of all the regions containing useful information regarding the hooks was marked as Private (MEMORY_BASIC_INFORMATION.Type == MEM_PRIVATE).
This observation is the fundamental block of our universal unhooking technique because that private memory region will contain all the information necessary for the unhooking process.
As an example, the next figures show the blocks used by AVG Internet Security to hook the function NTDLL.LdrLoadDll.
- Trampoline for jumping to the AV dll (located in the Private memory area)
- Stub for coming back to the original function (located in the Private memory area as well)
So, when a function is hooked, the pointer to the symbol in the Export Directory of the dll points to a jump instruction or to a set of well-known instructions that divert the execution (e.g. a mov followed by a jmp).
The target address of the jump is located inside a Private memory region as illustrated in the following images.
This (private) memory region contains trampolines to the hooking dll (which will be used to hijack the execution of the function towards the anti-malware software) as well as trampolines to the hooked (original) dll (which will be used if the call has been identified as legitimate by the anti-malware software and thus the execution should continue as normal).
Even when there are multiple Private memory regions, both trampolines will reside in the same memory area. This means that by using the destination address of the jump located at the symbol address, we can call VirtualQuery (or the corresponding system call NtQueryVirtualMemory) to get the memory region where the prologue of the hooked function is stored.
Once this memory region is identified, it is necessary to parse it, searching for the trampolines used to jump back to the original function. Each of those trampolines will contain the original prologue of a hooked function as well as a pointer to an address near the position of the hooked function - a few bytes after its first instruction.
Layout before hooking
Layout after hooking
At this point, the actions that have to be taken differ between hooking techniques as there are different ways to understand if the trampoline is pointing to the function we are trying to unhook.
When a jump is used to execute the (original) hooked function (first hooking technique), it is necessary to find all the jumps in that region so that we can analyse the destination address of each jump searching for the memory region where the original function is located.
In particular, we scan the PRIVATE memory region searching for
- Long Jumps (Detours-like) OPCODE: 0xff25
- Short Jumps (Malwarebytes-like) OPCODE: 0xe9
When the second technique (double-push) is used, it is necessary to find all the sequences of push rax; push rax; mov rax, addr (OPCODES 0x50; 0x50; 0x48b8 ), so that the destination address could be extracted to check if it points to the hooked function.
Once we have identified the correct trampoline, then the bytes prepending the aforementioned stub are the original bytes that have been overwritten by the initial hook. To unhook the function, we have to copy those bytes back to the original address of the symbol.
At first, we have implemented this idea by cycling over each hooked dll and performing the following steps:
- Use a direct system call to NtProtectVirtualMemory to set the memory permissions of the .text section to RW
- Unhook the functions by writing each original stub at the corresponding symbol address
- Call NtProtectVirtualMemory to restore the original memory permissions (RX)
The final result is shown in the following figure for CreateRemoteThreadEx
The following image shows LdrLoadDll unhooked.
However, at some point we faced a problem with security products which were using a more advanced method and were monitoring the integrity of their hooks which would undo our modifications.
To solve this problem, we changed our approach and decided that rather than overwriting the hooks at the symbol address, we could overwrite the AV hooking trampoline with a jump to the original prologue function (located in the same Private memory area).
In this way, when the AV checks for its hooks it will find all the initial jumps unmodified. Those jumps will also point to the very same addresses the AV placed the hooking trampolines in, however the instructions there will not divert the execution to the AV Dll anymore. We are basically bypassing the hooking trampoline so that when the function is called, the execution will behave exactly like no hooks were in place even if a jump is placed at the symbol address.
Layout after unhooking:
It should be noted that all the previous observations are still valid and they allow us to retrieve all the original stubs by walking the process address space in a clever way.
We have all the information that is necessary to restore the original execution path:
- We know the destination address of each jump located at the symbol address
- We know where the original function stub is located
After collecting all this information, we can start the unhooking process:
- Use a direct system call to NtProtectVirtualMemory to set the protection of the memory area that stores the stub to RW
- Add a short jump instruction opcode: 0xe9 to jump to the original prologue
- Set back the memory to RX using another direct system call to NtProtectVirtualMemory
Finally, it is worth mentioning that, using this technique, it is no longer necessary to differentiate between system calls and APIs - although the knowledge of the system call stub can be used as a verification to understand if a function has been hooked/unhooked correctly - and therefore the unhooking process will be exactly the same, namely:
- Check if the function has been hooked
- Get the pointers to “hooking” and “original” stubs
- Overwrite the “hooking” stub with a jump to the “original” stub.
The technique presented in this post allows us to evade detection from anti-malware products by removing the anti-malware alterations to the execution flow that allow the detection process to take place.
We retrieve all the information that is necessary in order to remove the detection and restore the functions to their original functionality, by reading our own process memory. The interaction with the system APIs is done by using direct system calls; therefore, even if those services are hooked in user space, the security product will not detect those calls.
The core of this technique is based on the observation that the type of memory allocated by the security products is flagged as Private and this was the case in all products we analysed. This makes identifying the stub trivial, as it cannot be confused with any other core memory libraries and allows us to identify the memory area allocated by the AV Dll to host the stubs. All the steps taken to get the information that we need are stealthy as they only require us to read memory and follow pointers in our own process address space. Moreover, calls to NtProtectVirtualMemory are reduced to a minimum as we only use the system call twice per Private memory area (one to set the region to RW, the other to set it back to RX).
From a defender’s perspective, user-space hooking is a very important mechanism, even though bypasses are possible, it is important to have it in place following a defense in depth approach. Moreover, security products that monitor the integrity of the hooks should be preferred as they make attacker’s life harder increasing the likelihood of detection.
Finally, it is important to note that these products are not bullet-proof security solutions that will protect systems from every possible threat, they are tools that defenders can use to identify anomalies in the monitored systems. Setting and tuning a security software are fundamental steps when a new AV is placed in the network: being able to receive meaningful alerts would help defenders to detect and react to stealth attacks that are not automatically detected as malicious but looks suspicious.