Uninformed - vol all article 11

Bypassing Windows Hardware-enforced Data Execution Prevention

Oct 2, 2005

	skape	Skywing
	mmiller@hick.org	Skywing@valhallalegends.com

One of the big changes that Microsoft introduced in Windows XP Service Pack 2 and Windows 2003 Server Service Pack 1 was support for a new feature called Data Execution Prevention[2] (DEP). This feature was added with the intention of doing exactly what its name implies: preventing the execution of code in non-executable memory regions. This is particulary important when it comes to preventing the exploitation of most software vulnerabilities because most exploits tend to rely on storing arbitrary code in what end up being non-executable memory regions, such as a thread stack or a process heap¹.

DEP itself is capable of functioning in two modes. The first mode is referred to as Software-enforced DEP. It provides fairly limited support for preventing the execution of code through exploits that take advantage of SEH overwrites. Software-enforced DEP is used on machines that are not capable of supporting true non-executable pages due to inadequate hardware support. Software-enforced DEP is also a compile-time only change, and as such is typically limited to system libraries and select third-party applications that have been recompiled to take advantage of it. Bypassing this mode of DEP has been discussed before and is not the focus of this document.

The second mode in which DEP can operate is referred to as Hardware-enforced DEP. This mode is a superset of software-enforced DEP and is used on hardware that supports marking pages as non-executable. While most existing intel-based hardware does not have this feature (due to legacy support for only marking pages as readable or writable), newer chipsets are beginning to have true hardware support through things like Page Address Extensions (PAE). Hardware-enforced DEP is the most interesting of the two modes since it can be seen as a truly mitigating factor to most common exploitation vectors. The bypass technique described in this document is designed to be used against this mode.

Before describing the technique, it is prudent to understand the parameters under which it will operate. In this case, the technique is meant to provide a way of executing code from regions of memory that would not typically be executable when hardware-enforced DEP is in use, such as a thread stack or a process heap. This technique can be seen as a means of eliminating DEP from the equation when it comes to writing exploits because the commonly used approach of executing custom code from a writable memory address can still be used. Furthermore, this technique is meant to be as generic as possible such that it can be used in both existing and new exploits without major modifications. With the parameters set, the next requirement is to understand some of the new features that compose hardware-enforced DEP.

When implementing support for DEP, Microsoft rightly realized that many existing third-party applications might run into major compatibility issues due to assumptions about whether or not a region of allocated memory is executable. In order to handle this situation, Microsoft designed DEP so that it could be configured in a few different manners. At the most general level, DEP is designed to have a default parameter that indicates whether or not non-executable protection is enabled only for system processes and custom defined applications (OptIn), or whether it's enabled for everything except for applications that are specifically exempted (OptOut). These two flags are passed to the kernel during boot through the /NoExecute option in boot.ini. Furthermore, two other flags can be passed as part of the NoExecute option to indicate that DEP should be AlwaysOn or AlwaysOff. These two settings force a flag to be set for each process that permanently enables or disables DEP. The default setting on Windows XP SP2 is OptIn, while the default setting on Windows 2003 Server SP1 is OptOut.

Aside from the global system parameter, DEP can also be enabled or disabled on a per-process basis. The disabling of non-executable (NX) support for a process is determined at execution time. To support this, a new internal routine was added to ntdll.dll called LdrpCheckNXCompatibility. This routine checks a few different things to determine whether or not NX support should be enabled for the process. The routine itself is called whenever a DLL is loaded in the context of a process through LdrpRunInitializationRoutines. The first check it performs is to see if a SafeDisc DLL is being loaded. If it is, NX support is flagged as needing to be disabled for the process. The second check it performs is to look in the application database for the process to see if NX support should be disabled or enabled. Lastly, it checks to see if the DLL that is being loaded is flagged as having an NX incompatible section (such as .aspack, .pcle, and .sforce).

As a result of these checks, NX support is either enabled or disabled through a new PROCESSINFOCLASS named ProcessExecuteFlags (0x22). When a call to NtSetInformationProcess is issued with this information class, a four byte bitmask is supplied as the buffer parameter. This bitmask is passed to nt!MmSetExecuteOptions which performs the appropriate operation. Optionally, a flag (MEM_EXECUTE_OPTION_PERMANENT, or 0x8) can also be specified as part of the bitmask that indicates that future calls to the function should fail such that the execute flags cannot be changed again. To enable NX support, the MEM_EXECUTE_OPTION_DISABLE flag (0x1) is specified. To disable NX support, the MEM_EXECUTE_OPTION_ENABLE flag (0x2) is specified. Depending on the state of these per-process flags, execution of code from non-executable memory regions will either be permitted (MEM_EXECUTE_OPTION_ENABLE) or denied (MEM_EXECUTE_OPTION_DISABLE).

If it were in some way possible for an attacker to change the execution flags of a process that is being exploited, then it follows that the attacker would be able to execute code from previously non-executable memory regions. In order to do this, though, the attacker would have to run code from regions of memory that are already executable. As chance would have it, there happen to be useful executable memory regions, and they exist at the same address in every process².

To take advantage of this feature, an attacker must somehow cause NtSetInformationProcess to be called with the ProcessExecuteFlags information class. Furthermore, the ProcessInformation parameter must be set to a bitmask that has the MEM_EXECUTE_OPTION_ENABLE bit set, but not the MEM_EXECUTE_OPTION_DISABLE bit set. The following code illustrates a call to this function that would disable NX support for the calling process:

One method of accomplishing this would be to use a ret2libc derived attack whereby control flow is transferred into the NtSetInformationProcess function with an attacker-controlled frame set up on the stack. In this case, the arguments described to the right in the above code snippet would have to be set up on the stack so that they would be interpreted correctly when NtSetInformationProcess begins executing. The biggest drawback to this approach is that it would require NULL bytes to be usable as part of the buffer that is used for the overflow. Generally speaking, this will not be possible, especially with any overflow that is caused through the use of a string function. However, when possible, this approach can certainly be useful.

Though a direct return into NtSetInformationProcess may not be universally feasible, another technique can be used that lends itself to being more generally applicable. Under this approach, the attacker can take advantage of code that already exists within ntdll for disabling NX support for a process. By returning into a specific chunk of code, it is possible to disable NX support just as ntdll would while still being able to transfer control back into a user-controlled buffer. The one limitation, however, is that the attacker be able to control the stack in a way similar to most ret2libc style attacks, but without the need to control arguments.

The first step in this process is to cause control to be transferred to a location in memory that performs an operation that is equivalent to a mov al, 0x1 / ret combination. Many instances of similar instructions exist (xor eax, eax/inc eax/ret; mov eax, 1/ret; etc). One such instance can be found in the ntdll!NtdllOkayToLockRoutine function.

This will cause the low byte of eax to be set to one for reasons that will become apparent in the next step. Once control is transferred to the mov instruction, and then subsequently the ret instruction, the attacker must have set up the stack in such a way that the ret instruction actually returns into another segment of code inside ntdll. Specifically, it should return part of the way into the ntdll!LdrpCheckNXCompatibility routine.

In this block, a check is made to see if the low byte of eax is set to one. Regardless of whether or not it is, esi is initialized to hold the value 2. After that, a check is made to see if the zero flag is set (as would be the case if the low byte of eax is 1). Since this code will be executed after the first mov al, 0x1 / ret set of instructions, the ZF flag will always be set, thus transferring control to 0x7c93feba.

This block sets a local variable to the contents of esi, which in this case is 2. Afterwards, it transfers to control to 0x7c91d403.

This block, in turn, compares the local variable that was just initialized to 2 with 0. If it's not zero (which it won't be), control is transferred to 0x7c935d6d.

It's at this point that things begin to get interesting. In this block, a call is issued to NtSetInformationProcess with the ProcessExecuteFlags information class. The ProcessInformation parameter pointer is passed which was previously initialized to 2³. This results in NX support being disabled for the process. After the call completes, it transfers control to 0x7c91d441.

Finally, this block simply restores saved registers, issues a leave instruction, and returns to the caller. In this case, the attacker will have set up the frame in such a way that the ret instruction actually returns into a general purpose instruction that transfers control into a controllable buffer that contains the arbitrary code to be executed now that NX support has been disabled.

This approach requires the knowledge of three addresses. First, the address of the mov al, 0x1 / ret equivalent must be known. Fortunately, there are many occurrences of this type of block, though they may not be as simplistic as the one described in this document. Second, the address of the start of the cmp al, 0x1 block inside ntdll!LdrpCheckNXCompatibility must be known. By depending on two addresses within ntdll, it stands to reason that an exploit can be more portable than if one were to depend on addresses from two different DLLs. Finally, the third address is the one that would be the one that is typically used on targets that didn't have hardware-enforced DEP, such as a jmp esp or equivalent instruction depending on the vulnerability in question.

Aside from specific address limitations, this approach also relies on the fact that ebp is pointed to a valid, writable address such that the value that indicates that NX support should be disabled can be temporarily stored. This can be accomplished a few different ways, depending on the vulnerability, so it is not seen as a largely limiting factor.

To test this approach, the authors modified the warftpd_165_user exploit from the Metasploit Framework that was written by Fairuzan Roslan[1]. This vulnerability is a simple stack overflow. Prior to our modifications, the exploit was implemented in the following manner:

This code built a NOP sled of 1024 bytes. At byte index 485, the return address was stored after which point the shellcode was appended⁴. When run against a target that supports hardware-enforced DEP, the exploit fails when it tries to execute the first instruction of the NOP sled because the region of memory (the thread stack) is marked as non-executable.

Applying the technique described above, the authors changed the exploit to send a buffer structured as follows:

In this case, a buffer was built that contained 485 int3 instructions. From there, the buffer was set to overwrite the return address with a pointer to ntdll!NtdllOkayToLockRoutine. Since this routine does a retn 0x4, the next four bytes are padding as a fake argument that is popped off the stack. Once NtdllOkayToLockRoutine returns, the stack would point 493 bytes into the evil buffer that is being built (immediately after the 0x7c952080 return address overwrite and the fake argument). This means that NtdllOkayToLockRoutine would return into 0x7c91d3f8. This block of code is what evaluates the low byte of eax and eventually leads to the disabling of NX support for the process. Once completed, the block pops saved registers off the stack and issues a leave instruction, moving the stack pointer to where ebp currently points. In this case, ebp was 0x54 bytes away from esp, so we inserted 0x54 bytes of padding. Once the block does this, the stack pointer will point 577 bytes into the evil buffer (immediately after the 0x54 bytes of padding). This means that it will return into whatever address is stored at this location. In this case, the buffer is populated such that it simply returns into the target-specified return address (which is a jmp esp equivalent instruction). From there, the jmp esp instruction is executed which transfers control into the shellcode that immediately follows it. Once executed, the exploit works as if nothing had changed:

As can be seen, the technique described in this document outlines a feasible method that can be used to circumvent the security enhancements provided by hardware-enforced DEP in the default installations of Windows XP Service Pack 2 and Windows 2003 Server Service Pack 1. The flaw itself is not related to any specific inefficiency or mistake made during the actual implementation of hardware-enforced DEP support, but instead is a side effect of a design decision by Microsoft to provide a mechanism for disabling NX support for a process from within a user-mode process. Had it been the case that there was no mechanism by which NX support could be disabled at runtime from within a process, the approaches outlined in this document would not be feasible.

In the interest of not presenting a problem without also describing a solution, the authors have identified a few different ways in which Microsoft might be able to solve this. To prevent this approach, it is first necessary to identify the things that it depends on. First and foremost, the technique depends on knowing the location of three separate addresses. Second, it depends on the feature being exposed that allows a user-mode process to disable NX support for itself. Finally, it depends on the ability to control the stack in a manner that allows it perform a ret2libc style attack⁵.

The first dependency could be broken by instituting some form of Address Space Layout Randomization that would thereby make the location of the dependent code blocks unknown to an attacker. The second dependency could be broken by moving the logic that controls the enabling and disabling of a process' NX support to kernel-mode such that it cannot be influenced in such a direct manner. This approach is slightly challenging considering the model that it is currently implemented under requires the ability to disable NX support when certain events (such as the loading of an incompatible DLL) occur. Although it may be more challenging, the authors see this as being the most feasible approach in terms of compatibility. Lastly, the final dependency is not really something that Microsoft can control. Aside from these potential solutions, it might also be possible to come up with a way to make it so the permanent flag is set sooner in the process' initialization, though the authors are not sure of a way in which this could be made possible without breaking support for disabling when certain DLLs are loaded.

In closing, the authors would like to make a special point to indicate that Microsoft has done an excellent job in raising the bar with their security improvements in XP Serivce Pack 2. The technique outlined in this document should not be seen as a case of Microsoft failing to implement something securely, as the provisions are certainly there to deploy hardware-enforced DEP in a secure fashion, but instead might be better viewed as a concession that was made to ensure that application compatibility was retained for the general case. There is almost always a trade-off when it comes to providing new security features in the face of potential compatibility problems, and it can be said that perhaps no company other than Microsoft is more well known for retaining backward compatibility.