Implementation

The implementation of the solution described in the previous chapter relies on intercepting exceptions prior to allowing the native exception dispatcher to handle them such that the exception handler chain can be validated. First and foremost, it is important to identify a way of layering prior to the point that the exception dispatcher transfers control to the registered exception handlers. There are a few different places that this layering could occur at, but the one that is best suited to catch the majority of user-mode exceptions is at the location that ntdll!KiUserExceptionDispatcher gains control. However, by hooking ntdll!KiUserExceptionDispatcher, it is possible that this implementation may not be able to intercept all cases of an exception being raised, thus making it potentially feasible to bypass the exception handler chain validation.

The best location would be to layer at would be ntdll!RtlDispatchException. The reason for this is that exceptions raised through ntdll!RtlRaiseException, such as software exceptions, may be passed directly to ntdll!RtlDispatchException rather than going through ntdll!KiUserExceptionDispatcher first. The condition that controls this is whether or not a debugger is attached to the user-mode process when ntdll!RtlRaiseException is called. The reason ntdll!RtlDispatchException is not hooked in this implementation is because it is not directly exported. There are, however, fairly reliable techniques that could be used to determine its address. As far as the author is aware, the act of hooking ntdll!KiUserExceptionDispatcher should mean that it's only possible to miss software exceptions which are much harder, and in most cases impossible, for an attacker to generate.

In order to layer at ntdll!KiUserExceptionDispatcher, the first few instructions of its prologue can be overwritten with an indirect jump to a function that will be responsible for performing any sanity checks necessary. Once the function has completed its sanity checks, it can transfer control back to the original exception dispatcher by executing the overwritten instructions and then jumping back into ntdll!KiUserExceptionDispatcher at the offset of the next instruction to be executed. This is a nice and ``clean'' way of accomplishing this and the performance overhead is miniscule^4.1.

In order to hook ntdll!KiUserExceptionDispatcher, the first n instructions, where n is the number of instructions that it takes to cover at least 6 bytes, must be copied to a location that will be used by the hook to execute the actual ntdll!KiUserExceptionDispatcher. Following that, the first n instructions of ntdll!KiUserExceptionDispatcher can then be overwritten with an indirect jump. This indirect jump will be used to transfer control to the function that will validate the exception handler chain prior to allowing the original exception dispatcher to handle the exception.

With the hook installed, the next step is to implement the function that will actually validate the exception handler chain. The basic steps involved in this are to first extract the head of the list from fs:[0] and then iterate over each entry in the list. For each entry, the function should validate that the Next attribute points to a valid memory location. If it does not, then the chain can be assumed to be corrupt. However, if it does point to valid memory, then the routine should check to see if the Next pointer is equal to the address of the validation frame that was previously stored at the end of the exception handler chain for this thread. If it is equal to the validation frame, then the integrity of the chain is confirmed and the exception can be passed to the actual exception dispatcher.

However, if the function reaches an invalid Next pointer, or it reaches 0xffffffff without encountering the validation frame, then it can assume that the exception handler chain is corrupt. It's at this point that the function can take whatever steps are necessary to discard the exception, log that a potential exploitation attempt occurred, and so on. The end result should be the termination of either the thread or the process, depending on circumstances. This algorithm is captured by the pseudo-code below:

01: CurrentRecord = fs:[0];
02: ChainCorrupt  = TRUE;
03: while (CurrentRecord != 0xffffffff) {
04:     if (IsInvalidAddress(CurrentRecord->Next))
05:         break;
06:     if (CurrentRecord->Next == ValidationFrame) {
07:         ChainCorrupt = FALSE;
08:         break;
09:     }
10:     CurrentRecord = CurrentRecord->Next;
11: }
12: if (ChainCorrupt == TRUE)
13:     ReportExploitationAttempt();
14: else
15:     CallOriginalKiUserExceptionDispatcher();

The above algorithm describes how the exception dispatching path should be handled. However, there is one important part remaining in order to implement this solution. Specifically, there must be some way of registering the validation frame with a thread prior to any exceptions being dispatched on that thread. There are a few ways that this can be accomplished. In terms of a proof of concept, the easiest way of doing this is to implement a DLL that, when loaded into a process' address space, catches the creation notification of new threads through a mechanism like DllMain or through the use of a TLS callback in the case of a statically linked library. Both of these approaches provide a location for the solution to establish the validation frame with the thread early on in its execution. However, if there were ever a case where the thread were to raise an exception prior to one of these routines being called, then the solution would improperly detect that the exception handler chain was corrupt.

One solution to this potential problem is to store state relative to each thread that keeps track of whether or not the validation frame has been registered. There are certain implications about doing this, however. First, it could introduce a security problem in that an attacker might be able to bypass the protection by somehow toggling the flag that tracks whether or not the validation frame has been registered. If this flag were to be toggled to no and an exception were generated in the thread, then the solution would have to assume that it can't validate the chain because no validation frame has been installed. Another issue with this is that it would require some location to store this state on a per-thread basis. A good example of a place to store this is in TLS, but again, it has the security implications described above.

A more invasive solution to the problem of registering the validation frame would be to somehow layer very early on in the thread's execution - perhaps even before it begins executing from its entry point. The author is aware of a good way to accomplish this, but it will be left as an exercise to the reader on what this might be. This more invasive solution is something that would be an easy and elegant way for Microsoft to include support for this, should they ever choose to do so.

The final matter of how to go about implementing this solution centers around how it could be deployed and used with existing applications without requiring a recompile. The easiest way to do this in a proof of concept setting would be to implement these protection mechanisms in the form of a DLL that can be dynamically loaded into the address space of a process that is to be protected. Once loaded, the DLL's DllMain can take care of getting everything set up. A simple way to cause the DLL to be loaded is through the use of AppInit_DLLs[4], although this has some limitations. Alternatively, there are more invasive options that can be considered that will accomplish the goal of loading and initializing the DLL early on in process creation.

One interesting thing about this approach is that while it is targeted at being used as a runtime solution, it can also be used as a compile-time solution. This means that applications can use this solution at compile-time to protect themselves from SEH overwrites. Unlike Microsoft's solution, this will even protect them in the presence of third-party images that have not been compiled with the support. This can be accomplished through the use of a static library that uses TLS callbacks to receive notifications when threads are created, much like DllMain is used for DLL implementations of this solution.

All things considered, the author believes that the implementation described above, for all intents and purposes, is a fairly simplistic way of providing runtime protection against SEH overwrites that has minimal overhead. While the implementation described in this document is considered more suitable for a proof-of-concept or application-specific solution, there are real-world examples of more robust implementations, such as in Wehnus's WehnTrust product[9], a commercial side-project of the author's^4.2.

Next: Compatibility Up: Preventing the Exploitation of Previous: Design Contents