Implementation
The implementation of the solution described in the previous chapter
relies on intercepting exceptions prior to allowing the native
exception dispatcher to handle them such that the exception handler
chain can be validated. First and foremost, it is important to
identify a way of layering prior to the point that the exception
dispatcher transfers control to the registered exception handlers.
There are a few different places that this layering could occur at,
but the one that is best suited to catch the majority of user-mode
exceptions is at the location that
ntdll!KiUserExceptionDispatcher gains control. However, by
hooking ntdll!KiUserExceptionDispatcher, it is possible
that this implementation may not be able to intercept all cases of
an exception being raised, thus making it potentially feasible to
bypass the exception handler chain validation.
The best location would be to layer at would be
ntdll!RtlDispatchException. The reason for this is that
exceptions raised through ntdll!RtlRaiseException, such as
software exceptions, may be passed directly to
ntdll!RtlDispatchException rather than going through
ntdll!KiUserExceptionDispatcher first. The condition that
controls this is whether or not a debugger is attached to the
user-mode process when ntdll!RtlRaiseException is called.
The reason ntdll!RtlDispatchException is not hooked in this
implementation is because it is not directly exported. There are,
however, fairly reliable techniques that could be used to determine
its address. As far as the author is aware, the act of hooking
ntdll!KiUserExceptionDispatcher should mean that it's only
possible to miss software exceptions which are much harder, and in
most cases impossible, for an attacker to generate.
In order to layer at ntdll!KiUserExceptionDispatcher, the
first few instructions of its prologue can be overwritten with an
indirect jump to a function that will be responsible for performing
any sanity checks necessary. Once the function has completed its
sanity checks, it can transfer control back to the original
exception dispatcher by executing the overwritten instructions and
then jumping back into ntdll!KiUserExceptionDispatcher at
the offset of the next instruction to be executed. This is a nice
and ``clean'' way of accomplishing this and the performance overhead
is miniscule4.1.
In order to hook ntdll!KiUserExceptionDispatcher, the first
n instructions, where n is the number of
instructions that it takes to cover at least 6 bytes, must be copied
to a location that will be used by the hook to execute the actual
ntdll!KiUserExceptionDispatcher. Following that, the first
n instructions of ntdll!KiUserExceptionDispatcher
can then be overwritten with an indirect jump. This indirect jump
will be used to transfer control to the function that will validate
the exception handler chain prior to allowing the original exception
dispatcher to handle the exception.
With the hook installed, the next step is to implement the function
that will actually validate the exception handler chain. The basic
steps involved in this are to first extract the head of the list
from fs:[0] and then iterate over each entry in the list.
For each entry, the function should validate that the Next
attribute points to a valid memory location. If it does not, then
the chain can be assumed to be corrupt. However, if it does point
to valid memory, then the routine should check to see if the
Next pointer is equal to the address of the validation
frame that was previously stored at the end of the exception handler
chain for this thread. If it is equal to the validation frame, then
the integrity of the chain is confirmed and the exception can be
passed to the actual exception dispatcher.
However, if the function reaches an invalid Next pointer,
or it reaches 0xffffffff without encountering the
validation frame, then it can assume that the exception handler
chain is corrupt. It's at this point that the function can take
whatever steps are necessary to discard the exception, log that a
potential exploitation attempt occurred, and so on. The end result
should be the termination of either the thread or the process,
depending on circumstances. This algorithm is captured by the
pseudo-code below:
01: CurrentRecord = fs:[0];
02: ChainCorrupt = TRUE;
03: while (CurrentRecord != 0xffffffff) {
04: if (IsInvalidAddress(CurrentRecord->Next))
05: break;
06: if (CurrentRecord->Next == ValidationFrame) {
07: ChainCorrupt = FALSE;
08: break;
09: }
10: CurrentRecord = CurrentRecord->Next;
11: }
12: if (ChainCorrupt == TRUE)
13: ReportExploitationAttempt();
14: else
15: CallOriginalKiUserExceptionDispatcher();
The above algorithm describes how the exception dispatching path
should be handled. However, there is one important part remaining
in order to implement this solution. Specifically, there must be
some way of registering the validation frame with a thread prior to
any exceptions being dispatched on that thread. There are a few ways
that this can be accomplished. In terms of a proof of concept, the
easiest way of doing this is to implement a DLL that, when loaded
into a process' address space, catches the creation notification of
new threads through a mechanism like DllMain or through the
use of a TLS callback in the case of a statically linked library.
Both of these approaches provide a location for the solution to
establish the validation frame with the thread early on in its
execution. However, if there were ever a case where the thread were
to raise an exception prior to one of these routines being called,
then the solution would improperly detect that the exception handler
chain was corrupt.
One solution to this potential problem is to store state relative to
each thread that keeps track of whether or not the validation frame
has been registered. There are certain implications about doing
this, however. First, it could introduce a security problem in that
an attacker might be able to bypass the protection by somehow
toggling the flag that tracks whether or not the validation frame
has been registered. If this flag were to be toggled to no and an
exception were generated in the thread, then the solution would have
to assume that it can't validate the chain because no validation
frame has been installed. Another issue with this is that it would
require some location to store this state on a per-thread basis. A
good example of a place to store this is in TLS, but again, it has
the security implications described above.
A more invasive solution to the problem of registering the
validation frame would be to somehow layer very early on in the
thread's execution - perhaps even before it begins executing from
its entry point. The author is aware of a good way to accomplish
this, but it will be left as an exercise to the reader on what this
might be. This more invasive solution is something that would be an
easy and elegant way for Microsoft to include support for this,
should they ever choose to do so.
The final matter of how to go about implementing this solution
centers around how it could be deployed and used with existing
applications without requiring a recompile. The easiest way to do
this in a proof of concept setting would be to implement these
protection mechanisms in the form of a DLL that can be dynamically
loaded into the address space of a process that is to be protected.
Once loaded, the DLL's DllMain can take care of getting
everything set up. A simple way to cause the DLL to be loaded is
through the use of AppInit_DLLs[4], although
this has some limitations. Alternatively, there are more invasive
options that can be considered that will accomplish the goal of
loading and initializing the DLL early on in process creation.
One interesting thing about this approach is that while it is
targeted at being used as a runtime solution, it can also be used as
a compile-time solution. This means that applications can use this
solution at compile-time to protect themselves from SEH overwrites.
Unlike Microsoft's solution, this will even protect them in the
presence of third-party images that have not been compiled with the
support. This can be accomplished through the use of a static
library that uses TLS callbacks to receive notifications when
threads are created, much like DllMain is used for DLL
implementations of this solution.
All things considered, the author believes that the implementation
described above, for all intents and purposes, is a fairly
simplistic way of providing runtime protection against SEH
overwrites that has minimal overhead. While the implementation
described in this document is considered more suitable for a
proof-of-concept or application-specific solution, there are
real-world examples of more robust implementations, such as in
Wehnus's WehnTrust product[9], a commercial side-project
of the author's4.2.
|