|Informative Information for the Uninformed|
With the introduction of Windows for x64, significant changes were made to how exceptions are processed with respect to how exceptions operate in x86 versions of Windows. On x86 versions of Windows, exception handlers were essentially demand-registered at runtime by routines with exception handlers (more of a code-based exception registration mechanism). On x64 versions of Windows, the exception registration path is accomplished using a more data-driven model. Specifically, exception handling (and especially unwind handling) is now driven by metadata attached to each PE image (known as the ``exception directory''), which describes the relationship between routines and their exception handlers, what the exception handler function pointer(s) for each region of a routine are, and how to unwind each routine's machine state in a completely data-driven fashion.
While there are significant advantages to having exception and unwind dispatching accomplished using a data-driven model, there is a potential performance penalty over the x86 method (which consisted of a linked list of exception and unwind handlers registered at a known location, on a per-thread basis). A specific example of this can be seen when noting that all of the information needed for the operating system to locate and call the exception handler for purposes of exception or unwind processing was in one location (the linked list in the NT_TIB) on Windows for x86 is now scattered across all loaded modules in Windows for x64. In order to locate an exception handler for a particular routine, it is necessary to search the loaded module list for the module that contains the instruction pointer corresponding to the function in question. After the module is located, it is then necessary to process the PE header of the module to locate the module's exception directory. Finally, it is then necessary to search the exception directory of that module for the metadata corresponding to a location encompassing the requested instruction pointer. This process must be repeated for every function for which an exception may traverse.
In an effort to improve the performance of exception dispatching on Windows for x64, Microsoft developed a multi-tier cache system that speeds the resolution of exception dispatching information that is used by the routine responsible for looking up metadata associated with a function. The routine responsible for this is named RtlLookupFunctionTable. When searching for unwind information (a pointer to a RUNTIME_FUNCTION entry structure), depending on the reason for the search request, an internal first-level cache (RtlpUnwindHistoryTable) of unwind information for commonly occurring functions may be searched. At the time of this writing, this table consists of RtlUnwindex, __C_specific_handler, RtlpExecuteHandlerForException, RtlDispatchException, RtlRaiseStatus, KiDispatchException, and KiExceptionDispatch. Due to how exception dispatching operates on x64 , many of these functions will commonly appear in any exception call stack. Because of this it is beneficial to performance to have a first-level, quick reference for them.
After RtlpUnwindHistoryTable is searched, a second cache, known as PsInvertedFunctionTable (in kernel-mode) or LdrpInvertedFunctionTable (in user-mode) is scanned. This second-level cache contains a list of the first 0x200 (Windows Server 2008, Windows Vista) or 0xA0 (Windows Server 2003) loaded modules. The loaded module list contained within PsInvertedFunctionTable / LdrpInvertedFunctionTable is presented as a quickly searchable, unsorted linear array that maps the memory occupied by an entire loaded image to a given module's exception directory. The lookup through the inverted function table thus eliminates the costly linked list (loaded module list) and executable header parsing steps necessary to locate the exception directory for a module. For modules which are referenced by PsInvertedFunctionTable / LdrpInvertedFunctionTable, the exception directory pointer and size information in the PE header of the module in question are unused after the module is loaded and the inverted function table is populated. Because the inverted function table has a fixed size, if enough modules are loaded simultaneously, it is possible that after a point some modules may need to be scanned via loaded module list lookup if all entries in the inverted function table are in use when that module is loaded. However, this is a rare occurrence, and most of the interesting system modules (such as HAL and the kernel memory image itself) are at a fixed-at-boot position within PsInvertedFunctionTable.
By redirecting the exception directory pointer in PsInvertedFunctionTable to refer to a ``shadow'' exception directory in caller-supplied memory (outside of the PE header of the actual module), it is possible to change the exception (or unwind) handling behavior of all code points within a module. For instance, it is possible to create an exception handler spanning every code byte within a module through manipulation of the exception directory information. By changing the inverted function table cache for a module, multiple benefits are realized with respect to this goal. First, an arbitrarily large amount of space may be devoted to unwind metadata, as the patched unwind metadata need not fit within the confines of a particular image's exception directory (this is particular important if one wishes to ``gift'' all functions within a module with an exception handler). Second, the memory image of the module in question need not be modified, improving the resiliency of the technique against naive detection systems.
Category: Type IIa, varies. Although the entries for always-loaded modules such as the HAL and the kernel in-memory image itself are essentially considered write-once, the array as a whole may be modified as the system is running when kernel modules are either loaded or unloaded. As a result, while the first few entries of PsInvertedFunctionTable are comparatively easy to verify, the ``dynamic'' entries corresponding to demand-loaded (and possibly demand-unloaded) kernel modules may frequently change during the legitimate operation of the system, and as such interception of the exception directory pointers of individual drivers may be much less simple to detect than the interception of the kernel's exception directory.
Origin: At the time of this writing, the authors are not aware of existing malware using PsInvertedFunctionTable. Hijacking of PsInvertedFunctionTable was proposed as a possible bypass avenue for PatchGuard version 2 by Skywing. Its applicability as a possible attack vector with respect to hiding kernel mode code was also briefly described in the same article.
Capabilities: The principal capability afforded by this technique is to establish an exception handler at arbitrary locations within a target module (even every code byte within a module if so desired). By virtue of creating such exception handlers, it is possible to gain control at any location within a module that may be traversed by an exception, even if the exception would normally be handled in a safe fashion by the module or a caller of the module.
Considerations: As PsInvertedFunctionTable is not exported, one must first locate it in order to patch it (this is considered possible as many exported routines reference it in an obvious, patterned way, such as RtlLookupFunctionEntry. Also, although the structure is guarded by a non-exported synchronization mechanism (PsLoadedModuleSpinLock in Windows Server 2008), the first few entries corresponding to the HAL and the kernel in-memory image itself should be static and safely accessible without synchronization (after all, neither the HAL nor the kernel in-memory image may be unloaded after the system has booted). It should be possible to perform an interlocked exchange to swap the exception directory pointer, provided that the exception directory shall not be modified in a fashion that would require synchronization (e.g. only appended to) after the exchange is made. The size of the exception directory is supplied as a separate value in the inverted function table entry array and would need to be modified separately, which may pose a synchronization problem if alterations to the exception directory are not carefully planned to be safe in all possible contingencies with respect to concurrent access as the alterations are made. Additionally, due to the 32-bit RVA based format of the unwind metadata, all exception handlers for a module must be within 4GB of that module's loaded base address. This means that custom exception handlers need to be located within a ``window'' of memory that is relatively near to a module. Allocating memory at a specific base address involves additional work as the memory cannot be in an arbitrary point in the address space, but within 4GB of the target. If a caller can query the address space and request allocations based at a particular region, however, this is not seen as a particular unsurmountable problem.
Covertness: The principal advantage of this approach is that it allows a caller to gain control at any point within a module's execution where an exception is generated without modifying any code or data within the module in question (provided the module is cached within PsInvertedFunctionTable). Because the exception directory information for a module is unused after the cache is populated, integrity checks against the PE header are useless for detecting the alteration of exception handling behavior for a cached module. Additionally, PsInvertedFunctionTable is a non-exported, writable kernel-mode global which affords it some intrinsic protection against simple detection techniques. A scan of the loaded module list and comparison of exception directory pointers to those contained within PsInvertedFunctionTable could reveal most attacks of this nature, however, provided that the loaded module list retains integrity. Additionally, PatchGuard version 3 appears to guard key portions of PsInvertedFunctionTable (e.g. to block redirection of the kernel's exception directory), resulting in a need to bypass PatchGuard for long-term exploitation on Windows x64 based systems. This is considered a relatively minor difficulty by the authors.