Uninformed - vol 6 article 2

The payload architecture that the authors decided to integrate was based heavily off previous research[1]. As was alluded to in the introduction, there are a number of complicated considerations that must be taken into account when dealing with kernel-mode exploitation. A large majority of these considerations are directly related to what methods should be used when executing arbitrary code in the kernel. For example, if a device driver was holding a lock at the time that an exploit was triggered, what might be the best way to go about releasing that lock so as to recover the system so that it will still be possible to interact with it in a meaningful way? Other types of considerations include things like IRQL restrictions, cleaning up corrupted structures, and so on. These considerations lead to there being many different ways in which a payload might best be implemented for a particular vulnerability. This is quite a bit different from the user-mode environment where it's almost always possible to use the exact same payload regardless of the application.

Though these situational complications do exist, it is possible to design and implement a payload system that can be applied in almost any circumstance. By separating kernel-mode payloads into variable components, it becomes possible to combine components together in different ways to form functional variations that are best suited for particular situations. In Windows Kernel-mode Payload Fundamentals [1], kernel-mode payloads are broken down into four different components: migration, stagers, recovery, and stages.

When describing kernel-mode payloads in terms of components, the migration component would be one that is used to migrate from an unsafe execution environment to a safe execution environment. For example, if the IRQL is at DISPATCH when a vulnerability is triggered, it may be necessary to migrate to a safer IRQL such as PASSIVE. It is not always necessary to have a migration component. The purpose of a stager component is to move some portion of the payload so that it executes in the context of another thread context. This may be necessary if the current thread is of critical importance or may lead to a deadlock of the system should certain operations be used. The use of a stager may obviate the need for a migration component. A recovery component is something that is used to restore the system to clean state and then continue execution. This component is generally one that may require customization for a given vulnerability as it may not always be possible to describe the steps needed to recover the system in a generic way. For example, if locks were held at the time that the vulnerability was triggered, it may be necessary to find a way to release those locks and then continue execution from a safe point. Finally, the stage component is a catch-all for whatever arbitrary code may be executed once the payload is running in a safe environment.

This model for describing kernel-mode payloads is what the authors decided to adopt. To better understand how this model works, it seems best to describe how it was applied for all three real world vulnerabilities that are shown in chapter 5. These three vulnerabilities actually make use of the same basic underlying payload, which will henceforth be referred to as ``the payload'' for brevity. The payload itself is composed of three of the four components. Each of the payload components will be discussed individually and then as a whole to provide an idea for how the payload operates.

The first component that exists in the payload is a stager component. The stager that the authors chose to use is based on the SharedUserData SystemCall Hook stager described in [1]. Before understanding how the stager works, it's important to understand a few things. As the name implies, the stager accomplishes its goal by hooking the SystemCall attribute found within SharedUserData. As a point of reference, SharedUserData is a global page that is shared between user-mode and kernel-mode. It acts as a sort of global structure that contains things like tick count and time information, version information, and quite a few other things. It's extremely useful for a few different reasons, not the least of which being that it's located at a fixed address in user-mode and in kernel-mode on all NT derivatives. This means that the stager is instantly portable and doesn't need to perform any symbol resolution to locate the address, thus helping to keep the overall size of the payload small.

The SystemCall attribute that is hooked is part of an enhancement that was added in Windows XP. This enhancement was designed to make it possible to use optimized system call instructions depending on what hardware support is present on a given machine. Prior to Windows XP, system calls were dispatched from user-mode through the hardcoded use of the int 0x2e soft interrupt. Over time, hardware enhancements were made to decrease the overhead involved in performing a system call, such as through the introduction of the sysenter instruction. Since Microsoft isn't in the business of providing different versions of Windows for different makes and models of hardware, they decided to determine at runtime which system call interface to use. SharedUserData was the perfect candidate for storing the results of this runtime determination as it was already a shared page that existed in every user-mode process. After making these modifications, ntdll.dll was updated to dispatch system calls through SharedUserData rather than through the hardcoded use of int 0x2e. The initial implementation of this new system call dispatching interface placed executable code within the SystemCall attribute of SharedUserData. Subsequent versions of Windows, such as XP SP2, turned the SystemCall attribute into a function pointer.

One important implication about the introduction of the SystemCall attribute to SharedUserData is that it represents a pivot point through which all system call dispatching occurs in user-mode. In previous versions of Windows, each user-mode system call stub routine invoked int 0x2e directly. In the latest versions, these stub routines make indirect calls through the SystemCall function pointer. By default, this function pointer is initialized to point to one of a few exported symbols within ntdll.dll. However, the implications of this function pointer being changed to point elsewhere mean that it would be possible to intercept all system calls within all processes. This implication is what forms the very foundation for the stager that is used by the payload.

When the stager begins executing, it's running in kernel-mode in the context of the thread that triggered the vulnerability. The first action it takes is to copy a chunk of code (the stage) into an unused portion of SharedUserData using the predictable address of 0xffdf037c. After the copy operation completes, the stager proceeds by hooking the SystemCall attribute. This hook must be handled differently depending on whether or not the target operating system is pre-XP SP2 or not. More details on how this can be handled are described in [1]. Regardless of the approach, the SystemCall attribute is redirected to point to 0x7ffe037c. This predictable location is the user-mode accessible address of the unused portion of SharedUserData where the stage was copied into. After the hooking operation completes, all system calls invoked by user-mode processes will first go through the stage placed at 0x7ffe037c. The stager portion of the payload looks something like this^4.1:

With the hook in place, the stager has completed its primary task which was to copy a stage into a location where it could be executed in the future. Before the stage can execute, the stager must allow the recovery component of the payload to execute. As mentioned previously, the recovery component represents one of the most vulnerability-specific portions of any kernel-mode payload. For the purpose of the exploits described in chapter 5, a special purpose recovery component was necessary.

This particular recovery component was required due to the fact that the example vulnerabilities are triggered in the context of the Idle thread. On Windows, the Idle thread is a special kernel thread that executes whenever a processor is idle. Due to the nature of the way the Idle thread operates, it's dangerous to perform operations like spinning the thread or any of the other recovery methods described in [1]. It may also be possible to apply the technique for delaying execution within the Idle thread as discussed in [2]. The recovery method that was finally selected involves two basic steps. First, the IRQL for the current processor is restored to DISPATCH level just in case it was executing at a higher IRQL. Second, execution control is transferred into the first instruction of nt!KiIdleLoop after initializing registers appropriately. The end effect is that the idle thread begins executing all over again and, if all goes well, the system continues operating as if nothing had happened. In practice, this recovery method has been proven reliable. However, the one negative that it is has is that it requires knowledge of the address that nt!KiIdleLoop resides at. This dependence represents an area that is ripe for future improvement. Regardless of limitations, the recovery component for the payload looks like the code below:

After the recovery component has completed its execution, all of the payload code that was originally executing in kernel-mode is complete. The final portion of the payload that remains to be executed is the stage that was copied by the stager. The stage itself runs in user-mode within all process contexts, and it executes every time a system call is dispatched. The implications of this should be obvious. Having a stage that executes within every process every time a system call occurs is just asking for trouble. For that reason, it makes sense to design a generic user-mode stage that can be used to limit the times that it executes to one particular context.

The approach that the authors took to meet this requirement is as follows. First, the stage performs a check that is designed to see if it is running in the context of a specific process. This check is there in order to help ensure that the stage itself only executes in a known-good environment. As an example, it would be a shame to take advantage of a kernel-mode vulnerability only to finally execute code with the privileges of Guest. By default, this check is designed to see if the stage is running within lsass.exe, a process that runs with SYSTEM level privileges. If the stage is running within lsass, it performs a check to see if the SpareBool attribute of the Process Environment Block has been set to one. By default, this value is initialized to zero in all processes. If the SpareBool attribute is set to zero, then the stage proceeds to set the SpareBool attribute to one and then finishes by executing whatever code is remaining within the stage. If the SpareBool attribute is set to one, which means the stage has already run, or it's not running within lsass, it transfers control back to the original system call dispatching routine. This is necessary because it is still a requirement that system calls from user-mode processes be dispatched appropriately, otherwise the system itself would grind to a halt. An example of what this stage might look like is shown below:

The culmination of these three payload components is a functional payload that can be used in any situation where an exploit is triggered within the Idle thread. If the exploit is triggered outside of the context of the Idle thread, the recovery component can be swapped out with an alternative method and the rest of the payload can remain unchanged. This is one of the benefits of breaking kernel-mode payloads down into different components. To recap, the payload works by using a stager to copy a stage into an unused portion of SharedUserData. The stager then points the SystemCall attribute to that unused portion, effectively causing all user-mode processes to bounce through the stage when they attempt to make a system call. Once the stager has completed, the recovery component restores the IRQL to DISPATCH and then restarts the Idle thread. The kernel-mode portion of the payload is then complete. Shortly after that, the stage that was copied to SharedUserData is executed in the context of a specific user-mode process, such as lsass.exe. Once this occurs, the stage sets a flag that indicates that it's been executed and completes. All told, the payload itself is only 115 bytes, excluding any additional code in the stage.

Given all of this infrastructure work, it's trivial to plug almost any user-mode payload into the stage. The additional code must simply be placed at the point where it's verified that it's running in a particular process and that it hasn't been executed before. The reason for it being so trivial was quite intentional. One of the major goals in implementing this payload system was to make it possible to use the existing set of payloads that exist in the Metasploit framework in conjunction with any kernel-mode exploit. This includes even some of the more powerful payloads such as Meterpreter and VNC injection.

There were two key elements involved in integrating kernel-mode payloads into the 3.0 version of the Metasploit Framework. The first had to do with defining the interface that exploit developers would need to use when writing kernel-mode exploits. The second delt with defining the interface the end-users would have to be aware of when using kernel-mode exploits. In terms of precedence, defining the programming level interfaces first is the ideal approach. To that point, the programming interface that was decided upon is one that should be pretty easy to use. The majority of the complexity involved in selecting a kernel-mode payload is hidden from the developer. There are only a few basic things that the developer needs to be aware of.

When implementing a kernel-mode exploit in Metasploit 3.0, it is necessary to include the Msf::Exploit::KernelMode mixin. This mixin provides hints to the framework that make it aware of the fact that any payloads used with this exploit will need to be appropriately encapsulated within a kernel-mode stager. With this simple action, the majority of the work associated with the kernel-mode payload is abstracted away from the developer. The only other elements that a developer may need to deal with is the process of defining extended parameters that are used to further control the process of selecting different aspects of the kernel-mode payload. These controlable parameters are exposed to developers through the ExtendedOptions hash element in an exploit's global or target-specific Payload options. An example of what this might look like within an exploit can be seen here:

In the above example, the exploit has explicitly selected the underlying stager component that should be used by specifying the Stager hash element. The sud_syscall_hook stager is a symbolic name for the stager that was described in section 4.1. The example above also has the exploit explicitly selecting the recovery component that should be used. In this case, the recovery component that is selected is idlethread_restart which is a symbolic name for the recovery component described previously. Additionally, the nt!KiIdleLoop address is specified for use with this particular recovery component. Under the hood, the use of the KernelMode mixin and the additional extended options results in the framework encapsulating whatever user-mode payload the end-user specified inside of a kernel-mode stager. In the end, this process is entirely transparent to both the developer and the end-user.

While the set of options that can be specified in the extended options hash will surely grow in the future, it makes sense to at least document the set of defined elements at the time of this writing. These options are described in the following table:

Hash Element	Description
`Recovery`	Defines the recovery component that should be used when generating the kernel-mode payload. The current set of valid values for this option include `spin`, which will spin the current thread, `idlethread_restart`, which will restart the `Idle` thread, or `default` which is equivalent to `spin`. Over time, more recovery methods may be added. These can be found in `recovery.rb`.
`RecoveryStub`	Defines a custom recovery component.
`Stager`	Defines the stager component that should be used when generating the kernel-mode payload. The current set of valid values for this option include `sud_syscall_hook`. Over time, more stager methods may be added. These can be found in `stager.rb`.
`UserModeStub`	Defines the user-mode custom code that should be executed as part of the stage.
`RunInWin32Process`	Currently only applicable to the `sud_syscall_hook` stager. This element specifies the name of the system process, such as `lsass.exe`, that should be injected into.
`KiIdleLoopAddress`	Currently only applicable to the `idlethread_restart` recovery component. This element specifies the address of `nt!KiIdleLoop`.

While not particularly important to developers or end-users, it may be interesting for some to understand how this abstraction works internally. To start things off, the KernelMode mixin overrides a base class method called encode_begin. This method is called when a payload that is used by an exploit is being encoded. When this happens, the mixin declares a procedure that is called by the payload encoder. In turn, this procedure is called by the payload encoder in the context of encapsulating the pre-encoded payload. The procedure itself is passed the original raw user-mode payload and the payload options hash (which contains the extended options, if any, that were specified in the exploit). It uses this information to construct the kernel-mode stager that is used to encapsulate the user-mode payload. If the procedure completes successfully, it returns a non-nil buffer that contains the original user-mode payload encapsulated within a kernel-mode stager. The kernel-mode stager and other components are actually contained within the payloads subsystem of the Rex library under lib/rex/payloads/win32/kernel.

Payload Architecture