|Informative Information for the Uninformed|
Dynamic Binary Instrumentation (DBI) is a method of analyzing the behavior of a binary application at runtime through the injection of instrumentation code. This instrumentation code executes as part of the normal instruction stream after being injected. In most cases, the instrumentation code will be entirely transparent to the application that it's been injected to. Analyzing an application at runtime makes it possible to gain insight into the behavior and state of an application at various points in execution. This highlights one of the key differences between static binary analysis and dynamic binary analysis. Rather than considering what may occur, dynamic binary analysis has the benefit of operating on what actually does occur. This is by no means exhaustive in terms of exercising all code paths in the application, but it makes up for this by providing detailed insight into an application's concrete execution state.
The benefits of DBI have made it possible to develop some incredibly advanced tools. Examples where DBI might be used include runtime profiling, visualization, and optimization tools. DBI implementations generally fall into two categories: light-weight or heavy-weight. A light-weight DBI operates on the architecture-specific instruction stream and state when performing analysis. A heavy-weight DBI operates on an abstract form of the instruction stream and state. An example a heavy-weight DBI is Valgrind which performs analysis on an intermediate representation of the machine state[12,8]. An example of a light-weight DBI is DynamoRIO which performs analysis using the architecture-specific state. The benefit of a heavy-weight DBI over a light-weight DBI is that analysis code written against the intermediate representation is immediately portable to other architectures, whereas light-weight DBI analysis implementations must be fine-tuned to work with individual architectures. While Valgrind is a novel and interesting implementation, it is currently not supported on Windows. For this reason, attention will be given to DynamoRIO for the remainder of this paper1.
DynamoRIO is an example of a DBI framework that allows custom instrumentation code to be integrated in the form of dynamic libraries. The tool itself is a combination of Dynamo, a dynamic optimization engine developed by researchers at HP, and RIO, a runtime introspection and optimization engine developed by MIT. The fine-grained details of the implementation of DynamoRIO are outside of the scope of this paper, but it's important to understand the basic concepts.
At a high-level, figure 1 from Transparent Binary Optimization provides a great visualization of the process employed by Dynamo. In concrete terms, Dynamo works by processing an instruction stream as it executes. To accomplish this, Dynamo assumes responsibility for the execution of the instruction stream. It uses a disassembler to identify the point of the next branch instruction in the code that is about to be executed. The set of instructions disassembled is referred to as a fragment (although, it's more commonly known as a basic block). If the target of the branch instruction is in Dynamo's fragment cache, it executes the (potentially optimized) code in the fragment cache. When this code completes, it returns control to Dynamo to disassemble the next fragment. If at some point Dynamo encounters a branch target that is not in its fragment cache, it will add it to the fragment cache and potentially optimize it. This is the perfect opportunity for instrumentation code to be injected into the optimized fragment that is generated for a branch target. Injecting instrumentation code at this level is entirely transparent to the application. While this is an oversimplification of the process used by DynamoRIO, it should at least give some insight into how it functions.
One of the best features of DynamoRIO from an analysis standpoint is that it provides a framework for inserting instrumentation code during the time that a fragment is being inserted into the fragment cache. This is especially useful for the purposes of intercepting memory accesses within an application. When a fragment is being created, DynamoRIO provides analysis libraries with the instructions that are to be included in the fragment that is generated. To optimize for performance, DynamoRIO provides multiple levels of disassembly information. At the most optimized level, only very basic information about the instructions is provided. At the least optimized level, very detailed information about the instructions and their operands can be obtained. Analysis libraries are free to control the level of information that they retrieve. Using this knowledge of DynamoRIO, it is now possible to consider how one might design an analysis library that is able to intercept memory reads and writes while an application is executing.