Uninformed: Informative Information for the Uninformed

Vol 7» 2007.May


If software analysis had a holy grail, it would more than likely be centered around the ability to accurately model the data flow behavior of an application. After all, applications aren't really much more than sophisticated data processors that operate on varying sets of input to produce varying sets of output. Describing how an application behaves when it encounters these varying sets of input makes it possible to predict future behavior. Furthermore, it can provide insight into how the input could be altered to cause the application to behave differently. Given these benefits, it's only natural that a discipline exists that is devoted to the study of data flow analysis.

There are a two general approaches that can be taken to perform data flow analysis. The first approach is referred to as static analysis and it involves analyzing an application's source code or compiled binaries without actually executing the application. The second approach is dynamic analysis which, as one would expect, involves analyzing the data flow of an application as it executes. The two approaches both have common and unique benefits and no argument will be made in this paper as to which may be better or worse. Instead, this paper will focus on describing three strategies that may be used to assist in the process of dynamic data flow analysis.

The first strategy involves using Dynamic Binary Instrumentation (DBI) to rewrite the instruction stream of the executing application in a manner that makes it possible to intercept instructions that read from or write to memory. Two well-known examples of DBI implementations that the author is familiar with are DynamoRIO and Valgrind[4,12]. The second strategy that will be discussed involves using the hardware paging features of the x86 and x64 architectures to trap and handle access to specific pages in memory. Finally, the third strategy makes use of the segmentation features included in the x86 architecture to trap memory accesses by making use of the null selector. Though these three strategies vary greatly, they all accomplish the same goal of being able to intercept memory accesses within an application as it executes.

The ability to intercept memory reads and writes during runtime can support research in additional areas relating to dynamic data flow analysis. For example, the ability to track what areas of code are reading from and writing to memory could make it possible to build a model for the data propagation behaviors of an application. Furthermore, it might be possible to show with what degree of code-level isolation different areas of memory are accessed. Indeed, it may also be possible to attempt to validate the data consistency model of a threaded application by investigating the access behaviors of various regions of memory which are referenced by multiple threads. These are but a few of the many potential candidates for dynamic data flow analysis.

This paper is organized into three sections. Section 2 gives an introduction to three different strategies for facilitating dynamic data flow analysis. Section 3 enumerates some of the potential scenarios in which these strategies could be applied in order to render some useful information about the data flow behavior of an application. Finally, section 4 describes some of the previous work whose concepts have been used as the basis for the research described herein.