Uninformed: Informative Information for the Uninformed

Vol 5» 2006.Sep


What is code coverage?

Code coverage, as represented by a Control Flow Graph (CFG), is defined as a measure of the exercised code within a program undergoing software testing. For the purpose of vulnerability research, the goal is to utilize code coverage analysis to obtain an exhaustive execution of all possible paths through code and data flow that may be relevant for revealing failures. It is used as a good metric in determining how a specific set of tests can uncover numerous faults. Techniques of proper code coverage analysis presented in this paper utilize basic mathematical properties of graph theory by including elements such as vertices, links and edges. Graph theory has lain somewhat dormant until recently being utilized by computer scientists which have subsequently defined their own sets of vocabulary for the subject. For the sake of research continuity and to link mathematical to computer science definitions, the verbiage used within this paper will equate vertices to code blocks, branches to decisions, and edges to code paths.

To support our hypothesis, the aforementioned graph theory elements are compiled into CFGs. Informally, a Control Flow Graph is a directed graph composed of a finite set of vertices connected by edges indicating all possible routes a driver or application may take during execution. In other words, a CFG is merely blocks of code whose connected flow paths are determined by decisions. Block execution consists of a sequence of instructions which are free of branching or other control transfers except for the last instruction. These include branches or decisions which consist of Boolean expressions in a control structure. A path is a sequence of nodes traveled through by a series of uninterrupted links. Paths enable flow of information or data through code. In our case, a path is an execution flow and is therefore essential to measuring code coverage. Because of this factor, this investigation focuses directly on determining which paths have been traversed, which blocks and correlating data have been executed, and which links have been followed and finally applying it to fuzzing techniques.

The purpose of code coverage analysis is ultimately to require all control decisions to be exercised. In other words, the application needs to be executed thoroughly using enough inputs that all edges in the graph are traversed at least once. These graphs will be represented as diagrams in which blocks are squares, edges are lines, and paths are colored.