Uninformed: Informative Information for the Uninformed

Vol 1» 2005.May

A Different Approach to Loop Detection

The reader has seen how to detect dominators within a CFG and how to use that as a component to find natural loops. The previous chapter described why natural loop detection was flawed when trying to detect irreducible loops. For binary auditing, the tool will need to be able to pick up all loops and then let the user deduce whether or not the loops are interesting. This chapter will introduce the loop algorithm used in the IDA plug-in to detect loops.

To come up with an algorithm that was robust enough to detect both loops in the irreducible and reducible loop categories, the author decided to modify the previous definition of a natural loop. The new definition reads "a loop can have multiple entry points and at least one link that creates a cycle." This definition avoids the use of dominators to detect loops in the CFG.

The way this alternative algorithm works is by first making a call to the is_reference_to(ea_t to, ea_t ref) function. The function is_reference_to will determine if there is a reference from the ea_t specified by ref to the parameter to. This check within the loop detection algorithm determines if there is a backedge or link that would complete a loop. The reason this check is done first is for speed. If there is no reference that would complete a loop then there is no reason to call is_path_to, thus preventing unnecessary calculations. However, if there is a link or backedge, a call to the overloaded function is_path_to(ea_t from, ea_t to) is used to determine if the nodes that are being examined can even reach each other. The is_path_to function simulates all possible code execution conditions by following all possible edges to determine if the flow of execution could ever reach parameter to when starting at parameter from. The function is_path_to(ea_t from, ea_t to) returns one (true) if there is indeed a path going from from to to. With both of these functions returning one, it can be deduced that these nodes are involved in the loop.