In the previous entry, I covered the details and motivation behind using Property Simulation for the CDT’s static analysis. Now we will move onto the implementation details. The first thing we needed was a control flow graph. Luckily the Parallel Language Development Tools sub-project of the Parallel Tools Platform already provides a control flow graph implementation for the CDT. I modified this to encapsulate control flow edges since they are crucial to the Property Simulation algorithm. Thus the modified control flow graph properly contains both vertices (blocks) and directed edges. The version of the Property Simulation algorithm I implemented is the simple intra-procedural case. This means the control flow of each function in the C/C++ project is processed separately.

The algorithm traverses the control flow graph in a breadth-first manner, populating state information in a dictionary that maps control flow edges to sets of Symbolic States (each is a set of Property States and an Execution State). Each block in the control flow graph represents a node (either an expression or statement) in the AST. To model Execution States I have created a structure that operates on Boolean formulas. The trick is to then break if conditionals into atoms that are either true or false. I chose `IVariable`

objects as the basis for these atoms. These objects are obtainable from the AST and have the advantage of maintaining a binding to their declarations. Therefore all references to a variable have the same `IVariable`

instance associated with them. So an Execution State is conjunction of `IVariable`

atoms and negations of atoms.

The performance benefit of Property Simulation comes from the way it groups Symbolic States and discards branches that cannot occur. The grouping function partitions all Symbolic States by their Property State, then the Execution States for all Symbolic States with a given Property State are joined. For example, suppose at a merge point we have two Symbolic States `[opened, xy]`

and `[opened, x!y]`

. Since they have the same Property State, we will group them into one Symbolic State. The Execution State is then a disjunction of `xy`

and `x!y`

. However, Execution States are conjunctions of variables and as such, disjunctions must be distinct Execution States. The goal is to find a minimal set of terms that imply the original Execution States. For our example, `xy + x!y`

simplifies to just `x`

since `x`

implies `xy + x!y`

. This leaves us with one Symbolic State `[opened, x]`

. Every Symbolic State must be processed for each block in the control flow graph. Thus the fewer Symbolic States we have, the faster the algorithm performs. To minimize a set of Execution States we use the Quine-McCluskey algorithm for Boolean minimization, which works much like I just demonstrated. We further simplify this by substituting in any truth assignments we have learned from assignment statements.

Property States are the extender’s customizable component. A checker tracks some temporal property that is encoded as a Finite State Machine of Property States. Property States are implemented as an abstract class where the subclass handles decisions about state transitions by examining a statement or expression. A Finite State Machine of Property States is defined via an interface with methods to get all Property States in the FSM, along with the special initial and error states. To define your own checker, use the org.eclipse.cdt.codan.core.checkers Extension Point and have your checker class extend `AbstractPropSimChecker`

. Provide your checker with the finite state machine encoding of your temporal property and the framework handles the rest.

We report errors via a callback by the algorithm with the AST node that triggered the problem, and the Execution State that causes the error condition to occur. The Execution State’s `toString`

method displays the Boolean form in a formatted String. Here is an example checking for cases where `fclose`

is called on an unopened file:

The test program is a modified excerpt from the md5sum program. Notice in one branch the file is not opened if

`binary`

is false.
Of course, this implementation is not complete. The analysis needs to be extended to consider the global control flow of the program, rather than just individual functions. We can do this by generating summaries for each function, which contain mappings from input Symbolic States to where the function takes them upon its return. We consult the summaries at function call sites to simulate executing the function.

Unstructured code is another hurdle to consider. `break/continue`

, `goto`

, `switch/case`

statements and multiple `return`

statements all cause problems in analyzing the control flow. We have an assumption that a branch block has two children and a merge block has two parents, which the mentioned constructs break. Conditional and assignment parsing also need to be improved. Currently, we can only recognize conditionals that take the form: `(`

, *var* == *literal*)`(`

, *var* != *literal*)`(`

or *var*)`(!`

. Similarly, pointer operations (dereferencing, arithmetic) cause parsing issues. Lastly, we need to be able to analyze more than one file handle at a time. This requires an implementation of a value flow graph to track file handles.*var*)

I have hosted my code in a Git repository on Fedora People. My Property Simulation implementation along with the open/close example checker are in the org.eclipse.cdt.codan.extension plug-in. My modified PLDT plug-ins are also in the repository. The stock static analysis framework is also included.

To check out the code:

git clone git://fedorapeople.org/~ebaron/codan.git

…or browse the Gitweb interface.

I have submitted a talk for EclipseCon that covers Property Simulation and the CDT. If you are interested, please comment. Thank you.