4.0 KiB
Program Dependence Graphs(PDG) for Change Impact Analysis(CIA)
When working on large software systems, even a tiny code change can have unexpected consequences. A single modified line may influence other parts of the program through hidden control or data dependencies. Understanding what might break after a change is one of the hardest parts of software maintenance and that’s exactly the problem our project set out to solve.
For our CPSC 499 Software Analysis project, our team built a PDG based change impact analysis tool for Java 1.4 programs. Given a line number in the source code, the tool automatically determines which other lines may be affected by a change to that line. Instead of relying on intuition or manual inspection, we let static analysis do the heavy lifting.
The core idea behind our approach is simple: if one statement controls whether another statement executes, or if it defines a variable that another statement uses, then the second statement depends on the first. We capture these relationships using a Program Dependence Graph that combines control dependencies and data dependencies. Once the PDG is built, change impact analysis becomes a graph reachability problem starting from the changed line, we follow dependency edges to find everything that could be affected.
We implemented our tool using JavaParser configured for Java 1.4 to generate Abstract Syntax Trees (AST). Each statement was mapped back to its original source line so that results could be reported in a form developers actually care about: line numbers. On top of the AST, we constructed a CFG, then layered control and data dependency analysis to form the PDG.
Control dependency analysis determines which statements depend on branching decisions, such as conditions in if statements or loops. Because loops introduce backward control flow, we used a fixed point iteration approach, repeatedly propagating dependencies until the results stabilized. Data dependency analysis was implemented using classic reaching definitions analysis with GEN and KILL sets, allowing us to track how variable definitions flow through the program even across branches and loops.
Once the PDG was complete, impact analysis was straightforward. Given a changed line, we traversed the graph to find all reachable nodes. These nodes represent the estimated set of impacted lines. The quality of this result depends entirely on the accuracy of the dependency graph, so much of our effort went into handling edge cases like compound assignments, variable declarations with initialization, and loop-carried dependencies.
To evaluate the tool, we compared its output against manually analyzed ground truth across a variety of Java programs. The results were encouraging: 95% average precision, 91% average recall, and 85% perfect accuracy. The tool performed especially well in data-flow-heavy scenarios, capturing transitive dependencies and loop effects with very few false positives. Most errors occurred in complex control-flow situations, which is consistent with the known limits of static analysis.
Beyond the metrics, the project taught us valuable lessons. Fixed-point algorithms are powerful but subtle, especially in the presence of loops. Mapping between ASTs, CFGs, and source code is far more difficult than it initially appears. And perhaps most importantly, static analysis is always an approximation—being explicit about its trade-offs is just as important as implementing the analysis itself.
While our tool is limited to Java 1.4 and intraprocedural analysis, the results demonstrate that Program Dependence Graphs are a strong foundation for automated change impact analysis. With further work, this approach could scale to larger codebases, support modern language features, and incorporate inter-procedural dependencies bringing developers one step closer to confidently answering the question: “If I change this line, what else might break?”