The goal of my project was to research, design and implement an algorithm for differencing two programs. The main reason for this is that the results could be very useful for determining the effects of changes to a program (impact analysis).
Initially, I read a lot of papers in the field. Then after some discussions with Mary Jean and Alex, I started focusing on the algorithm described in . The definition of cluster in the paper left out some cases I thought should be a cluster. After reading some more papers, I found that the definition of cluster I thought was appropriate because it didn't exclude cases. The definition was not for a cluster, but a hammock , and stated that a hammock was a group of nodes with all incoming edges going to one node in the hammock and all the outgoing edges going to one node not in the hammock. However, I didn't want every hammock, only ones that had at least two nodes and were minimal for a given decision node in the hammock.
After researching the problem, I started working on the design of the algorithm. I modified the general hammock algorithm in  to get an algorithm to find the minimal hammock for any decision node in a single entry, single exit graph. Then I wrote up the differencing algorithm from  in pseudo code and made sure it made sense because in  the description of some important parts of the algorithm were very short and vague or completely omitted. Next, we discussed what the structure of the graph fed to the hammock algorithm should be. Since the hammock algorithm needs a single entry single exit graph, several modifications to the CFG must be made. Mainly, anywhere there are method calls, the call site must be expanded. This means an entry node for every possible method that could be called must be inlined and any uncaught exceptions must be propagated. We called this new graph the DIG.
Once the algorithms were designed, I worked on implementation. To implement, I needed to use and work with JABA. I spent some time familiarizing myself with JABA and DOTTY, a graph drawing program. First I implemented the hammock algorithm, this includes a method that creates a dotty file that has each hammock is in a subgraph box. Then I implemented the method level differencing. The visualization I implemented for that is a dotty file with dotted edges representing the matchings between nodes. I also added labels to the matching edges describing the differences, if any. I had only been concentrating on the method level differences up to this point. I hadn't really considered or thought about whole program differencing, but after Alex and I discussed program level differencing, I quickly threw together class and program level differencing. Both still have some work that needs done, for example currently, both only categorize components as added, removed, or perfectly matched. Then when the analysis is run on the perfectly matched components, the component could be changed to modified. Then I very quickly partially implemented the DIG, it still needs exception propagation information.
There are several areas of future work: continued implementation, visualization,
optimization, and impact analysis. The dig needs to have exception
propagation implemented. The program and class level differencing
need better matchings, and difference descriptions. Class level differencing
needs to create the method differences in the order specified by the call
graph. The dotty output might need to be modified because sometimes
the subgraphs don't work right, even though I can't find an error in the
file. For visualization, it would be nice to have some method level
output for tarantella where lines were color coded based on perfect match,
matched mod, added or deleted. It would also be nice to be able to
view everything at an upper level and click to view more internal details.
The dotty visualization output could be improved by having the matching
edges be different colors, perhaps green for perfect match and red for
matched mod, and by having the added and deleted nodes distinguishable,
perhaps outlining added and deleted nodes in red. Some ideas for
optimization are hammocks on demand, a cache for globally qualified class
name look ups, and making dejavoo only search between two nodes (we could
give it the actual nonplace holder hammock header and exit). For
impact analysis, back propagation needs fixed for recursion (other than
that it is done), and it would be nice to have a mapping from a type difference
back to the declaration line number.
 R. Komondoor and S. Horwitz. Semantic-Preserving Procedure Extraction. Symposium on Principles of Programming Languages. (University of Wisconsin-Madison, 2000). 155-169.
 J. Laski and W. Szermer. Identification of Program Modifications and its Applications in Software Maintenance. Proceedings of the Conference on Software Maintenance - 1992. (November 1992). 282-290.