Home | DMP | Project | Journal | Contact Relevant Java Classes | Delete Call Site | Template Call Graph | Research | Final Report * Code best viewed with Internet Explorer. *
OVERVIEW The weeks I have spent as a DMP participant are among the most challenging and rewarding of my life. The experience was invaluable in many ways. Even if I had worked at Bucknell over the summer, I would never have had the opportunity to be involved with meaningful research that will impact the future of computer science, to sample graduate life, and to form relationships with other women who share my interests and want to see me succeed. I learned a great deal about programming, people, research, and myself. Never having even considered graduate school before, I am now almost certain that I want to pursue a master's degree in computer science. And I feel that I have increased in both patience and skill in addition to overcoming the strangeness of a completely new atmosphere. The research environment is very different from my undergraduate computer science courses. In both language design and programming classes, we were given very specific problems to solve - problems with at least one "correct" answer - and were provided with guidance and direction every step of the way. This summer, however, my goals were much more abstract; I knew what I had to do, but no one told me how to do it. Professor Souter didn't have all the answers, and neither did anyone else. For the first time, my problem solving skills were truly tested. As difficult as it was to initially adjust to the lack of structure, I definitely benefited from the experience. I am more comfortable now with asking questions other than, "Where's the error in my program?" and with approaching a scenario from different angles. I have also learned to adapt existing code and algorithms to my purpose and to know when some segment of code - no matter how elegant or effective - must simply be scrapped. In the real world, computer scientists face more complicated and varied problems than those found an introductory college course; the most important skill a student can develop is this sort of critical thinking, accompanied by patience, perseverance, and creativity. While formulas may fail and compilers may crash, these traits will remain with me throughout my career, applying not only to a few specific situations but to every challenge I will face.
The overall goal of my research was to implement and conduct tests to experimentally evaluate the effectiveness of Amie's delete_call_site algorithm, an incremental call graph reanalysis algorithm for object-oriented languages. Even at the beginning of the summer, I was warned that I might not be able to complete the many tasks set before me. I was overwhelmed at first, having only a few programming and language design courses and several months of industrial Visual Basic application creation under my belt. The list of tasks that greeted me upon arrival was as follows:
The project was a daunting one, but I threw myself into research and development. Many discoveries, frustrations, and lessons lay ahead...
In terms of concrete results, I did not accomplish nearly as much as I had hoped. The lasting problem of computing accurate add types - the set of types that, after the deletion of a call site, reach beyond that call site - caused my work to grind to a near standstill for weeks. No solution presented itself, despite hours of additional research and numerous trial runs. I have to admit that at times I was extremely frustrated and rather discouraged. Luckily, Amie was very supportive and encouraged me to focus on what I had accomplished and to continue to approach the problem from new angles. What have I done this summer? And, more importantly, what have I learned? I began by learning Java, with the help of Thinking in Java (2nd Edition), by Bruce Eckel, and the Sun Java Tutorial. I had never seen the language before, but it was not difficult, especially since I have a background in C++. I was lucky; object-oriented programming is my favorite paradigm, as it most closely parallels the way I think. Java seems to be closer to the object-oriented ideal than C++ and is more consistent in syntax. Other advantages of Java are its portability and automatic memory management and garbage collection. (For more information, see Perry Smith Enterprises: C++ vs. Java.) Overall, I was very satisfied with my new language, but I had many encounters with one of the pitfalls of using Java - its slow execution speed. Especially with a large, complicated program like the template call graph implementation, the tradeoffs between speed and ease of expression are evident. The next step was to download and explore the FLEX compiler. According to the FLEX website, the original aim of the project was to translate "Java bytecode files into a class-oriented intermediate representation which is intended to be easier to analyze and manipulate than bytecode assembly language. The intermediate representation is control-flow-graph structured, with all control flow explicit." Since its conception, FLEX has grown to include several different intermediate representations (IR). The most relevant to my work is the Quad IR, a high-level, typed, quadruple-based IR. Amie and I were not using FLEX as a compiler; instead, we took advantage of its rich array of classes and methods in writing the Delete_Call_Site implementation. FLEX provided basic implementations of both the traditional program call graph and the template call graph, which is known as a meta call graph. Basing my work on PAMain, a class designed for top-level program analysis and testing, I produced an executable that constructed a meta call graph and output various statistics - depending on user-entered command line options - to a file. In the following weeks, I added calls to my evolving delete_call_site method and further analysis statistics to this main background program. It took some time to understand and adapt the many aspects of PAMain; documentation was scarce, and I was an inexperienced programmer who had never before seen such complex call chains. I learned most effectively by rewriting the relevant portions of the class in my own coding style, a time consuming but helpful process that I used frequently in developing my data structures and methods. Working with someone else's code was one of the primary obstacles - in terms of both syntax and semantics - I faced. During my two years at Bucknell, I had developed a personal coding style, and the organization of the FLEX classes, by contrast, was rather hard to follow. The compiler had several authors, and, although they followed standard naming conventions, each had his or her own way of approaching problems. The quality of documentation varied significantly from class to class, and information was often passed through instance variables that changed throughout execution rather than through parameters. Local variables and instance variables tended to have the same names, which spawned a great deal of confusion at first glance. Luckily, clearing this hurdle required only time and patience; I adapted my coding style to Java and studied the meaning of each method as I reconstructed it. To ensure that my main program produced valid information, I compared its output to that of the original PAMain. Finding no discrepancies, I moved on to the construction of the template call graph. FLEX provided a MetaCallGraph interface, an abstract class, and my focus - a full power implementation, MetaCallGraphImpl. As with PAMain, I began by searching for documentation, tracing the execution, and rewriting each method. MetaCallGraphImpl was a complex class, with several inherited accessor methods and a constructor with more than seven levels of nested calls. (For extended information on the operation of the constructor, see my page on the implementation.) While gaining an understanding of the code, I came face-to-face with many auxiliary classes that were indispensable to my project. Among them were classes representing calls sites, general statements within a method, reaching definitions, and class types. After discovering that MetaCallGraphImpl lacked the necessary methods to simplify the implementation of the delete_call_site algorithm, I decided to write my own class, similar to MetaCallGraphImpl but with some modified functionality and additional methods. Because of the changes I made, I was unable to simply inherit from MetaCallGraphImpl and instead chose to extend the abstract template call graph class, MetaCallGraphAbstr. This proved to be an effective choice in terms of abstraction and readability. The most important of the MetaCallGraphImpl methods that I overrode, added, or significantly altered are as follows:
Throughout my first few weeks - filled with exploring Java and FLEX and ensuring that the appropriate data structures were in place prior to the invocation of delete_call_site - I was also studying Amie's short paper ("Incremental Call Graph Reanalysis for Object-Oriented Software Maintenance") and dissertation ("Context-Driven Testing of Object-Oriented Systems") on the template call graph, incremental reanalysis, and the delete_call_site algorithm. I came to understand how the Cartesian product algorithm (CPA) creates various templates - or meta-methods - for each method in the target program. A template is created for each element of the Cartesian product of the concrete type sets of the receiver and arguments, yielding a monomorphic, exact type for each input to the specified method. Call graph construction using templates and CPA eliminates many unrealizable call chains and increases precision. One reason is that templates are constructed only for those type combinations (for receiver and arguments) that actually occur in call sites in the target program. In addition, the type information provided by templates is used to curb undesired type merging and to avoid redundant analysis through the sharing of cases. A better and more complete explanation of templates, CPA, and their current and potential uses in incremental analysis is available at my research subsite; be sure to read through these pages. delete_call_site responds to the fact that call graphs for object-oriented languages can be affected by any program edit that directly changes the calling relationships between methods or impacts type information reaching call sites. Deleting (or inserting) a call site in the target program is one edit that may directly alter the calling relationships. Because the change to the code is so small, complete reanalysis of the program and reconstruction of the call graph would be a waste of time and system resources. Amie's algorithm aims to avoid unnecessary analysis, performing only an amount of work proportional to the impact of the deletion of a call site while maintaining soundness and precision in the modified call graph. I wrote a quick walkthrough of the algorithm and discussed it with Amie to ensure that, minus the implementation details, I understood what the algorithm did. I determined the necessary Java data structures and developed a framework for the implementation. The final version as of the end of July is included at the end of this document (with the aspects I have not yet successfully implemented shown as comments). Both delete_call_site and propagate_update are non-static methods of my meta call graph implementation class. The bulk of my outside research - as opposed to coding - occurred somewhat late in the game, but it was rewarding nonetheless. I began by seeking clarification on the relationship between dependency detection and type inference and ended in reading and reporting on Ole Agesen's PhD thesis, Concrete Type Inference: Delivering Object-Oriented Applications. It would take many words to elaborate on my findings and their relevance to the current project, but it will suffice to say that for the first time, I truly understood the workings of the template call graph constructor; I learned things I hadn't even realized I didn't know. I was able to see computations in the light of CPA and better understand the motivation behind the call sequences and organization of analysis passes. My research summary should be helpful to anyone working on this project in the future.
The testing program analyzes each call site in the target program, calling the currently implemented portions of delete_call_site. Progress is stuck on the computation of add_types_after in delete_call_site. I have tried various approaches, including taking the union of the definitions that reach each predecessor of the deleted call site. The problem may not be in my approach but rather in the data structures resulting from meta call graph construction. When I began, the only entities I had to work with were the split relation (template repository), a map from each meta-method to its callees, and a Map<MetaMethod, Map<CALL, MetaMethod[]>> that traces the callees of a meta-method at each of its call sites. Type information was not maintained after exit from the constructor, which caused many problems. I tried keeping track of the MCGExactTemps for each method, but types were erased between analysis of one specialized meta-method and the next; that road was a dead end. If I had more time, I would attempt to add the set of exact temps, along with their type sets, in each statement (Quad or HCodeElement, that is) to the information stored for a meta-method. To reduce the amount of data stored, we could maintain only the exact temps for the retval temp of each call site, as found in the HCodeElements providing reaching definitions for the call site's predecessor. There are many possibilities, and I may simply have missed something. Storing the aforementioned relationships would probably take place in analyze_meta_method, between compute_types and analyze_calls or after analyze_calls. Types reaching the return of the method input to the algorithm can be similarly stored, using return extraction, reaching definitions, and exact temps. If add_types_after and the worklist containing all call sites in T influenced by the deleted reaching types at CS (in propagate_update) can be computed accurately, there should be no major obstacles to a functional implementation. Once the algorithm is successfully implemented, tests must be run to compare its accuracy and speed with the results of exhaustive reanalysis. It will be simple to add statistical output at any point in MetaCallGraphImpl or DelCallSite if necessary. If delete_call_site proves to be feasible and presents a clear advantage over exhaustive reconstruction for the same program edit, research can move to developing an algorithm and implementation for updating the program call graph upon insertion of a call site. Edges from each meta-method associated with the modified method to its new callees must be added to the template call graph. Callee templates can be generated by taking the Cartesian product of the sets of reaching types for each argument of the new call site; if a template already exists, it can be reused. The return type of the added call site must be analyzed to determine how it affects subsequent call sites. An algorithm for the computation of added and deleted reaching types must be developed, and any differences in propagation must be investigated. Following implementation of the updates following both deletion and insertion of a call site, other edits that can directly change method calling relationships can be handled as subsets or combinations of these cases. Replacing a call site is simply a deletion followed by an insertion, and changes to actual parameters or receivers or method names can be handled in the context of call site replacement. Another direction for future research regarding the incremental approach is extending the algorithm to handle inconsistent call graphs like those found during change propagation. The current work assumes that the change has been fully completed - that the target program is consistent and can be compiled without errors both before and after, say, the deletion of a call site. During change propagation, however, one class is updated at a time; the program may thus become inconsistent and not compile until the change has been propagated through all involved classes. The call graph will also be inconsistent during this intermediate stage, but the programmer may want to review it, with inconsistent dependencies indicated by some sort of flag. Such an extension would be very helpful to programmers making changes to large applications, especially when each class is compiled separately.
The DMP experience itself, outside the opportunity to improve my computing knowledge and abilities, was unparalleled. It was wonderful to be part of a lab community in which men and women worked side by side. Each person added unique background and expertise to the mix, and no one conformed to the "computer geek" stereotype. Amie and Lori were superb mentors, always ready to answer questions, help Emily and me to find additional resources, and take us out for lunch or dinner. Our conversations didn't always focus on work! Guys and girls alike enjoyed meals at Peace a Pizza and Dairy Queen. All in all, our lab group embarked upon many memorable adventures. From observing the graduate students and professors, I am certain that computer science will remain a dynamic and interesting field for years to come. I would like to thank the CRA-W for sponsoring the DMP and providing young women with such exciting opportunities. I hope the program will only continue to grow. I have learned a new programming language (or two), discovered more about compiler infrastructure than I ever thought I would want to know, and made significant progress on implementing an algorithm for complicated data structures. I have become more patient and am anticipating future computer science internships and projects, eager to look at graduate schools during my next two years at Bucknell. Special thanks to Amie, Emily, and Lori for making this summer the best of my life.
©2002, Katie Heise |