Emily's Journal

Week 1 (May 28-May 31)

Orientation
The first day included a Smith and Campus Tour, lunch with the rest of the research group, set up of my eecis account, activation of my lab key, and an introduction to the project(s) I would be working on for the summer.
My First Project
My first project revolves around the creation of a Frame Document Tree, or FDT. The FDT is a hierarchical structure that represents the relationships between frames and documents (or pages) in a Web application. Sreedevi Sampath has already come up with the algorithm, and my job is to implement the algorithm and find any holes. Parsing of the data and the generation of nodes and edges will be handled by Perl, and creation of the data structure and printing of the FDT will be handled by either C++ or Java. A full description of my project can be found at project.html.
The Basics
It would be impossible to embark on a project without the proper tools...
- The paper
  My first item of background reading consisted of the article Sreedevi had drafted detailing the idea behind the algorithm and defining its elements.
- Web Applications
  During orientation, Sreedevi had given me a number of articles concerning the differences between Web sites and Web applications and the importance of testing these applications. In addition, she included another piece on software testing in general. Through these articles I was able to understand the ideas behind the Web application environment, and re-read the paper with greater comprehension.
- JavaScript
  Although I have worked with HTML for years, I have very little experience with JavaScript (by "little experience" I mean taking other people's code a few times and modifying it to fit my needs). Upon Sreedevi's suggestion, I started off my learning with JavaScript, by Don Gosselin. I skimmed the book in general, but paid particular attention to the chapters on windows and frames. However, I needed a more specific resource to fully understand the JavaScript shopping cart application that I planned to use in generating my first FDT.
- Perl
  I have had limited experience with Perl in writing CGIs, which was a couple of years ago. However, I feel that to take on a project of this magnitude I would be better off getting a firm footing in Perl before proceeding. Otherwise I risk implementing the FDT in Perl as if I were trying to do Java. But, as anyone who has programmed Perl knows, programming in Perl as compared to the Java way are completely different!
Understanding the FDT Algorithm
With Sreedevi's and Lori's help I got an understanding of the algorithm and its basic phases. As a result, by Friday afternoon I was able to step through the algorithm and produce an FDT of my very own (with only a couple of extraneous nodes). At this point I felt I still wasn't clear enough on the algorithm to implement it yet, and planned to analyze the algorithm in depth the following Monday.

Week 2 (June 3-June 7)

Stepping through the Algorithm
I encountered some issues while studying the FDT algorithm. For example, the algorithm doesn't handle URLs to external sites or URLs without targets. Further, the algorithm as yet doesn't handle JavaScript, just HTML. Upon Lori's suggestion, I created a detailed open issues file, and have been appending to it anytime I encounter something the algorithm doesn't handle, or will require special care in implementation. For example, JavaScript has a number of complex ways to reference frames and change the current document. I searched on the internet and found a number of scripts that used examples of JavaScript code that may not be resolvable under the current system, if ever, or will require further study (possibly beyond the scope of my summer here) before implementing these features fully. In addition, I began altering the FDT algorithm with my own suggestions initialed in the margins. As a consequence, I was able to create my very own LaTex file for the first time.
Tim Toady
In my reading through the Camel (O'Reilly's definitive Programming Perl guide), the most important message I've gleaned is TMTOWTDI, pronounced tim toady. Meaning that There's More Than One Way To Do It. Aside from that, I have gone through most of the basic material on operators, statements, declarations, and scope. Next week I'll start with pattern matching.
JavaScript, HTML's more evil friend
What nobody tells you about JavaScript that can really throw a wrench in things! I went to Netscape's web site for the rest of the gory details on JavaScript so I could begin to appreciate the intricacies that make my implementation of the FDT so difficult. At this point I plan on implementing the FDT based on HTML, without a thought to JavaScript. Later I can begin to integrate a JavaScript module that will at least be capable of handling the basic JavaScript issues.
Trees and Leaves
I found a number of helpful frame examples on the internet that I will use to help my understanding of LeafLists. In addition, I generated a complex sample web site to test the robustness of the FDT's algorithm, and I encountered a few issues. Foremost was the introduction of the tree's "elbow" as a result of nested framesets. I plan on fully exploring the elbow with more contrived frame examples next week.
To Perl or not to Perl...
My project outline suggests that I get the tokens from the documents and generate the nodes and edges using Perl and then use a language like Java to build a data structure and print out the data. However, I was having trouble discerning the fuzzy line between where use of Perl should stop and Java begins. As a result of the difficulties involved in finding the nodes and edges without actually making a semblance of a tree, I have decided to complete the project in one language. Which language, I'm not sure yet. Perl is an elegant language filled with little programming nuances and idioms that I've never encountered before. Although I have programmed some simple CGIs in Perl previously, I don't have a lot of experience with the language. I have more experience with Java, however Java can be a bit bulky. With the release of 1.4 and a new package that handles regular expressions, Java and Perl are both feasible languages in which to implement the FDT. The difference being that Java may take a few hundred more lines of code to do so. I know what Java has to offer, but I feel like Perl may be the best suited language for the job. Either way, I will continue to analyze the algorithm and keep reading up on Perl until I get an idea of the best way to implement the FDT.

Week 3 (June 10-June 14)

The FrameSet Frontier
In studying the FDT algorithm and attempting to follow it for other examples, I realized that nested framesets are a problem. Should framesets nested within the same page be flattened to the same level, or should the parent frame point to an "elbow" that splits into the nested framesets as the algorithm suggests? A consequence of the elbow is the creation of an additional frame and document node that do not actually exist (there is no HTML document that the frame/document nodes represent). Further, resolution of the elbow issue is critical, so that the frame document tree parallels the internal frame tree that JavaScript uses within the browser. Without an accurate FDT, we will not be able to resolve JavaScript frame references such as in the following syntax:

top.frames[0].frames[2].location = "HelloWorld.html"

As a result, I explored how the JavaScript frame numbering scheme changed with nested frames within the same document versus frames that are nested through several documents. I generated examples of my results so others working on the project will be able to understand that JavaScript treats these scenarios very differently.
A Question of Trunks
While going through my contrived frame example, I found that it can be difficult to determine the current loading frame of a document. This is an issue when URLs don't contain target frames. In such cases the URL is loaded into the current frame, however, the disconnected sub trees of a budding FDT may not yet contain the information necessary to discover the current frame that a document is loaded into. The obscurity of the loading frame is due to the fact that the FDT sub trees are headed by document nodes, not frame nodes. Utilizing document nodes as the basic building block for all the sub trees is a necessity, but the issue of finding the current load frame can be remedied. I have added a TargetList data field to the FDT algorithm to suit this purpose. For a given document node d, the TargetList contains a list of pages, each page containing a URL pointing to d without a target frame and the current frame was not known during phase 2 and the subsequent grafting of sub trees using the LeafLists. The lack of a load edge here or there does not greatly affect the FDT, and the resolution of the target lists is suitable and necessary immediately following phase 2. For a given element of the TargetList, the current frame node is resolved. A load edge is then drawn from the current frame node to d (the node whose TargetList is currently being evaluated).
Effective Affectability
Sreedevi's paper states that "each frame node is distinct for a given FDT". In terms of implementation, this is correct. However, in a valid Web application more than one frame node with the same name may exist (see my leaflist example). The two events frames in my example are never within the same window, and therefore do not cause trouble for pages who link to them. To formalize this idea, I came up with the simple Affectable Frames Algorithm, which I integrated into the FDT algorithm as a function. To summarize the algorithm:
1. For a given document node, traverse the tree from path to root, marking each node along the way.
2. For each document on the path to root, explore the sub frames, marking each frame node. Do not explore any document sub trees of documents who are children of a frame node on the path to root.
3. Return the list of marked frame nodes; these are the affectable frames for a given document node.
As a consequence, the algorithm also determines frames that could be concurrently displayed in a given browser window. This fact makes it possible for a web application testing suite to also test the validity of target frames within document anchor tags.
Jumping through the Algorithm
I have made a number of changes to the FDT algorithm. In summary:
- Elbow Problem Solved
  To correct the elbow issue (framesets that are nested within the same document) only a small portion of the algorithm had to be removed: the portion of the create_subgraph_frameset function that dealt with framesets.
- TargetList Resolved
  I modified phase 2 of the algorithm to include the creation of a TargetList, and the subsequent resolution of this list in phase 3.
- Affectable Frames Installed
  After developing the Affectable Frames function, I integrated it with the FDT algorithm by creating a new portion of phase 3. At this point the algorithm creates affectable frame lists for all documents in question (as decided by a revised section of phase 2) and then uses the lists to resolve links that point to different frames with the same name.
In addition, I have previously made the following major changes to phase 2:
- Ability to handle external URLs.
- Ability to handle URLs w/out explicit targets (hence the creation of the TargetList).

Week 4 (June 17-June 21)

The Divine Data Structure
Well, perhaps 'divine' is a little excessive. But after three weeks of tiptoeing around the issue I have finally settled on a language and a data structure with which to implement the FDT. I will implement the algorithm in Perl, using hashes as the building block for all my data structures.
Hashes of Hashes of Hashes: Data structures in Perl
I cooked up some exciting structures especially for the FDT. The data will be stored in the following manner:
- %ND
  All pertinent information on the document nodes is stored in one large hash, indexed by document name. The document name points to an array of parent frames, an array of child frames, an array of arrays of all the paths to root for the node, and a hash of affectable frames whose key is the frame name and default value as 1.
- %NF
  The frame nodes are less involved than the document nodes, and are stored in a hash keyed by frame name and pointing to a scalar parent document and an array of child documents.
- %TargetList
  The target list is stored in a hash keyed by the page that needs to be linked. The page points to scalars of the document whose html contained the link, and that document's parent.
- %dupes
  A list of frames that have the same name is kept. It is stored in a hash indexed by the duplicate frame name, and points to an array of the renamed frame nodes (when duplicates are found, the names of the frames are changed to ParentDocName_DuplicateFrameName).
- %R
  The R list, used in conjunction with the affectable frames function, keeps track of links to pages that are in the dupes list. The list is stored as a hash of arrays of hashes -- a hash keyed by documents containing links that are unresolvable during phase 2, whose value points to an array of links that are in the document, and this array stores these links as hashes with two elements: the reference and the target frame.
- %NE
  When implemented, the external document node information will be stored in a hash similar to %ND.
To avoid duplicate parent and children frames and documents, I plan on changing these arrays to hashes, similar to how the affectable frames hashes were implemented.
Phase 1: Implemented!
Currently phase 1 of the algorithm is working. The program reads in html files, removes comments, and creates the frame nodes accordingly. Aside from phases 2 & 3, there are a number of things that still need to be implemented, including:
1. The program needs to be broken into separate modules, as opposed to one huge driver program as it is now.
2. The document contents of the html documents are currently stored in memory, files should be used to make the program scalable.
3. External nodes are not yet recognized and are treated differently from internal document nodes.
4. The complete path for file names should be used to allow for sub folders within a web site.
5. Duplicates nodes in child and parent arrays should be deleted.

Week 5 (June 24-June 28)

Phase 2: Complete
To start off my fifth week, I completed most of phase 2 by linking all URLs and creating the Target Lists and R Lists as necessary. For the time being I am ignoring the external document nodes and treat them as regular document nodes.
Root of the Problem
The next step was to implement the PTR, or path to root, for each document node. As a result of certain document nodes having more than one parent, the PTR was difficult to implement. Therefore the program has to keep iterating through each parent and create a new list for the subsequent new PTRs.
Take the A Frame
Not quite as fun as the A Train, but close. Once the necessary PTRs have been constructed, I implemented a function that creates an affectable frames list for as many nodes as necessary. Next I programmed the bulk of Phase 3: resolving the target and R lists. Affectable frames lists in theory can be generated from the frame sub trees of children nodes, however, the program currently cycles through each PTR and repeats the same tree traversal numerous times. Time permitting, I hope to make this portion of the program more efficient.
FDT: First Draft of the Tree
Before the end of the week I had a working draft of all phases of the FDT algorithm. For simple sites based on HTML alone, this draft will function properly. However, there are a number of touch-ups and improvements that need to be considered when I return for Week 6:
1. Thoroughly test the last phase and ensure robustness.
2. Delete duplicate entries in parent and child "arrays" (the simple fix requires the arrays be construed as empty hashes with insignificant values).
3. Currently the program is contained in one enormous file. This file needs to be broken into smaller module files and one driver program.
4. The contents of the documents that are analyzed are currently stored in memory; these should be stored in temporary data files to ensure expandability.
5. As of right now the program only uses document names to reference document nodes. To account for more complex sites that store documents in a directory hierarchy as well as for links that use full URLs even for local pages, the full path name should be used.
6. The concept of external nodes still needs to be implemented.
7. Cycles that happen to occur need to be broken and reported.
8. Basic JavaScript commands should be implemented.

Week 6 (July 8-July 12)

Referees and Referents: The Process of Modularization
As one enormous document, my implementation of the FDT algorithm contained a number of very large data structures. When breaking my gargantuan file down into a driver program and several modules, it would have been inefficient and resource expensive to pass all these data structures between the functions that needed to use and sometimes modify them. As a result, I resorted to references. By passing references of my data structures around, I only needed to send small scalar variables of the memory addresses to my data. So when some functions require seven different shared data structures, I don't break the memory bank.
Hashes: Perl's Built-in Data Type for Sets
In Perl, sets are essentially hashes with the elements stored as keys and insignificant values. This is precisely what I needed for my sets of child documents and parent frames. Originally I had these two sets implemented as arrays, unfortunately duplicates would occasionally get added and arrays do not automatically check for repeats. However, after implementing the affectable frames list I found it was simpler to use a hash, rather than manually checking the array for duplicate entries. This change created difficulties in how some phases were implemented, specifically Phase 2 and the PTR creation. I was forced to essentially rewrite both functions. As a result, I made both sub routines much more precise and hopefully easier to understand. In addition, the fact that these functions had not behaved as I had hoped indicated that there may have been minor logical flaws that were subsequently fixed as a result of rewriting the methods.
An Equation for Iteration
When implementing the Target List, I realized that if one of the pages being linked in the target list is a frames page with its own sub tree that needs an R list and resolution with a PTR, I may need to repeat Phase 3 a number of times. However I feel confident that with an example to guide me I will be able to determine an equation for the number of times Phase 3 should be run.
Tying up loose ends
This week I concentrated my efforts on adding functionality and reliability to my program. I was able to implement a few items on my list, including:
1. Thoroughly tested the last phase and ensured robustness.
2. Deleted duplicate entries in parent and child "arrays" by changing the arrays into keys of hashes with insignificant values.
3. Broke the program down into modular pieces and one driver program.
4. The contents of the documents that are analyzed are currently stored in temporary data files to ensure expandability.

Week 7 (July 15-July 19)

Knotting up tied ends
This week I was able to knock a couple items off my to do list (and add a couple too). My accomplishments this week include:
1. Constructed a basic README for the tool that included information about the modules and how to run the program.
2. Full path names can be used if necessary.
3. External nodes implemented.
4. Found two limitations thus far:
  - The tool can only analyze one directory at a time.
  - No frame may be named "top".
Iterative minds loop alike
My hypothesis for the number of iterations required for Phase 3:
If there is a document on the R list that is also a document with a link in the target list, there needs to be one more iteration. If there are more than one but they reside in the same level of the FDT there shouldn't need to be more iterations. However, to be expedient I am having the program iterate through Phase 3 an additional time for each document on the R list that is also a document with a link in the Target List.
Further Machinations with the PTR (Path to Root)
The bulk of my time this week was devoted to testing and improving Phase 3, specifically the find_ptr function. Most importantly, I added cycle detection and removal to the function. While adding nodes to the PTR, the program checks to make sure there are no duplicate document nodes. If there are, the cycle is indicated to the user and the offending edge that caused the cycle is removed.
On the docket for next week
1. Test the cycle detection scheme with a bigger example (a cycle that is more than just two nodes long).
2. Revise the article with the changes I made to the algorithm.
3. Implement basic JavaScript.
4. Test the system on some real sites.
5. Finish the README file for the tool.

Week 8 (July 22-July 26)

Cycles Turn up the Heat in a Battle for Accurate Representation -- Researcher Baffled
Why I was banging my head against the wall last week. When I constructed the LeafList2 Example, I was inadvertently forced to consider frames prematurely (I had hoped to leave cycle detection until last). In so doing, I added cycle detection and deletion to the algorithm, but my test case was based on the simplest cycle possible: DocumentOne.html-FrameOne-DocumentOne.html. Next I endeavored to create a more complex cycle, with the original title of Cycle Example. Unfortunately, cycle detection turned out to be exceedingly more difficult than my previous implementation. Working with the problem all week, I was unable to find a solution. Finally, I had an epiphany. The cycle I had created in the example involved index.html, a child document of the top frame. The cycle was created when index.html gained an additional parent frame, called control_frame. The find_ptr function of my program was based on the principal that children of the top node have only one parent frame: top. The combination of my function and the example resulted in an unsolvable situation that exhausted me after a week of trial and error. Finally I realized that my example was the problem, in concert with my programming scheme.
Figuratively Tweaking
After modifying the algorithm, the first FDT figure in the paper needed to be modified, as it no longer represented a scenario that would ever occur (see the elbow problem). In addition, the HTML code declaring the frames and framesets was incorrect, and did not create the situation represented by the graph. I redesigned the example to illustrate as complex an example as possible, while hopefully maintaining an understandable structure. After changing the code I learned how to use the xfig tool and created an accompanying figure.
JavaScript: Best Left for Part II?
After reading more of the paper, I feel JavaScript handling is best left for the second part. According to the article, the FDT is based on HTML framesets and is used to resolve frame calls with JavaScript. The FDT created by my program accomplishes this. To the best of my knowledge, JavaScript and other web technologies cannot create frames, they can only link them; therefore not changing the structure of the FDT. Adding some extra documents and links to the FDT does not change how Part II would resolve JavaScript links, however, adverse situations could produce the following implications:
- Any documents containing frames that are linked in with JavaScript would affect the structure of the FDT and compromise JavaScript link resolution.
- It is possible for a document being linked to create a cycle. Without the find_ptr function being run on the set of document nodes, the cycle would not be detected and could potentially ruin the program results or loop infinitely, both of which would be most unfavorable.
Furthermore, adding JavaScript links to the FDT would require all script elements to be analyzed by a scaled-down version of JavaScript's parser (Rhino is such a parser and is available for download on Mozilla's web site). Already a requisite of Part II, this time-consuming process would be a waste of man hours and execution time to perform twice within this system.
Accomplishments
1. Re-wrote relevant portions of the article.
2. Decision not to load the FDT into a Java data structure for Part II based on the fact that I don't know how Part II will be implemented yet.
3. Updated README with detailed information about the test cases, current limitations of the program, and the issues addressed in the current version.
4. Tried to tighten up the algorithm so it would be more in keeping with other published algorithms, however I found it difficult to omit any details of the algorithm and still maintain its ideas, when the absence of some had led to reviewer criticism.

Week 9 (July 29-August 2)

Project Complete!
Finally, after 5 weeks of programming, I believe my project is complete. I was finally able to finish cycle detection and correction, and successfully complete a number of test cases. For the cycle cases, I tested the program using both major files (files that are loaded into the top frame in their respective web sites, before they were integrated to form the cycle examples). Further testing will include a test run of a real e-commerce frame site and the original shopping cart example.
Cycles Unraveled
Through my research I found that there are three types of cycles, each of which are detected at a different point in my find_ptr function during Phase 3. There are small, or local, cycles that occur when a document is an element of both the parent_doc and child_docs sets for the same frame. There exists a special cycle case where the principal cycle document is a child of the top frame node. Lastly, there are large cycles spanning multiple document-frame node pairs. My classification of these three different types of cycles resulted from my headache the week before over solving my cycle dilemma.
Submission for Reprobation
Although the initial article describing the work I was studying this summer was not accepted for publication, the reviewers provided helpful comments to improve the article's chances for future publication. After completing most of the project, I was able to view these comments. The reviewers' ideas were very reassuring and confirmed some objections I had had when initially approaching the problem. I was encouraged to discover that I had already rectified some of their complaints:
1. Resolving links without explicitly defined targets.
2. Fixing the unnamed document node in Figure 1 that does not in fact exist.
3. Rewriting the incorrect code of the frameset tags in Figure 1.
In addition, I found that one of the reviewers had also been troubled by the idea of name resolution when there exist more than one frame with the same name within the same level, but with different parent documents. However, only one such frame may exist at a given time, since only one of those parent documents will be loaded into the browser at any particular moment in time.
Target Lists and Affectable Frames: Phase 3 Explained
My most significant contributions to Part I can be found in Phase 3. I felt it would be expedient to provide people already familiar with the project a summary of the changes I made to Phase 3 and the ideas that triggered these modifications.
- Target Lists
  While going through my contrived frame example, I discovered that it may be difficult to discover what the current loading frame of a document is. This is an issue when URLs don't contain target frames. In such cases the URL is loaded into the current frame, however, the disconnected sub trees of a budding FDT may not yet contain the information necessary to discover the current frame that a document is loaded into. The obscurity of the loading frame is due to the fact that the FDT sub trees are headed by document nodes, not frame nodes. Having the basic building block of all the sub trees be document nodes is a necessity, but the issue of finding the current load frame can be remedied. I have added a TargetList to the FDT algorithm to suit this purpose. For a given document node d, the TargetList contains a list of pages, each page containing a URL pointing to d without a target frame but where the current frame was not known during phase 2 and the subsequent grafting of the sub trees using the LeafLists. The lack of a load edge here or there does not greatly affect the FDT, and the resolution of the target lists is suitable and necessary immediately following phase 2. For a given element of the TargetList, the current frame node of the element in question is resolved. A load edge is then drawn from the current frame node to d (the node whose TargetList is currently being evaluated).
- Affectable Frames
  Affectable frames are used to resolve elements on the dupes list (frames that are distinct but have the same name). The algorithm works as follows:
  1. For a given document node, traverse the tree from path to root, marking each node along the way.
  2. For each document on the path to root, explore the sub frames, marking each frame node. Do not explore any document sub trees of documents who are children of a frame node on the path to root.
  3. Return the list of marked frame nodes; these are the affectable frames for a given document node.
- PTRs
  A consequence of finding the affectable frames is finding the Path to Root (PTR) for each document node. The find_ptr function also detects and deletes cycles, therefore it is run on all document nodes, not just the R list (the list of unresolvable links whose targets point to a frame page that is a duplicate). However, the affectable_frames function is only run on the R list.
Part II: Trials and Tribulations
The details of Part II of the paper were a rough sketch of a model that included HTML and JavaScript. However, most web applications incorporate technologies to access back-end components, such as databases, and even intermediate layers of business logic units. Some web technologies currently in use to achieve such large-scale systems include: CGI Scripts, JSP, ASP, PHP, and XML. To make our model more accurate and applicable, I feel we should include at least JSP into the model. As the research progresses, perhaps we will be able to incorporate additional technologies.
The difficulty in modeling web applications stems from their dynamic nature. Web applications typically pull from a variety of sources to create their dynamic content, and the path the data takes between a client browser request and the returned page may be long and complex. In many cases there is no index.html file to analyze, but a collection of scripts, together with the server and other back-end components, that creates on the fly the page a client browser sees.
Part of the difficulty in modeling a new technology is finding sample applications to work with. Perhaps realizing this, Sun has created two stand alone example applications that can be deployed on a single machine. These samples make it possible to begin more detailed research of Part II with client, server, and database components all running from one local computer.

Week 10 (August 5-August 9)

Infinitely Complex
Following the article's model, I attempted to calculate the complexity of my new algorithm. Unfortunately, the FDT algorithm has the potential to take infinite time because it is impossible to predict the number of pages, frames, or links for a given web site. In this instance, algorithm complexity seems inappropriate.
Finishing Targets
With further testing, I realized that it is possible to have an external link with no target frame, thereby necessitating the addition of target lists for external nodes as well as internal. The target list now includes both internal and external links.
FDT-Based Grocery Storing
After testing the FDT on a real site (netgrocer.com) I found the following limitations:
- Does not handle image map links.
- Does not screen for just *.html or *.htm files in a directory, but attempts to interpret all files.
- As a result of issues with the real site, the program ignores any href links containing "javascript:" or "mailto:".
For the latest version of netgrocer.com, not the version I used to test the FDT, please see netgrocer.com's official web site.
Snuggling with the Algorithm
I put the finishing touches on the FDT-related portion of the paper. Mostly I corrected any typos, clarified ambiguous sentences, and ensured my new algorithm fit comfortably, if not a little snugly, into a single column.
Future Work
In the fall semester of 2002 I will continue to work on this research project. As the research focus shifts from frame and JavaScript-based pages to large-scale modular Web applications, I will focus on understanding the Java Enterprise Edition sample Pet Store Application. In understanding Sun's sample application, I will be looking to see if we will be able to model such an application and if such a model can be automatically generated. The multi-layered architecture and number of technologies involved in sites of this type (Java, JSP, XML, HTML, servlets, Enterprise Java Beans (EJB), databases, etc.) make this project very challenging.
In addition, I will be presenting my summer's work in poster format at the SIGSOFT Foundations of Software Engineering (FSE-10) Student Research Symposium in Charleston, SC in November.