Research Journal

Dong-Hui Xu
DMP Experience 2003

About Me · My Mentor · Project · Final Report · Presentation · Pictures

Research Journal
week1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Week 1:

I arrived at the University of Oregon on Tuesday. Professor Janice Cuny, my mentor, picked me up from the airport, and I stayed at her house before I moved to my dorm the second morning.

On Wednesday I met Jan and Debby, the other DMP student, and we discussed the various projects that Debby and I might work on. One project is to enhance the functionality of VRV-ET, a computer based educational tool that supports networked collaboration in a learning environment. Both Professor Cuny and Megon Chinburg, an undergraduate student at the University of Oregon, have made observations regarding areas of VRV-ET that are in need of improvement, such as controlling access to the data to protect one user's work from being edited or deleted by another user, and developing an easier way to retrieve data, and build an interface that makes the system easier for first time users to learn. Another project is about usability-testing on a model coupling module.

I spent the rest of the week reading every thing available to me, the research papers about the project and previous work (mainly articles related to the Google search engine), and searched on the Internet for relative papers and articles. It was challenging in that we had to learn a lot information and to make many difficult decisions in order to get the project started. We needed to choose our research topic, to decide its feasibility, and to sketch an outline on how to carry it out. Since the whole project would last only ten weeks, we had to get everything done in one or two weeks. We asked advice from Jan and other people who were involved in these projects and evaluated the possible goals We could attain if I worked on these projects for 10 weeks. We also evaluated the knowledge, skills and technique needed to accomplish our goals.

New Friends

My roommate, MiJin, moved into Sheldom a week after I arrived in the University of Oregon. She came here from Korea for a six-month ESL program, and wants to become an English teacher after she return to her home-country.

Both MiJin and I are fond of cooking. One of our frequently visited places was a nearby grocery store, where we got groceries and cooking pans. Every Sunday MiJin would prepare one of her favorite Korean dishes, cooked rice and seafood wrapped in lettuce, and I prepared Miso soup, chicken with vegetables, or fish with vegetables. Sometimes we invite Debby, my partner, and Magi, my next-door neighbor to join us. After dinner, I would make some British style black tea for us and we chat till late night about topics from Asian culture, new released movies to graduate school application.

Me, Mijin (my roommate), Marki (my next door neighbor) and Debby (my partner) at the country fair.

Week 2:

After much reading and decision making, we decided to work on a project that allows geologists to quickly and effectively search for specific content. You may find a brief description of our project here. Once I knew which project we were going to work on, I felt much relieved.

I applied the knowledge I learned from my senior year project class and wrote a draft of my research plan, which included

Goal

Motivation

Background

Design(& techniques)

Implementation

Testing

Documentation

I like the idea of "backward planning". That is, knowing what your goals are and making a detailed plan according to the goals and your time constrains. It has proved to be an effective way for me to get work done in time. Still, some things might seem vague to me, so I may change or redirect my plan as I gain more knowledge and understanding of the project, but at least I got a solid starting point.

We installed software tools we need for our project such as, Jedit, Eclipse and Ant. We also wrote a simple "Hello World" servlet to test Tomcat and learned to work with build files and Ant.

Week 3:

We started this week by installing Lucene and Digester. Both of them are open source software under Apache Jakarta umbrella. We use Digester to parse XML document for it offers simple and high-level interface and uses Lucene to do indexing and searching.

We decided that Debby would work on the searching part of the project, and I would be responsible for parsing XML files to Java objects and building searchable indexes for the searcher. After much reading, we set off to write our first demo that could index all text files in a directory and its subdirectories and searches for the content matches the query words. We worked all weekend, so that we could have a running demo by Monday.

However, We kept getting an error: "container standard contest [/demo] has not been started yet". I checked all the files which might cause such an error, but I could not solve the problem.

Eugene, Oregon

Eugene, the second largest city in Oregon, is a nice place to spend a summer. Eugene is one and a half hours driving distance from the Cascade mountain and one hour from the breathtakingly beautiful Oregon coast. The Willamette River runs through the heart of the city and the McKenzie River joins the Willamette just north of town.

One day Jan took Debby and I visit the Oregon coast. First we visited Devil's churn where, for the first time, I saw a tide pool that had purple sea urchins, giant mussels, hermit crabs and various other sea creatures. Then we stopped at Old Town Florence and enjoyed the famous clam chowder and fried oysters at Mo's restaurant. What I enjoyed the most, however, was walking on the seashore in the setting sun feeling waves gently touching my feet.

As I was enjoying my walking I, surprisingly, discovered a stream flowing quietly down a hill. The stream ended by merging into the sea without any visible break. At that moment, time stood still, I felt as if I were enjoying the best that the world could offer: peace and harmony.

Devil's Elbow, Oregon Coast

Week 4:

Monday we tried to fix the error, and finally found out that for some reason the Tomcat was not able to automatically unjar the files as it suppose to. We modified the build.xml so it would unjar the files every time the program executes. By Tuesday the program was able to index and search. We are so excited and recorded our little accomplishment on the white-board.

After first demo, we had a meeting with Jan and planned our next step. Our plan included:

to connect Indexing and Searching

to parse XML using Digester

to index Java objects created by Digester

to research on file structures(data base structures)

to research more on searching(ranking) techniques

to design interface

Week 5:

For the last couple days, I have been working on a program that parses the XML files to Java objects using Digester. My program compiled well but it failed to run. I found out that there were some problems in the installation of Tomcat and other software. Analyzing configuration files and checking proper class files in Tomcat was an impossible mission for me: I did not enjoy it.

However, Josh, the research assistant, who is very good at dealing with system, was on his two-week vacation. It left me with two options: wait till Josh returns from his vacation, or solve the problem myself. I chose the later. I carefully read the documentation for set up the software, analyzed the configuration files, and checked the application directories and directories under Tomcat to see whether proper class files were created.

After hours of work, I finally caught the error and modified the build file. It is exciting to see that the program finally ran after hours' hard work; it is more exciting to realized that I am capable beyond my expectation, and I have gained confidence in my own ability to solve problem.

The rest involved indexing Java objects using Lucene. I studied Lucene API, quickly found the methods I would need, and wrote a program to perform parsing and indexing.

Week 6:

This week, I spend much time reading and designing a generic parsing and indexing system. One of our goals for this project is to design and implement a system that is easily extensible, so that functions that do not exist during project development can be added with little or no modifications to the generic part of the project. One means of achieving extensibility is to use polymorphism.

Our data retrieval system will implement parsers that can parse files of many varieties, including XML, HTML and the like, to Java objects. Each of these parsers implements a common interface, like GenericParser, that contains an abstract method called "parse". This method is defined by each concrete class that implements GenericParser. A ParserDriver program would maintain the references to these various parsers and send messages to certain parsers when corresponding types of files needed to be parsed.

Such a polymorphic parser makes it easy to add new types of parser to a system with minimal modification. I will apply this method to the IndexBuilder, as well, so that it is easy to plug in various new types of data files to the IndexBuilder with minimal impact.

With the generic programming concept in mind, I decomposed the seemingly overwhelming problem into many 2-day and 3-day sub tasks which are my main focus for week six and week seven:

separate parsing process from indexing process

build some more concrete parse engine

build a generic XML parse engine interface

For the last couple weeks, Debby and I have worked separately on searching and parsing-indexing. This week, we combined these functions into one system. I passed a new searchable index to Debby, so she could test her search on it. However, the searcher did not produce correct results. We checked the code carefully and find out that we were using different types of analyzers in out indexer and searcher. This meant that the indexer and searcher are tokenizing words in different ways. After we changed the analyzers to the same type, the searcher produced satisfying results.

At end of this week we worked on connecting indexer and searcher. In order to do this we had to write an init servlet that get the configuration information, and pass it to indexer and searcher. We also had to modify build files and web.xml.

I also worked on building more concrete parse engines, I encountered some problems on parsing nested XML files but I think I will solve the problem soon.

Week 7:

I had tried several ways to parse nested XML files but the solutions were not satisfying. Feeling frustrated I searched on the internet for help. One article showed up as my search result and surprisingly I found that the author was working on a similar project. Immediately, I sent him an email asking for his opinion on parsing nested XML files. I did not really expect that he would reply, since he did not know me. The second day, I received an email from the author, Peter Carlson. He answered my questions in detail and offered to help me if I encountered other difficulties.

Since the very beginning of this project, I have received help and valuable advice from many people. Professor Cuny showed me the big picture of this project, and gave me a starting point to work from. She patiently guided me through each step of this project while allowing me to develop my own thoughts. This helped me to improve my ability to work independently. Josh, a research assistant, helped me out on many aspects of this project, such as setting up computer systems and designing generic modules of the indexing-parsing system. Debby, my partner, shared her thoughts with me so I was able to make quick progress. They all have inspired, taught, and helped me along the way.

Know Thyself

Since last week, I have started to create a set of concrete XML parseDrivers. This week my parseDriver was able to read from a directory and parse each XML files under this directory and its subdirectories. I was pretty satisfied with my generic parseDriver until I learned that our data source is not limited to a file system. Our data might be from a file system, relational data base, memory, or even the internet. Therefore, I have to modify my design and make it open to any form of data source.

I thought I had built a generic parseDriver, only to find out that it covered only a fraction of the data source; I thought I knew enough, but learned that there is much more I need to learn.

Week 8:

I redesigned the parsing-indexing system so that it conforms to the open-closed principle. The open-closed principle promotes that

"Software system (classes, Modules, Functions, etc.)
should be open for extension, but close for modification."

I created an interface ParseDriver, a class AddressBookDriver that implements ParsDriver, and a class IndexBuilder. An object of Indexbuilder class uses an object of AddressBookDriver. If we want IndexBuilder to use a different parsedriver, a new derivative class can be created. The IndexBuilder will remain unchanged. Thus, we extend the behavior of our system without modifying the existing code.

The following are the main classes, interfaces and their functions:

InitServlet: Passes configuration information to an IndexBuilder, including source data, path of index folder, and type of concrete parseDriver.

IndexBuilder: Gets configuration information, creates index, and populates index

ParseDriver (interface): Has doParse and getPropertiesList methods

AddressbookParseDriver: Implements ParseDriver. The doParse method processes the input XML document based on patterns and rules. It also creates java objects that contain fieldName-fieldValue pairs needed for creating a Lucene Document.

After a couple days of work, I finally implemented the new indexing-parsing system. On Saturday, Josh, Debby and I worked all afternoon to connect our system to VRV-ET.

The first step was to connect these two systems. We set our indexing-searching system to index and search on one project from the existing data in VRV-ET. Since we will apply a keyword based indexing scheme, and there is no key word element in the current data source we manually add keywords, so we can test our system.

It was exciting to see that after hours of hard work our system indexes and searches from VRV-ET files.

Week 9:

This week we spent most our time modifying our code. First we used Perl to create a XML file which contains all the data (including keywords associated with images) we need for indexing and searching a project in VRV-ET. By now our indexer and searcher are able to index and search project on VRV-ET rather than some sample data, such as address books, we created for testing purpose. I also modified the code so the indexer indexes an image that has no keyword associated with it.

This is not the final version of the project. Other students will continue working on the project and add more functions to it. For example, implementing Data Access Object in indexer and searcher, using the system to find specified information, and creating a presentation or slide show from it.

Week 10:

This is the last week of DMP. Debby and I spent most of our time writing our final report and preparing for a presentation.

It is said that the last thing most people want to do is to make a speech in public. It is even harder for me, for I had to do it in my second language. Jan has been very helpful and gave us some tips on how to make a presentation. I was nervous but also amazed that I made a presentation in front of my fellow students and professors.

It has been such a wonderful summer for me. I explored my interests in software engineering, gained confidence in conducting research and felt sure that I want to pursue graduate study and have a career in research and teaching.

I am grateful to have a mentor like professor Cuny, who not only guided me through the research project, but also is so caring and thoughtful. I also had good time with Debby, and we decided to keep in touch after this program.