Weekly Research Journal
Here you will find a journal of my research experience-- what I did each week, the problems I encountered, and the progress I made.
This week was a lot of learning background information. I didn't really know anything about machine translation! Some of the concepts were familiar, but the way the different models used them was new. The most helpful paper was the Statistical MT Tutorial Workbook by Kevin Knight. Rebecca and I worked through this and I'm starting to become comfortable with a lot of the information.
I'm now starting to plan out the user interface I'm going to be making. I've programmed in Java before so that's not new, but I've never written a functional specifications document. I found a really good tutorial that is a really helpful guide for starting with a general idea of what the program should do and slowly working down to specific details on the inner workings-- so that is what I spent most of this week doing. My specifications are online on the project page. I also went to the first summer meeting of the Natural Language Processing group that Rebecca and a few other professors and grad students are a part of-- we introduced ourselves and talked about the speakers we're going to have. I'll be giving a talk to the group about my project at the end of my research!
This week my goal was to begin the main part of the program, where the users enter the word alignments. I now have an applet that displays two sentences (both in English for now) and you can align the words in the sentences with a sure link, an unsure link, or you can align them to not translated. It only does one sentence for now, and there are a few bugs that happen if you try to do something weird, but I'm sure I'll work those out. I'm worried about getting Chinese characters to display in my applet-- I've been trying to look up exactly how to do it and no one seems to know the best way. I may also have to make this a Windows-only applet, since Linux (what I'm using now) seems like there's really no way to do it really. We shall see. On Wednesday, we (the machine translation group) read a paper together about maximum entropy models. I understand some of it but I need to go over it again this weekend! I feel kind of lost sometimes in the meetings and reading the papers, but I've heard this is normal since I haven't really taken any classes in this area, or any graduate courses for that matter.
Karina and I at Phipps Conservatory
Week 4: 6/1/04-6/4/04
It works! I spent this week tweaking my program and most everything is working the way I want it to. I had to redo a few parts completely, which wasn't fun but the end result is better. I'm still having a few major issues-- applets can't write files on either the client's computer or the computer which is the source of the applet-- so I may have to get around this by making a server that the applets would send their data to. Right now I quickly changed the applet to an application and had the data written out that way, in the worst case scenario the application could be distributed and a file sent back via email, but we were hoping to avoid this. Chinese character display is still up in the air, as is exactly what I'm going to add next. There are a few different features I could add depending upon exactly what we wanted to do with this applet-- whether it involves part of speech or not, if it involves active learning and another server-client with a machine translator that has specific sentences it would like to learn, or if it involves doing block studies of how similar people align sentences. We shall see!
This week was short because I went home for my sister's high school graduation. However, I have made some significant progress in the client-server problem-- I now have an applet that sends information to a server program which writes the information in a file! I'm not running a web server from my computer, however, so it still doesn't work on the web. Also a user can exit their session, return with their user name, and have their previous work retained. This weekend I'm going to fix some of the aesthetic issues and maybe get Chinese characters to display.
This week I added the part of speech to my applet. Now the english part of speech is given to the user, and the user has to choose parts of speech for the chinese words. When an english word is aligned to the chinese word, the chinese word automatically receives the part of speech for the english word (but they also have the option of correcting it). I also made a utility for the researcher to convert the output files from my program into different formats-- those useful by word alignment programs and those useful by part of speech taggers. Since Karina is working on a part of speech tagger, hopefully we will be integrating our projects together-- her program will give the human annotators using my program sentences that it needs to learn the parts of speech for. I still don't have chinese characters yet! One non-project related thing I enjoyed this week was the talk at our Natural Language Processing group meeting - instead of a project talk, this week Rebecca gave a talk on how to give a talk. I haven't had to do more than a presentation in class before, but a lot of the information was general good public speaking tips and I'm sure the more specific research talk information will be useful when I go to grad school! It was fun hearing about good talks and bad talks and how to deal with things that may happen when you give a talk.
This week was BIG!! Rebecca helped me to fix both my major problems and we now have an applet that displays Chinese characters AND communicates with a server to save the files!!! The Chinese character display problem was solved by putting a new font in the java fonts directory to compile with, and we talked to James, a very helpful Pitt CS Tech person who made one of our machines run as a web server so that I could run the server and host the applet on it and they could communicate. Having these two problems solved is a HUGE weight off, and everything looks all downhill from here-- just changing the way the user files are created and managed to speed up the communications and how the experimenters will interact with the program. A fun thing that happened recently was last Friday-- Rebecca, Karina, Teresa (a PhD student) and I went to the Three Rivers Arts Festival downtown! It was nice-- the weather was gorgeous, I got to know everybody better, and we got Dave and Andy's ice cream afterwards!
I spent time reorganizing the file structure and other things to make my program ready to be integrated with Karina's project. Now each user has a directory instead of a file and can get more sentences once they are finished with a small set they are given, either in sequential order or randomly from a larger file. I also removed a lot of the things that were hard coded and made them come from a configuration file. Aggregate data files from all the work the user has done so far are collected when they close their browser or navigate away from the applet, and these include a file in the format that Karina's part of speech tagger can read and a file that has the alignments in a format an alignment model would want them. I also made an interactive tutorial to train new users, so my program is just about ready for people to use! I'm sure there's a few more bugs in strange cases that I still need to find, but the programming part of my project is wrapping up!
Lunch with the CMU DMP participants
Week 9: 7/6/04-7/9/04
So much happened this week! Sunday was 4th of July and Karina and I went to see the fireworks down at the point. They were really neat and we got to hear the symphony too! It was SO crowded though, we kept getting stepped on! Then we had Monday off which I spent preparing my presentation I gave to the Natural Language Processing group on Wednesday! Karina and I were both really nervous about presenting our work to the other professors and graduate students in the group but both our presentations went well. I'm glad that's over with! I spent the rest of this week making my program more robust and improving the tutorial. Rebecca is having a few people try it out, which is exciting but scary because they've found weird errors that I didn't find, and even though I've fixed them I can't help wondering how many more there are. We had lunch today with DMP people from CMU - mentor (and DMP co-coordinator!) Jessica Hodgins and her students Shylah Thurman and Tabitha Peck. It was really nice to meet them and hear about their experiences! Oh, and Karina and I are probably going to the National Aviary tomorrow. What a week!
Wow, I can't believe the ten weeks are over already! It went really fast, but now that I look back I've done a lot of work. From starting with practically no knowledge of machine translations to making this program and reading all the papers... I've learned a lot! This week I added a test to the applet so that we can put it up on the web but only people who know Chinese enough to align a sample sentence correctly will be allowed to do actual alignments for us. I also wrote a lot of documentation for my code and worked on my final report. Rebecca, Theresa, Karina and I went to go see The Music Man downtown on Wednesday, it was a nice break for everyone from work. I'm really grateful to the DMP committee and the CRA-W for giving me the opportunity to participate in this valuable learning experience!!