Week 10

August 3-7, 2009

This week I finished my poster and wrote my research paper. Tuesday we had a practice presentation in the Zachry building - it was helpful in calming my nerves about the upcoming presentation on Friday. Tonight is the farewell dinner - we're going to C and J Barbeque. Tasty!

Week 9

July 27-31, 2009

This week I implemented MIT's SIMILE Timeline widget. However, it seems a little finicky and I'm not sure if I'm doing something wrong or if it's a problem with the application. Anyway, the timeline only works for certain terms. For example, "shipwreck" and "titanic" don't work, but "nautical archaeology" and "obama" do. In addition to the timeline, I have also added JSTOR to the databases to be searched. So, now the search looks through the New York Times and JSTOR for results (I omitted CNN because the Google API didn't provide enough information - publication dates, article authors, etc). Results returned by JSTOR don't have a URL with them, though, so I can't link to the actual articles. The titles and abstracts are now included in the text cloud, but not the timeline since there were too many variations in date. JSTOR was a little tricky to add, because I have been working with JSON and JSTOR returns results in XML using the Dublin Core Record Schema, which I had never heard of. After some poking around the internet I was able to figure it out, though. I also have been working on my poster, which is due Monday.. yikes!

Thursday was exciting. The El Dorado Chemical Company warehouse up in Bryan caught fire or exploded or something, and a large area north of University Drive was evacuated because of the chemicals in the air. Apparently the fire department had to let the fire burn itself out because it was too dangerous to get close to it. The university was closed a little after 4, so I went home then and most of the evacuees were allowed to return a few hours later.

Week 8

July 20-24, 2009

This week I met with Dr. Furuta, Dr. Caverlee, and Dr. Castro to discuss the next step in my project. We began discussing interesting methods for visualizing such large amounts of data. I looked at a site called Wordle which does text clouds. I was able to find an API for a similar application called Wordics. I added the Wordics script to my search site so now after searching, you can view a text cloud displaying the 50 most common words in all titles and blurbs of the New York Times results. It is rather slow, though, so I think I may assume later results are less important and only incorporate the first 50 or so pages. Another suggestion was to use a timeline like the one found on MIT's Simile project. It might be interesting to, for example, plot the publication dates of articles about the Titanic. It would be interesting to see where clusters appear - for example, clusters would probably appear around the date of the wreck, maybe the 50th anniversary, and then again around the time the movie was released.

I intended to get dinner with a friend this Monday night, but we had some ridiculous rain. We almost had a tornado, but apparently the funnel cloud didn't touch down. The Eagle reported 5-7 inches of rain and golf-ball sized hail. It doesn't rain much in California, so I've really been looking forward to having a rain storm ever since I got home. This is the first storm we've had all summer! So strange...

Week 7

July 13-17, 2009

This week I spent trying to generalize my code. I wrote my search to use the New York Times API specifically. I'm now trying to rewrite my code so that it is easier to have it use a different API or for more than one API to be used. I've got it almost working, but the pagination isn't quite right yet. Hopefully that shouldn't be too hard to fix though. Next week I'll meet with Dr. Furuta, Dr. Caverlee, and Dr. Castro to talk about the next step. Now that I have this search which can access different databases, what should I do with it? Hopefully they will be able to offer some advice. I'm still having the problem of which databases to use, since our ideal databases are too difficult to get access to. Currently, my search uses The New York Times API and CNN via the Google Web Search API. The problem with using both of these is that they are both sources of relatively recent news. JSTOR, on the other hand, has articles from various archaeological journals detailing shipwrecks from thousands of years ago. However, JSTOR only grants access to its API after a signed form has been mailed to them and approved.

Tonight (Friday, July 17) I will be going to a concert in the Woodlands - about an hour and a half drive from here, near Houston. I get to see ZZ Top and Aerosmith - it will be amazing!

Week 6

July 6-10, 2009

This week I finally got the pagination on my New York Times Article Search to work. I also cleaned up and commented the code. Now I'm trying to find a second API which I could incorporate into the current search. So far the only possibility I've really found is Google, which isn't really the type of database we want to be accessing. However, I don't have access to the APIs for the databases which would be useful to us. JSTOR seems to only give access to a limited number of articles to teams participating in the Digging Into Data challenge. Chronicling America, the database we are most interested in, was supposed to have an API coming out Spring 2009, but that has not yet happened, so we can't access their data. The Digging Into Data site has added a few more sites since the last time I checked it, so I am looking at those and at APIs listed on ProgrammableWeb to try to find some more useful APIs.

Week 5

June 29-July 3, 2009

This week I continued playing with my simple NY Times search site. I've gotten the results to display in an easily readable format, but now I've been struggling with getting the pagination to work. I want to display all results in groups of 10 over multiple pages. I have been unable to get this to work, so I'm trying to first display all results on one page. However, I am getting some strange errors. If I try to display multiple sets of 10, the first 30 results show up fine, but every result after that is "November 30, 1999 ...". Also, the results returned change depending on how many sets I'm displaying and on which offset I start displaying at (that is, if I start with offset=0, I get a set of 1-10, 11-20, 21-30, etc, but if I start with offset=2, which should be the same as starting with results 21-30, I get different articles). I've been reading up as much as I can about pagination, but I haven't found much that's particularly helpful...

I spent the 4th of July in California visiting friends. However, there were many celebrations around College Station - the George Bush Library had fireworks, and the Star of the Republic Museum in Washington-on-the-Brazos has a 4th of July celebration every year with food, live music, and fireworks.

Week 4

June 22-26, 2009

At the beginning of the week, I researched a specific shipwreck for Dr. Caverlee using the Chronicling America database. He's writing a proposal, and wanted to give an example of a story that might be difficult to put together with a typical search interface. I researched the wreck of the Valencia, a ship which crashed on Vancouver Island on January 22/23, 1906 on its way from San Francisco to Puget Sound. Most articles were concerned with the actual wreck of the ship, but I was able to dig up a few from the ship's past and a few several years after the wreck which mentioned it. These past and future articles were quite difficult to dig up for several reasons. First, multiple ships were called the Valencia. Valencia could also refer to a street in San Francisco, or someone's name. These caused many false results. It would be convenient if our interface could determine which articles concerned a particular ship, because these few past/future articles contained very interesting information which may have been otherwise overlooked.

Then, I signed up for a New York Times API key, and have been playing with it ever since. Search data is returned in JSON, which I had never heard of, so I've been doing a lot of reading about JSON, JavaScript, PHP, etc. Using this very useful site I made a very simple site with a NY Times search bar.

Week 3

June 15-19, 2009

This week I did more sketching and more research about what exactly nautical archaeologists are interested in. I checked out a book - Archaeology Under Water by Keith Muckelroy - that Dr. Castro recommended as well as looking through all of the nautical archaeology websites he sent me. Now it seems to me that the articles provided by the Chronicling America and New York Times databases may not be very useful to nautical archaeologists. Newspaper articles tend to be more narrative - they tell the story of a ship wrecking and give some information, like the ship's name, captain, passengers who died, etc. However, I think nautical archaeologists are most interested in the cargo of the ship, the ship's construction, and its origin. These are things which may or may not be included in a newspaper article. Also, nautical archaeologists are more interested in ships dated to before records were kept, because they use the cargo and construction of the ship to try to determine the way of life of people from that period.

A few days this week I got lunch with some friends. Northgate is a mere ten minute walk from my building and has lots of tasty restaurants. I particularly enjoy Potbelly's sandwiches and milkshakes, and Antonio's pizza has always been a favorite. I also went out a few nights to play pool.

Week 2

June 8-12, 2009

This week on Monday I met with Dr. Furuta and Dr. Castro from the Nautical Archaeology Department. We discussed what type of information a nautical archaeologist might be interested in while researching. For example, whether an account of a shipwreck is factual or fictional, whether the wreck has been found, the origin and destination of the ship, or what the captain's name was. Dr. Castro e-mailed me a more complete list of questions. I skimmed several chapters in some of the user interface design textbooks Dr. Furuta gave me and wrote down tips I thought might be helpful when designing the search interface. I began sketching out ideas for how the interface might be laid out, based on the several online databases I've looked at so far. I also wrote out several problems to think about, such as how to make it easy to divide a search into many subcategories without making the interface cluttered.

While home I've made a point to see as many movies as possible with my friends. The ticket prices here are so cheap compared with California. At school, seeing movies is a treat since tickets can cost >$10. Here, since the tickets are only $4 for students, seeing a movie in the theater costs the same as renting a movie. I went with some friends to see The Hangover, and it was hilarious. It's definitely one of the funniest movies I've seen in awhile.

Week 1

June 1-5, 2009

The first week I met with Dr. Furuta and Dr. Caverlee to discuss beginning work on a search interface for the Nautical Archaeology Department. They pointed me to a website for a competition called Digging Into Data which had a list of several different online databases. My main task for this week was to skim over the databases and get an idea of which ones might have some information pertaining to nautical archaeology, and of those, which had the most information. I kept track of statistics such as how many total resources the database had and how many results were yielded when I searched for terms like "shipwreck". I found three databases which will be useful: Chronicling America Library of Congress National Digital Newspaper Program, The New York Times Article Search API, and JSTOR.

Dr. Furuta also gave me a few textbooks about user interface design to look through, as well as one with a few sections on information retrieval.

Unlike many of the other REU students, I am from College Station, though I attend Harvey Mudd College. This has made it easier, I imagine, because I don't have to get to know a new city. Last summer I stayed at Mudd and did research, so I was only home for a few weeks. It's nice to be back and to see all my friends from high school (most of whom are attending Texas A&M). I'm working in the Digital Libraries lab, in an office with one other girl, Bethany. Bethany is also working with Dr. Caverlee, but her project has to do with the geographical locations of Twitter users.