August 8, 2006

Week Ten - Tuesday

Wow. It's hard to believe it's my last week already. I put in my last appearance at the monthly division lunch yesterday, and talked a little with Kristina about some things I'd like to get done before the meeting on Thursday. I still need to get the actual list of people who favorited each photo, a task I deferred as there is no API method that will do this for me. I wrote a script to scrape the favorites page for each photo, but there were a couple of things that made this a tricky endeavor. First, users are listed by their user name and not their user ID. This makes it very easy for a human to read, but it would be better if we could translate names into user IDs so that we can match them up with users already in our database. Second, the dates are written in the abbreviated format DD Mon YY (eg, 08 Aug 06). I'll need to translate that to a different format if we want to work with it easily later. On the plus side, the list is very regular and orderly, and it was fairly easy to write the necessary regular expressions. The presence of multiple whitespace characters in some users' names posed a bit of a challenge, but I think I've allowed for enough spaces to catch all the user names on a page.

August 11, 2006

Week Ten - Friday

My last day. There's still a lot of work to be done, but I feel pretty good about what we were able to accomplish so far. Kristina and I will be collaborating on a paper based on this research, specifically on how users find new pictures to view. I'm also going to be using the Flickr data for my MA thesis; we didn't get a chance to explore the structure of the social networks, and the data we have collected is well-suited to the task. Kristina presented me with a CD containing the entire SQL database today. It's funny to think that a summer's worth of work is on this one small disc. It was an interesting summer's worth of work, though, and I'm glad to have been a part of it.