Week Three: June 1 - June 4
This week I had to work with two new datasets from Stanford Network Analysis Platform (SNAP). Both of them are citation networks. I downloaded datasets and imported into our database. The problem I found is that the data are not clean. There are a number of articles that have the wrong date of publication. We suspected that the wrong dates were caused by inefficient parsing method of the dataset, so I wrote another Java program to extract the date from the article's id and it finally worked!
We also wanted to know what time interval is best to analyze the dynamic networks. We considered either 6 months or 12 months for each snapshot of the dynamic networks. I had to do some data analysis on that and presented what I got to professor Lerman and Rumi next week.
Lessons Learned:
1) Every dataset has flaws.
2) There is no dead end. Don't stop. Keep trying new methods and thinking of a new way to solve problems.
Weekend:
I didn't feel well so I stayed home over the weekend. I had some kind of allergy. The best thing I remembered is that I had super delicious Thai noodles at Thai Town for Saturday lunch. That's all.