Week Five: June 14 - June 18
I started off this week by spending tremendous amount of time debugging my Java program. It was my first time using SAXParser to parse XML files into MySQL database, and I got several warnings and error messages which took a lot of time to examine. I finally found out that the XML files are not well-formed. They are missing closing tags which results in incomplete parsing process. I made an error report and submitted it to Professor Lerman, so that she can forward the report to the publisher of APS dataset.
This week I mostly worked on the report for Arxiv HEPPH and APS Physical Review datasets. The report consists of the overview of the dataset, some important statistics of the citation network, the publish year distribution, and the estimation of gamma and alpha values. I encountered another problem of parameter estimation. In the APS Physical Review dataset, there are about 450,000 papers that will form 450000 by 450000 sparse matrices. Even though UJMP can handle very large matrix, the time complexity is too high on large dataset. I left the program running for the whole weekend just to see how long it would take to perform ten matrix multiplications and calculate summation of each adjacency matrix.
Lessons Learned:
1) Java is nice for something else, not for scientific computing.
Weekend:
This weekend was amazing. I had friends from San Francisco, Providence, and Los Angeles visited me over the weekend. We went to the Boiling Crab to have seafood (of course!). I am well known as a crab lover so this is the best restaurant I could ever visit. We also went to Santa Monica beach which has become my favorite place since the first time I saw the Ferris wheel lights up at night on the pier. We had a great time hanging out, eating ice cream, and taking tons of pictures. It was the best weekend so far! =)