Internship Week 8

Week 8

Tests, tests, and more tests. I ran the new geoHash (which is significantly faster) on the L set. I then ran those tests through my programs and eventually produced a file the Excel could read. Then I used Excel to graph the results in several different ways to determine the best thresholds. I also automated the entire process so that one command will do everything up to outputing the Excel file. Next I ran another set of tests. These tests were intended to test the effects changing the spacial threshold. The problem with these tests was the change in the file format. The names of the files went from source-target to soure-target@threshold. The files had already been run through geoHash, but Brian wanted me to add the Geometric match to the output. This meant that the entire automation had to be changed and geometric match had to be calculated. Calculating geometric match was relatively simple. Geometric match is the number of matches found over the number searched for, not taking into account whether the found matches were correct. To find this number, I simply added the number of true positives and false positives and divided them by the total number of matches. Brian also wanted me to calculate geometric match in the L set tests which meant that I had to rerun that set as well. Eventually, when all the tests were run and the new graphs were made, calculating the goemetric match helped to confirm that the output was correct. The sensitivity followed the same pattern as the geometric match, indicating that the results were at least following a predictable pattern.

My Summer Internship at Rice University

Contact me at aec@rice.edu.

Week 8