My Summer Internship at Rice University



Contact me at aec@rice.edu.

Week 6

This week I ran into an unforseen problem in testing the thresholds. I started testing all the different experiments at the lowest thresholds then slowly worked my way up to higher thresholds. However, what I did not realize was that as the thresholds grew, so would the running times. While a lower end threshold takes about thirty minutes to run, the highest thresholds were taking upwards of four hours. Unfortunately, even with the cluster, this set the rest of my work back as I waited significantly longer than I had expected for the tests to finish. While I was waiting, I wrote a file parser to create a text document that Excel could interpret. I then converted all the test results into line graphs demonstrating the change in the results as the thresholds increased. I created on set of graphs in which I kept the first threshold steady and varied the second and a second set in which the first set was steady and the second set was varied. This way, we can first choose the best of the first thresholds then using that number, select the best of the second thresholds.

I spent the second half of the week creating a set of negative tests. Each protein comparison consists of a souce protein and a target protein. Previously, all of these proteins were chosen because of some similarity, and we expected to find matches between them. This week, I took each of the source proteins from each set and combined them with one target from each of the other sets. This should give us a set of unmatching protein pairs. I also began to run these new sets on every combination of thresholds from 0.5 to 5 at intervals of .25 for the Query (first) threshold and from 1.5 to 5 at intervals of .5 for the match (second) threshold. Once this has been done, we should be able to determine which threshold gives the most accurate overall results not only for matching sets, but a threshold that will accurately predict non-matching sets. I have also written a program that will run every combination of proteins at every threshold. This will probably take at least two days to complete, especially at the higher thresholds.



Back 1 2 3 4 5 6 7 8 9 10 Next