Today, we decided that our somewhat hacky internal transition matrix change wasn't up to par. Sure, it worked as a proof of concept (we weren't even remotely sure if it would work), but we really need the transition matrix to be passed in externally for maximum changeability. As it turns out, Tony and Inbar are also attempting to use the transition matrix. Initially, they were going to use our code, but they decided they'd rather have this external feature as well. So, we'll borrow from them when it's done. It's great that the two of them are working on similar things as Thomas and me, as a lot of conferring goes on behind the scenes in both directions. We've shared a lot of our code with them, and vice versa, so we also don't duplicate the work and can focus on different things.
Anyways, we'll be getting the code from them as soon as it's available. They did make some changes to the input format (now a list instead of a matrix), so we spent the rest of the day planning how to code up our own weights for the transition matrix.
Today we finished planning the confidence matrix -> transition matrix -> transition list script. It's actually pretty handy and should be easy to change if we want to play around with the weights a little bit. We also took some time to run the original version of majority vote as a baseline for our final results, whatever they are.
Today we ran the external transition matrix code for all of our various versions of confidence scores. The happy news is that we got the same exact results after all these changes. Yay!
We also took care of some of the more mundane tasks, namely filling holes in our results spreadsheet. We've been keeping track of every DSD+MV method and every confidence method, filling our results in for each combination as we go. It took a long time to organize all the missing runs, but while it was running, we kept going on the poster planning, including coming up with the final table of results to include in the poster.
We formally met with Lenore again today to discuss our results, how we plan to represent them in the table, and such. Lenore also came up with the idea of an iterative version of majority vote, which would be pretty cool in practice. The idea is that we use a small amount of training data from the best performing subset of edges. Then we'd predict functions as normal for the rest of the proteins in this subset. In the second iteration, we'd use the training data plus the predicted proteins as a new set of training data for more interactions/proteins. This should actually perform worse than before, but if we could take this information and try to guess which predictions are more likely to be accurate, we might be able to come up with a scenario in which only a small amount of training data is needed for the function prediction to perform exceedingly well. Note that this no longer uses cross-fold validation.
We spent most of today implementing the iterative version of majority vote, after deciding to overhaul the internal functions to improve readibility, and honestly, to improve our understanding of the inner workings of the code. We spent LOTS of time writing pseudocoe and staring at the whiteboard, just organizing our thoughts and wading through the tricky parts. Sometimes it just helps to get away from the computer, and to talk it over with someone to get on the same page.
Also, Danny came to visit today! We spent the late afternoon walking around the MIT and Harvard areas, stumbling upon an abundance of cool buildings and bookstores. I'm sure looking like nerdy college kids and walking around with backpacks on helped so we didn't get any weird looks while wandering around in random buildings. :)
Danny and I went exploring today. It was gorgeous outside, so we thought we'd go to the beach. We changed plans a lot today, or more like upgraded plans as we stumbled upon different things to do. First idea was heading to Revere Beach balance the hot sun with the cold Atlantic. Then on our way down there, we saw a billboard for the aquarium, and thought we'd go there in the morning, since it hadn't quite gotten hot out. When we got off at the stop for the aquarium, we changed plans and decided to just wander around the wharf. Once we got the wharf, we decided to take a ferry to an island in Boston Harbor! We ended up going to Spectacle Island, which was really really cool. Despite being a reclaimed landfill (we had no idea until we were about to leave), it was absolutely gorgeous. Because of its shaded past, the shoreline was FILLED with seaglass. So cool!
Today we were a little "walked out" so we went back to a really cool bookstore at MIT and got some books. I chose An Introduction to Bioinformatics Algorithms by Jones and Pevzner. It's the most readable algorithms book I've come across so far, and actually really really like it. It's not too often that I can read the first ~100 pages of a textbook-style book straight through without being extremely (extremely) bored. (Algorithms are really cool and all, but reading about them without interacting with them in some way is usually very very dry.) We got some lunch and spent some time reading in the (air-conditioned) Tufts library. What a great, low-key way to pass the day!