Home‎ > ‎

Week 4

For the biology lecture on Monday, Lenore talked about high-throughput experiments which can be on single genes, pairs of genes, or higher orders of genes in yeast (with ~6,000 genes) and humans (with ~50,000 genes). The proteins that are associated to each other may or may not be physically bound to one another, but can be modeled in a PPI (protein-protein interaction) network. Lenore mentioned some of her previous work on non-essential yeast genes. She compared growth of double knockout to single knockout. She later on talked about Tufts part in a competition known as the DREAM Challenge which focused on unsupervised community detection.

I spent the rest of the week trying to do two things -- continue working on the programming project and trying to prove what I found last week.

The programming project kept progressing slowly. I was finally able to make predictions for the unknown set based on functionalities from the known set. However, my program still requires a method to check predictions with actual values. It also needs to be able to make the predictions based on the confidences of all surrounding nodes (not just one adjacent one).

With regards to the math project, I started off by reading about triangular numbers. I recalled that the nth triangular number is (n+1)C(2); hence, the (n-1)th triangular number is (n)C(2). Since, a complete graph of n nodes has (n)C(2) edges, I thought that  I could use these facts as a starting point of my proof. I also thought that the terms of the equation k = 1 + 1/3 + 1/6 + ... + 1/r depended on the number of nodes in the network. Lenore advised that perhaps the relation depended on the length of the random walks. Therefore, I began to look at random walks. I started by looking at a complete graph of three nodes and seeing the k values of the arranged DSD matrices for various lengths of random walks before convergence. The values of k until convergence were: 1, 1.5, 1.25, 1.375, 1.3125, 1.3438, 1.3281, 1.3359, 1.3320, 1.3340, 1.3330, 1.3335, 1.3333, 1.3334, 1.3333. Of course these values weren't telling me anything, so I decided to see if adding their reciprocals led me anywhere. The sum ended up being around 9.202 which gave me no further insight. Later that week, Lenore and I sat together and talked about the DSD model. She reiterated the idea behind the DSD algorithm. Together, we calculated the expected output of DSD on a complete graph of five nodes with random walk of length 1. We then discussed how we would calculate it with random walk of length 2.  I had a vague sense of what she described, but decided to solidify my understanding by doing some further calculations, by hand, of higher order random walks. I compared my results with the computer program to make sure I was doing them correctly. I found that a random walk of length 0 had k value of 2, length 1 had value of 1.5, 2 had value of 1.625, 3 had value of 1.5938. I found a new pattern: a random walk of length 0 had k value of 2, length 1 had k value 2 - 1/2, length 2 had k value 2 - 1/2 + 1/8, length 3 had k value 2 - 1/2 + 1/8 - 1/32, and so on and so forth. Basically, the k values were determined, for a complete graph of 5 nodes, based on a geometric series with initial term 2 and common ration (-1/4). That means that for a random walk of length x, k = (2((-1/4)^x - 1))/(-5/4).