Kathryn Doroschak

DREU Participant

Tufts University

B.S. Computer Science Student

University of Minnesota

Adventures of Week 1

Monday, June 3, 2013

Wow, first day! After spending the weekend traveling, moving in, and sightseeing with family, I am so ready to welcome this next adventure. It was a damp and rainy day, but that didn't dampen my excitement for even one moment.Actually, after the searing heat and humidity from the weekend, this was a nice little respite.

Lenore and I arranged to meet at the Danish Pastry House at 9:00 am for breakfast, so at about 8:50, I waltzed into this quaint and friendly little cafe, crossing my fingers that she would already be there. I found her already sitting at a little table and introduced myself. We chatted and ordered breakfast, and then sat down to talk more about the project. (More to come on that in a little bit.) Overall, I think Lenore is very friendly and easy to talk to.

After breakfast, we drove to the building where we'll be working for the summer, 196 Boston, and received a nice warm welcome. I was given a desk, a computer, and some papers to read in the "Undergrad Dungeon," which is actually a small but pretty room in the middle of the level. (We now rarely turn the lights on in favor of the gaping skylights above.)

Then I met the other students I'll be working with this summer: Thomas, the other undergrad, Michelle and Mike, two of Lenore's grad students, and Maxim, another undergrad. Lenore led us through one of the computational biology research area's main focuses: the diffusion state distance algorithm for calculating shortest-path distances on networks. This algorithm, also known as DSD, is based on a graph's topology (the nodes and edges themselves), meaning that it can really be used for any network. In this case, the advantages shine brightest in biology, applied to improve protein function prediction for protein-protein interaction networks. The algorithm works by esentially redistributing "distances" between proteins, generally giving more weight to nodes are connected in local neighborhoods rather than across the network or through a hub. Thomas and my part in this will be to try to improve the algorithm by adding in external confidence scoring information.

After talking about the project and the algorithm as a group, we went out to lunch at a lovely place I can't quite remember the name of. They serve breakfast all day, and are evidently quite popular in the Medford area. I'll have to find out, because I'd like to bring family there.

After lunch, we went over to the official (but under-construction) computer science building, Halligan Hall. We took care of more housekeeping business (or as one of the organizers of Grace Hopper humorously called it, "white-collar housekeeping"). Because the department is so split up due to construction, it feels tiny. After seeing Halligan, though, I'm realizing it's actually a little bit bigger than I expected.

This whole process took most of the day, but we went back to 196 Boston afterwards and started learning more about the algorithm and gaining sight on what exactly we'd be doing. This concludes a very long but fun day!

Tuesday, June 4, 2013

On Tuesday we actually started to get down to business. Most of the day was spent looking for sources of confidence information to integrate into DSD. I think Thomas and I also started to get a better picture of what we needed to do... at least somewhat. We at least knew that the first step was finding these confidence scores.

Standing up against a learning curve in a brand new realm is both exhilarating and frustrating for me. I love it. It's a whole new chance to learn how to learn all over again. I know it's nerdy, but I get a mini adrenaline rush when on a mission to learn something quickly. I just want to know about this new thing five minutes ago already, so I can get on with my work and be constructive. Every single step of the way I'm always thinking, "hm, how could I have explained that better, in a way that I would understand it faster and better?" It's become a subconscious part of me now, a section of the consant internal dialogue. There's always a better way. Thankfully and disappointingly, this learning curve hasn't been all that steep, at least not compared to learning about HLA for my internship last summer with NMDP. (There it took about 2-3 weeks to do anything but smile and nod while trying to comprehend the vast amount of jargon being tossed around.) Still, I'm on the hunt for more and more knowledge about protein networks.

Wednesday, June 5, 2013

I think we know what we're doing now...

Phew. Not so bad, but having a list helps immensely.

Thomas and I met with Lenore this morning and decided generally which confidence scoring methods to use. We're going to shoot for at least one method from each of the following categories: literature, semantic similarity, and experimental method-based. Yet another common method of calculating confidence scores uses the graph topology itself, but we want to steer away from that since DSD is founded on graph topology. Beyond that, we started reasoning out which methods would be best within each category. The MINT database is the lowest-hanging fruit, since it already contains confidence scores within it. Thomas is going to extract those and see whether they're distributed properly. Other methods we'll check out include MIScore, GOSemSim, a publication counting approach, CAPPIC, and a few others.

I took the afternoon to go find a yarn shop, and ended up trekking down to the Cambridge area to a little place called Gather Here. It's an adorable combined yarn and fabric shop (my other weakness) with comfy couches and sewing machines/workspace available for rent. I took my time picking out some gorgeous handspun yarn, purchased it, and then wound it up using the hand winder in the back. I figured I'd hang out in Cambridge for awhile, so I found a comfy bench near some blooming peonies and spent about two hours knitting, people watching, and dreaming up the best way to write a python script to generate 100 networks based on edge probabilities. :) On the way back, I stumbled upon a bookstore near Central Square and did some public transit exploration.

Thursday, June 6, 2013

I spent another day trying to coerce these confidence scores out of the woodwork. It turns out that the mint database, while great on its own, has only a fairly small subset of the proteins and protein interactions found in bioGRID (our main source of protein interaction data). This makes it tough to compare our results to the original dataset used for the DSD paper, although this should theoretically be alleviated by the comparison to the original, unprocessed MINT dataset. At the very least, it will allow us to play around with the confidences and see if adding them actually changes anything. We'll keep finding other measures (for example, the literature-based methods should be decent given that most of the information we need is guaranteed to be within the files we have). Ideally, we will observe an improvement in the algorithm's performance... but we'll see!

Friday, June 7, 2013

We're making good progress towards the first completed dataset. The random networks were generated based on the MINT scores, and these networks were run overnight through DSD. I wrote a few more python scripts to process all this data, including matching up the edges and taking the average across all 100 networks. This is probably the fifth (or fourth... or seventh?) little python script I've written this week, and I can tell I'm getting better at working with the language. Fun stuff!

Saturday, June 8, 2013

Saturday, I went to the science museum with Mike and Michelle! We met in Davis Square at the College Ave. entrance to the red line... and were very thankful Lenore suggested specifying an entrance (there are four). We had a great time - it's so much more fun to go to the science museum with people who are deeply comfortable with their love of science. The music-playing tesla coils during the "lightning show" were particularly fun to see. It was super busy in the museum overall, but still a good time. Some other favorites included the knitted brain (of course), the Mathematica exhibit (rife with math jokes and puzzles), the natural science exhibit (there was a petrified log that had turned into a geode with bright blue crystals inside), and the Pixar animation exhibit.

After our science museum adventures, we walked through Charles St. and through the Commons to eat lunch at a pizza place by Park Station. It was a gorgeous sunny day, perfect for eating pizza in the park after a long day of walking around. Afterwards, I took advantage of being downtown already and ran some errands. I planned to go check out student rates and advance purchase rates at some sightseeing places for when my family comes to visit, but by then I was pretty tired so I only made it to Macy's and Marshalls before heading home.

Sunday, June 9, 2013

On Sunday, I spent the day relaxing, taking care of stuff from home, and playing computer games. Everybody needs a day like that once in awhile! I also started up yoga again after at least a year. (Probably chose a routine that was a little much after not doing it for so long.) It felt great but I could tell I was going to be sore the next day...