Concluding Comments and Final Report
Posted by Bryn Reinstadler on August 20th, 2012 at 4:40pm
Well, I cannot believe this summer; it's been crazy. But finally, here's my final report on the project. It was a super great time.
Week Ten - The End
Posted by Bryn Reinstadler on August 2, 2012 at 3:35pm
Oh, the final (but one) blog post.
It's been a good week. I've spent a lot of it frantically finishing up experiments and reports, but the brunt of the work that we have been doing is finished. Basically we have three things left:
- Make a histogram of Smotif frequencies in the general population of proteins
- Break apart one full protein and make pretty pictures of the Smotifs, as well as find matching Smotifs amongst the library of Smotifs we have on hand.
- Break apart 3 files (of varying difficulty) and discover how meaningful a scoring function based on Smotif frequency and distribution would be.
So far we're pretty close to finished with all of those. The histogram is done, the pretty-picture thing is in progress (four more smotifs to extract and color all pretty-like using JMol!) and the scoring function, well...it's almost done (the data collection, the proof of concept part) but it isn't showing happy results. I don't think it'll be a super accurate way to grade protein design, basically, which is too bad. Maybe we'll try taking some other samples from the population of CASP structures.
Outside of work, I've been hanging out with my roommates and Katie a lot. It's been fun - I still need to pack though, as I'm moving out in 2 days! Eesh!
The final results will be in my final paper which will be posted in the ACTUAL final blog post, probably titled "Concluding Comments".
All the best, and I hope you've enjoyed the ride as much as I have!
Week Nine - Applications
Posted by Bryn Reinstadler on July 29, 2012 at 8:05pm
Last week we had our first successful run on a single dssp/pdb file pairing. This week we modified the parser so that it could take multiple files and parse their information into a single file. This might sound like a simple loop job but it ended up being a bit more complicated than that because we found it necessary to include more information from both the dssp and pdb files that needed to be added and utilized throughout the program (which at this point is about 9 discrete classes, not including what I call 'data structures classes' which are custom objects for things like Secondary Structures, Residues, and Atoms). By the end though, we had a beautiful library of Smotifs from our selection of pdb/dssp file pairings. It was wonderful.
Jennifer and I met with our advisor, Amarda Shehu (her information is on the About Us page if you'd like to learn more about her or her fabulous work in Computational Biology). We talked about some exciting applications, but that's what I talk about in next week's blog...the final blog...Oh dear, so sad. We'll basically be using some files from CASP (a structure prediction competition) in order to provide proof of concept for a few ideas for using this library of Smotifs, such as scoring structures on difficulty of prediction and things like that.
In life-outside-of-research (wait, there's a life outside of research? Ooooh yeeeah...) I watched the Olympics with Katie Friday night and that was a lot of fun. We got ice cream and soda and made a party out of it! It was a great time to see all the countries' outfits, the swimmers, Lord Voldemort, and the Queen of England jumping out of a helicopter. Weirdly only one of those didn't happen, and it was the swimmers bit. Apparently swimmers and gymnasts don't walk because it's so strenuous to do the day before competition. Hm.
Anyway, so it's been a great 9 weeks and I can't believe I only have a few days left. I'll be sad to leave, but I'm also excited to see my family again! Not to mention the month that I get off of school and work after this is finished. I'm very ready to leave but there's a lot to do until then...Arrivederci!
Week Eight - One Successful Test Run
Posted by Bryn Reinstadler on July 22, 2012 at 1:35pm
Unfortunately, due to an unforeseen error, we did not finish on Friday. Right now we're working on fixing that.
So, the energy function project is/has been done for a while now. Basically the information that I garnered was all I needed to do, as I learned, and the woman with whom I was working has temporarily moved on to another project as well, as she just got one of her papers published (Congrats, Irina!). So, that project is defunct and I won't keep mentioning it in this blog.
The other project though! I've been putting in 10 - 12 hour days for a while now. (of my own free will, I promise). It's been a lot of fun but also a lot of work trying to finish this up. I finally got to the point where there were no more off-by-one errors in the syncing of residues to their secondary structure (and thank heavens for that because that was the most annoying error to fix), everything in fact was synced up, and the file printed. However, I noticed there was an error in the geometries and I asked Jennifer about it. It took a few days for Jennifer to be able to fix it because it was a complicated mathematical problem and we only today (on the first day of week 9) got it almost fully figured out.
I was really proud of myself earlier, because though the math was all checking out fine the program was chucking out numbers that were far too small...in fact, they were between 1 and pi, which I thought was odd. I soon realized that what we thought we had been printing in degrees we were actually printing in radians. It felt beyond successful to figure that out and fix it. Such an easy, elegant fix and so much trouble was immediately alleviated. Check your ranges and domains when working with trigonometry, kids!
So, today, then, finally and officially, the parser prints out the parsing of a single pdb file. Mission. Accomplished. (Well, one of them)
In not-work-related news, my friend Katie (whom I know from high school) and I have been hanging out lately. We went out to see the big pond near the Mall (not the Reflecting Pool, that's down), and we've gone to see a couple of monuments this week. It was a lot of fun!
Week Seven - Debugging
Posted by Bryn Reinstadler on July 14, 2012 at 2:30pm
Week Seven's Progress was fantastic! We got so much done, at least on the main project.
The Side Project (Energy Functions): I have not been able to meet with Irina yet. I expect to do so on next Wednesday, however.
The Main Project (SSMs): This project has been a lot of fun this week. I finally got to a place where the structure of the entire program fit together very well (after a couple of failed infrastructure tries, sigh). However, the infrastructure really came together this week and I think the whole program makes a lot of sense. It acts as a series of sieves that gets rid of extraneous information or uses existing information to inform new information (such as the x y z coordinates of atoms being used to calculate the principal moment of inertia, and from there only using the pmoi instead of the xyz coordinates). I have spent a fair chunk of this week also beginning to debug as we come closer and closer to the point where this program will make sense.
Doing a run of the program right now doesn't yield anything that we can check, so realistically we're going to have to do a lot of mini-runs with specific printlns to make sure that each step of the long process is done correctly. This should take up a lot of next week, I'm expecting. This is in addition to the fact that while my part is finished, my partner's more mathy coding (which I could not hope to do!) has not quite caught up with the rest of it and so the extensive testing on that will have to hold on until it is completely finished.
I feel very satisfied with the amount of work that was done this week and feel confident in our ability to get the parser finished by Friday of next week, in good running shape and able to parse any single pdb/dssp file that comes its way. Great week, lots of coding (which is what I like!)
This week I again, hung out a lot with Katie and we went out to eat. It was fun. I've also hung out a bit with my new roommates, who all seem pretty cool.
Week Six - No More Energy Functions
Posted by Bryn Reinstadler on July 8, 2012 at 6:35pm
Week Six went surprisingly well.
Without the dragging force of the energy functions project, which has been put on an indeterminate hold, I now have a lot more time to work on the project with SSMs which has been steadily moving ahead.
I have been able to code a lot more this week and we finally got a good concept down for how the coding infrastructure would work. It took some reeworking of the current ideas but all in all it sped us up a lot and will do us well in the future. Right now we are working on refining an array of residues into an array of secondary structures (which will become an array of Smotifs...the finished product!) We've already gone through the process of harvesting all of the information that we need from the dssp and pdb files and put those into an array of atoms (which informed the array of residues). Once we figured out a secure infrastructure the project really started going along. Below is a picture of our oh-so-fleshed out idea:
It has all of the data that we need and has some good ideas for where the various filters will need to go. The progress would be faster than it is but my partner, unfortunately, works full time and therefore cannot dedicate quite as much time as I can and sometimes I can't do the things that she can with her marvelous math-brain.
But, all in all, I'm very happy with how this week went and I can't wait for the next 4. I can't believe I only have one month left here.
In a somewhat related topic, I expect next week to be a little slower of a workweek because I'm moving mid-week and that will probably take some energy. Hopefully I'll catch right back up though.
Happily, I've been hanging out a lot with Katie, going to see shows and museums. I've also been seeing a friend that I met at school named Steve. Overall it's been a good, productive, and friendly week. Until next I write!
Week Five - Progress
Posted by Bryn Reinstadler on June 29, 2012 at 5:20pm
Week Five was a good week as well.
Energy Functions Project: This project wrapped up for me around midweek; I finally finished compiling all of the information on the different energy functions that were used by the best docking strategies in the world. I'm not sure how many of them we will be able to use, however, because a lot of them use all-atom calculations where we just want backbone atom calculations. We will see. Though I was planning on meeting with my graduate student colleague this week, she had some unexpected business to take care of and we will be meeting next week sometime. Until then, I will be working diligently on...
Secondary Structure Motifs Project: This project we are still coding. We struggled a bit with github at the beginning of the week, but the payoff has been great as my partner and I have been able to work together without worrying about making changes simultaneously and having to merge documents by hand. We've been working out coding kinks all week. A large portion of the problem was figuring out exactly what type of data structures we would end up using, but we figured out a good overall structure midway through this week (Wednesday). From then we've been working fairly separately as we code and figure out what we need to do to make our separate parts work. She's been putting the code that she gets online (she's the math powerhouse of the equation) and I've been merging it with my code and figuring out more efficient algorithms which will be important later when we're ultimately making a library.
The coding was a bit more frustrating this week as progress began to slow down because the initial stage of making documents and basic state/accessor/mutator methods ended and the real work of coding and debugging began. However, it was ultimately a pretty good week and I feel that I got some solid work done.
Week Four - Coding!
Posted by Bryn Reinstadler on June 23, 2012 at 9:45pm
Week Four was really great (I just finished it I suppose, because it's Saturday). Because my mom and brothers weren't here like they were during Week Three, I got a whole ton of stuff done and I'm just very excited about the progress that we're making.
Energy Functions: In this particular project, life is still going on drudging through as normal. I've started a large document on Google Docs called "Energy Function table" that basically lays out what the different energy function parameters are that are used in any of the papers, talks about which papers its used in, implementation, and types/numbers of atoms it uses (all-atom versus backbone atoms only). So this is going a little bit slowly but I'm about halfway done, which is uber-exciting because I've been spending most of my time on...
Super Secondary Structure Motif Coding! Finally, coding! The fun part! So we started coding in earnest this week, making data structures, and designing software infrastructure. It's been a process because when you begin to code math you realize whether or not you understand math (haha) and when you begin to code anything you begin to realize that you know nothing. So we've been learning more and more but are still able to log some time at the keyboard, which is the best. We're compiling all of our research from the last three weeks into this cumulative effort, which will take a PDB file and a DSSP file and output information about the protein's ssms. This is a great deal of information to coordinate between the two files, unfortunately, which we're still working out the kinks of, but I have a lot of faith in where we're going right now. Our project can be found on GitHub under "BrynMarie" and "Bioinformatics-Project". Yeah!
Weeks like this must be why people do research!
Week Three - Hitting a Stall
Posted by Bryn Reinstadler on June 15, 2012 at 7:10pm
Week Three was a particularly non-productive week, I felt. A lot of the work was still left from Week 2, and it was still drudge work, but I felt less motivated to do it because I had been involved in it for so long.
Happily, there were some advances. For example, on project number one with secondary structure motifs, we moved from the preliminary stages of getting down the geometry of Smotifs and reading papers, onto the next stage of writing pseudocode. Unfortunately we hit a few blockages which meant we needed to do more research, but it was really cool to see the project moving ahead. We had some difficulties ascertaining whether or not the authors of the main paper we are working on reproducing (Secondary Structural Characteristics of Novel Protein Folds, Narcis Fernandez-Fuentes) used amino acids or not in their calculation of the geometry of Smotifs.
On project number 2, we also made some progress. While I was done hand-mining information on who the top ten were from the CAPRI site, I still had to find all of their important papers and compare them. That was a long, arduous task. While finding all of the papers only took a morning, and paring them down to the important papers based on their abstracts was the work of a day, the rest of the week was spent reading and analyzing the papers, further paring them down. Even papers that I thought would be useful based on their abstracts turned out to be non-useful for our purposes (they used all-atom energy structures instead of just utilizing backbone atoms, for example). At the same time I was working on the other project, which is why it seems both of these projects are going glacially.
On a more personal note, I've finally settled in to living in the DC area. Though I thought at first that I would not particularly enjoy being in a big city, and I would be scared to be out at night, it's actually just fine. DC is a fairly friendly city. Also kind of cool, one of my good friends from high school called me up this week and told me that her sister was in town, and maybe we could hang out together. Her sister's name is Katie.
Happier still, my mom and my little brothers came out this week (which OF COURSE did not contribute AT ALL to how difficult I found it to get things done this week!). It was nice to see them again, for the first time since Christmas, and to go sight-seeing with them as well. Until next week!
Week Two - Getting Going
Posted by Bryn Reinstadler on June 7, 2012 at 11:25am
So on Week 1, we left off with me and my partner about to meet with my professor. We did meet with the professor, and it was really good. We talked about more efficient ways to mine data from the pdb website. That weekend, my partner and I had downloaded all of the pdb files using rsync through the pdb website. It was a long process but through downloading them we were able to find patterns in how pdb files were set up (also using the pdb documentation).
Also that week, I began to learn about my role in the second of my two projects, refining our team's current energy function using information from others. This week was a long week in regards to this project as well, as it was mainly hand-mining data off of the CAPRI website. The CAPRI website lists those people, corporations, and projects that do the best in predicting native structure of given proteins. Therefore, the people/projects/corporations who do the best are generally considered to have the best technology and/or code at their disposal. So, we're looking through the code and implementation of the Top 10 from the CAPRI experiments in order to look for accurate energy functions.
Kind of a lonely week. I telecommute most of the time because I live fairly far away (about an hour and a half commute by public transit) and my mentor can only meet with me about once a week or so because she is so busy.
Week One - SSMs
Posted by Bryn Reinstadler on June 1, 2012 at 1:15pm
So far, I've been working most on the SSMs project because the other project is waiting on my colleague Irina's deadline to turn in a paper. So, here's how the SSMs project has been going.
We started off with working on a literature review of the pertinent information. I worked my way through the course modules of GMU's Computational Biology class in order to ready myself for the technical terms that were about to hit in the papers that were to come. I reviewed 3 technical papers and 2 class lecture slides. Of the technical papers, I had to write a full summary of one of them and turn it in to my professor. I was also concurrently working on reading a paper for my other project on energy functions. I also wrote a full summary of that paper and turned it into my mentor.
After turning in the summary of SSMs, my partner and I went looking for trouble. Or rather, went looking for the 324 SSMs (Smotifs, as per paper terminology) that had been used in the paper that we researched. We emailed the authors and the authors were very warm towards us and offered us not only the information that they had on each type of Smotif, but also a sheet of the necessary mathematical equations to model them geometrically, as well as some of the important topics that they had not had time to cover. Those include adding an energy function to make sure that the two fit, and exploring the idea that the Smotifs follow what is called a Boltzmann distribution.
So now we're going to start jotting down implementation notes. We're meeting with Professor Shehu (my mentor) on Wednesday to further discuss implementation details and how we're going to jerry-rig -- er, delicately insert -- our code into their mainframe.
On perhaps a less work-oriented note, I've moved successfully into the place that I will be staying for 6 of the next 10 weeks. I'll be living with my uncle in Arlington, Virginia. Should be fun! The neighborhood I'm living in is super cute and has a lot of conveniences pretty nearby. I went grocery shopping for myself for the first time ever this week. I feel like an adult...almost...
The Research Project - The Beginning
Posted by Bryn Reinstadler on May 30, 2012 at 10:20pm
I officially started work on May 28th, 2012, Memorial Day. We had our orientation and were given literature to review in prepation for the next stage of our respective research projects.
I may as well introduce you to the crew:
*** Professor Shehu -- She's my mentor for the summer, in charge of my research projects.
*** Jennifer Van -- I'm working with her on one of my two projects. She's an undergraduate at George Mason University (GMU) but she already has one undergraduate degree in math. She's coming back a second time to get her degree in Computer Science.
*** Irina Hashmi -- I'm working with Irina on the second of my two projects. She's a Computer Science PhD student at GMU.
And now, for my projects.
My first project is working with computationally creating realistic protein structures. The PhD lab here already has a working model of a program that produces realistic protein structures; however, my research collaborator Jennifer Van and I are adding another piece to help make the simulation more accurate. Specifically, this summer we will be coding (in Java) the simulator to recognize and deal appropriate with Super-secondary Structure Motifs, or SSMs. SSMs are combinations of alpha-helices and beta-sheets that are smaller than subunits or protein domains, yet can have a profound impact on the structure of the protein and can be used to help estimate the native structure of a protein.
My second project deals with protein-ligand docking. I will be helping the PhD student Irina work on an energy function to accurately model the structure of protein docking with various ligands. Energy functions are functions that take into account several chemical and physical elements of the surroundings of a protein and its ligand and then use those elements to predict how much free energy would be in that system. The lower the number, the better the relative fit of the ligand and the protein. Because of the computational difficulties involved in estimating the value of forces such as van der Waals, hydrogen bonds, ionic interactions, polar interactions, and so forth, the energy function needs to be as precise as possible in order to yield accurate results. I will be helping with coming up with an energy function appropriate for the simulations that we will be running.
All in all, I'm pretty excited, but a little nervous. Three days ago I couldn't've told you what a super-secondary structure motif was, and by today I have read so much about them I feel ready to give a lecture on what they are and why they're important. As for energy functions, I have a fairly firm basis on where I'm supposed to start, but I also have this feeling I'm going to have to work through some serious amounts of chemistry before I'm fully ready. Either way, it's been an exciting first three days and I'm more excited than an electron being hit by a photon to continue onwards. (Learn to expect the chemistry jokes. This is how I learn.)