Weekly Research Journal

Weeks: 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10

Week 1: May 14-18

I spent my first week in College Station at Texas A&M. I started reading several journal papers about bioinformatics (mainly on protein folding and ligand docking). Some of the papers were from Dr. Amato's group at A&M and some were from Dr. Kavraki's group at Rice. From these, I got a good feel of the motivation behind the biology applications as well as their potential impact in the field of medicine and pharmeceudical drug design.

We also madly worked on our presentations for ICRA. (ICRA is the International Conference on Robotics and Automation. It was held in Korea this year!) After countless hours of practice, capturing movie clips, and reinstalling windows/office (don't ask...) we were presentation ready! It was really exciting, but I was also really nervious.


Week 2: June 4-8

After a glamourous 10 days in foreign lands (my first time out of the country), I began my first week at Rice. Luckily, I met my mentor, Dr. Kavraki, and one of her students at the conference, so I had a pretty good chance of seeing a familiar face.

I trudged through Houston rush-hour traffic Monday morning (not a pretty sight) and arrived at Rice. The campus is beautiful! I love all the trees. Of course I'm partial to Houston -- I was born and raised here. I had been to Rice before in high school for a cross-country meet, but I didn't get a chance to see the campus. Let me just say I love it. It's a little smaller than I'm used to (coming from Texas A&M) but it's nice for a change.

I made it to Duncan Hall pretty easily, but once inside (I must confess) I was lost. First of all, I couldn't remember her office number and I couldn't find any signs for the Computer Science Department. I vaguely remembered something about being on the third floor, so I thought I would give it a try. I wasn't wandering the halls too long before Dr. Kavraki found me. She showed me where everything was and helped me set up a computer account, student id, etc.

This week was mainly spent getting familiar with the campus, getting all the bookkeeping squared away, and reading, reading, reading. (I have very little background in bioinformatics, so I had a lot of ground to cover.) I read a book on rational drug design (drug design with computer assistance) and several different journal papers. It was all very interseting, and I learned a lot.

We had our (I really should say my) first group meeting this week. We had cookies -- if that won't entice you, I don't know what will. At the meeting I introduced myself and met everyone in the Physical Computing group. It was really neat to hear a little about what everyone is doing. Everyone is really nice and helpful. Every week we will meet and someone will present about a paper or their research.

All in all, it looks to be a great summer!


Week 3: June 11-15

I finally got a computer account!!! That was probably the highlight of my week! I can't tell you how excited I was about it.

With my new computer account I began to get familiar with different software tools for displaying molecules (mainly proteins in my case.) I started with RasMol, a free visualization tool written by a guy in the UK. It was pretty easy to install and use. (To try it out yourself, go to their website. If you have the right version of windows/netscape you can try it without having to install it through the web-based version - Protein Explorer.)

After experimenting with RasMol, I tryied Sybyl, another visualization tool but of the more expensive variety. It is pretty powerful, but I think it has more bells and whistles than I'm gonna need. I spent a lot of time going through the manual and the tutorials. They were very helpful, and I at least know enough about it to use it this summer.

We had our weekly group meeting on Wedneday afternoon. Andrew talked about his research -- Motion Planning and Simulation of Deformable Objects. It was very cool, but some of the math was a little over my head. I can't wait to see a simulation. I should ask him about that...

The second half of the week was spent understanding how Ming's code works. Ming wrote a program that given a molecule and some torsional angles, it computes the new xyz coordinates for each atom in the molecule. He has four ways to calculate the new coordinates, each one faster than the one before. I started with the first way (Simple_rotations) because it is the simplest. Paul and I changed the code to read .mol2 files instead of .pdb files. Not only are .mol2 files easier to read/create and viewable with RasMol, they explicitly list the bond information. This is great because now we can read in the bonds instead of calculating them. We also tweaked the code to generate a random conformation (i.e. random torsional angles) instead of inputing them. We did this for two reasons. One, we don't have to worry about an input file, what the bond id's are, etc. Two, Paul needs to generate random conformations for his research on ligand docking.


Week 4: June 18-22

This was a pretty productive week. I finished fiddling with Simple_rotations. I updated Group_frames (the most complicated way -- but the fastest) so it can input/output .mol2 files, derive atom groups from the molecule, produce a random conformation, and calculate the CORRECT! xyz coordinates. It works!

This time Paul presented a paper at the group meeting. We gave him a hard time because he almost forgot the cookies. (The rule is the presenter has to provide the cookies.) The paper was on Quasi-Random PRM (QPRM). PRM is an algorithm to solve the motion planning problem: given a robot and an environment, compute a collision-free path between the start and the goal. The basic idea behind PRM is to build a roadmap for the robot to it can navigate safely around its environment. The roadmap is simply a graph where the nodes are collision-free points and the edges are collision-free paths inbetween them.

PRM uses random samples, but QPRM uses quasi-random samples. The idea is that quasi-random samples are more uniformly distributed than random samples, so the algorithm will perform better. (This paper was presented at the ICRA conference this year.) The paper didn't discuss a few points, though. First, how expensive are these quasi-random points to generate? How do they compare with random points? Second, some variations of PRM (like OBPRM, MAPRM, RRT) have different heuristics to enhance the roadmap after it is built. How does QPRM compare with these?


Week 5: June 25-29

Last week Paul was working on the energy calculations. He got his stuff working and we merged the two versions at the beginning of this week. Now, Group_frames method can generate a random conformation, compute the xyz coordinates, and calculate the van der Walls energy of the molecule. "One small step for man, one giant leap for mankind!"

Brian talked about his new research project on Wednesday. His project is on 3D database searching -- using geometric hashing to enhance results from the evolutionary trace. Pretty self-explanatory, don't you think? Well for those of you (like me) who have no clue what half the words are talking about, let me (try to) explain. The goal of Brain's research is to classify proteins by function, something that hasn't been done but is desperately needed. A protein's function is based only on the protein's shape. It has nothing to do with its sequence (the order of amino acids).

So where does geometric hashing come in, and what in the world is the "evolutionary trace" anyway? For starters, geometric hashing is an algorithm that identifies objects in a scene based on object features (points of interest like verticies, etc.) Geometric hashing can identify objects regardless of how they are translated, rotated, or scaled. Pretty neat, huh? Oh, about that trace thing. Well scientists have discovered that some proteins in different animals have very similar characteristics (or similar parts of the protein sequence). They believe that since these same characteristics appear across different animals and the proteins perform the same function, these characteristics must determine its function. Brian uses the evolutionary trace to identify "interesting" parts of the protein. He then uses geometric hashing to search a 3D database of proteins looking for ones that have the same "interesting" parts.

The last half of the week was spent investigating protein-to-protein interactions. After searching Medline (a medical database) and google, I found a few examples of protein pairs. Some of the proteins were not in the PDB and are of no use to our research. (PDB is a database of various protein structures. We would like to study proteins that have known structures.) All in all, it was pretty frustrating, and I don't have a lot to show for my efforts. The main problem is that I don't have a good enough biology background to search effectively. (I couldn't understand half the words in most of the articles I read!)


Week 6: July 2-6

Monday, some guys from Baylor College of Medicine came. Dr. Kavaraki is beginning a new research project with them. Brian gave a presentation on this project last week. We will have meetings for the next three weeks to discuss the details of the project -- Monday was more of an overview.

I attended my first pizza talk on Monday. From what I understand, every couple of weeks, someone talks about their research while everyone eats pizza. (Free pizza -- I'm there!) This time the talk was about some research in protein-to-protein interactions. They used machine learning algorithms to help divide pairs of proteins into two groups: those that interact with each other and those that don't. I liked the idea of using machine learning algorithms, but I am not sure about his implementation. He uses the sequence of the proteins as input to the machine learning algorithms instead of 3-D structures. Most biologists (I think) believe that the 3-D structure is what determines how proteins interact, not their sequence. Yes, their sequence helps determine their structure, but when studying interactions, it's the structure that counts. I think he targeted sequence instead of structure because we know more sequences than structures and structures may be difficult to use as inputs to learning algorithms. I'm not exactly sure how you would do it...

Tuesday we had an unusually long group meeting. Brian presented a paper that is related to the new research project. It presents an algorithm to find matches among different constellations, or groups of points. For some reason the algorithm was confusing, so the meeting lasted quite a while. Honestly, most of it was over my head. I think I got the general idea, but some of the details went in one ear and out the other!

When I wasn't attending meetings pizza talks, I was working on the protein-protein investigation I began last week. Still with little success. I did find a few examples, but they weren't what Dr, Kavraki was looking for. She suggested that I read some chapters out of a biochemistry textbook. I think it will help me search more effectively. She also gave me a few pointers on the kinds of interaction she is looking for. I think I can find a some good examples, after I read the text book.


Week 7: July 9-13

So far, I'm still reading the biochemistry textbook. I've read several chapters on proteins:

It's going pretty slow because I have to sift through a lot of chemistry terminology, but I think I'm learning a lot, which is the point.

The second half of the week went much better than the first. I am finally making some progress in this "investigation." I have targeted several different examples of protein-protein interactions. You can check them out here.

I discussed my findings with Dr. Kavraki on Friday. We decided to focus on calmodulin. It is a well studied protein that regulates Calcium levels and triggers other reactions. The next step is to design an algorithm to compute a realistic path between calmodulin's bound and unbound conformations.


Week 8: July 16-20

I spent the beginning of the week working on an algorithm to efficiently compute paths from start and goal conformations of a protein. (We are focusing on calmodulin specifically.) It wasn't easy, and it isn't clear how well it will work, but we are going to implement a variation of RRT (Rapidly-Exploring Random Trees) and similar method developed by David Hsu, Jean-Claude Latombe, and Rajeev Motwani. You can check out the paper here under "Expansive Spaces".

I met with Dr. Kavraki and we discussed the proposed algorithm. With a few minor adjustments, we are go for launch. Since there are only a few weeks left in the summer, we decided to focus on implementing the pieces of the algorithm. Then, at a later date, I can put the pieces together in a framework that I have from the robotics group at A&M and test the code. Here are the pieces:

I spent the rest of the week working on the VDW energy calculations. Paul already has a working implementation, so I am tweaking his for my purposes. I have even tailored it to make more efficient energy calculations when parts of the protein are known to behave in a rigid way.


Week 9: July 23-27

This week, I was able to finish up the VDW energy calculations. I also implemented a cutoff distance. If two atoms are far enough apart, the energy resulting from their interaction can be negelected. I set the cutoff distance to 8 angstroms. It speeds up the calculations a bit, but they are still slow.

I have also implemented the negative gradient descent to minimize a conformation's energy. It is implemented as outlined in the previous week with one minor change. Instead of computing a bunch of random conformations at the same time (which would be very useful on a parallel machine), I compute them one at a time. When I find one with a lower conformation than the original, I replace the original conformation with it. This allows less storage overhead and a larger minimization.

I have also implemented the last task: generating a random conformation in the neighborhood of a know conformation. To do this, several bonds (not all of them) are selected at random. Then a slight random angle displacement is applied to the selected bonds. This generates a new conformation in a random way that is still very similar to the original conformation. I have also incorporated this in to the energy minimization procedure.


Week 10: July 30-August 3

This last week was most splent tying up loose ends. Besides cleaning up the code etc, I mainly worked on the final report. It has been good to put all the ideas/work down on paper. The summer has been great, and I've really learned a lot about bioinformatics and research in general. For more info on what I did this summer, where the research stands, etc., check out the final report.

I would like to give a big THANK YOU to all the people that helped me out this summer. Thanks to: Dr. Kavaraki, everyone in the Physical Computing group at Rice, and all the nice people that showed me where to go when I looked lost.


Back

Last updated: 8/03/01