The main drive of this project, headed by Dr. Amato, is to use established OBPRM (Obstacle Based Probabilistic RoadMap) techniques to study protein folding.
OBPRM is an extention of Probabilistic RoadMap planning as developed in the early 90s by a number of robotics researchers. The method is simple and fairly easy to implement.
1. Learning Phase: During the learning phase, a roadmap is constructed from the free C-space (the configuration space not lying in an obstacle). To ensure a "good" distribution of nodes, the learning phase consists of two distinct parts. First, a predetermined number of nodes representing free C-space are randomly generated. These nodes are then joined by collision-free edges. If it is determined that there are more "difficult" areas of the C-space (such as folding around obstacles or narrow passages), a second step is initiated, which creates more nodes within a certain radius of the difficult spots (this is where the "probabilistic" bit comes in). Similarly, these new nodes are connected to the graph.
2. Query Phase: During the query phase, the origin and goal are connected to the roadmap (usually by the shortest obstacle-free course). Then a simple query is done on the roadmap to determine the best path of motion. If the query fails, either the origin and goal are not connectable, or more learning needs to be done.
Applictations to Biochemsitry
Thus far, most applications of OBPRM (and PRM in general) remain in the realm of robotics. However, Dr. Amato's work has shown that it may also be very effective in studying protein folding.
Traditionally, most studies in protein folding involve predicting the native fold (final configuration) of the amino acid chain. Using rules of chemistry, such as hydrogen bonding, Van Der Wahls interaction, and hydrophobicity, a general idea of how a chain of amino acids interacts with itself may be formed. However, this process is extremely complex, and is often indefinite. For instance, many studies involving snapshots of a protein undergoing secondary and tertiary structure changes indicate that a protein may take several different paths to its native fold. The hope is that by using OBPRM methods, we may be able to better understand how specific proteins fold, and then generalize our results.
Our current method is to model a protein as a many dof (degree of freedom -- two per joint) robot, with the C-space representing the configurations the protein might have. Because we are not interested in movement within a space, but rather contortion around the joints, we do not need to consider foreign obstacles. Rather, we need only look at self-collisions.
In addition, we calculate energies of possible configurations and are thus able to determine the most energetically favored path the protein may follow. In fact, the hope is to set a maximum energy level and therefore discard many of the configurations generated by our algorithm.
Much work on this project has already been done, by Dr. Amato and her student Guang Song. They have written a program that generates and analyzes roadmaps. However, even for short amino acid strands, many computations must be made, so the program is very computationally intense. Thus far, the average learning time is 15 hours, and query takes a half hour. One aspect then of the current problem is to parallelize the algorithm, so that bigger proteins may be considered.
As of June 25th, we have parallelized the code (i.e. it compiles and runs in parallel). However, we are still testing it to see if it actually generates the expected results. Also, there are some interesting problems, with regard to the time it takes to run. For instance, with running the program on two processors will speed the program up almost two fold. For four processors, it is almost four fold better. However, for eight processors, the amount of time it takes to execute is longer than for four, though when we go for twelve, the time drops again. We well need to investigate this further.
Another aspect of the project is to automate interpretation of the program's results. So far, it is only able to produce graphs, which are then analyzed by humans. Our hope is to be able to return not only graphs indicating energetically favored paths, but also to generate answers to questions often posed by biochemists. One question we are looking at specifically is whether secondary structures form before tertiary structures. Usually, they are considered to have a definite order, due to energy potentials contained in each configuration. But much work in biochemistry suggests that it is not necessary for secondary structures to be formed before tertiary structures. Some of our current results are consistent with this theory. We would like therefore to be able to use our results to calculate the probability of different orders of formation.
We are currently working on a number of functions that will enable us to study secondary structure formation order better. We believe this will help us verify our results with those provided from the biochem community.
Documents and Presentations