Liz's DREU 2018
Week 10
August 2, 2018
This is my final week working with the Kavraki Lab at Rice! I did a lot to wrap up my work this week. I finished up my last steric clash diagnostic for the DINC program. We started a final experiment on a new super-computer, Stampede2. Stampede2 allows for us to use up to 68 docking tasks as a parameter. Our goal was to push the docking task curve out far enough to experience a plateau for both the LEADS and the Renard dataset. This week will finish off with a poster session with CPRIT! The poster session includes students from around Rice and the medical complex that have been working in summer research internships similar to mine. Yesterday I gave a practice presentation during the group lab meeting. The last tasks that I need to complete include making sure the data I have gathered from the experiments I have run on the super-computer is organized and clear. I will be leaving Houston and heading home this weekend. It has been a wonderful experience!
August 2, 2018
This is my final week working with the Kavraki Lab at Rice! I did a lot to wrap up my work this week. I finished up my last steric clash diagnostic for the DINC program. We started a final experiment on a new super-computer, Stampede2. Stampede2 allows for us to use up to 68 docking tasks as a parameter. Our goal was to push the docking task curve out far enough to experience a plateau for both the LEADS and the Renard dataset. This week will finish off with a poster session with CPRIT! The poster session includes students from around Rice and the medical complex that have been working in summer research internships similar to mine. Yesterday I gave a practice presentation during the group lab meeting. The last tasks that I need to complete include making sure the data I have gathered from the experiments I have run on the super-computer is organized and clear. I will be leaving Houston and heading home this weekend. It has been a wonderful experience!
Week 9
July 28, 2018
This week I ran experiments on the LEADS dataset to evaluate the fragment size parameter in DINC. This was a frustrating experience because the dataset contains many complexes and because our allotted computation hours on the super computer are running out I had to run the datasets in small pieces. This doesn’t effect the results at all or the methodology of the experiment. It just makes for a whole lot more work managing the resulting data. The analysis of this data revealed similar trends as the previous dataset: primarily that there is no best fragment size that will work for all or even most protein-ligand complexes. In future experiments, the goal will be to explore if varying the fragment size as some ratio of the overall ligand size will yield good results. Another approach may be to determine a battery of different parameters, which when applied across a dataset will be able to produce a good result for all complexes in the dataset. In addition, this week I have been working on another diagnostic for the DINC program for steric clashes. This one determines whether or not there are any lone heavy atoms in the molecular file that are not connected to any other heavy atoms.
July 28, 2018
This week I ran experiments on the LEADS dataset to evaluate the fragment size parameter in DINC. This was a frustrating experience because the dataset contains many complexes and because our allotted computation hours on the super computer are running out I had to run the datasets in small pieces. This doesn’t effect the results at all or the methodology of the experiment. It just makes for a whole lot more work managing the resulting data. The analysis of this data revealed similar trends as the previous dataset: primarily that there is no best fragment size that will work for all or even most protein-ligand complexes. In future experiments, the goal will be to explore if varying the fragment size as some ratio of the overall ligand size will yield good results. Another approach may be to determine a battery of different parameters, which when applied across a dataset will be able to produce a good result for all complexes in the dataset. In addition, this week I have been working on another diagnostic for the DINC program for steric clashes. This one determines whether or not there are any lone heavy atoms in the molecular file that are not connected to any other heavy atoms.
Week 8
July 21, 2018
I ran the third experiment on one of my datasets of protein-ligand complexes this week. There were a few takeaways regarding the fragment size parameter from these initial results. The most important insight is that there does not seem to be one number that is the optimal fragment size for all ligands. This is expected. A fragment with 6 degrees of freedom from a ligand with 12 degrees of freedom is a significantly different scenario than a fragment with 6 degrees of freedom from a ligand with 54 degrees of freedom. In general the results seem to be pointing towards an ideal fragment size slightly smaller than the size of the entire molecule that would allow for just a few rounds of incremental docking. Further experiments will be designed to explore the effects of changing the fragment size relative to the size of the particular ligand being docked, but first we are going to run the same experiment on the other dataset. This next dataset contains significantly larger ligands than those in the first so the results may be more illuminating.
July 21, 2018
I ran the third experiment on one of my datasets of protein-ligand complexes this week. There were a few takeaways regarding the fragment size parameter from these initial results. The most important insight is that there does not seem to be one number that is the optimal fragment size for all ligands. This is expected. A fragment with 6 degrees of freedom from a ligand with 12 degrees of freedom is a significantly different scenario than a fragment with 6 degrees of freedom from a ligand with 54 degrees of freedom. In general the results seem to be pointing towards an ideal fragment size slightly smaller than the size of the entire molecule that would allow for just a few rounds of incremental docking. Further experiments will be designed to explore the effects of changing the fragment size relative to the size of the particular ligand being docked, but first we are going to run the same experiment on the other dataset. This next dataset contains significantly larger ligands than those in the first so the results may be more illuminating.
Week 7
July 14, 2018
This week I worked on finishing up and testing my restart script for DINC! This was a good task to be able to check off of my to do list. Developing this functionality forced me to really familiarize myself with the structure of the DINC program. In addition this week we started to set up the next experiment that we will be running on all of the datasets that we have been working with. So far we have examined the computation time given different configurations of computing resources on the supercomputer, looking at the effects of increasing the Exhaustiveness parameter in Vina, and experimenting with the effect of increasing the number of docking tasks used. There are two main methods that the DINC software incorporates to improve upon the results obtained using an existing docking software like Vina or Autodock4 on its own: combining the results of multiple instances of the docking tool run in parallel and docking the ligand in overlapping fragments incrementally. Our experiments so far have not involved the incremental aspect of DINC- only the multi-threading aspect. The results from the next experiment we run on all of the datasets will help us to determine the best fragment size to use for re-docking.
July 14, 2018
This week I worked on finishing up and testing my restart script for DINC! This was a good task to be able to check off of my to do list. Developing this functionality forced me to really familiarize myself with the structure of the DINC program. In addition this week we started to set up the next experiment that we will be running on all of the datasets that we have been working with. So far we have examined the computation time given different configurations of computing resources on the supercomputer, looking at the effects of increasing the Exhaustiveness parameter in Vina, and experimenting with the effect of increasing the number of docking tasks used. There are two main methods that the DINC software incorporates to improve upon the results obtained using an existing docking software like Vina or Autodock4 on its own: combining the results of multiple instances of the docking tool run in parallel and docking the ligand in overlapping fragments incrementally. Our experiments so far have not involved the incremental aspect of DINC- only the multi-threading aspect. The results from the next experiment we run on all of the datasets will help us to determine the best fragment size to use for re-docking.
Week 6
July 7, 2018
I spent a lot of time this week working on the development of the restart job feature for the DINC program. My main challenge with the development of the feature was in understanding the DINC code that I was adding on to. I had to do a close read of the code to understand the program flow. Part of enabling the restart is understanding which temporary variables need to be regenerated in the event of a restart. Another aspect is to select a restart point based on the docking files that have already been generated in the directory. I don’t anticipate that the final code that I write for this feature will be very long at all. The challenge is understanding the code that already exists. In addition to the programming, I have continued bench-marking a new dataset. The experiments that we have been running are to evaluate how many threads of the docking software (Vina) are needed to generate the best results. Vina is the software that is incorporated into DINC to perform the actual docking. This phase of experimentation is almost over.
July 7, 2018
I spent a lot of time this week working on the development of the restart job feature for the DINC program. My main challenge with the development of the feature was in understanding the DINC code that I was adding on to. I had to do a close read of the code to understand the program flow. Part of enabling the restart is understanding which temporary variables need to be regenerated in the event of a restart. Another aspect is to select a restart point based on the docking files that have already been generated in the directory. I don’t anticipate that the final code that I write for this feature will be very long at all. The challenge is understanding the code that already exists. In addition to the programming, I have continued bench-marking a new dataset. The experiments that we have been running are to evaluate how many threads of the docking software (Vina) are needed to generate the best results. Vina is the software that is incorporated into DINC to perform the actual docking. This phase of experimentation is almost over.
Week 5
June 30, 2018
In addition to running datasets with different combinations of parameters with the DINC tool using the supercomputer, this week I started working on a development task for DINC. When we are testing DINC we generally run it over a substantial dataset containing different protein-ligand complexes. Even using a supercomputer, DINC can take a long time to run over an entire dataset. Since there is a maximum time-length that a job on the supercomputer (Comet) can run, the lab has experienced jobs that have terminated prematurely when the time limit is reached. Not wanting to waste results from computational resources that have already been spent, lab researchers will go in and restart the jobs by hand from a midpoint. The functionality that I will be adding to the DINC program will allow jobs that are partially completed to be restarted automatically from the point of interruption.
June 30, 2018
In addition to running datasets with different combinations of parameters with the DINC tool using the supercomputer, this week I started working on a development task for DINC. When we are testing DINC we generally run it over a substantial dataset containing different protein-ligand complexes. Even using a supercomputer, DINC can take a long time to run over an entire dataset. Since there is a maximum time-length that a job on the supercomputer (Comet) can run, the lab has experienced jobs that have terminated prematurely when the time limit is reached. Not wanting to waste results from computational resources that have already been spent, lab researchers will go in and restart the jobs by hand from a midpoint. The functionality that I will be adding to the DINC program will allow jobs that are partially completed to be restarted automatically from the point of interruption.
Week 4
June 23, 2018
This week has been all about running DINC with different combinations of parameters on the supercomputer. I have generated a lot of data and have been doing a lot of organizing in Excel. It is interesting to see how the changing parameters are affecting the results that I see. The two most important aspects of the docking process are 1) generation of possible docking conformations 2) selection of the lowest energy conformation from all those that were generated. Scoring functions are used to complete step 2 and they take into account the interactions between the specific atoms in the protein and the ligand. Since there are so many interactions happening only the most significant ones are considered. Once again I am seeing the balance in this work between level of detail/accuracy and what is practical computationally. A very detailed scoring function that took into account even minute interactions would likely be able consistently choose the lowest energy conformer, but if it takes too long to do so it is not practically useful for the task.
June 23, 2018
This week has been all about running DINC with different combinations of parameters on the supercomputer. I have generated a lot of data and have been doing a lot of organizing in Excel. It is interesting to see how the changing parameters are affecting the results that I see. The two most important aspects of the docking process are 1) generation of possible docking conformations 2) selection of the lowest energy conformation from all those that were generated. Scoring functions are used to complete step 2 and they take into account the interactions between the specific atoms in the protein and the ligand. Since there are so many interactions happening only the most significant ones are considered. Once again I am seeing the balance in this work between level of detail/accuracy and what is practical computationally. A very detailed scoring function that took into account even minute interactions would likely be able consistently choose the lowest energy conformer, but if it takes too long to do so it is not practically useful for the task.
Week 3
June 16, 2018
I got to explore the DINC program in depth this week. I read through the code on GitHub and installed and ran several jobs on DINC on my own computer. DINC is scripted in Python and uses several modules that require a Linux environment. Since my personal laptop is a Windows 10 machine, I created a virtual machine and installed it on that. Didier is interested in me trying to install DINC on my native machine as well, so I will give that a shot next week. While they are not distributing DINC in any way right now, he thinks knowing how it will work on Windows will be helpful when they do.
The docking tasks take a significant amount of time on personal machines even when running it with relatively small molecular complexes. Next week we will get access to a supercomputer which we will be running DINC on to perform the bench-marking on several sets of different protein-ligand complexes.
June 16, 2018
I got to explore the DINC program in depth this week. I read through the code on GitHub and installed and ran several jobs on DINC on my own computer. DINC is scripted in Python and uses several modules that require a Linux environment. Since my personal laptop is a Windows 10 machine, I created a virtual machine and installed it on that. Didier is interested in me trying to install DINC on my native machine as well, so I will give that a shot next week. While they are not distributing DINC in any way right now, he thinks knowing how it will work on Windows will be helpful when they do.
The docking tasks take a significant amount of time on personal machines even when running it with relatively small molecular complexes. Next week we will get access to a supercomputer which we will be running DINC on to perform the bench-marking on several sets of different protein-ligand complexes.
Week 2
June 9, 2018
This week I had the opportunity to visit several laboratories in the medical complex on Galveston Island (on the Gulf Coast). We saw NMR and crystallography equipment. I had flashbacks to organic chemistry. The tour was a part of the CCBTP (Computational Cancer Biology Training Program) of which Didier and Stephen are a part. Although I am working in Kavraki Lab as part of the DREU program, I will be participating in this program as well. I think that this will be a great opportunity, both to have exposure to more of the wet-lab/biology/chemistry aspects of this research and to be able to meet other undergraduates doing research this summer in the area.
We have been having daily discussions with Didier about the material we are reading this week and have begun exploring the online version of DINC.
June 9, 2018
This week I had the opportunity to visit several laboratories in the medical complex on Galveston Island (on the Gulf Coast). We saw NMR and crystallography equipment. I had flashbacks to organic chemistry. The tour was a part of the CCBTP (Computational Cancer Biology Training Program) of which Didier and Stephen are a part. Although I am working in Kavraki Lab as part of the DREU program, I will be participating in this program as well. I think that this will be a great opportunity, both to have exposure to more of the wet-lab/biology/chemistry aspects of this research and to be able to meet other undergraduates doing research this summer in the area.
We have been having daily discussions with Didier about the material we are reading this week and have begun exploring the online version of DINC.
Week 1
June 2, 2018
It was a long drive from Minneapolis to Houston. Arriving here I was surprised by the size of Houston. I didn’t realize how big of a city it is and it is sprawling. My second day in the city I visited Rice University for the first time and met the post-doc who is serving as my day-to-day mentor, Didier. Didier explained some about the Kavraki Lab and the work I would be doing this summer.
I quickly met my mentor, Dr. Lydia Kavraki and the rest of the lab in an all lab meeting and learned about some of the work the Robotics side of the lab is doing. Dr. Kavraki was incredibly welcoming and made sure that my housing and transportation was in order for the summer. I also met Stephen, another undergraduate who I will be working in collaboration with on much of the work this summer. For this first week, Didier provided us with papers the lab had written regarding DINC and docking tools in general to help us familiarize with the work of the lab.
June 2, 2018
It was a long drive from Minneapolis to Houston. Arriving here I was surprised by the size of Houston. I didn’t realize how big of a city it is and it is sprawling. My second day in the city I visited Rice University for the first time and met the post-doc who is serving as my day-to-day mentor, Didier. Didier explained some about the Kavraki Lab and the work I would be doing this summer.
I quickly met my mentor, Dr. Lydia Kavraki and the rest of the lab in an all lab meeting and learned about some of the work the Robotics side of the lab is doing. Dr. Kavraki was incredibly welcoming and made sure that my housing and transportation was in order for the summer. I also met Stephen, another undergraduate who I will be working in collaboration with on much of the work this summer. For this first week, Didier provided us with papers the lab had written regarding DINC and docking tools in general to help us familiarize with the work of the lab.
Introduction
June 1, 2018
Hello! My name is Liz Palmi and I am an undergraduate student studying computer science at the University of Minnesota- Twin Cities. As a part of the DREU program this summer I was matched with my mentor Dr. Lydia Kavraki who is a professor of computer science at Rice University in Houston, TX. The Kavraki Lab does research in the domain of physical algorithms. Researchers in the lab focus on applications in two main areas: Robotics and Biomedical Computing. My work this summer will be with researchers on the biomedical side. The lab has been involved in work on computational tools that can be used to model protein structures and molecular interactions. These tools may aid in the development and analysis of new drugs, for instance, the highly individualized drugs that are used in immunotherapy treatments for some cancers.
June 1, 2018
Hello! My name is Liz Palmi and I am an undergraduate student studying computer science at the University of Minnesota- Twin Cities. As a part of the DREU program this summer I was matched with my mentor Dr. Lydia Kavraki who is a professor of computer science at Rice University in Houston, TX. The Kavraki Lab does research in the domain of physical algorithms. Researchers in the lab focus on applications in two main areas: Robotics and Biomedical Computing. My work this summer will be with researchers on the biomedical side. The lab has been involved in work on computational tools that can be used to model protein structures and molecular interactions. These tools may aid in the development and analysis of new drugs, for instance, the highly individualized drugs that are used in immunotherapy treatments for some cancers.