Week I

Jun 5, 2010

I arrived in College Station the Saturday before the program started. The weather is hot and humid just like in Alabama. I met a couple of students that Saturday. We went out to eat at PotBelly's here in town. The program started on Tuesday. I met some of the professors and other students participating in undergraduate research this summer, and there was a tour of the gigantic campus.

I met with my faculty advisor Dr. Taylor and another of my faculty advisors Dr. Wu. We discussed what I would be doing this summer, and they gave me lots of things to read so that I could familiarize myself with parallel programming concepts. I almost met Charles who is one of the graduate students who works with Dr. Taylor. He will also be helping me this summer.

The rest of the week I spent reading material about parallel programming and OpenMP and MPI. I also took a look at the MrBayes application.

Week II

Jun 12, 2010

Earlier in the week, I was not able to run the MPI programs because we couldn't get the programs to compile on Dori. Dori is the computer that we will be using this summer to do analysis. It is a machine at Virginia Tech. It has 8 nodes with 2 dual cores per node so there are 32 cores in all. So I wasn't able to run on Dori, but I was able to run the OpenMP programs using Visual Studio. I just set the number of threads, and I was able to see what was going on. Around Tuesday, I was able to start compiling and running both OpenMP and MPI programs on Dori, so now I can see how the MPI programs work.

The OpenMP programs to me are easier to understand than the MPI programs, but I hope after running and writing some more programs, the MPI programs will be easier to understand as well.

I read a paper and a PowerPoint presentation about the MrBayes computational biology program. I also took a look at the applications source codes. One of the files (out of many) has tens of thousands of lines!

One of the students here had a birthday this week, so we had ice cream cake to celebrate. We also went out to Antonio's pizza by the slice. Delizioso!

Week III

Jun 19, 2010

This week I read more about MPI and ran some more programs. I also looked at the MrBayes application to see some of the parallelization. MPI was used in several of the source files to parallelize it. I also looked for OpenMP directives. OpenMP was only used in one file, the mcmc file. The mcmc file is the largest with many lines of code. Since the program uses Markov Chain Monte Carlo chaining to approximate the posterior probability of the phylogenetic trees, I assume that is why it is so large. MCMC is used because the values cannot be determined analytically. I read all of this online. Some of the stuff about MCMC I do not understand truthfully.

Also this week, I wrote the Hello World program in C++ and Fortran to see how they were similar and different to the same program in C. There were many similarities, but also some differences.

Another student in the CSE REU program had a birthday this week. We again celebrated with ice cream cake.

Week IV

Jun 26, 2010

This week I continued to read about MrBayes. I read the manual for MrBayes and I also read some more about using Bayesian inference in phylogenetics. Some of the stuff I still do not understand, but as I read it more and more, I begin to understand things that I didn't before.

I also began running the program this week on different datasets. I compiled both versions of the program, the MPI only version and the hybrid. In the hybrid version, Charles told me how to change the number of OpenMP threads. However the program did not pay any attention to this for some reason, and just kept running with one OpenMP thread. So Charles told me that I would have to change it manually in the code. I ran the program using the different datasets to see how the program worked. It was nice to see how increasing the number of processes and threads made the program run faster. Some of the datasets are larger than others and take longer to run.

Week V

July 2, 2010

This week I ran the datasets and collected timing data. I ran the program on different datasets, but later in the week, I discovered that I was doing it wrong and had to start over. There are 8 datasets that I've been running. I usually run them three times and use the best time. Each dataset is run multiple times on a different number of cores. It's very interesting to see how much faster the programs execute with more cores. When I learned about parallel programming, it made sense that the programs would have reduced execution time when more processes were used to run the program, but it is very nice to physically see the theory in practice with my own eyes.

This week I also had to give a presentation on my project, what I've learned so far, and my current progress. I was really nervous about it, but we had been doing weekly presentations with Dr. Wu, so that made me feel somewhat better.

Also over the weekend, there was a Fourth of July celebration at the George Bush Presidential Library. We were able to check out the muesuem and get some food. There was an Elvis impersonator and fireworks later.

Week VI

July 9, 2010

This week we got accounts to access the TAMU Hydra computer. Hydra is one of the university's computers. It has 52 nodes and 832 cores total. So it is much bigger than Dori which has only 32 cores.

Running the programs on Hydra is much different than running on Dori. On Dori we had to type in mpirun …….. On Hydra we have to submit jobs. Dr. Wu tells us that this is the normal way of running programs on supercomputers. So he gave us two scripts for the MPI only version and the hybrid version. I had some problems compiling the program on Hydra, but Dr. Wu helped and now I'm able to run the program on Hydra. I am to run the program using up to 512 cores.

Week VII

July 16, 2010

This week I continued running the datasets on Hydra. I have been running both versions, but when I started on the hybrid version, there was a memory issue with all of the datasets. The amount of memory needed is proportional to the number of taxa in the dataset and also the number of chains. On Hydra, I was using more chains since there were more available cores, so this is where the problem came from. I started off running on at least 3 nodes and this seems to give the program enough memory. On Hydra, the hybrid version is significantly slower than MPI, but the hybrid begins to improve over time.

I also started writing some things on the research paper. I wrote about the application, some concepts about OpenMP and MPI, and the two systems I’ve been using.

Week VIII

July 23, 2010

This week, I continued working with Hydra. After running the datasets using the hybrid version, I discovered that the method I was using last week would not work with datasets with larger datasets. I was able to successfully run only the two smallest datasets. So now, Dr. Wu has advised me to do weak scaling. Where the number of chains per node is constant. So as the number of chains increases, the amount of memory available will also increase.

Also this week, I was supposed to collect power profiling data on Dori, but there was a problem logging in with the machine. So, I have to wait until that is fixed and start next week.

Week IX

July 30, 2010

This week I worked on my poster. I worked on it over the weekend, but we did a lot more throughout the week. I was not able to get any data from Hydra on 512 cores. I submitted all my jobs ahead of time and hoped that they would eventually be run in time, but they have been sitting in the queue for days. Since I couldn’t get data for the datasets on 512 cores, I excluded this problems size from the charts on my poster. Hopefully I will have this data to include in my paper. I continued to work on my paper. I had already done many of the different sections, but some of the stuff I added to my poster I think will also be good for my paper. So I’ve been going through my old weekly presentations and the poster to see if I might have mentioned something important in them that I left out of the paper. Also, we did a run through of our poster to practice for next week.

Week X

August 6, 2010

This week was a busy week. We had student poster presentations on Monday. I was dreading doing this. I guess I have stage fright, because I was nervous about speaking in front of everyone. It turned out okay, but I was very nervous. Tuesday was my birthday, and we also had another poster presentation. It was better because it was one on one, but there weren't that many people stopping by. I think it had to do with posters being so close and limited amounts of space. After the presentations we went to eat at Cafe Eccell to celebrate having one poster presentation down and only one more to go. Also we were celebrating my 21st birthday and the end of our time in Texas

Our final papers were due that Wednesday. I had already written most of it, but I had a lot of polishing left to do on it. I finished it and submitted it Wednesday afternoon. That took a load off my shoulders. After that I only had one more poster session on Friday.

At the poster presentation on Friday, there were two different sessions. I had to peer judge in the first session and present in the second. There were about fifty students presenting in both sessions. I judged two people, one guy from biomedical engineering and another from aerospace. I presented in the second session. So many people came by. I was nervous for the first presentation, but I think I improved for the others. There were a couple of people who came by who were really interested in my experiment because of the MrBayes application.

We had a farewell dinner that afternoon at C&J Barbecue. So my first and last encounter with Texas barbecue for the summer. Also this weekend, a couple of students in the program and I made a trip to Houston. It was not that far away and we had fun. We went to Chinatown and ate Thai food :-) We also got tea. I got strawberry milk tea with tapioca balls. I was not a fan of the tapioca balls, but the tea was good, and I usually don't like tea. Afterwards we went to The Galleria. That mall was gigantic. It had three levels for shopping and other levels for offices and such. There was an ice skating rink inside as well. After the Galleria we went to eat at Mark's American Cuisine. This week was Restaurant Week in Houston. For Restaurant Week, you can eat at these really expensive 4 star restaurants for a discount and part of the money paid will go to the Houston Food Bank. Oh my goodness!! The food was amazing. It was a bit expensive, but totally worth it.