Cadran's summer at Texas A&M with the the Distributed Mentor Program (DMP)

Mentor :: The Project :: Progress Log :: About Me :: Home



Summer Progress Log:

Week 1 (May 27th - June 2nd)

        I arrived at the Texas A&M campus this week. I was pleasantly surprised by the dorm accommodations. Our coordinator arranged for the DMP students (there are four of us at TAMU) to be housed with the computer science REU at a pretty luxurious off campus dorm. The place has a pool, fitness center and kitchen, along with recreational facilities, among other things. Students from other REU programs are also living in the dorm with us. My roommate and the other residents all seem very nice and everyone appears excited about their respective programs.
        I've been quite anxious about this summer because I want to do a good, thorough job with my research and I want to leave Texas with some sort of finished product that I'm proud of. It does seem like a tall order though, and I am nervous about whether I have the proper skills. Having a good experience during my time here is definitely important for my CS self-esteem.
        There were a few introductory events this week, such as a welcome breakfast with staff and students from the various engineering summer programs, an introductory luncheon, and a get-to-know-you pizza party. These activities turned out to be really good ways to meet some of the other students in the various programs. It's interesting to hear about the projects other students are working on.
        I met my mentor and her Ph.D. and Masters students toward the end of the week. They were all very friendly (and very busy). Prof. Williams and I sat down for a few hours and talked about her research and what she and I hope I'll get out of this experience. She gave me papers to read and some topics to think about. Thus far I've been doing a lot of background reading and I've also been reading the papers that she and her students have recently published. I'm anxious for next week when we really start to plan my project and I actually begin working on it. Next Tuesday I'm also planning to attend Hyun Jung Park's thesis defense. He's one of Prof. Williams' Masters students. Beside learning more about the topic, I think hearing his defense will give me a better idea about what it means to be a grad student.

Back to Top

Week 2 (June 3rd - June 9th)

        This was a difficult but good week. I'm still reading papers and getting comfortable with concepts. The information is starting to settle in though. On Monday I met with my professor again and continued to layout and discuss my project. We determined that this week I would continue to read and that I would get comfortable using the tools I'll need (SLS and Hash- RF). They are both written in C++, which unfortunately I'm not too familiar with, so SLS took a bit of figuring out (with the help of one of the grad students). I've also been learning how to use the R language and environment. R is for statistical analysis, so once I get my tree data sets, I'll be using R to actually analyze them. Even though I'm working a typical work week, I end up thinking about the material all the time, and so I feel pretty mentally exhausted at the end of each day. My brain is getting quite a work out and I'm happy to be learning so much.
        I checked out an R textbook from the library to help me learn the language and discovered that the library here is huge. Coming from a small school, I'm not used to having so many resources. The library annex includes multiple towering buildings. I also checked out some other books, one of which is the first book that we are reading in our dorm book club. We decided to form a book club this weekend so we would all get to know each other better.
        The REU program here includes a GRE prep class that I participate in, so this week I went to the first class and took a practice GRE. It's the first time I've ever actually looked at one of the tests, and I was a bit unpleasantly surprised when I realized that my vocabulary isn't what I thought it was.
        I also went to that thesis defense I mentioned earlier. It was interesting to see what a graduate student does and what's expected of him. Plus, it was nice to hear about the work first hand, rather than just reading papers.
        Coming out of the week, I feel like I've learned a lot and I'm anxious to start working with real data.

Back to Top

Week 3 (June 10th - June 16th)

        On monday I met with my mentor again. We talked more about overall goals for the summer. We also determined that I should write an R script to produce a heat map to characterize distance values produced by Hash-RF. I started out by learning more about R and its capabilities. There are a lot of packages out there to extend R's features, so I spent some time looking through those and determining which ones might be useful to me. Two of them, ape and apTreeshape are written specifically for dealing with tree data, so I've been playing with them a bit. It's amazing to see all the different applications for the language.
        The picture on the right is an example of a heat map. It's a 2-D graphical representation of our data. When Hash-RF outputs a matrix of distance values, the heatmap is a helpful way to visual the information. In our case, the same set of trees is represented on the x-axis and y-axis, and the color values in each cell of the grid represent RF-distances between tree pairs. That means that if a tree on the x-axis and a tree on the y-axis do not have a lot of branches in common, the RF value associated with the comparison will be quite high (and that is represented with a lighter color on this map). Also, since the same set of trees is represented on the x and y axis the color values along the diagonal, where tree 1 is compared to tree 1 and tree 2 is compared to tree 2, etc., are bright red. This is because a tree compared to itself will not have any branching differences, so the RF-value will be 0.
        I enjoyed making the heat map script because it required that I learn R. I enjoy figuring things like that out, so that task was pretty fun. I also did some more reading about quartet distance measurements. The quartet distance between two trees is another method for measuring the differences between trees, and. would also be interesting to look at. I downloaded another tool, QDist that calculates the quartet distance between trees.
        This week ended perfectly on Sunday night because I went out for a walk around campus and coincidentally ran into my mentor. She and I began to talk and ended up having a really great long conversation about graduate school, feeling prepared for it, having confidence, and overall strategies for success. It was really nice talking to her outside of work and I really appreciated her advice. Hopefully I'll get more chances to talk with her like that in the future.

Back to Top

Week 4 (June 17rd - June 23rd)

        This turned out to be my hardest week yet. On Monday my mentor and I met as usual to discuss my work plans. I was going to take tree data produced by 5 runs through SLS using different search methods and compute the RF-distances with Hash-RF. Each run had a different initial tree, which meant the search started from a different place in tree space each time and investigated different neighboring trees on the way to finding an optimal tree structure. I computed the RF-scores for all the trees, so the matrix included trees on all 5 search paths. The next step was to cluster the trees in order to find relationships between the 5 sets, and among trees within each search.
        Clustering the trees became tricky for me because I realized once I began that there are a lot of algorithms and methods to cluster trees. Basically the goal of clustering trees is to group similar trees together in a meaningful way, but the problem is that it's hard to figure out what's meaningful and what features of the data are important. Clustering is used in the exploratory phase of research when you don't have a specific hypothesis and are searching for patterns in data. Some clustering methods are hierarchical clustering, k-means clustering, etc.
        Then there are different implementations for each method. For example, hierarchical clustering measures the distance between trees you're clustering, so then you have to choose what distance measurement to use (Squared-Euclidean, City-Block, etc.). Then the next step is to determine which of those groups are more related, so you need another distance metric (Single Linkage, Complete Linkage, etc.). K-means clustering has an array of parameter choices too, and so do the others. This was very hard for me because it's difficult to know which method is best, and then which parameters and algorithms to use once you've chosen a method. I felt sort of like picking a method of analysis was an NP-hard problem in itself.
        The next frustration I encountered was with the "wealth" of poorly documented software that implements clustering algorithms. I found myself downloading program after program that was useless, or took data in odd formats, or was too confusing to use. I did however eventually find a solution. I discovered that R has a variety of cluster analysis packages that are well documented and easy to use. After struggling for so long with so much junk it was actually refreshing to go back to R, write a few scripts and have some tangible data. That of course happened at the end of my long week.         Admittedly, I got extremely frustrated with the whole process and felt very overwhelmed. I think my biggest mistake was that I didn't talk to my mentor about my issues during the week, and instead waiting until a scheduled Friday meeting to speak with her. She explained to me then that I needed to be careful not to get lost in a black hole of information. She also said that in the future I should talk to her right when I run into problems because we're supposed to be a team. She made an analogy that I liked. She said that our meetings should be like check-ups with a doctor because you're supposed to go in for check-ups when everything is working right, and you aren't supposed to wait until something miserable happens. So I guess I learned a valuable lesson. By the end of our Friday meeting I felt much better and re-motivated, but getting there was hard.
        I was very thankful for the weekend because I definitely needed to clear my mind. The physics REU also had a BBQ that I went to.

Back to Top

Week 5 (June 24th - June 30th)

        This week I continued to work on clustering the data. I was looking at something called the k-means algorithm that I mentioned last week. It clusters your data around k central values (centroids). During each iteration, the algorithm tries to improve its centroid values, and replaces them based on whether the clusters produced are tighter (the data points are closer to the specified centroid). The trouble is figuring out the optimal number of centroids and determining which centroids to start with. You can pick starting centroids, or you can let them be chosen arbitrarily. I was also looking at graphical ways to interpret the k-means results. So I've been looking at clustering results for the data sets and I've been trying different methods to graph them. By the end of the week I also stumbled onto a new implementation of k-means that I think will work better. We also found an algorithm to evaluate the clustering done for a specified number of centroids and some nice new visualization techniques. The plot pictured on the right is a rough idea of one way we've decided to plot the clusters. It's difficult to figure out how to plot a cluster because the points don't have x-y coordinates, they have distance values from each other and distance measurements from centroids.
        My mentor was out of town at a conference this week, so we'll meet again to talk about the project on Monday. I've begun to realize how busy professors at large research institutions are because they have to balance their time between research, students and conferences.
        I've been feeling much less frustrated because I have a working knowledge of the tools and I have all the needed software. I'm much less overwhelmed by the amount of information as well.
        This weekend I also went to Austin with two girls in my program. One of them lives in Austin, so she showed us around and we stayed at her house. It was a lot of fun to get away and see a new city. Austin seems to have a unique culture that I've heard is different from anywhere else in Texas. They also have this bridge that huge amounts of bats live underneath .
        All in all this was a fun week. I can't believe I'm half way through the program. Time has been flying!

Back to Top

Week 6 (July 1st - July 7th)

        This was quite an interesting week. I've continued to apply the analysis techniques that I mentioned before and I'm starting to see some interesting patterns in the data. When I met with my advisor this week she was interested in the results I had gotten and we discussed ways to flush them out further and build on them. I'm excited to actually be making progress, but I'm realizing that it's hard to interpret the results. It's also difficult to figure out how to verify what I've found. How do I know that the techniques I'm using are an accurate measurement of anything?
        We also talked about the paper I'll be writing to summarize what I've done. I bet writing everything up is going to be a bit difficult, but I've been taking notes and keeping track of my daily progress so that I don't forget the details. I've also written scripts to for all the work in R that I've done, which will save me a lot of time and agony. Hopefully my note keeping will actually pay off.
        My extracurricular activities of the week included more time at the rock wall and fireworks at the George Bush Library. The Library was pretty interesting. They had a lot exhibits set up for the 4th. Inside the library there's a large, intricate model of the white house, as pictured above.

Back to Top

Week 7 (July 8th - June 14th)

        This week was pretty hectic. I've been doing more tree analyses with clustering. I've been clustering different data sets and looking at the results. I'm trying to find some good ways to graphically interpret the topological relationships, but it's difficult to take the clustering groups and represent them in a meaningful way. I've been using bar graphs and charts, but we've been talking about a grid representation. My mentor got the idea for the grid at a totally unrelated talk. It's interesting how techniques have such far-reaching applications, and a tool developed for one field, might be functional for something entirely different.
        There was an REU seminar on Tuesday that was interesting. A professor doing interdisciplinary work dealing with MRI imaging gave a talk. He showed us this really neat video from a Harvard lab that animated a lot of cellular processes. Tuesday night was the midnight showing of the new Harry Potter movie. A lot of us from the dorm went and some of them are very obsessed with the books, so it was fun. The only downside was that the movie was something like 3 hours long. It made Wednesday a little tough to get through.
        On Wednesday afternoon my mentor told me she wanted me to prepare a presentation for a few grad students and her for Friday. It was a bit intimidating, especially because I only had a day or two to prepare, but it was a good experience. I had to put a cohesive summary together fast, which was good practice. It was hard to figure out who my target audience was since I wasn't really sure how much information my listeners would have about phylogeny. Plus, my mentor gave me a 15-minute time cap. I also get very nervous about public speaking, so I appreciated the opportunity to practice (though I was quite nervous about it and mostly appreciated the opportunity after the fact). It did end up going okay and then the group did some brainstorming about possible applications for what I've been working on.

Back to Top

Week 8 (July 15th - June 21st)

        This week I worked on making a function to visualize the tree clusters and I also did some more work with my tree datasets. I really like the R function that I've made. It takes either a distance matrix or data observations and does the necessary steps, including clustering. Then it displays a grid that we're calling a cluster grid. The user can decide the characteristic to cluster with, the distance metric to use, the number of cluster centers to cluster around, the size of the matrix, etc.
        I was very pleased with the cluster grid because it makes interpreting our data a lot easier. The picture to the left is an example of a collection of clustered trees. Each cell in the mxn grid represents a tree. The tree in row i and column j is Ti,j= (i-1)*n + j (ex: T2,2= T4) ). In this case, the color of a cell describes its run assignment (red = run 1, yellow =run 2,etc.). The color can also describe a tree's run membership. In this case, the alphanumeric label beginning with C identifies a tree's cluster membership. In other cases, a symbol beginning with R represents the run classification. My mentor and I have been meeting pretty frequently to talk about the project and she has a lot of ideas about how to improve the grid and examine the data.
        I feel a lot of ownership for our project at this point. I like that I've gotten to see the research process beginning at the exploratory phase. I think it's been important to get a taste of that part of the process.
        I also began writing my paper and thinking about my poster this week. It was relieving to actually get started because it was starting to become a bit daunting. It is hard to decide what to actually include in the paper and even harder to figure out what should go on the poster. I only have two weeks left though, and a lot still needs to get done.

Back to Top

Week 9 (July 22nd - July 28th)

        This was a very productive week for me. My mentor is done traveling for the summer, so she's had more time to spend on my project. I've really enjoyed working more closely with her and having more opportunities to talk to her. Early in the week we went out to lunch and discussed my project. Things are getting pretty interesting because of the plotting techniques we've been developing. We named the two types of graphs Grid Plots and Tessellation Plots . At first we were pleased with the grid plot because it was turning out to be a useful way to represent our tree clusters, but then when we designed the two-toned tessellation plots we got really got excited because the tessellation plots provide even more information.
        I've included two tesselation plots. One is a plot of all the trees and one is a summary of the cluster information. I've also included a key that explains how to interpret each cell of the tesselation plot with all the trees. The summary plot includes information about how many trees from each run are in each cluster. I enjoyed designing the plots in R, and I've had a lot of fun thinking about their application and ways to improve them. My mentor says I have the research bug now because I've been spending a lot of time on the project recently. It's also fun because she's very enthusiastic about it as well and so we've been getting things done very fast.
        I've also been writing my paper and designing my poster. The poster has gone through a lot of drafts, but it looks significantly better after each edit so I don't mind. The paper has also taken a long time to write and edit because there are so many graphs and datasets to include and discuss. Making the poster and the paper have been really great ways to think back about the whole summer, the research process, what I've learned and what I've accomplished.
        My mentor and I have also had quite a few meetings lately that have ended in discussions about computer science, grad school and life in general. I really value the advice she has given me and it's definitely been nice to talk to a women computer scientist.
        It's hard to believe this is going to be the final week, but luckily I'm just about done with the paper and the poster so hopefully it will be fun and low stress

Back to Top

Week 10 (July 29th - August 4th)

        This last week I finished up my research, completed my paper, put the final touches on the poster, etc. The research has been moving so fast lately. Once we thought of how to visualize the data using the tessellation plot things started to get really fun. I can't believe how much progress we made in the last few weeks. The colors and patterns in the tessellation plots have made analyzing the data much more interesting.
        I'm glad I started writing the paper and preparing my poster pretty far in advance so that I had lots of time to edit them both. My mentor has been so helpful in the process because she's been very involved in all the steps. I remember the first draft of my poster had tons of text and was quite unfocused. Since then though she's shown me how to improve it and make it much more clear. The process has given me a lot of confidence about preparing posters in the future. On Thursday, after many drafts, we printed it and I did a practice presentation for her. I was pretty nervous about explaining my work to other people, but she gave me tips about how to make a good presentation which made it less scary.
        Writing my paper has also been quite a task. This is the first time I've used LaTex, so it took some getting used to. Preparing and explaining all my figures took a while because there are so many datasets. The paper ended up being quite long.
        I feel very proud of both of these products because we spent so much time perfecting them. They make me feel like I've accomplished a lot this summer. Plus, the paper is in-depth, so it's a nice way to document the summer and remember what I've done.
        On Thursday and Friday we had two poster sessions. The one on Thursday was a half-day one and the one of Friday was a full day. The one on Friday was also more formal and the posters were judged. Even though I was nervous leading up to them, once they got started I kind of enjoyed telling people about my research. Plus, as my mentor pointed out, I really did completely understand all the elements of my project, so I wasn't scared of being asked questions. I also liked hearing about everyone else's research. Even though we all spent the summer living together on campus, I didn't really know what most people were working on. It was cool to see how different all the project were and how enthusiastic everyone was about their research.
        After the Friday poster session my mentor and I went out for coffee. We ended up having another really nice conversation. We've discussed how to face our fears on a number of occasions and talked more about it on Friday. I've also realized how much more confident I feel now about my ability to do research and to succeed in computer science. I'm so grateful to her for giving me so much advice and making the summer such a worthwhile experience. I thought that the DMP program would be good for me, but I didn't realize I would get as much out of it as I have. This summer has been a huge success and I'm so glad I had the opportunity to work with Dr. Williams.
        I'm glad to be going home, but I'm also sad to be leaving. I've really enjoyed working with my mentor and getting to know her. She and I will definitely keep in touch in the future. I'll also miss the other students in the program. We all had a lot of fun working together this summer. All in all the program has been a great success.

Back to Top

© Cadran Cowansage, 2007