Journal
For various reasons, I decided to host my journal with Blogger. I have been impressed with Blogger in the past, and thnk the interface would be ideal for my journal. The DMP however, did ask me to port all the posts I made to this website. Due to space restrictions, I could not port absolutely everything. If you would still like to view my original journal, please click here.
May 2006
June 2006
July 2006
August 2006
May 2006
Tuesday, May 30, 2006 - Introductions
Hi! My name is Suzanne, and I'm a participant in the Distributed Mentoring Program for the Summer of 2006. While it isn't my main journal, I will be keeping it for the rest of the summer. I can't promise you that I will post every day (it does get pretty hectic!), but I will try to make it as near daily as possible. In here, I hope to share with you some of my researching accomplishments and frustrations, my little triumphs and tears. Whatever the case, I won't lay a cry fest on you :-)
Now then! On to introductions! So here's a picture of me. This is how I looked at home. Now that I'm in the hot, blazing heat that is a Texas summer however, my hair is usually tied back in a ponytail. Alas, the day will come when my hair will be allowed to fall freely, without fear of certain frizzy doom!
My mentor is Tiffani Williams. She's an assistant professor of Computer Science at Texas A&M. Her main research area is phylogeny. I don't have a picture of her yet, but when I get one, I will put it up here. She's incredibly nice and very professional, and I like her a lot. I'm really excited to work with her this summer.
Last, but not certainly the least, is my research accomplice and much talked about inspiration for various famous papers I am yet to write, is Dan Bear(no link), the lovable and adoringly sweet teddy bear I got for Christmas last year. I brought him to Texas this summer so I'd have some company, since his namesake will not be able to keep me company this summer.
Although no pictures are provided, I should also mention Lana and Phoenix, my two trusty computers, who, I should add, are utterly indispensable to this reasearch. I am very lucky to have them both.
Anyway, by now you are probably bored with the introductions, and revving to find out about the wonderful research I have to do! I shall place that information in the next couple of posts. After today, this journal will actually act like it's purpose: it will be a real journal, talking about my research as it progresses on a near daily basis.
That's all for now! It was good to meet you too!
June 2006
Saturday, June 03, 2006 - The First Week
Before I continue, I should mention the two weeks that transpired before my first week as a DMP Summer mentee. Approximately three weeks ago, I graduated from Rensselaer Polytechnic Institute, with a Bachelors in Computer Science. I had specialized specifically in Computational Biology, with interests of going on to graduate school for Computational Biology or for Bioinformatics. When I had applied to DMP this past February, I had done so with the desperate knowledge that I may not even get into the graduate schools I had applied to, and therefore I craved the experience that DMP would give me. I heard about the program my freshman year itself, and when my GPA plummeted after an especially rough semester, I lacked the confidence to apply until this year, my senior year. To my absolute joy, I was accepted. To my even greater joy, I got into Graduate School (at Rensselaer), and I would be working with a professor I knew well and was being offered funding.
But even though it seemed that the plans for the summer were all for moot (it was for preparation to go to grad school, after all), I was more excited and determined to do well than ever. This, this was my chance to prove myself. To put all the knowledge I had acquired about Computational Biology to the test and produce something meaningful, wonderful, and above all, publishable. I had 10 weeks, which, I knew, was not that much. So I asked for a headstart. After crashing for a week after graduation, I e-mailed my mentor Tiffani Williams, and asked her for suggested reading. She sent me three research papers which I read, and I began to get an idea about what the research was about. Dr. Williams was into phylogeny, a research area that I have long been curious about, and have even seriously considered making it my research niche of choice in Computational Biology. By the time I met her a little more than a week later, I had read all the research papers.
Coming into Texas was interesting. I had never been to Texas in my entire life, and, in fact, never even crossed into the continental U.S. lower than Virginia (save for Nebraska). Being in Texas is a very big change, especially for a girl brought up in the northeast like myself. It was an even bigger change, considering that I went to school where snow lies perpetually on the ground for almost five months out of the year. A nice day in Albany, NY was around 65, 70 degrees farenheit. A hot day was around 80. In Texas, a hot day can be anywhere from 95 to 110 degrees farenheit, and with the humidity, it feels still more unbearingly warm. Needless to say, I spend most of my days indoors, silently praising God for the man who invented air conditioning.
The first week in College Station, TX went very quickly, but was fairly good. We got microfridges in our room and our own microwave oven, so we could buy some food. I also met with Tiffani for the first time last Tuesday. I told her how excited I was from reading the research papers, and showed her some of the ideas I had about potential research projects for the summer. Dr. Williams was very movating, and very encouraging. She seemed to like the ideas I had, and told me to run with them. We set up a meeting time to be every thursday, with additional meeting times as necessary.
By the time we met on thursday, I had an idea for an algorithm. While it did not exactly match the maximum parisimony approaches she was working on, she suggested that I change my focus to the construction of starting trees, which was absolutely delightful, since I had recently discovered that it was possible to beat the neighbor-joining approach to phylogenic tree construction. This would tie into her research, since I will be studying how starting tree construction effects maximum parsimony searches. I left the meeting, absolutely elated, and on friday, set to work on what I hoped would be an improved distance metric.
My problem with current distant metrics designed to measure the evolutionary distance between two sets of taxa had two parts. First, I didn't like the fact that all bases, and thefore all mutations, were treated with equal weight. Since it was more likely for one type of nitrogenous base to mutate into another (say A to G), than to another one(say to T), I felt that in cases where the same amount of mutations were present, it was important to consider mutations that were more likely as denoting a closer evolutionary relation than those that were less likely. Second, the only evolutionary change that is considered are mutations. The effects of deletions and insertions were not considered (at least to my knowledge). For this reason, I decided to create a new metric based on the pairwise alignment scores based on two taxa. I hoped this would yield more accurate distances, and therefore, make a better starting tree than, say, one produced by neighbor joining. By friday, I had a basic algorithm for this new method down on my computer. I planned to flesh it in on monday. Last week ended on a very happy note.
Monday, June 05, 2006 - Disaster Strikes (well, nearly)
I had one heck of a scare today. I was typing very fast in bash, when, absentmindedly, I typed in the following:
g++ -o matrix matrix.cpp
./matrix > matrix.cpp
This command would have (and in fact, did) effectively overwrite the source with the output to the program. Thankfully, I was able to recover the orignal code from the buffer, which was a very good thing. Else, I would have lost over two days worth of coding (a good 200 lines of code).
I had to get up and leave my computer for a short while, realizing the magnitude of what would have happened had I not been able to recover the code. I was close to tears; it had been a very close call.
Coding went pretty well besides that today. I'm completely on schedule. The distance matrix implementation is nearly complete; with that out of the way, I will be able to start the second phase of my algorithm before the end of the week. Who knows? Maybe I'll even have it done.
I think I need a hug from Dan bear.
Wednesday, June 07, 2006 - Hitting the Brick Wall
"Yesterday.. all my troubles seemed so far away
Now it looks like they're here to stay
Oh I believe in yesterday...
Suddenly.. I'm not half the coder I used to be
There's a shadow hanging over me
Oh yesterday, came suddenly
Why the program won't work I don't know, it wouldn't say
I did something wrong, now I long for yesterday..."
----
My apologies to the Beatles. But the above lyrics were stuck in my head a good part of today. Here's why: I took yesterday off (tuesday) to work on my DMP website (which you can view here). My thinking was, "Hey, I'm making really good time on this project, I know what I need to do next, and it's best to get this website thing out of the way at the beginning of the week rather than at the end". Good thinking right? right? right.
Well. Today was absolutely unproductive. I fixed all of one bug, only to have two more pop in its place. In addition, I realized I had not thought out completely how to continue what I wanted to do, and I genuinely feel stuck. I badly need to talk to my advisor, but I will be meeting with her tomorrow, so hopefully by then, I'll have things smoothened out.
That's all for today. Let's hope tomorrow is like yesterday and that I'll be back on track :-\
Thursday, June 08, 2006 - Hooray!
Back on track.
I met with Dr. Williams this afternoon and I'm back on track. She gave me a few ideas that I think will help me a lot during the next phase. Plus, the first half of the program (the distance matrix stuff) is completely done. Since my code is in a transitionary phase, I get to have some fun running some statistics. Also, I have a bit more reading to do and some serious LaTeXing. I will probably be spending most of tomorrow working on LaTeX. This probably means I will be doing a day of serious work in the dorms. Hmm. Let's see how that works out.
I was beginning to get worried; Since I have only eight more weeks to work on this thing, I don't have time to hit a slow spell. Here's hoping things continue to stay smooth.
Monday, June 12, 2006 - Mondays....
Well, it's Monday, and I guess I'm back to work. So far so good. It turned out that the one portion of my code that takes up a lot of memory and space is actually very unnecessary (I'm allowed to assume that step has already been performed), so I can modify my code to reflect this. In addition, I've been working on the paper. So far, ok. Dr. Williams gave me a dataset for me to play with, and I took a look at it tonight... I'm going to have to think about what the best way to parse this would be, but I'm guessing a script of some kind (perhaps PERL?). We'll have to see. I think I'm done with most of the writing I wanted to do tonight, but I am going to want to think this parsing through carefully.
My current stategy is thinking things through carefully before implementing anything. That way, I hope to save as much time as possible (efficiency is gold!). I also need to make sure I get a decent amount of sleep tonight. I really need to get some rest. I'm very tempted to work on some code tonight, which, knowing me, I will probably break down and do. At the same time, I am very tired, so I may not break down and do it. Bah. Hopefully tomorrow will be better.
Thursday, June 15, 2006 - Refinement!
I am very happy to report that the algorithm that I thought up is now refined to work even better than it worked before! Suzanne = very happy. I thought up the improvement during benchmarking, and now I can start coding.
In related happy news, I finally got access to the computer that Dr. Williams is letting me use. I spend the better part of the afternoon installing software (cygwin, MiKTeX, Tortoise SVN, PAUP, the usual). I should be ready by tomorrow to port my code from my laptop to this computer. Then, I can probably do everything a lot easier, since I will have internet access (when I was programming on my laptop, I didn't have access to the internet... perhaps that's why I was so productive). Hopefully, my productivity streak will continue.
Plans for tomorrow: Port current code, generate graphs for benchmarking, write some more code, learn how to use PAUP. As soon as I generate the graphs, I will post them. And explain, pre-improvement, what I want to do to them :-)
That's all for now. I'm going to finish installing software and then get out of here (probably get some food. mmm... food).
Friday, June 16, 2006 - Some Benchmarking Results
So I graphed what I did from benchmarking, and I got some very interesting graphs. One of the things I need to determine before the I finish designing the improvement to my algorithm, is something I call the "q-decay" factor. In this spirit of divide and conquer, the algorithm takes considerably large chunks and splits them into smaller chunks. The size of these chunks are determined by the q value. So what would be a good rate of decay? Using the dataset I was given, I tried out different rates of decay: A factor of 10, a factor of 5 and a factor of 2. Below is the graph for the decay according to a factor of 2. Pretty, isn't it? I really liked how gently the slopes seemed to decay. This is very favorible. In order to give you an idea of what is NOT favorible, I think I should show you what something unfavorable is (click here).
See how steep that slope is? That's really unfavorable. And that's a factor of five! We don't want the q value to decay too rapidly; I think this would lead to unfavorable results. While I'm not going to post it, I'm sure you can imagine how horrible the "factor of 10" decay would look... in short, very steep!
So why do I care about q-decay, you might ask? Well, currently (graphically speaking) there is a vertical bound on the size of clusters. I am going to implement a horizontal bound as well. My decay will help me enforce that bound. I want the decay to be gentle, so that cluster sizes are not too disparate. In a few weeks, you'll see what I mean, but I'm being a bit mum on this :-) (And I'm having a lot of fun!)
That's all for now. I might upload some diagrams to the webspace sometime soon. I wil let you know then so you can go and see my progress ^-^
Tuesday, June 20, 2006 - Wrestling with Angels...
Or at least code.
So I spent all of today wrestling with one portion of my code. 6 hours. Phew! Am I beat. Usually, 6 hours of writing code and debugging would make me want to cry. But I'm having such an awesome time. I'm a girl with a mission, and that mission is to beat the pants off of neighbor-joining. I guess that even makes 6 hours of frustration enjoyable ^-^.
So I got the recognition bit-vector portion of my code to work and it is totally integrated (yes, that took 6 hours!). I've also been designing the main back-end data structure and rebenchmarked the decay function using the recognition integration. And I was very pleased. My results corroborate my claim that the algorithm is robust.
I'm really jubilant, but I feel strangely tired. I'm thinking I'll stop here today with the code, and probably work on the paper a bit. Then tomorrow, more bemoaning the back-end of this program and probably some cries of frustration.
Here is a pretty graph to keep you occupied.
Notice we still have a nice gentle tapering of the slope as the q value is decayed. This is very good news, and exactly what I expected to see.
Anyway, off to finish business around here and work on the paper. Toodles!
Wednesday, June 21, 2006 - Success!
So I think I figured out (with the guidance of my graduate mentor) the last portion of the algorithm that I need for building the tree. I'm in high spirits, though I'm pretty sure that the coding for this part will be very difficult. I created some data structures that will hopefully behave in the way that I want them to, but I won't fully know until I test them out in the code.
I'm a bit nervous, since this portion of the algorithm requires a high usage of recursion; I've never gotten myself completely comfortable with recursion, though I do understand its merits. While I'm pretty sure that the recursion I've created is correct, I won't find out until I fully write it out.
*sigh* I really want to work on my paper. Maybe I will for the rest of the day. There are too many ideas I need to write down, and I'm keeping it in my research notebook. Poor notebook. A lot of the pages that I have written on are dog-eared, since I've been looking over my notes constantly, and making refinements as I go.
-----------
In other news, I'm starting to feel a little stressed. I just found out that I need to start studying for my qualifiers that are in the fall. This means I have a book to buy and start reading. Perhaps this will render me a lot busier. I wish I knew the date of my exam, so I could prepare efficiently. I have practically a month before school starts and after this internship ends, but I don't know when the exam will be, and I think it might be in my best interest to study before the end of this internship, as an evening activity. Of course the DMP stuff will take precedence, but I have a feeling this will just make time fly. There are 27 questions that I need to study for. I think I will study one per night, in order to pace myself, and perhaps read a chapter. This way, I won't overload myself, and I will not shirk my DMP responsibilities.
Friday, June 23, 2006 - Blast!
Not the Protein DB Search either. Ugh. What. an. unproductive. day. I spent the entire morning coming to the conclusion that while the algorithm is finally completely designed (save for the supertree method, but I'm pretty sure I'm borrowing that code), I'm having serious problems with data structure design and implementation. This is the last part of my coding. The last part. And I had to get stuck here now! *sigh* so after lunch, I spent my time writing the algorithm design and reasoning out on paper, for what I will be submitting toward the end of July. I got four pages cranked out, complete with diagrams. And I feel exhausted. If this translates to 4 pages in LaTeX, this should be a total of 5 pages for the first half of the paper. I should easily be able to add another five pages for the benchmarking and discussion section, which should yield a nicely sized final paper.
So here is my revised plan. My code (this part), though 90% done, is due on friday. I don't know if I'll make that deadline (yes, I'm that stuck!), and I'm going to see if I can get a long meeting with my professor on Monday, when she gets back from her confernece. In the meantime, however, I will start on the benchmarking phase a week early (so next week). This will allow me to benchmark all the other starting tree algorithms while I continue to refine my code. Since each MP simulation is going to run for 50 hrs, this will be a good use of time. I'll see what Tiffani says when she gets back.
Ok. I am tired. And hungry (i've been more and more hungry lately). And it's friday. This is where Suzanne calls it quits for the day (except going and typing up what she just wrote out) and relaxes for a bit. Have a good weekend all!
Monday, June 26, 2006 - Why Advisors are Awesome
So today started out frustratingly bad. I couldn't code this morning (since I was still stuck), so I started looking up stuff for my introduction that I need to write. I knew I was going to meet with my advisor at 3, so the plan was to be occupied until then. I had requested a late meeting for today (one that takes longer than usual), and I had also asked about starting benchmarking early. Of course, something happened to the firewall (or so I suspect), so I couldn't actually connect to the internet and work on my intro. So I spent a good hour twiddling my thumbs and trying to do other things on my project that did not involve the internet, like look over my algorithm again. This didn't make me feel any better. I even started worry that perhaps my implementation was the only way to do the tree structure and that I'm just going to have to scream it out. What if I'm really, really stuck?
So I walk into the meeting, expecting to be there for three hours at least, and explain how I've been struggling with this underlying tree data structure.
"Why don't you use a 'left-child, right-sibling' tree structure?" she asked. She quickly went over how it worked. I just stared at her. In roughly five minutes, she gave me a flexible tree structure that I'm pretty much will solve all the problems I had with the tree structure I've been wrestling with. Just wow. Can you say awesomeness? :-)
As for benchmarking, I have full permission to go forward, and to start early. She agreed that it would be wise to start benchmarking a week early, since neither of us can tell how long the simulations will run. So I got a bunch of information on how I can continue. I also got a series of deliverables for the end of the week (yay!)
- Read about the Parsimony/Ratchet algorithm: learn it, reimplement tree structure with it. Put aside code.
- Draft of paper: methodology section must be done, intro should be mostly fleshed out
- Install IPE/DIA/R on the Windows Machine at work
- Read the Tree of Life Brochure
- Read and learn all the necessary PAUP commands for benchmarking.
Tomorrow will largely be a play day. In other words, I'll be playing around with PAUP, getting familiar with it, and trying to see how many iterations would be good for the simulations. Right now, I think between 150 - 200 would be a safe number (as by Dr. Williams' suggestion).
This should be a welcomed break. I'm looking forward to having some non-coding fun. For now, I need to do some reading. Oh, and if you're curious, the meeting lasted only an hour and forty-five minutes :-) My advisor is truly awesome. I feel reassured. But I must go read!
Take care!
Wednesday, June 28, 2006 - Stroustrup and other tales
So today, I got to meet Bjarne Stroustrup, arguably the father of C++. The details of that experience is/will be published on my main blog, since I rather spare you the details on how I acted like a 12 year old girl meeting a famous movie star.
Research is going ok. I checked over the left-child, right sibling tree structure, and, sure enough, I can implement my decomposition strategy with it, and use a simple post-order traversal on that tree structure for the merging stage. This is all good news.
The bad news is, I'm supposed to be benchmarking right now, and I haven't gotten any done, which is quite sad indeed. The reason is due to the fact that I'm not sure how to use PAUP or the parsimony-ratchet algorithm to my advantage, since the documentation is somewhat confusing. So I'm torn between continuing to bang my head against the wall, or trying to piece together an Introduction and some diagrams for Dr. Williams by Friday. I mean, that's important, but I really need to start benchmarking. But, then again, I really can't since I don't know how. I got a surprise call from My Guy this afternoon, and I bawled to him about how I'm not sure what I'm doing and how this isn't getting done, and OMG, I met Stroustrup, and made a total fool out of myself in front him!. He thought the situation was very cute (normal response) and reassured me that I'll figure it out. *sigh* As much as I love working independently, I'm so nervous that I'm going to screw up something at this stage, and I really don't want to, and so I'm looking for guidance. I need to sit down with someone and go over this PAUP stuff with them. I don't know if I'll get it otherwise.
Friday, June 30, 2006 - Round One.. Done!
So I finished round one of benchmarking. You could say that I got a little carried away, since I decided to pull a 12 hour day yesterday and do all the benchmarking in one shot. I now have the experimental set ready for the analysis. What remains for me to finish now is the algorithm, which I can start, (starting Monday) go back to working on full-time. This has bought me a good deal of time, especially since I can continue to run benchmarking on the main machine while I'm coding. This will help progress immensely.
I'm pretty much done for the day. I think I deserve a break! ^-^
Catcha later.
July 2006
Monday, July 03, 2006 - Back to coding...
So I finished more benchmarking over the weekend (it is honestly my favorite part of any project, so I tend to finish it faster), and so it is now back to coding.
I cleaned up the binary tree implementation of the M*ary tree decomposition a bit. It is almost ready. I need to write the traversal code, but it is honestly pretty straightforward. i also have a function that will give me all the leaves in the tree, which is a big deal. Which leads me to the supertree method that I'm going to use. Here is a pretty picture showing how the M*ary tree can be represented in binary form. You should now have no doubt of my utter dedication to you, reader. I used MS Paint, for Pete's Sake!
I'm planning to grab all the leaf nodes of my guide tree and then put them linearly into a list. Each of the nodes will then be converted into a subtree (a linear time operation). The new problem is merging these subtrees together in a manner that will still respect the structure of the guide tree. I call this new problem the numbering problem:
"Give an enumeration, e, to a set of subtrees, S, such that it will be intuitive to deduce the structure of the whole tree (with respect to the guide tree)."
So I created an enumeration; while it is intuitive, I don't think it's simple. So I've been staring at this thing for a while now, seeing if I can simplify it.
The appeal of spending ten hours in the office a day to tackle this project has started to grow on me significantly. Not only do I feel like I'm getting more done (when you subtract the hours spent staring blankly at the screen), but I also know that early evening is when I'm more productive. And, now I'm bored less ^-^
I feel some of my old fire coming back for this project. The problem with working on a problem constantly for a long period of time is that you start noting chips in what you had perceived to be a flawless design. And soon these chips become cracks, and the cracks grow wider until they look as large as canyons... but then you blink and shake your head once and look again. And there it is: a flawless design. I really thought that I was going the wrong way, because this and this and that was not perfect. But then I went back and read some of the research papers I'm using as a reference and read a Paul Graham Essay for inspirational purposes. You're being ridiculous. I told myself, and got back to work with greater zeal.
*sigh* I still haven't started studying for my quals (in other news). I probably should buy that book eventually, so I can start preparing. That's all the news for today. I'm probably going to get back to work now.
Wednesday, July 05, 2006 - Time to organize
Well we are entering the final three week or so stretch. Now is the time to reorganize, reprioritize and start cranking.
So earlier today, I created a list of the things I still need to do.
Coding:
- implement numbering thing in code (figured out!) -- done
- implement tree creation stuff.. may need to implement new graph class
- implement merge function (biggie)
- output tree in Newick form (should be near trivial)
Benchmarking (for me):
- feed tree into paup, run parsimony-ratchet analysis
Benchmarking (general):
- fix Perl script
- graph results (in R?)
Misc:
- write up benchmarking section of paper
- write up conclusion, results, and abstract of paper
- cry
- make poster ( translation: lot of pretty pictures!)
Well, I definitely have my work cut out for me. Today I spent most of the day (ok, all of the day) trying to integrate the tree structure that I've been working on into my main code. I have an unterminating recursion somewhere, and I really need to figure out how to do that. Right now, it's segfaulting, which is an improvement over the infinite loop I had on top of that before. Fun, fun fun. I still also still have that error in my Perl script.
I was planning on staying till very late tonight. But honestly? I'm not going to make it. I'm too tired, and too frustrated. I don't think I'll be able to accomplish much more by staying here. So this is just me touching base and saying good night. It's almost time for me to leave the office anyway. Some of the other guys were talking about going running. Maybe that will help me out. A good run always helps out the ol' brain. If I do end up running tonight, we'll see just how much my work improves tomorrow.
Wednesday, July 12, 2006 - Decomposition Phase: Done!
Ladies and Gentleman,
I am proud to announce that the decomposition phase of the algorithm is completely done! I fixed the last of the few remaining bugs yesterday and today. Now it's on to the merge phase!
This is terribly exciting. The merge phase is the last phase of the algorithm. Once I'm done with this, I'll be done! Of course, I'll have to benchmark the performance of this tree on the taxa, etc. etc., but I'm just one step away from having that tree!
It's 4 pm. I'm taking a nice break. Then, I think I'll come back here and work a few more hours on the merge step of this algorithm. See if I can at least get a good chunk of the code down.
This is SO AWESOME!
Thursday, July 13, 2006 - A new problem
So now the decomposition stage of everything is done, I thought I would state here, formally, the next problem that I'm tackling:
Given a set of sets, S:
Find an efficient method for finding the intersection of these sets and merging them together, until no overlap remains. Use the following criterion:
Let is_deeper(X,Y) denote "X is deeper and more left in the tree than Y".
"If one element a of set A can be merged with another element b in set B, and is_deeper(A,B), then sets A and B should be merged as follows: ( ( (a,b), A ), B ). "
Interesting isn't it? I'll let you know if I find any solutions.
Monday, July 17, 2006 - Blessed Progess
Alternate title: Reasons Why I am Awesome.
So the merging subproblem I talked about last time is completely solved, and implemented in code. I'm sure you're all dying to know the solution, so here it is: the trick is to use a hash table (thanks Bill!). Take each set of taxa in the leaves and add them to the hash table ( which in this case I implemented using a map.. an int associated with a list of ints). Let's say you have x.y, where x and y are two taxa. so you add y to list associated with x, and x to the list associated with y. Then, this beauty of a recursive function that I wrote takes care of the rest:
void merge(int starter, map > & my_map, vector & product) {
//check to see if starter is already in product.. if not, add it to product
if ( my_map[starter].empty() )
return;
else {
while ( !my_map[starter].empty() ) {
int new_starter = my_map[starter].front();
my_map[starter].pop_front();
merge(new_starter, my_map, product);
}
}
}
Isn't it pretty? And I set myself completely up for the next stage of the merge step: take these clusters and use something akin to the neighbor-joining approach to build up the entire tree. Then, after the entire tree is built, all I have to do is output the tree and I am DONE WITH CODING!!
Then.. I get to TEST my tree. And if it's better... wow. This is so cool. I am almost done! A big thanks to my friend Bill, who suggested the hash table idea. You are also totally awesome!
That's all for tonight. It's been a pretty late night at the lab, and I still have stuff due for tomorrow. Namely:
- an abstract
- the benchmarking graph for the 500 taxa
I also have these things to by the end of the week:
- Finished code - thursday
- Presentation - thursday
- Benchmarking results for 854 taxa set - before thursday
All in all, a very busy week! Wish me luck!
Tuesday, July 25, 2006 - I am so tired...
Forgive me for not posting as of late; the last week has been very hectic. I pulled 12 hour days almost all of last week, and today, I'll probably be pulling close to 14 (yikes!)
But progress is being made. The code is done. I have generated trees for 51, 500, and 921 taxa sets. I have also done so for 1127, but before I go further, I have to test out to see even if The RAq approach is even superior to the other methods. So that's where I am currently.
I am also:
- commenting/touching up code for Open Source release ( I love you, DOxygen!)
- Working on a poster
- Benchmarking
- Trying to forget about the paper that I need to write (I don't think i'll make the August 1st deadline)
- crying profusely (actually no; I'm really exultant right now. This is so awesome!)
- not sleeping (this one is true)
So if I learned anything, how much you like your project is essential to its success. I don't think I would work for 14 hours unless I was in love with what I was doing. Thankfully, this summer that seems to be the case.
Some report later, and more details laterish too. I gotta get back to work!
Thursday, July 27, 2006 - Code Base: DONE!
Hahaha.. I'm so happy about this one :-) The code is completely done. I've created the command scripts necessary for the final stages of benchmarking, and created documentation for all my code (I love you, DOxygen!). Furthermore, the code is now distributed under the GPL, and the documentation associated with it is distributed under the FDL. Links will show up on the main website sometime in the future.
Thursday, July 27, 2006 - I Created Something Beautiful!
"I think that I shall never see
A program lovely as a tree
Code is made by fools like me
But only God can make a tree"
~ Ethan, with apologies to Joyce Kilmer
So I just created something breaktakingly beautiful, and I just had to share:
Isn't she gorgeous? This is an unrooted tree composed out of 51 taxa. This tree was outputted by The RAq Approach algorithm. It's really nice to see a face to your hard work. I think this baby is going on my poster.
I also think it's time for me go to bed. Clocking in today at 15 hours.
Friday, July 28, 2006 - The Guessing Game!
Hello boys and girls,
Welcome back to "The Guessing Game!". Today's topic is, "Why is Suzanne so happy?"
As your first clue, take a look at the graph here.
As a second clue, look at the positions of RAq v. Neighbor Joining (NJ)
As a third, note that lower scores = awesomeness.
So, why is Suzanne so happy? Let's take a look and see.
It appears I may have beaten Neighbor Joining, and, at least for this data set, RAq does as well as the best method :-) I'm still waiting back for info from the 921 taxa set (the RAq starting tree has been running on it, and it still has another Batch and a half to go, overall 300 iterations). If RAq performs well on that data set too... well, I may have something very exciting to publish! :-)
Suzanne...is... happy!
August 2006
Tuesday, August 01, 2006 - An update (in a hurry)
So here is what I've gotten done since friday (yes, I've been working during the weekends)
- Poster: 90+% done
- Paper: 50+% done
- RF graphs: not really even started (cry)
- Benchmarking: on last set.. but several hours to go!
So much work so much work so much work.. I still can't believe I have less than a week to go! I'll be working on this project more when I get home, but sheesh...
Ok, Suzanne go bye bye now. Need to finish pretty picture for paper and go to bed. Clocking in today at... 14 hours.
Wednesday, August 02, 2006 - Things accomplished...
And those that are not:
Things accomplished:
Poster - done.
Paper - rough draft will be done by 5 pm tonight.
Things not done:
Benchmarking for 1127 taxa set (most of it still needs to run!)
RF scripts + graphs (this is a huge deal)
I'm not too happy about not having the RF scripts done, but the Perl code is really tricky, and I'm still not sure how to tackle it correctly; hopefully I'll be able to meet up with my grad student mentor (who wrote a program that i'll be feeding stuff in and out of) soon.
Ok back to work. I still have the discussion to write, as well as put in the entire results section for the tree scoring. I also have one-two little blurbs to write about in the introduction. Then rough draft is done!
Friday, August 04, 2006 - Poster Session I
Well, today was poster session one.. and personally I think it went great! I was a bit nervous and scattered at first, but today's poster session I think was a very good warmup for the big one that is tomorrow. I learned two very important lessons very quickly:
1.) Bring a bottle of water. After talking for two minutes you get
incredibly thirsty.
2.) No matter how cute and professional your favorite pair of stilletos are, DO NOT WEAR THEM TO A POSTER SESSION. You are on your feet for too long to endure that much pain. I practically had to hobble the way back to the dorm, and one of the guys in my program (Nick) took pity on me and carried me part of the way back. My solution? Flip flops for tomorrow!
A lot of people seemed interested in my research too... I talked to I think 6 or 7 different people about the project, which I think was pretty good for two hours. I am also very happy that my advisor suggested all the changes she did about my poster; one of the diagrams that she suggested I make was so essential to the success of the poster; I found myself pointing to that one quite a bit, and it made explaining everything else much much easier.
Also.. the poster looks really, really pretty! Even though the design was simple, it was very clean, and I think (with some bias ;-) ) that I had one of the better looking posters at the sesion (I even got it laminated!). Oh man... sooo pretty! <3 I explained to the boys that I think that the poster is so pretty because my advisor is a female, and us women have an eye for prettiness ^-^
Well, that's all the news about the poster. I am pretty giddy and excited about tomorrow (can you tell?). I think I'll be properly prepared for the presentation. I am also almost done with the second rough draft of my paper; I'll be sending that to my advisor later tonight.
So that's all the good news. The bad news is, I still haven't been able to get a hold of my grad student advisor. This is frustrating, because I really need to talk to him about the program he wrote, since I have to write scripts that need to interact with it. I'm going to send him one last e-mail tonight, hopefully so I can catch him tomorrow.
Well, I better get back to work; I just have to alter the images somewhat, and then I can send in the second rough draft and go to bed! I'm probably going to ask my friends to help read the techinical report for me (or should i save them for the actual paper? hmm...) Have a good night!
Saturday, August 05, 2006 - Last Post
So this is my last post for the summer :-)
I think I got quite a bit accomplished, and I also had so much fun working on my research at TAMU this summer. The second poster session went really well; no one in CSPC won an award (aww...) but it was still a very good experience!
I'll be working on finishing this project a bit more once I get home; I have a few graphs that I want to draw up, and the paper that Dr. Williams and I are submitting for publication to write up ^-^
All in all, a very fun time! I enjoyed sharing my thoughts with you :-)
Have a good summer. I'm out.
-Suzanne
© 2006 Suzanne J. Matthews | Design by Andreas Viklund