Week 1: May 28th - June 3rd

I finished up reading the STL Tutorial and Reference Guide by Musser, Derge, and Saini as my review of the STL for C++. I highly recommend it to anyone who uses the STL often, as it highlights some less-known features that I will be sure to use in my future use of the STL.

I also finished up reading a few papers on STAPL, to understand the general concepts of how this superset of the STL works:

  1. Standard Templates Adaptive Parallel Library (STAPL) by Rauchwerger, Arzu, and Ouchi. This gave a good general overview of exactly what STAPL is. If you are unfamiliar with STAPL, I suggest you start here.
  2. STAPL: A Standard Template Adaptive Parallel C++ Library by An, Jula, Rus, Saunders, Smith, Tanase, Thomas, Amato, and Rauchwerger. This gave a little more detail, including an explanation of a pTree and analysis of the performance of various sorts in STAPL.
  3. STAPL: An Adaptive, Generic Parallel C++ Library by An, Jula, Rus, Saunders, Smith, Tanase, Thomas, Amato, and Rauchwerger. The discussion on how STAPL was used with a physics problem in here was interesting.
  4. A Framework for Adaptive Algorithm Selection in STAPL by Thomas, Tanase, Tkachyshyn, Perdue, Amato, and Rauchwerger. This was an in depth discussion of STAPL that gave good descriptions of the column, radix, and sample sorts and discussed the speedup of each, among other things, with STAPL.
  5. STAPL Parallel Container Framework. This is the potential dissertation of my Graduate Student, Gabriel Tanase, and goes into the detail about the recent major addition of views, changing the role of the pRanges. It was very clear and I found it easier to read, though that might have been because I had already read four papers.

After getting our accounts set up, the other two undergraduates, Michael Souza and Jorry Palm, and I set to work playing around with STAPL, mostly concentrating on the pContainers and pAlgorithms. Michael\'s graduate student, Nathan Thomas, and his wife, Shauna, were also nice enough to take us around to a few stores in order to help us get the groceries and other necessary items.

Dr. Amato and Dr. Mauro Bianco, another of Dr Amato's graduate students, talked with us about potential projects, and decided that we would work on the p_sort Algorithm first, by splitting among us the revision of the code for the radix, column, and sample sorts to take into account the major changes detailed in Gabi's paper. The estimated that this might at most take a month, and then we would move on to working with the pContainers. We discussed among ourselves, and decided that Jorry would take the radix sort, Michael the column sort, and I the sample sort.

Over the weekend we all continued to try to play around with STAPL, specifically with views and getting test files to work, and started looking at the old code for our sorts. To tell the truth, I feel more than a little over my head looking at the old code and trying to get things to compile with STAPL, especially since I never had even heard of sample sort before I got here and only know how it works theoretically, never having implemented it or understood someone else's implementation.

Week 2: June 4th - June 10th

This week felt like it was Sample Sort 24/7. Although I only work from 8:00AM to 5:00/6:00 PM, I was thinking about it almost all day every day. Having to keep re-implementing the same thing multiple times as I talked to different grad students and they suggested I try other approaches got on my nerves a bit at first, but then I settled in and started really enjoying the coding of it all.

For those of you who, like me before I started this problem, have no idea of what Sample Sort is, here is a quick sketch of the algorithm:

  1. Place the data (numbers, strings, etc) in some type of container.
  2. Choose, ideally randomly, a number of values (or indexes) from that container of data that is one less than the number of desired buckets (which, for parallel processing, can correspond to processors). Put these values, which we will call splitters, into another container.
  3. Sort the container of splitters.
  4. Use the splitters to create buckets. Take the elements from the container of data and place them in the correct buckets. For example, if your data consisted of 3 8 9 2 0 1 7 5 4, and you had chosen your splitters to be 2, 5, and 8, you would place 3 in the "bucket" created between the splitters 2 and 5 and place 7 in the "bucket" created between the splitters 5 and 8.
  5. Sort each bucket.

Implementing this with STAPL has been really fun, even though I have had to implement it a few times over and now will be returning to rewriting the first approach I had tried. I figure it is a learning experience, and the grad students have been very helpful in explaining the different components of STAPL and how they work.

I keep thinking how this would be so cool if it actually passes the standards board and people really start using this. It takes care of so many of the difficulties of parallel programming for the user. But even if it doesn't, I don't think I will regret anything, because I am learning how to accustom myself to an already-built framework and work within it.

As to my personal life, the people here are really nice and I have gotten close with some of the other REU/DMP students. My roommate is awesome, and the only difficulty is remembering that I need to go to the store to get food and finding someone to get me there, since all the stores are at least a few miles away from my dorm, the Tradition at Northgate.

Something that has really made me really happy has been the Hillel down here. They have regular Friday night services, and they have been very welcoming to me. The Prime Minister, Ashley, has given me rides to and from services, and offered to help me out with getting to the store. I really enjoy the discussions they have about Judaism and the world.

All in all, I have been enjoying myself thoroughly, and am very glad I chose to take this opportunity.

Week 3: June 11th - June 17th

I got an implementation of Sample Sort working!

This week has been a lot of fun with coding. After Dr. Amato and Mauro suggested I just go back to writing Sample Sort the simpler way and worry about optimization and efficiency later, writing the code was fairly straightforward. I know there are definitely things that need to be optimized, such as how I am choosing my over-sampling ratio (a number that defines how many samples I will pull from each thread to choose the splitters from), and how I am randomly finding those samples, and I have sent a copy of my code to all the graduate students and Dr. Amato and Mauro so that they can give me their feedback.

I also got to work on p_npartition this week with Gabi, which was really fun because I felt like I was writing a piece of STAPL that others, such as my fellow undergrad researchers, might use. p_npartition basically does all of step 4 as described in my entry from last week. I did not write it, but simply worked towards optimizing it. Although we have run up against a few bugs, I am sure I we will figure out how to deal with them next week.

Next week should consist of working on researching Sample Sort and learning how to document my code using something called Latex.

This week I also attended another GRE session that was supposed to prepare me for the Verbal section. They say they put more emphasis on the Verbal section than the Math section because the majority of us are Engineers. They suggested that we read more, because, of course, Engineers don't have a good grasp of the English language. That would be unheard of, and possibly dangerous, as I remarked to Dr. Amato.

More useful was the graduate student panel discussion that we had during a brown bag lunch this week. They talked about the reasons for going to Graduate School and their experiences with Masters and PhD programs. I am now leaning towards going to grad school to get a MS, but I don't think I want a PhD, to tell the truth. The stuff I really enjoy is the programming, the coding. I would rather be someone working on the code of STAPL than the person who had to come up with the idea in the first place and tell everyone else what to code.

Week 4: June 18th - June 24th

This week was more polishing of Sample Sort. I didn't realize how long it would take to write this simple algorithm. However, I am having fun coding it, so I don't particularly care how long it takes, as long as I finish before the end of the summer.

I also read a few papers. I have learned that I should not try to read a paper in its entirety, but rather skip to the important sections and focus on those. I found it a little disturbing at first when Dr. Amato told me that very few people who read a paper actually understand all of it except the writer and a few others working on a similar topic and that the same goes for lectures, but I guess I will learn to accept that after a while. The problem is I am very used to trying to understand everything I read.

I think I will try to make the technical report, which I have just started outlining in a Latex document (Latex is a popular program for aiding in writing technical reports), more readable to someone who, like me when I began reading up on STAPL, has very little background in parallel computing.

At some point I think I will spruce this thing up a bit. There are so many things I want to do this summer. I borrowed Jorry's PHP and MySQL book and I am trying to read that, as well as read as many other books as I can get my hands on this summer. The Evans Library here is huge, and it makes me happy.

Week 5: June 25th - July 1st

Sample Sort is a monster: it is taking on a life of its own. As the author of this fearsome implementation, I am watching as it grows more heads and arms and legs and wondering when it will grow to big to support its own weight.

The above was not nonsense, sample sort is really growing more heads in the sense that I am letting the user choose how to choose the samples (the brains of the algorithm), and theoretically whether they want to do overpartitioning. They already can choose the over-sampling ratio. I am wondering when the overhead of all this will start hindering the performance of it. Not that I have started serious testing yet, but the day will come when I finally finish with finding new potential optimizations.

What is great about how long Sample Sort is taking is that it is taking me into other, more obscure, parts of STAPL and I am getting to fiddle around a bit with seemingly unrelated pAlgorithms and parts of pContainers. So far I have also contributed in small ways to p_npartition and p_min_max_element, and I am working on a mapping strategy in partition_base.

I am getting a little more used to reading papers, although most of them are still very difficult for me to read, and I find myself having to skip sections. Part of it is that there is a great deal I do not know in Computer Science. However, I think part of it is also the fact that these papers are a little pretentious in writing style. For example, given the option, they will use \"enumerate\" instead of \"list\".

I understand this is formal writing, but I don't think the authors have to write as if they were trying to impress someone with their language. Technical writing should try to express ideas in the most basic language possible, because the topics are often confusing enough to begin with and there is no need to make reading them more difficult.

PHP seems like a pretty cool language so far. I like how can switch between types at will, and how there are so many different ways to do one thing. Printing, for example, can be done with print or echo, and with single quotes or double quotes. Slowly but surely I am reading through the PHP and MySQL book. It was really nice of Jorry to lend it to me.

I need to spend more time with Drupal. The CMS that Mike helped us all set up is pretty cool, and if I want to actually learn something I need to play around a little more than I am currently. It is just so hard to pull myself back to the computer when I have a few good books to read, especially when one of those books is Ender's Shadow. Orson Scott Card is amazing.

Hope everyone who reads this (maybe 1 other person if I am lucky) has a Happy Fourth of July.

Week 6: July 2nd - July 8th

First off, it was great to see my family again. I cannot believe the weekend went by so quickly. Houston was nice, but seeing Mom, Dad, Ari, and Josh was better.

Second, I have realize I will probably not get around to prettying up this site until the end of the program, if then. I have so much I want to do, and not enough time. As it is, I am behind on the CMS training. Work (research) is as draining as ever, and I generally want the chance to live, breathe, eat, sleep, exercise, and even read a little bit.

Speaking of Work, I still haven't gotten to testing. All last week was preparation for testing, and there is still some more preparation to be done tomorrow before I can try to test again. The supercomputer I was supposed to be testing on is giving me problems. Basically, it cannot compile my code because of my testers, which is a problem. Hopefully next week the situation will be fixed.

I helped edit Gabi's paper for English and grammar errors earlier in the week, and may end up doing a bit of that next week as well. Hopefully it will be done soon. It has been more than a bit frustrating that he does not have time to help me with my work because all his time is taken up with his paper, since I am at a point now where I need to discuss testing and certain bugs with him.

I really cannot think of anything else to say. If I think of anything, I will add it next week.

Week 7: July 9th - July 15th

This week I battled Hydra. Not the ancient Greek mythological figure with hundreds of snakes for heads, but the IBM supercomputer. It has 128 processors and a distaste for the gcc compiler.

First it wouldn't compile anything I sent at it, sitting there for an hour and then telling me, sorry, but you don't have enough tmp qouta. I tried breaking my tester down into many smaller testers, with the same results. I was finally given an increase in quota, from 10mb to 50mb and then finally to 1GB, but it still wouldn't compile. Then we realized that it was the -g flag. The -g flag allows for debugging programs to come up with a display of what crashed where that a human can understand, but apparently takes a ridiculous amount of space/time to do so.

We got rid of the -g flag once we were sure there were no errors that we would need to debug, and then were able to compile in 2 min or less. I was all carefree and happy, until I saw that the 30 hours I was allotted to run on hydra was rapidly becoming 24 hours, then 18 hours, then 15 hours, and then 11 hours. Now it is -109 hours. I don't know how this is possible, but it ran the necessary tests, so I guess I will just hope it will be nice to me.

The highlight of this weekend was finding a huge bug in my program that tripled the runtime, and actually just being able to easily discard that step. It still takes 140 seconds to do everything but sort the buckets (last step) on 64 million elements, but at least I know the sorting buckets part should not push it over 300 seconds. Theoretically.

I keep telling myself that I am lucky in how well my research is going. My friend back home is doing research in Physics and he just found out a few days ago that the experiment he had been painstakingly setting up for 2 or 3 months now is not possible. I just hope this next experiment that he is doing goes well.

I am starting to feel the crunch. I have all this testing to be done, a paper to be written, and a poster to make, and I only have 2 weeks left. I am close to pulling my hair out at some points, although sometimes I am feel strangely calm about it.

I wish the little things would make it easier. For instance, if GNU-plots worked on my computer, I could get on to the next phase of testing, but it doesn't, so I have to wait until Monday to see what the next step is. Xdvi also doesn't work on my computer, so I can't work much on my paper.

Oh well, I guess such is life. I will get through, and soon I will be home in querida MA. Texas is nice, but I miss MA.

Week 8: July 16th - July 22nd

I know I posted this a few days late, but it is only because I have been figuratively working hard and playing hard. Let me explain what I mean by that:

1: Working Hard. I have been working 11 hour days on a regular basis for the past week. And it isn't like I have been checking my mail or playing games online for breaks. The only break I have been taking is for lunch. There is just that much work to be done. It would be helpful if I didn't keep making little mistakes because I am tired. For instance, I ran a few jobs on the queue with the wrong number of elements for data input, so those were just a total waste.

2: Playing Hard. This weekend, when I got back from work Saturday evening, I went to a steakhouse and a rodeo with some of the Physics REU students and my roommate (also DMP and CS). I couldn't have steak at the steakhouse because I keep Kosher, but the fish was good and the rodeo was hilarious. The only thing slightly disconcerting was the treatment of animals. The things they do to get these bulls and horses to buck are inhumane...

Sunday, after working in the morning, we made snicker doodles and watched the movie of Firefly, Serenity. They were both amazing (the snicker doodles and the movie)

All in all, I am exhausted. It has helped some that I have taken to exercising in the morning before I got to work (wake up at 6:15am, exercise until 7:00am, go to work at 8:00am), since that means I don't have to worry about it at night.

Week 9: July 23rd - July 29th

One thing to note about research: it never finishes cleanly and quietly. And it definitely never finishes calmly. This past week has been draining. Every time I thought I was done, I wasn't. There was always some little detail that required further investigating.

Oh, and there was the fact that the air conditioning in the building that houses Hydra, the supercomputer we are running our tests on, blew out just before I wanted to run a few, what I thought to be final, tests. This meant that Hydra was down all Friday. It didn't help that that on Thursday, the super computing people quietly, trying to sneak it by us, turned off pre-emption. The computer science department pays a great deal to allow us sole use of 128 of the processors in Hydra, and while they let other departments run on it as a courtesy when the computer science group isn't using it, they are supposed to pre-empt, also known as bump-off, anyone who isn't computer science when the computer science people need it. Basically, they prevented us from running tests.

I think I have finished testing both the redistributing version and the non-redistributing version of my algorithm for the oversampling ratio, the overparitioning ratio, and general scalability. Now I think all that is left is the sampling method. I will be able to finish off most of my paper in the next few days, and I am hoping that we will finish the poster today.

I am excited to go home and be with my family and friends again, but I must say that I will miss the other students here a great deal. Not that the grad students, Dr. Amato, and Kourtney haven't been amazing, but I think I have had more fun this summer than any past thanks to the other REU students. I hope I keep in contact with them after we go back home.

Week 10: July 30th - August 4th

Well, it is over. I am writing this as I wait for my poster session. My paper is done, our poster is done, and we have to present this afternoon. I leave the dorm tomorrow around 5:45, and arrive home around 3:30 (EST).

This has been a great experience for me, and I have a much better idea of what research entails after these past 10 weeks of STAPL-ing. I learned how to get used to a huge and totally new framework that is constantly evolving and how to incorporate the ideas of others into my work through research and asking the people around me for suggestions. I have become more familiar with the C++ language and learned a bit about Parallel Programming. I suspect I am going to miss STAPL dearly when it comes to learning about Parallel Programming in terms of MPI, threads, and locking.

As to whether I am now going to go to grad school and get a PhD, I am not sure. I think at this point, after hearing the graduate students discuss what you can do with the various degrees, I am leaning towards grad school for a masters. However, I am not sure. I can only hope that I get an internship in industry next summer, so that I can try that route out a bit as well before making my decision.

I can't say I will miss the inability to get to the grocery store or any other store without a car, but I will miss the people here. Like I said in my previous entry, the other REU / DMP kids are amazing. I wish I had more time to spend with them. Especially my roommate, who is fun to just talk to about pretty much anything. I guess I am lucky with roommates, because my Tufts roommate last year, picked semi-randomly, was also one of the coolest people I know. Anyways, I hope I can remember to keep in touch with them.

That said, I cannot wait to get home. I miss my family and friends, and I miss Massachusetts. It really is almost a different country in Texas. And, of course, I can't wait for my mother's cooking ;)