Even though there was LOTS of work and very very very many hours spent in the lab, there's always the weekend. And this is where these come in.
This week I met all of the people I will be working with: Natassa, Lisa, Stavros, and Vlad. We had two meetings during which we discussed the various projects that Lisa and I could work on. We decided that my project will deal with the staged database system and workload characterization for OLTP. The main goal for the summer is to get the staged database system ready for TPC-C and TPC-H. I spent all of Wednesday and Thursday reading about TPC-C and TPC-H and the transactions they're made of. On Thursday I met with Natassa and Vlad and we discussed the project in a bit more detail, as well as a list of things needed to be done. Vlad also gave me a copy of a TPC-C toolkit that another graduate student at Carnegie Mellon University wrote, but although it is on top of SHORE (the storage manager that the staged system also uses), it is not staged. For the rest of the week I read more about TPC-C and I got the toolkit to run some transactions and studied those.
This week I started creating a document that details the transactions of the toolkit as well as TPC-C. This is a very important step because it will provide us with an understanding of which operators and functions still need to be implemented in the staged system in order to be able to work with the benchmarks. I also started dissecting the toolkit and understanding the operations that are involved with SHORE. The next steps in the process will be to implement both insert and update in the staged system.
This week I started working on the hash join. I spent about 2 days writing a hash_table which has the following properties: insert, lookup, remove, init. I also did some testing but I'm sure more testing will be needed as I implement more functionality for the hash_table. I then incorporated what I had written (the hash table etc.) into the already existing TPC-C Toolkit (so that I could use the other functionality of it such as loading tables etc.) Since the hash join will later on be part of a bigger puzzle which already has tables to work with etc., I didn't see the point in wasting time implemenenting that myself (which is where the toolkit comes is). So far I have inserted a table into the toolkit and the next step will be see if I can join the two tables in question. Currenlty the table selection as well as the fields they're joined on are hard-coded, but once this works, I will start making everything much more "stand alone." A few things that slowed me down this week included unfamiliarity with C++ as well as working with shore.
This week I worked for about 65 hours on my hash join! I was really into it, and I just couldn't stop working. Another reason for why I spent so much time working on it was because I kept getting C errors that were difficult to figure out. Lisa helped me a little bit with those, but besides that, I was on my own. Near the end of the week, I realized that I was implementing a part of the hash-join incorrectly, but I will meet with my advisor on Monday and figure things out.
One really exciting thing that happened this week was the Aladdin conference. Lisa and I attended the two day event which was about the Aladdin center here at CMU and the various research projects and classes associated with it. We met very many people and had a great time! I've never been served wine by a Turing Award winner, so that was also pretty neat :)
This week was somewhat more laid back than the previous one. I met with my advisor first thing on Monday morning. We went over the hash-join implementation that I was working on. She gave me some things to read and gave me some pointers. I also met with Vlad and he also answered some of my questions. Since my birthday was on Friday, I took the day off and spent time with some friends who were visiting me.
This week I spent lots of time coding and dealing with bugs. I found that the toolkit had some problems with it that I had to go around. I had to write a few more functions which took up some time. I also ran into major C errors but I did learn a great deal (although it did take me a while)
This week I finished the hash join!! Well, I can't take all of the credit since Vlad did help me. My advisor wanted me to do some other parts of the project and since the hash join was weighing me down (specifically my unfamiliarity with C), Vlad spent some time with me and we finished it on Friday afternoon. It was lots of work, but the finished product was great :)
On Monday I cleaned up my hash join code. I took care of all memory leaks, commented, documented, optimized the code etc. Starting Tuesday, I began my work with the TPCC workloads (using DB2 not QPIPE). I was to write a Python script which would initialize, run and stop the tpcc toolkit for DB2. I had never used Python, so Lisa pointed me to a few sites and helped me out with some compiling errors. We worked very many hours that week, since our advisor gave us a deadline of Friday.
This week we continued our work with running the wisconsin and tpcc benchmarks. On Tuesday we met with our advisor and she gave us new assignments. There are a few things that she wanted me to change from the work I did in the previous week. Also, she wanted me to start working with the tpch benchmark. Right now I'm having serious issues with tpch since it takes about 8 hrs to initialize and it seems to crash about 10 seconds into the initializing.
This week was very busy since it was the last week of our research. Lisa and I finished up TPC-C. We met with our advisor and she liked the progress we were making. We hooked up my python scripts to Lisa's applet, and the TPC-C toolkit can now be initialized and run from from the applet! I had issues with the output file, though. Because there are multiple users running the toolkit and writing to one file, they were overwriting each others data! Locking the files when writing didn't seem to help. After many many hours spent on this, I figured it out on Friday night. Unfortunately since Lisa had already left work, I never go to see the working TPC-C (with the output file), but hopefully I'll be able to run it sometime or see a screenshot. With the help of Minglong, we got the TPC-H toolkit to create the table spaces and populate the tables. I also wrote the python scripts to get the TPC-H toolkit working for DB2.