natasha @ wpi: the summer research experience

natasha @ wpi:
the summer research experience

DMP Program
CAPE Project
natasha's corner

Home

The Details:
   All About Me
   My Mentor
   The Projects

The Journal:
   Week 1: 5/17 - 5/21
   Week 2: 5/23 - 5/28
   Week 3: 6/1 - 6/4
   Week 4: 6/7 - 6/11
   Week 5: 6/14 - 6/18
   Week 6: 6/21 - 6/25
   Week 7: 6/28 - 7/2
   Week 8: 7/5 - 7/9
   Week 9: 7/12 - 7/16
   Week 10: 7/19 - 7/23
   Week 11: 7/26 - 7/30
   Week 12: 8/2 - 8/6
   Week 13: 8/9 - 8/13
   Week 14: 8/16 - 8/20
   Week 15: 8/23 - 8/27
   Summary

The Pictures:
   People
   Places

The Contact Info:
   E-mail
   Personal Website
   AIM: theoryofideas

The Acronyms:
   DMP
   CRA-W
   WPI
   CS @ WPI
   CAPE

Final Report

Here is the final presentation for my summer project:

   Final Queue Manager Presentation (ppt) (ps)

And here are the screen shots of the GUI:

   Initial GUI screen
   Main GUI with additions on the bottom panel
   A menu showing the new icons
   Plot From File selection screen

Any additional details and results can be found in the Projects page.

Summary

I have not been very good at keeping this journal throughout my summer experience. However, I do have a good reason for this - I swear!

The last few weeks were really intense for me. I spent most of my free time coding, even on the weekends. It got a bit too hectic for me to keep up with this website, so I will now try to catch up on all that has happened.

The last time I updated this journal, I was just starting to think about the CAPE GUI. Before I actually dove into the GUI project, I was able to finish the Queue Manager. It took a long time and required a lot of help from Nishant, one of the graduate students. It was a very happy day when I finally tested it and it worked!

There were a couple compatibility issues to work out with the Queue Manager. We wanted to make it so that it was optional to use to Queue Manager and if someone did not want any data to be stored on disk, then they could turn it off. This wasn't a very difficult task though, so it didn't take much time.

After this, I started working on the GUI. It was at this point that I realized I bit off more than I could chew with this project. It took me the full 10 weeks to do just the Queue Manager project, which I initially expected to take no more than 3 weeks. Now I had a whole other project still to begin and finish, and my funding had basically run out already. But I had said I would work on the GUI, and I wasn't about to back away from that. It was also important for me to finish the second project because several of the grad students were getting ready to go to a big databases conference (VLDB) at the end of August and they planned to demo their work with the GUI.

I was excited to actually start fiddling with the existing GUI and making minor changes to it at first. The first thing I did, as I was learning how everything functioned, was create new icons for the buttons in the GUI. The old ones were not very visually appealing, in my opinion, so I made cooler looking ones. Then, I started changing actual functionality of the GUI.

The first major change was that I added a "plot from file" option for plotting statistics for the query engine. This functionality allows easy comparison of one configuration with another one. This is especially useful for optimizations so that you can see the "standard" performance compared with "optimized" performance all in one plot.

The next, and perhaps the biggest, addition I implemented was to have the GUI start first, before the system runs. Previously, you'd have to start running CAPE and the query processor would start the GUI. I added an option to have an initial GUI first, which can then start the system running, as well as the original GUI. This initial GUI allowed for easy setup of the system by selecting the configuration files and even some adaptability variables. More info on this GUI can be found in the Projects section.

In the end, I was happy with the experience I got from working with Professor Rundensteiner and the CAPE crew. The only thing I regret is not having any time off before classes started. I guess that was in part my fault, since I didn't manage my time so that I could finish within the allotted 10 weeks. I was just very tired from all the hectic last-minute work, getting things ready for the conference, which overlapped with the first two days of classes. Still, I feel like I am a much stronger programmer now than I was back then. I gained a lot from this experience.

July 2, 2004

The last two weeks have been filled with frustrating debugging. Just when I thought I had everything working, I got a million exceptions. I've been trying to work through them all, but I really do not see where they are coming from. I hope that if I step back a little and maybe move on to my next project, then I'll be able to come back with a clear head and fix the problems more effectively.

I have written up documentation for the QueueManager and as soon as I can get rid of all the errors, I'll be all done.

In the meantime, I will start looking at the GUI and see how the Natasha before me had done it. She created the GUI as her MQP project and it's pretty impressive. There are two packages she used that I will have to learn, but I'm really excited about finally doing graphics-related work. I think this will be challenging because I have spent too much time on the QueueManager and may be pressed for time on this project. But I hope that I can finish it up by the time the grad students need to present their demo.

June 18, 2004

This has been a really good week for my project. I was able to finish up two major goals - I got the file I/O working and I implemented the queue so that multiple operators can access it without any problems. There was a lot I needed to clean up in the old implementation and that took a bit of time. I also wanted to make sure that nothing I changed had any effect on other classes that might be calling methods within the queue, so I took a bit of time looking through the old files and comparing the changes I did. I think it should be fine, but only time will really tell.

I have also started reading about how to find out if there is enough memory to store tuples or if they need to go to disk. This seems pretty difficult to find. There are functions you can call in Java to see how much memory the VM is going to use when it runs... but is that the same as main memory? I doubt it because the numbers are very small. But maybe that really is all that's available for processes to run. I think I need to read more about how memory works. It's too bad I haven't taken any OS classes yet because I think that might be helpful.

While I do the research above, I'll also start looking through the execution code of CAPE to see where I can find out when an operator is all done with a queue. So, I still have a bit of work ahead of me before I can wrap up this project...

June 15, 2004

I did not get a chance to write up my progress from last week yet, so once again here it is, late.

Personal Reflections

Technical Details

Personal Reflections

I get frustrated a lot sometimes with this project. I know what needs to be done, and I work at it, but somehow there are always a million problems that come up. It makes me think of a programmer's drinking song that I saw once:

99 programming bugs in the code,
99 programming bugs.
You fix one bug,
Compile it again,
Now there's 100 bugs in the code.
(Repeat until bugs == 0)

Funny, but sooo true. Still, I do feel like I'm making a lot of progress finally. I hope to have this all wrapped up within a week or two.

Technical Details:

This past week the most important thing that I accomplished was that I got the file I/O working right. I only had it working with one customer accessing the queue, and there's still no method that decides when stuff should go to disk, but I did get it to work by manually plugging in which tuples should go to disk and which stay in main memory. I think this was the hardest things I had to do for this specific project, so I'm glad I finally accomplished it.

My next goal is to get everything working with multiple customers accessing the queue. This shouldn't be too bad, but I need to make sure to keep track of cursors. It looks like someone had started doing something with that before me because there are some methods that keep track of cursors, but it doesn't look like they ever fully implemented this.

After that, I'm going to see how we can decide when tuples need to go to disk. This might be a bit tricky because I don't know how to see how much memory is available for processing the tuples, but I'll do some research into it and see what I can do. I'm hoping that this will not take too long.

I feel pretty good about the project now because I got the most important part of it working. I did use Nishant's code, but there were quite a few things that needed to be changed in order for it to work properly. Also, right now I am only focusing on the enqueue/dequeue functions and any others that are called by them. But eventually I will have to go through all the methods in the queue class to make sure they all work with the new changes in place. This is more of a clean-up thing, so I think I'll leave that till the end. For now I'll focus on getting the major tasks accomplished.

June 8, 2004

This update is a bit late, but I have been busy and haven't had a chance to write up everything that's been going on this past week. Once again, here's the breakdown:

Personal Reflections

Technical Details

Personal Reflections

I got a chance to see some of my friends recently who took an internship that I was also offered but turned down for the research experience. I wasn't completely positive that I had made the right choice, but after speaking with them I know that I did. They say that all they do at work is sit at their computers, trying to look busy. They get some menial tasks to do, which their bosses expect will take several days but in reality only take a couple hours at most. They basically get paid for doing nothing.

While some people think that this is great, I think I'd just go crazy. One of the things I enjoy the most about my research experience thus far is that I am always challenged by the tasks I perform and I feel like a lot is expected of me. Sometimes I get stressed out because I feel like I can't live up to those expectations, but I'd much rather have people expect more from me than I think I can handle than be in the situation of my friends where nothing is expected from them at all.

It was truly an eye-opening experience for me as we all sat and talked about our jobs. I mean, sure they are all getting paid a LOT of money. But where's the personal satisfaction in their jobs? I'm personally very happy that I got this chance to work with Professor Rundensteiner and her grad students because I feel good about the work I do.

And now on to other business...

Technical Details:

I think the biggest breakthrough of this week came on Thursday when Professor Rundensteiner introduced me to Nishant, who she's been telling me I should speak with for a little while now. It turns out that he did almost exactly what I need to do last summer, but for a slightly different application!

This was all very exciting to me because after the first two days' work I was getting really frustrated because I knew what I needed to do and I also knew that it would take forever for me to do it. I had basically figured out that in order to store files efficiently, I'd need to write some sort of a parser that would somehow store the tuple information and be able to build it up again when the tuples are read off.

I had also started off thinking that there would be multiple cursors into the file, depending on which operator needed to retrieve tuples. But Professor Rundesteiner pointed out to me that once some tuples are loaded into memory by an operator, they can just stay in memory, so if the next operator needs to get more from the file, it can just read from the beginning. In other words, each tuple that is stored in the file will be read off only once and can safely be deleted after it is read. This simplified my task a lot and I don't know why I didn't see it before. I guess when you get too caught up in the little details, you fail to see the big picture sometimes.

Once Nishant gave me his code at the end of Thursday, I was a happy camper. It did take me a while to figure out just how his storage manager works, but I did figure it out. The hardest part, which took me the rest of the week to complete, was getting his code to compile with our Raindrop code. His storage manager was written for a special type of tuple, so I had to trace all the differences and add the necessary parts into our existing tuple code to make it compliant with the storage manager.

By the end of the week, I had his code compile just fine and his test case was working. However, now I need to get it to work and communicate with my queue manager, which is proving to be rather difficult. I hope to have that working soon.

June 2, 2004

A detailed to-do list has been put up under the Project section. This list will be updated regularly to reflect my progress.

May 28, 2004

As last week, here are the two sections for week 2:

Personal Reflections

Technical Details

Personal Reflections

This was a pretty short week for me because I got sick in the middle of it. I didn't come in to the lab on Wednesday or Thursday, so I got a little behind in my work. I hope that I will get a chance to catch up in the upcoming days.

We had a meeting today with everyone who is working on CAPE. It was pretty long, but I think it helped me get some things figured out. I was looking too far into my current task, and for now I just need to take a step back and get the simpler stuff figured out before I move on to changes that might affect everything in the system.

So overall, this week was alright. I did get a lot figured out a lot of what I need to do, and I'm all ready for working full-speed next week.

Technical Details:

A detailed list of tasks is coming soon. I will post an update when it is done.

May 21, 2004

Since people might be reading this journal for different reasons, I have separated it into two categories for easier reading. Enjoy!

Personal Reflections

Technical Details

Personal Reflections

The first week is almost over now and it feels like the time just flew by. I got to meet a lot of people this week that I'm working with. Everyone here is really nice and my mentor, Professor Rundensteiner, is extremely helpful. Every time I talk to her I leave feeling energized and really motivated. I'm not sure why that is exactly, but I've definitely noticed that any time I meet with her, I sort a lot of my thoughts out and feel much more organized.

After being here for a week, I see that this is going to be a challenging summer. There is a lot I need to learn. But I really am looking forward to everything because I enjoy challenges. They make life interesting.

Technical Details:

It was a slow start this week because the first couple days were just spent getting my desk set up and installing everything I need and figuring out all the little computer issues that somehow always come up. I spent some of my time updating the CAPE website and making my own DMP website. I love doing web design, so this was a lot of fun for me.

We also spent some time trying to decide exactly what my tasks will be this summer. There are two new graduate students who also started at the same time as me, so everyone was trying to find relatively simple tasks for us to do so we can get used to the CAPE system. I guess it's easier for me because I just took the Databases II class with Prof. Rundensteiner where we did three projects that dealt with a smaller version of CAPE. So I do know a little of how the system is structured. Still, the projects dealt with just small components, so I do have a lot to learn.

We decided that my first task will be to actually take one of the projects we did in the class and incorporate it into the system. The purpose of this project was to come up with some sort of a persistent storage system when there is not enough main memory available for the queues. The idea is that the tuples in the queues will be stored in main memory as much as possible to optimize speed, but since memory is not infinite, there should be a backup method that takes tuples and puts them in persistent storage if necessary. Right now the system assumes that main memory is unlimited, which is not all that far-fetched. The processes are distributed on a network of computers, so there is quite a bit of memory available. However, it's not safe to assume that this will always be enough. So that's what I need to do now. I have to figure out when and how the data should be stored so that it doesn't noticeably affect performance.

The way the system works is shown in the picture below. Each operator has one or more queues coming into it and one or more coming out. The resulting queue becomes an input queue for the next operator.

Since queues usually enqueue/dequeue several tuples at a time, it would make sense to keep the most recent ones in memory and only have the tuples in the "middle" of the queue in persistent storage. In other words, there should be a mechanism that decides when an operator is done enqueueing and when that happens, those tuples can be written to file. There would be a similar method for dequeuing. A certain number of tuples at the beginning of a queue would always be in main memory. Once an operator is done dequeueing, the next tuples that will be needed from the queue should be loaded into main memory. The queues would basically look like this:

To do all this, there would have to be a lot of communication between different components of the system. For example, the Queue Manager would need to know when there is not enough memory available, how many tuples will be needed from a certain queue, and when an operator is finished enqueueing. This communication will be the more challenging to implement.

The first thing that I'm working on now is working out the mechanism for storing/reading from files. This is what we actually did for the DBII project. However, we did not implement the most efficient method. I have been reading a lot of Java docs and a journal article to figure out the best method. I am thinking of either keeping a RandomAccessFile with some sort of header that says how large each tuple is in the file so that searching is faster or using Java's ObjectInput/OutputStream to store an ArrayList of the tuples in a file. I am not yet sure what the best way is and I am still exploring other possible options. I hope to decide this soon so that I can start actually implementing this.