YY invited us to her house to acquaint us with her current graduate students. I re-read the flashback project to try to understand more of the technicalities of the project. The paper can be found here flashback. I started researching related papers with replaying processes and threads. We will be doing presentations from this list of papers every other week. I will be presenting my first paper staring next Wed. I also used some time to set up my working environment on Puccini, the computer that I will be using. YY gave us a couple of places to look for papers. One of the sites I found very useful is from U Penn. They have a very broad database of technical papers.
Here are a couple of interesting papers related to rollback and re-execution.
Execution replay and
I did my presentation on Deterministic Replay by Choi. My presentation slides are ready for viewing here Presentation 1 I think I was pretty nervous. I spent about two days preparing I was still quite nervous. I don't think I ever gave a presentation that was 30 minutes long. Comments were that I had too much contents on my slides and I was reading off of them, so I will need to watch out for that next time. Also it was suggested that I keep my slides more simple because they could be distracting. It is a good experience, because I also got to hear how other people presented their papers. Heads up my to Joe. He's another grad student in YY's research group. He gives his talks like a pro.
I also spent some time talking with Sudarshan, the graduate student whom was working on Flashback project previously. We came up with several things that I could add to the project. He is very knowledgeable about the project and gave me a high level description. I began to familiarize myself with the kernel by reading kernel module documents. One of the best sources I found was this online book, which you can find at http://www.tldp.org/HOWTO/Module-HOWTO/x102.html I would suggesting this to anyone who wants to do some kernel hacking. I wrote mine own tiny Hello World kernel module. I also did some reading up on Syscalltrack. A crucial tool that was used in flashback. Documentations of SyscallTrack can be found http://syscalltrack.sourceforge.net/examples.html
This week consisted of primarily code reading. I obtained the code for flashback. I think at first it was quite difficult, because I didn't know where all the files are, and which ones are important. It seemed that every file I opened contained over thousands of lines of code. So I printed out some important parts of the files I needed. Then I did it the old fashioned hight-light and comment by hand way, which to my surprise worked quite well. I'm beginning to link the function calls with the functions and get an understanding of how it all worked. The idea is quite simple but the code is quite complex.
I also figured out how to compile the kernel with the flashback files added it. It is a tedious task for a rookie like me. Plus it takes 10-20 minutes to make the modules and compile the kernel. I think it took a couple of hours just to get the config files right. However, I've got to admit it was a very fruitful experience.
One thing I forgot to mention is that we meet with YY every Wed to discuss the progress of our project. I'm not sure if every adviser does this, but I think every adviser should. Not only does she gives us helpful comments, I think she also does a good job encouraging us.
I think I will be adding multiple checkpoints to the project, and into gdb and perhaps automated checkpoints if I have extra time.
I gave my second presentation this week. This time the paper is on IGOR, a paper on re-execution of code. The presentation slides can be found here Presentation 2. I think I did a better job this time, but lots of improvements are still needed. One thing I need to improve on is my presentation skill, I need to be louder. Also I think I need to understand the material more thoroughly so I can be more prepared for questions.
Now that I've gone through pages of code I'm beginning to implement the user side of the project. I'm implementing the functionality to selectively read the correct log entry for the correct checkpoint . It is very hard to debug, because every time I want to run the program I need to unload and reload the modules and my only way of debugging is to use kernel printk. As if that's not hard enough every little mistake I make ends up with crashing Puccini and restarting the computer. It can be very very frustrating!
I've made some progress with the logging part of the kernel. I figured out how the log files are written to and read from for the playback, and I've added some functions that will be able to differentiate different parts of the log for each checkpoint. I think the idea is so simple yet it's taking hours and hours to implement. One thing is I'm getting used to is being a lot more careful when writing code; it's either that or reboot time.
Based on the outputs of my kernel printk's, I'm quite convinced that my log writing and reading utilities are working correctly. I'm looking at Syscalltrack code to figure out how to pass the correct information to the kernel Handlers. I think that would be a hard task.
of the biggest problems I ran into is needing to pass an extra
parameter to the kernel handlers through
Syscalltrack and I cannot find a way. I tried to trace the function
calls in Syscalltrack, but I think it seems that all the functions
have the same number of parameters and the nested level of function
calls are at least a dozen. I cannot seem to find a way around
it. I've been
trying this and that for some time.
came to my rescue and
mentioned that they had the same problem when they initially tried to
implement flashback. He mentioned that all Syscalltrack functions
much has those parameters; they had the same problem when they
tried to pass information to the kernel handlers. They used a
by overloading some of the parameters that aren't being used by
Syscalltrack to pass the information needed by flashback. He
suggested that I should try to find a parameter that wasn't being
used and perhaps trying to overload it.
This is one of those things that is easy in concept, but hard to do. It took me almost two days to figure out which parameter I can use "safely". I'm very grateful, because I think that was the only parameter left unused or I should say didn't crash the program. Otherwise I would need to implement another structure and modify their initial parameters to a pointer to that struct, in which case I would need to modify every instance of that paramerter and do casting which would lead to many days testing and debugging.
After I got the the parameter to be correctly passed to the kernel I just needed to change the kernel task_structure to index the shadow process that will be doing the replay.
added a array of in the
task_structure and made a couple modifications in the main kernel
functions fork.c exit.c ...etc
I've gotten the shadow process to index correctly. After I got the checkpoints to work correctly I went ahead and made changes to the rollback and replay system calls mechanisms. This rest of this week was spent testing it out and correcting bugs. One of the problems was the system calls and log mismatch. I also had to make sure that the shadow task_structures were correctly being cleaned.
also spent some time looking into
gdb. I need to eventually add the functionality into gdb. I'm
learning where the files are and finding out how to compile gdb
...etc. Some how there were too copies of some of the flashback
code I'm alittle confused on why there are two copies?
I did another presentation this week on “Optimal Tracing and Replay for Debugging Message-Passing Parallel Programs.” The power points can be found Presentation 3 I think this paper was a little shorter than the other papers I presented. I think I'm starting to get the hang of the paper presentations. I'm a little less nervous. Comments were that I need to speak a little louder it seemed that my voice trials off towards the end. It seems I need to learn more about the strategies of presenting a paper. It used to take me about a full day and a half to prepare and I think now it's taking me about a day. I noticed that the graduate students only need a couple hours to do an exceptional job. I think reading more papers would definitely help.
added some functions in gdb to allow me to rollback and replay to a
specific checkpoint in gdb. I think I still need to do more digging
with gdb. Gdb is a complicated piece of software, with lots of
files and functions. I think I'll start
looking into the automated checkpoints.