June 1st - Week 1
So I arrived in Amherst a little earlier than expected. Because my advisor, Eliot Moss, was going to be out of town for the first official week of the program, it would be nice to meet him before he leaves. On the Friday before the first official week, I met the other students I will be working with in the lab: Addison and Chujiao. Eliot got us set up with the servers, keys to the lab, and other clerical stuff before he left for the week. Once everything was running smoothly, we took a look at the project and some of the tools we will be working with for the summer.
The overall project is called CoGenT, which provides a mechanism for describing a machine architecture at a high level. Using this high level description, CoGenT can generate plug-in components for existing simulator and compiler frameworks. At this point, I don't think I have a grasp of the entire project yet, but I'm sure I'll have a deeper understanding the more I work with it.
One aspect of the CoGenT project is a high-level language called CISL that is used to describe the Instruction Set Architecture of a machine. To get better acquainted with the project, we'll be writing CISL code to describe the ISA of the PowerPC. This won't be the project itself, but it'll help us understand how to work with the available tools and what exactly the goal is for CoGenT.
Most of the week was spent writing CISL code and documenting errors we found in the tools. Specifically, I implemented the majority of the fixed-point operations described in the PowerPC ISA into CISL code. At this point, however, there is no way to tell what I have written is semantically correct. Yet I was assured that as long at is syntactically correct and I have a firm grasp of the tools, then we are ahead of schedule.
June 8th - Week 2
A lot of this week was spent implementing POWER ISA functionality into CISL code. At this point, I have written a great deal of code that replicates the fixed-point facility, while Addison and Chujiao are working on the floating-point stuff.
One of my advisors, Tim, gave a pretty broad overview of the CoGenT project to the three of us and helped us fully understand the scale and scope of what we could possibly be doing this summer. Tim is working on something he calls Gist. What Gist does is it takes as input a description of a compiler IR architecture in CISL, a description of a target machine ISA in CISL, a mapping of the storage between the two and generates instruction selection patterns. Basically, given an IR instruction (say, a Java bytecode instruction), Gist will attempt to find a sequence of instructions on a target machine (say PowerPC) that would provide the same functionality as the bytecode instruction.
It turns out that the whole backend of any compiler relies on an instruction selector, so Gist can be used to generate a major componenet of the backend of a compiler. What I have been doing is I have been taking the instruction selection patterns, and have been trying to figure out a way to plug them into an already existing compiler framework like lcc or JikesRVM.
So far, I have been really busy.
June 15th - Week 3
The majority of this week has been spent developing an adapter for the JikesRVM framework. The adapter takes the Gist output (the instruction selection patterns) which is currently formatted in XML and attempts to write the Java source code needed to implement the Baseline compiler in Jikes for the PowerPC.
The idea of the CoGenT project is simple: it should be easier to write the CISL description of the IR and the target ISA, the store mapping and the compiler-specific adapter than it is to hand-write an entire instruction selector for the backend of a compiler. Right now, since I'm not an expert in compiler writing, it seems to be that this is the case.
I have written a large chunk of the adapter already in Ruby. The majority of the instruction selection for the baseline compiler in the JikesRVM is contained in a single class, so all I have to do is try to automatically generate this class from the instruction selection patterns provided by Gist. What I'm doing currently is outputting lots of methods that are specific to Java bytecode. For example, the when the compiler encounters the bytecode iadd, it calls a method named emit_iadd() which just drops down to target assembly. I'm automatically generating the method emit_iadd() and it appears that the generated one is actually more efficient than the provided one in that it has less extraneous operations.
We plugged in some of these generated methods into the compiler framework and all appeared to function correctly. Tim, my graduate advisor, was pleased to know that Gist actually produced correct results and that it could possibly be automatically plugged-in to existing compiler frameworks. All in all, a good week.
June 22nd - Week 4
I wrote the adapter from last week in Ruby, which is a really convenient language for XML processing and string manipulation, which is essentially most of what the adapter does. Unfortunately, since the rest of the CoGenT project is written in Java, it was decided that Java would be a more appropriate language to write the adapters in. I kind of dislike Java for many reasons, but porting what I had written was actually not all that bad.
Once I got everything up in running in Java and integrated into the same project in Eclipse, I expanded what the adapter could do. Surprisingly, the adapter I wrote can now generate the entire Java class for the PowerPC Baseline compiler in Jikes. Meaning Gist accurately maps instructions from the IR to the target machine and that the CISL descriptions are mature enough to actually find enough sequences to make Eclipse not cry when building JikesRVM.
Just for fun, I decided to create an adapter for the Gist output of instruction selector patterns to a more readable format. Basically, I took the patterns and made an HTML output which you can find here. To see what this means, click on the aload instruction in the list of IR instructions. A box will appear describing exactly what the instruction does in English, and then it will provide the appropriate sequence of instructions that must be emitted on the PowerPC machine to replicate this instruction. I thought this tool was pretty nifty.
For next week, we're planning on improving the CISL description of the x86 architecture (i.e., the only architecture that matters) to allow me to start writing an adapter for JikesRVM's Baseline compiler. That should be very interesting, only because the x86 architecture is the most complicated mess of goop that I've ever seen.
June 29th - Week 5
This week was fairly short and relatively slow. Because the 4th of July fell on a Saturday, the Friday before was declared an official holiday, which I enjoyed thoroughly. This left very little time for actual work to be done (and I'm a little guilty of using some of the office time unproductively).
What was very productive and insightful, however, is we had a mini meeting where one of our advisors, Chip Weems, laid down what we'd be doing for the summer and what our plans from here on out are. Essentially, the group here is rushing to write a paper for ASPLOS, which is a major architecture conference. Because the CoGenT project is at the point where it's pretty mature (i.e., it's actually producing output the can be used), this is a very exciting time to be researching and working with it.
Addison, Chujiao and I will be feverishly working on getting the Gist output to play nice with the backends of both the Jikes baseline compiler and lcc. The goal is to get working adapters for the compilers for the PowerPC and x86 architectures. Since this is being done for a paper, whose deadline is beginning of August, we're going to have to get this cranking. It will be fun.
July 6th - Week 6
It's getting closer to crunch time for the grad students writing the ASPLOS paper. As they're trying to get down what exactly they need for the paper, we're attempting to throw everything we have at the machine descriptions. Chujiao and I were tasked with getting everything down for the PowerPC baseline compiler on the Jikes RVM. This meant getting our hands dirty with researching how the Java Virtual Machine operates in theory and what actually goes on in the Jikes implementation.
The JVM is notorious for having almost little to no standardization with its specification, leaving most of the implementation issues up to the people who actually write their own virtual machines. This means it has been really fun trying to figure out what's supposed to happen, and why the RVM implementation is going about it in a different way.
For the most part, we are just testing if the instruction selection process is functioning correctly both syntactically and semantically. Meaning, testing whether the instruction selection patterns and the adapters produce both valid Java code that conforms to the RVM and valid sequences that perform the correct semantical operation. Right now, we're kind of just eyeballing the output and getting a feel as to if the adapters we wrote are actually producing sensible results.
Completely unrelated to instruction selection, I went to a concert on Friday and saw some pretty amazing bands perform. The headline was Mission of Burma, which is an old school punk band. Unfortunately, there was no moshing, but it was still really fun anyway.
July 13th - Week 7
Continuing from last week's progress, Tim set up Jikes RVM on a PowerPC system and gave us sort of a testing methodology to actually verify if what is being produced is what we want. At this point, progress can actually be expressed in a quantitative way.
The way the Java Virtual Machine Instruction Set specification works is that each op code has to fit in one byte, meaning there can only be up to 2^8 = 256 bytecodes defined. However, a lot of op codes are not in use so this limits the total number of bytecodes that need to be implemented. Basically what we've been doing is going through the list of bytecodes and checking them off one by one. If they are currently implemented (i.e. described in CISL), we can actually plug them into Jikes now and test them. If they aren't implemented, we try to add some machine descriptions and fix up the adapters so that we can match as many IR instructions as possible.
At the moment, we can say we have implemented 40 bytecodes and verified that they actually work and on the RVM. It's pretty exciting to see automatically generated instruction selectors being plugged into the compiler and compiling/running actual code. To see that the code runs and the entire process does what it's supposed to is a nice feeling.
This week has been a lot of just trying things working for the paper. When it doesn't work, we try our darndest to fix it, and we're getting some pretty compelling results this way.
Not related to ALI stuff, a lot of the other REU kids went skydiving over the weekend. Afterwards, they went to a pretty amazing barbeque place (well, amazing for New England at least). I passed on the skydiving but was excited for the barbeque. I think that decision very concisely summarizes my personality, honestly.
July 20th - Week 8
If I were to use one word to describe this week's work, it would be successful. We successfully brought the count of bytecodes we are matching up to 62. We've done this by implementing several different categories of operations: long operations, double-precision floating-point operations and a few type conversion operations. All three of these categories are breakthroughs for the current project as a previously submitted paper was criticized for not having a broad range of bytecodes represented.
Another major criticism for last year's submission was the fact that it was too preliminary -- they didn't actually have any results to actually implement. Just the fact that we can take the instruction selection patterns and generate the entire instruction selector for JikesRVM is a huge step forward for the project. I feel that this time, the group is much more hopeful about their chances of publishing a paper in ASPLOS.
In other news, an REU student had their birthday and we all went out to rollerskate at some roller rink. Normally, this would sound pretty lame but I haven't been to a roller rink in maybe eleven years so I was pretty excited to go. We also had an REU foosball tournament "sponsored" by Yahoo! (the prizes were mostly Yahoo! swag and the table itself was a gift from Yahoo!). I came in last place and I'm pretty proud of that fact.
July 27th - Week 9
Our graduate mentor, Tim, went on a vacation to Maine for the week, so we were left mostly to ourselves. At this moment, we're up to generating 85 bytecodes. However, I think we have reached the limit of what we can currently express using CISL with regard to the JVM specification. The reasoning for this is many of the remaining bytecodes rely on information from the runtime virtual machine, like type checking where specific objects are stored in memory, which is not necessarily static.
Despite the previous limitations, I feel that we've made tremendous progress this summer and the work that I've done will contribute to a paper that will hopefully (hopefully being a key word here) get published in a major architecture conference. Chujiao and I have spent most of the week plugging our version of the baseline compiler into a live version of JikesRVM and performing runtime benchmarks. The point of the benchmarks is to prove that building an instruction selector using our technique does not at all hinder the performance of the compiler. Preliminary results indicate that this is the case.
We also put a lot of time and effort this week creating a poster that is to be presented at a poster session next week -- our final week. There's not much else to say on that front other than it's a little frustrating to create a poster or a paper while we're still receiving results.
One of the major things that happened this week is that most of the REU program took a trip to Boston to visit Microsoft, IBM and Google. I personally really enjoyed the trip and I'm sure that everyone that went had a good time. It was really exciting to see what kind of research that these companies were producing. It was also equally exciting to eat free lunch at Google.
August 3rd - Week 10
Well, this is our final week here in Amherst. The first half of the week was dedicated to completing the poster and then actually presenting it. You can look at the final version of our poster on the right. The poster session itself wasn't as big as I thought it would be, but it was very exciting to see what everyone else has been up to this summer. There is something about making and presenting a poster that legitimizes people's work and says "I did some real research."
After the poster session, it felt like the summer was basically over. Chujiao and I felt like we've pushed the JVM specification as far as we could get it and we've already had a sufficient amount of results from the runtime benchmarks. For the rest of the week, we tweaked a few things with our descriptions and adapters, but we felt we had finished what we had set out to finish, for the most part.
The REU group tried to spend what little time we had left together doing something fun. For our last night, we all went out to Amherst Brewing Company for our farewell meal. The overall experience was amazing and it was a pretty somber affair when I realized the summer was coming to a close.
I really enjoyed my time in Amherst this summer. I don't think I can properly convey with words just how I feel about the experience, but it was one of the best times of my life.