Week 10

This is it: the Last Week. Scott was absolutely right-- ten weeks zipped by like racecars. It worries me a little, actually. Where did they go? The next time I blink, am I going to open my eyes and discover that I'm forty-five?

Well, worrying aside, I've got a lot to finish up before Friday. The main thing is writing my Final Report, a technical paper on what I've been doing this summer. I've already got an outline, and today I wrote the first draft of the abstract. This is my first introduction to the moribund world of technical writing, so I've been doing a lot of research on the topic, discovering fascinating little tidbits-- for example, every abstract ever written apparently begins with the magical words "This paper". Perhaps this is a trend I would be better off bucking.

I've also been doing a big-O analysis on the Two Heap algorithm, since it seems like it's the one we're actually going to be using. If I've worked through it correctly, it has a total runtime of O(n2rt + n3tlogn), where n is the number of processes, r is the average roll-in/roll-out time, and t is the average runtime of a process. I'm not sure if this is in the ballpark of what we were looking for or not-- I'll ask about it at the meeting tomorrow.

I'm now using exponential averaging instead of a sliding window to create my share plots. I like exponential averaging a lot-- after coding it, I really grok it, which makes me feel all math-savvy. And the exponentially averaged share plots definitely give us a better picture of instant-by-instant fairness than the sliding window ones. I'll have the new plots up on the project page soon.

One thing I noticed when I was turning out the latest round of plots is that the Two Heap scheduler is consistently getting smaller runtimes than the Simple Linear scheduler-- which ought to be impossible, since the SL scheduler supposedly has 100% CPU utilization throughout. I think the problem is that near the end of the run, some processes have already finished, but to keep exact fairness the SL scheduler keeps giving them quanta when it's their turn anyway, thus wasting some time that the TH scheduler doesn't waste. I'm going to start counting that time as idle time and see if the numbers add up.

Two days left, and my Final Report is coming right along. I've never done any technical writing before, and I have this strong urge for it to be absolutely perfect-- it's my technical writing debut!-- but obviously it's not. It's a fascinating kind of writing, though: so fully thought-out, so logically ordered. (When it's done right.) While researching to figure out just what a technical paper was, exactly, I came upon some advice that I've tried to keep in mind while I write. The two rules of effective writing: 1. Have something to say. 2. If by chance you find yourself with two things to say, restrain yourself. Say one, and then the other, not both at once. Easier said than done.

So over the course of the summer I've picked up a few new things for my toolbox. Before this project, I'd never used emacs or LaTeX before; now I'm comfortable with both. And I'd written bash scripts and used gnuplot a few times, but now I'm far more proficient with them. I think this is also the first time I've done a big Java project with layers of inheritance (apart from a very simple example for class). Yay learning.

Here we are at the last day of work, and my final report is officially done. Go admire it. It only encompasses my part of the project, the simulation of the algorithms-- Xin's kernel implementation isn't touched on at all-- but it's a fairly comprehensive look at what I've been spending my time on over all these weeks. And it's in LaTeX, with lovely equations and things.

It's been a fun summer-- more fun than I had any right to expect work to be. I know (I think) a lot more now about what it's like to be a computer science grad student; and I'm sure, now, that it's something I want to do. (Unless I change my mind.) Thanks to Eliot, Scott, Emery, and Xin, and I hope the rest of the RIRO project goes wonderfully. I'd like to stay involved, if that's possible... well, I'll be back in the area in another month, so we'll see how things go. So long!

Week 9

Two Heap Scheduler: finished! Hooray! I'll get the algorithm up on the project page later this week. I haven't pored over the graphs sufficiently to give a final verdict, but at first glance it's pretty exciting: this, unlike the In Order scheduler, is truly non-schedule-based: processes just run willy-nilly, in whatever order they best fit. Because of that, the shares tend to get pretty wild; but it's going to be impossible to tell just how wild, and whether they conform sufficiently to the goal shares, until I get my new, small-window based plots up and running.

So-- I've finished the window-based share plots, and am ready to offer some tentative thoughts on the Two Heap scheduler. Thought 1: if there are any processes big enough to need the whole memory to themselves, it breaks down totally, because while it's rolling Giant Process in, there's not room for anything to be active and running; and as soon as it gets it in, it has to roll it out again, since there's nothing else in the active heap. So Giant Process never gets to run at all, and the algorithm spends eternity shuttling it into and out of the active heap. But! The old schedule-based algorithms require at least two processes to fit at once, too, so I think that's allowable.

Thought 2: it really doesn't conform to the shares very well. I mean, yeah, it gets there-- and it gets there a lot better than the non-cyclic SE algorithm, what with all its idle time-- but it's slow, and unpredictable, and it really doesn't keep very close to the goal shares. But, genius that I am, I think I know why this is, and how to fix (or at least improve) it. The problem is that everything in the active heap is getting quanta at the same rate-- the processes that should be getting 80%, and the processes that should be getting 2%. And even though the heap-ordered rolling in and out is supposed to counter that, it's not doing enough. But if the processes in the active heap got quanta according to their share, I think the fairness would be much improved. I'll try that next, after today's meeting.

All right, several improvements made to the Two Heap Scheduler. First, now the quanta get passed out among the active processes proportionately to their shares, not evenly. And second (this one suggested by Xin), now before an active process swaps places with a passive process, the program checks to see if the active process has a higher excess of quanta than the passive process. If not, the swap doesn't take place-- the active processes just keep receiving quanta until the excesses are reversed, and THEN the two processes swap. The overall effect of these changes was to reduce the processes' total runtime by about a quarter, and to improve fairness a little-- but not a great deal. Ah well, the runtime improvement is nice.

Week 8

Oh man, how did it get to be week 8 already? How am I going to finish everything? I have no time left! None! Not to mention this giant academic paper I'm apparently supposed to be writing! Gah! Gah!

Okay. It's okay. I've actually gotten a lot done so far (or so I tell myself)-- I have pages and pages of Java code (that actually, you know, does something), I have lots of pretty plots that tell me things about our different algorithms... it's ooookay. Deep breath.

I've finished the In Order Scheduler and updated my script to make plots of the data it generates. I've also written an addition to my script which generates a new kind of plot: integral plots! (Hooray!) They give you a better idea of the fairness of a schedule, by showing you how far from its goal a process is at any given time. The plot is also labeled with the area under all the curves (a kind of total fairness for the schedule) and a total runtime for the schedule. I'm making a chart of all fairnesses and runtimes, using all three schedulers, and I'll present it at the meeting today along with some of the new plots. Observations: IO and SE tend to have the best total fairness, while SL tends towards the shortest runtimes-- but depending on the process set, any of the three might be the best at any given time. I'm glad to see that the heuristic scheduler is not lagging too far behind the pre-computed schedulers; that means maybe we'll be able to use it.

All right, after the Tuesday meeting I've got a lot to work on! Primarily, I need to write a simulator for a less-stupid heuristic algorithm; because yes, IO does all right, but it tends to rack up big idle times for the CPU, which for some process sets is really a problem. The new algorithm is the Two Heap algorithm, which is a spin-off of an idea Scott and I talked about way back at the end of the school year. More on it later.

The second big thing I need to do is spruce up the plots in various ways. They need to list total idle time; they shouldn't be absolute-valued, they all should be to the same scale, etc. And the biggest change: rather than judging each process's share over the entire length of the run, shares should be judged over a smaller window of time, so we can see how the fairness has been recently. This'll take some doing.

Week 7

Happy Fourth of July! Sawa and I went to all the Amherst festivities-- parade, carnival, band concert, and fireworks. "Ah, small-town ambience," said my dad on the phone; well, if that means it lacks the soullessness of my home suburb, I guess so. I actually loved growing up in the suburbs-- the forest in my backyard, the red foxes chasing each other down the street, the quiet night-- but Amherst (and the assorted other Western Mass towns) have much more of a sense of place than home, if that makes sense. Amherst is a town, a community; Burke is a mailing address.

But anyway. Today I tested out various numbers of processes on my simulator to see just how much it could handle. Looks pretty durn exponential: three processes take 2 seconds, four take 5 seconds, five take 9 seconds, six take 8 minutes, and seven-- well, the simulator's been running for 6.5 hours now, and no sign of a finish yet. I thought about looking for ways of speeding up the algorithm, but it's probably, as Emery put it, a fool's errand. This is just the optimal schedule anyway.

So my next task: time for a new simulator! This one's algorithm won't be looking for an optimal schedule, or coming up with a schedule at all-- it'll be a heuristic, assigning processes run time on the fly, trying (but not too hard) to balance things out. Maybe a simple greedy algorithm. Coming up with the algorithm and programming the simulator for it should be a pleasant occupation over the next few days (or weeks, who knows...) and then it should be simple enough to plug it into my plotting script and see how it measures up to the optimal ones. Of course anticipating it like this means it'll be jinxed and end up being an agonizing month-long nightmare. The irony gods are cruel.

All right, I've decided on two heuristic algorithms: In Order Scheduler and Two Heap Scheduler. In Order Scheduler is the second stupidest algorithm I could think of. It just runs the processes in order, and if there's time during the run, it rolls in the next and rolls out the last; or if there's not time, it just suspends the CPU until there is. I chose it because I wanted something simple and quick to write, so we could see just how bad it is compared to the optimal schedules.

Two Heap Scheduler is more like the algorithm Scott and I talked about at the beginning of the summer: you have an active heap of processes and a sleeping heap of processes, ordered by how much surplus or deficit of CPU quanta they've gotten (at the top of the active heap is a process that's received way more than its fair share; at the top of the sleeping heap is a process that's received way less). And you constantly swap processes between the heaps, rolling out the top active process and rolling in the top sleeping process. It gets a little complicated because the processes are different sizes, so you might have to roll out more than one active P to fit the top sleeping P, and then maybe there's room for another sleeping P, etc.; but it's not that bad overall.

So I just finished writing the In Order Scheduler simulator, and I'm getting graphs from it now. They don't look too bad-- definitely a longer runtime than the optimal schedulers, but they reach a point of fairness a lot more quickly, since they're letting each process run just for its share before switching to the next one. One thing worries me, though: while looking over my graphs, I noticed that the processes don't always reach fairness right at the end of each schedule. Does that mean my graphing script is messed up, or my java program, or what?

Ooh. Ooh. Idea. In the Start-End Scheduler, sometimes "overtime" is added-- the CPU is sometimes allowed to be idle while the processes run in and out. That would mess with the schedule length, and mean that the processes don't always reach fairness right at what the script thinks is the schedule's end. So I'll check over the graphs, and if it's only the SE ones that are having this problem and not the SL ones, then I'll be pretty certain that I'm right.

Week 6

I'm constantly refreshing my memory with regards to shell scripting and gnuplot-- or perhaps not refreshing, since a lot of the things I'm doing now I've never done before (just freshing, perhaps?) It's impressive how powerful the script has become. It plucks each process's goal share from the .proc file using awk, then writes another script on the fly for gnuplot to use; and on and on. I am growing more and more fond of Linux: everything you want to do really can be done simply by making and changing files.

And then sometimes I'm not so fond of Linux. Spent a frustrating day today trying desperately to get my machine to talk to the networked printer so that I could show my plots to Scott, Eliot, and Emery at our meeting... a quest that met in abject failure. Finally I just emailed the plots to them, which probably was not the best thing to do. Sigh. We all did finally get to look at them, though, and they're doing... pretty much what we wanted them to. Some interesting wrinkles-- depending on where it is in the schedule, a process will be consistently over- or under-share-- but the general picture is what it should be.

Big Problem of the day: when I try to run my simulator/plotter on sets of five or more processes, Java runs out of heap space. Whoops. Our limiting factor should definitely be run-time, not memory space. The professors suggested that the problem is that I'm creating an object for every permutation... and when you're permuting ten items (five starts, five ends), that's a lot of objects. Not to mention that each permutation has its own little array of event objects. So, yeah, way too many objects. Time to fix that.

Okay, that was pretty easily repaired. Now, instead of making a list of every possible permutation and then scanning them to find the shortest, my program makes a permutation, checks it against the current shortest time, throws it away, and goes on to the next permutation. And now it's handling sets of five processes with ease. I'll spend some time experimenting to see just how many processes it can handle before we have to start measuring run-time in fortnights.

Week 5

Scott is gone on vacation this week, and I haven't seen Eliot or Emery around either. So Xin and I are merrily working away on our own. I've finished coding the Start-End Scheduler, and am debugging (and debugging... and debugging...) The code is jammed with print statements, places where I try to figure out just what it is that's causing an array-index-out-of-bounds error most of the way through the simulation. And as soon as I figure that out, something else jumps up to make it fail. But I keep finding them and sorting them out, so things are coming along. Mostly I find that the errors are the result of not thinking of all the situations in which a piece of code might run; a method works fine until you call it on the last process of the set, etc.

I've gotten the Start-End Scheduler working. Next goal: make attractive graphs that give us all an idea of what these simulators are actually doing. I'm doing it by writing a shell script that first uses my Java program to generate some data files, then uses gnuplot to plot them. My plots get fancier and fancier with time: first just simple CPU vs. time plots, then share vs. time, then with horizontal dotted lines to show the goal shares, and on and on. I'll post them up on the project site soon, so all can share in their beauty.

Week 4

Xin has arrived! (He's the Amherst student-- well, recent Amherst grad-- who'll be working on the RIRO project with me.) He's very nice, and it's awfully nice to have someone around to talk to and bounce ideas (or wadded-up paper towels) off of. We spent a couple hours in front of a giant whiteboard, armed with markers, turning the Start-End Simulator from a vague idea to a complete and exhaustive algorithm. Scott stopped by to see how it was going, and seemed pleased with our progress. I am, too; and all this algorithm-mongering is making me look forward to taking Algorithms next semester.

The professors want both Xin and I to work on everything, not to divide subjects up too much; but to start with, I'm working on extending my simulator to use the Start-End Scheduler, while Xin looks at actually implementing the roll-in roll-out process using signals. A lot of the framework for the SE scheduler is in the code I've already written; but as I code, I keep finding ambiguities in our algorithm that we then have to decide on. It's fun. (Which, yes, means I am a giant geek.)

Today I attended the DaCapo Conference, in the UMass computer science building. Eliot was in charge of organizing it, and he invited Xin and me to come to the day's talks (with breakfast and lunch, too. Nice.) There were speakers-- mainly professors and people working for Adobe, Microsoft, etc., and also (I think) some grad students-- from a lot of different places, with contingents from Texas and Australia. Again, I noticed just how few women there are in this field-- of the 26 people in the room when I counted, 23 were men (and both of the other women, mysteriously, were from Texas.) Everybody's white or Asian, too. But demographics aside, the conference was... bracing. Some of the talks soared completely over my head; some I could follow almost to the finish; and most fell somewhere in between. The format of the presentations was interesting, too. Everyone used a powerpoint, and whenever someone had a question or an issue with something the speaker said, they called right out, often starting a conversation in the middle of the presentation. It was more confrontational than I was expecting, although I suppose that's useful for strengthening ideas.

Week 3

I finally bit the bullet and spent a day becoming exhaustively familiar with Emacs. Now I feel like a real programmer. And I have to admit that it's a lot more powerful than Nedit. The only-one-window took some getting used to, but once I did, I learned quickly to appreciate it. My newfound familiarity came in handy more quickly than I was expecting: when I tried to reboot my machine, the GUI wouldn't load, leaving me stuck in command line. But with Emacs, I could work on my simulator as easily as ever, and not a mouse-click in sight.

We-- the three professors and I-- have been meeting about once a week to brainstorm RIRO ideas and find out how things have been coming. One unexpected benefit of this internship is that I'm learning a lot about the non-teaching part of being a professor. It seems like a big part of the job is to think of interesting problems, try to solve them, and argue with people about getting money for them (not necessarily in that order). This is of particular interest to me because (shh, don't tell) I aspire to be a CS professor myself some day; so I'm glad to get a look at what the job's actually like. So far I really enjoy it.

I finished the Simple Linear Scheduler simulator (hooray!) and am getting actual data from it. Of course, the data doesn't tell us anything yet, since we have nothing to compare it to; but all in good time. At our last meeting we brainstormed ideas for other scheduling algorithms, and Eliot came up with a schedule based on start and end order that we think should work well. So that's probably what I'll be programming next; but the algorithm is very sketchy so far, so I'll need to flesh it out considerably before I start.

Week 2

A week of moving. I'm moving in to my permanent housing for the summer: Tyler dorm, up on the hill. It's absolutely packed with bugs-- I have found spiders, moths, flies, hornets, ants and beetles in my room so far-- and it has no air conditioning, but I love it anyway, because it has a kitchen! Sawa and I are cooking like mad: stuffed baked eggplant, buttermilk pancakes, sweet potato pie, lemon yogurt muffins, corn bread... mmm.

And, more importantly, I'm moving from the Amherst 007 lab to the UMass ALI lab. This definitely feels more like actual work: I've got my own cubicle in a lab full of graduate students, with all the professors' offices close at hand. It's a nice place, and I'm settling in with pleasure, but I do kind of wonder where all the girls are. I haven't met a single female professor or graduate student. Everyone else in the lab is a guy. Huh. Must be because girls' delicate brains aren't built to handle the complexities of computer science. (Joking. Please don't revoke my funding.)

I just met Eliot Moss and Emery Berger, the other two professors we'll be working with. We had our first official meeting in Emery's office to discuss where we'll be taking the project. Their attitude was enlightening: we started the conversation at a very high level, questioning whether RIRO was worth pursuing at all, what benefits we hope using it will bring, and how we'll find out if those benefits are feasible. I could follow what they were saying pretty much the whole time, but I felt like I didn't contribute much to the conversation. I think that'll pick up as I get more involved with the project, though.

I'm still working on the Simple Linear scheduler simulator. I'm trying to write the code in a very organized way, since it may get quite complicated, and other people will probably have to wade through it at some point. Instead of just plunging in with the Simple Linear algorithm, I sketched out a skeleton from the top down: the Simulator class calls a Scheduler, which is an abstract class with two abstract subclasses: PreCompScheduler (computes a schedule beforehand, based on the processes) and HeuristicScheduler (figures out a schedule on the fly, based on current shares). SimpleLinearScheduler is an implementation of PreCompScheduler, and it calls the Process class to manage its processes. Hopefully this framework will make the simulator easier to expand over the next few weeks.

Week 1

Back to Amherst! It seems like I just left. (This is because I did. I finished the year at Amherst College, went home for a week, and then drove right back up the the University of Massachusetts at Amherst.) It is gloomy and miserable here, not a proper sort of May at all. The fan I bought is useless-- what I need is more sweaters.

I'm starting out in Amherst's 007 lab. Scott (that's Scott Kaplan, one of the three professors I'm working with) and I discussed the RIRO project considerably before I went home, and it's definitely clear enough in my head to begin coding. So that's what I do: code, and code, and code some more, translating the Simple Linear Scheduler algorithm into a simulator that reads a file containing a list of processes, creates a schedule for the processes, and simulates a CPU running them based on the schedule. I'm doing all this in Java, on a Linux machine, using Nedit. (Yes. Yes, I know Nedit is for weenies. I'll learn Emacs sometime soon, I promise.)