June 25 - June 29We spent a lot of this week collecting data, trying to figure out why the computer prices perceptron is not working. Amy is just as perplexed as I am. With Aysun's code, we discovered the days on which the linear program is infeasible (which we dealt with by adding "noise" to each inequality and minimizing the noise in the objective function), and got the idea to look at whether the perceptron is performing differently on infeasible days vs. feasible days. I discovered that the perceptron is actually doing better on infeasible days, perhaps because there are more training data on these days. Nevertheless, the prediction errors are still unreasonably large, and I really don't know why. I modified my component prices perceptron to use the same update rule as the computer prices perceptron. The weights no longer overflowed, but when I tried it with a constant learning rate on another small example I came up with (not Amy's, since that one was for inequalities), they didn't converge. Rather, they oscillated between two sets of values. Is the perceptron convergence theorem some theoretical illusion? I was almost tempted to look into the mathematical magic myself, but I was daunted by the prospect of reinventing the wheel. Eventually, I simply used a decaying learning rate to force convergence, and looked at the perceptron's performance. Surprisingly, its predictions were much better than the computer prices perceptron's predictions. They were at least in the right ballpark, though still not as good as the nearest neighbor algorithm's predictions. So it seems that the perceptron itself is okay, and it must be whatever is different on the customer side (perhaps less training data, or the fact that we have only inequalities rather than equalities?) that is causing the computer prices perceptron to do so poorly. When I showed Amy how much better the perceptron is working on the supplier side, she said that I should implement a baseline predictor that just solves the system of linear equations to find the appropriate weights. I found a linear algebra suite on Koders started to implement this, but got sidetracked by more important tasks after I started testing my code and found a bug. Things are getting a little hectic. With so many different things going on at the same time, I'm finding it hard to concentrate on one task and finish it, and I feel like this is decreasing my efficiency. Hopefully I'll have time to fix the bug next week. After solving the system of linear equations with the linear algebra suite I found, I'm thinking of trying it with CPLEX so I can add "noise" to the unsolvable systems and solve them like the LPs on the customer side, and compare the time/space efficiencies of using the two solvers. To test the hypothesis that the perceptron is doing worse on days with fewer training data, we were going to write a script to run the computer prices predictors on more games, collect and analyze the data, and output the resulting statistics (such as the RMS errors for feasible days, infeasible days, and days on which no orders were received, which seemed to be, understandably, particularly problematic for the perceptron). Since neither Aysun nor I knew much about shell scripting, it turned out to be easier for us to write two Java programs, one to output the right data into a file as the game is played and another to read those data and analyze them. Since I had already been hacking the game log parser that was provided to us in the prediction challenge software, I undertook the first task. Eventually, though, I want to learn more about shell scripting. And LaTeX, which I've been slowly reading about. Unfortunately, I seem to be able to learn these skills quickly only when I have some immediate need. The qualifying round is next Monday, so on Friday, I made a copy of all our files with the packages flattened into the default package. I wrote a simple script to run the client code with our prediction agent from the command-line, and learned about screen, a terminal manager for Linux/UNIX. Amy, Aysun, and I met with Greg, the machine learning expert, to talk about what we've done so far and ask for suggestions. Interestingly, he thought that it is not clear whether the perceptron is doing the right thing in this case and that it would require some "black magic" to justify its behavior, even though I thought the perceptron was very simple! He mentioned a regression method and support vector machines, which I would like to look into. Someone also said at this week's machine learning reading group meeting (during which I felt very lost) that the Kalman filter is a special case of sequential Monte Carlo, which does not assume Gaussian noise, so that may be interesting to learn about as well. |