June 18 - June 22

To fix the problem of the perceptron's weights changing too much, Amy suggested that we implement an alternative update rule that does not involve the amount of error, only the features. The weights no longer overflowed, but they were not converging either. I suppose this means that the training data are not linearly separable.

I spent this week adding a convergence check to the perceptron, testing it using different sets of features, and debugging what I thought was rather odd behavior. It was a very frustrating week, sprinkling ugly print statements all over the code and staring at the output grow beyond my puny human comprehension, then commenting out some of those print statements and adding other ones. I forced the perceptron to converge (i.e., the weights to stop changing) by using a decaying learning rate, and the error at the end was huge. The perceptron was making terrible predictions, and I was bewildered because I thought it would do better than that.

I eventually gave up and switched to other tasks. Aysun implemented Amy's linear programming approach to the computer price prediction task by using CPLEX, software for solving LP problems. In this linear program, the variables are the same weights as those in the perceptron. The information that we acquire every day about which offers were won provide the set of inequalities. The objective function is to minimize the sum of the weights, which seems rather arbitrary to me. The justification is that the weights were always too large after training the perceptron. This approach worked surprisingly well (that is, compared to the perceptron), but still did not perform as well as the method we started from, Botticelli's price-probability model.

The prediction challenge qualifying round is drawing near (July 2), so I made sure that we could connect to the server with the provided client code and our own prediction code, and start making predictions. I had some difficulties, but I eventually resolved all of them except for one. To make development easier, we split our code into packages (so that component price prediction code is separate from computer price prediction code, for example). Although everything compiles and runs fine locally, a mysterious ClassNotFoundException occurs when I try to connect to the server. The error disappears if I flatten the directory structure and put everything in the default package. This is strange, because all the client/server code is in the same package and Eclipse doesn't complain about any unresolved references. I emailed the developer of the prediction challenge and its support code about this problem, but he didn't know why it occurs either. At least we can get everything to work without packages though.

At last, I implemented a perceptron approach to the component price prediction problem as well, just to see how it would do compared to the current method and perhaps discover bugs in my perceptron algorithm. I really suspect that the perceptron is buggy since it is doing so poorly, but I'm not sure--it works fine with a small example Amy came up with. I started with the original perceptron update rule, which adjusts each weight by a fraction of the product of the overall error and the specific feature, and... yes, the nightmare of "NaN"s and "Infinity"s came back to haunt me! I'm really starting to feel the disheartening disparity between theory and practice, ideology and reality. I was really hoping to use the error, since it is accurate in this case, unlike on the customer side. How can I cross to the other, greener side of this great chasm?

In happier news, I went to the machine learning reading group this Wednesday. I was pleasantly surprised to find other cognitive science people there. I wanted to know who everyone was and introduce myself as well, but that didn't seem to be the custom, so I remained silent. The topic this week was on computational linguistics, which is not my favorite field, but one that has piqued my interest ever since I discovered that a probabilistic version of the Earley parser I learned about in my compilers class is actually used as a model for human language processing (paper). Unfortunately, I didn't understand much of the talk, but it did get me to take a look at unfamiliar concepts such as K-L divergence and the Chinese restaurant process.