Week 4

We went over the results I had received from running the decision trees on the spambase dataset. Dr. Dutta asked me to do the same thing with a probability estimate tree and see how many ties result, and how Laplace smoothing affects the performance.

We reviewed the paper "Learning to Rank: From Pairwise to Listwise Approaches." Dr. Dutta introduced SVM's and assigned an introduction to read.

We went over how to download WEKA's source files and what Dr. Dutta wanted me to do with them: set them up for editing with Eclipse and find the module that implements the Laplace correction.

I ran a PET on the spambase dataset with and without Laplace smoothing, but there are so many instances in the dataset that I was unable to make a comparison. I did it again on the iris dataset, which has far fewer instances, and was able to see that almost all the probabilities are ties: every instance has a probability of 0, 0.3, or 0.971. Laplace smoothing improved this slightly, but there were still many ties.

I successfully downloaded and unzipped WEKA's source files, but I still had trouble editing them in Eclipse. After much fruitless Internet searching, I finally found a wonderful website, which I was afterwards unable to find again. It outlined, step by step, how to set WEKA up as a project in Eclipse. Following this tutorial, I was able to work with WEKA in Eclipse.

Iris probabilities
Iris probabilities with Laplace smoothing


[Previous] [Next]