Week 3
We spoke on Skype this week. Using the iris dataset as an example, Dr. Dutta explained the .arff file format and how to convert a .csv file to this format. We went over the Explorer GUI of WEKA, Dr. Dutta explained the various features that I would be working with, and recommended that I spend time experimenting with them. She asked me to download the java source files for WEKA and set them up for editing with Eclipse, read about the concepts of precision and recall when testing the efficacy of a decision tree, and to build an .arff file from the spambase dataset and run and compare two decision trees on it, one with 10-fold cross-validation and one with a 70-30 train-test split.
I manually converted the spambase data set from .csv to .arff, though I realized afterwards that it would have been simple to write a program to do it. I then used what Dr. Dutta had shown me to build the decision trees and compare them. I saw that the results were almost equally accurate, with the cross-validation perhaps performing slightly better.
Result of 70-30 train-test split
Result of 10-fold cross validation