Week 7

Before I continued with PLSA, I took some time to look over my algorithms and results from all of my other attempts in order to see if I could find any errors or come up with any ideas for improvement. For the most part, I just wanted to be thorough, but I especially wanted to see if there was anything wrong with algorithms that used the noun attributes. I thought it was strange that they seemed to do exactly the same whether I used all of the attributes or used an entropy calculation to removed attributes that had little information value. I did fix one error which changed the results slightly, but they actually ended up doing more well when I didn't calculate entropy. I'm still a little suspicious of that, but I'm moving on for now. I also looked at individual children's vocabularies so that I could see if any of them varied more than others. I ran each algorithm 10 times and plotted the results on a graph, and it seems that the children who know very few words vary quite a lot in accuracy, and everyone else has very little variation, though I think that this is to be expected and doesn't provide very much insight. At the end of the week, I finally moved back to PLSA. I couldn't find a good gem to use, and when I followed the algorithm and tried to write it myself, it was very costly, so I'm looking into writing it in Python. I think I've found a good package, though it doesn't appear to have much documentation.