Week 6

This week, I managed to find a working SVD/LSA gem and use it on the data, but the results were no better than anything else I have tried so far. I kept reading and found that a related technique, PLSA, is often more successful than LSA, and I began trying to implement that. However, this technique uses a decomposition method other than SVD, and I have yet to find a gem that can handle it. I worked on this for a while, but I changed directions part of the way through the week when Professor Colunga gave me an additional set of data for all the nouns. This set contains averaged ratings from adults about where each noun falls on a scale from light to heavy, or from small to large, or how much it has of a certain color. There are almost 100 attributes. This replaces the need to extract latent concepts. I tried out a couple of implementations with this set, and the results were actually much lower than most of the others I've tried (though still higher than average). However, they did appear to be interesting in one way: for example, if a child only knew words for people, like baby, mommy, and daddy, nearly all of the other words for people were at the very top of their list of recommendations. I only got to look in detail at the smaller vocabularies, but the pattern seems significant. However, I will probably move back to PLSA to see if that algorithm does more well.