Week 4

I finally got to start working with the actual data this week, and I tried out several different approaches. I started by measuring the success rates of simply recommending random words and recommending the words that are the most commonly known to all of the children. Then, I moved on to several k-NN approaches to the problem and compared the results. Everything I tried did better than random, but about the same as recommending the most common words. Professor Colunga pointed out to me that the words are organized into categories and that we might be able to take advantage of this, which gave me an idea. I tried a couple of approaches that involved calculating the probability that each child knows each word based on each of the words the child already knows, and using this information and other attributes of the child to make recommendations. I figured that this approach would automatically pick up on any patterns based on word categories. But so far, the results have been about the same as k-NN. At the end of the week, I started rereading some of the papers I read earlier, and one on collaborative filtering, singular value decomposition, and latent semantic analysis stood out. I had some difficulty understanding the paper before, but it started to make a little more sense now that I have the data to think in terms of. It's going to take more research to fully understand this approach, but I think it will pick up on the many of the same patterns that the probability approach does (and probably many that it doesn't) and put them to better use.