The Project Journal

Week 4: July 6 - July 12

I mostly focussed on presenting the paper this week. The paper was on a specific algorithm to work with transfer learning in the problem of named entity recognition. The paper is called "Exploiting Feature Hierarchy for Transfer Learning in Named Entity Recognition" and it is by Arnold, Nallapatti and Cohen from the Machine Learning Department at CMU.

"Transfer learning" is the technique where information gained from processing one task is used as training data for a different task (this is similar to what I did last week, with the experiments.) Named entity recognition is the problem of recognising particular kinds of words in text, like proper names, or protein names. Two kinds of transfer learning are dealt with in this paper: domain adaptation (where the task remains the same, for example: identifying proper names. The datasets can be different, like news vs. e-mail.) The second type is multi task learning, where the tasks and the datasets are different - for instance, you could use learning to recognise names in a news corpus to learn how to identify proteins in a biological journal corpus.

The paper cites three earlier techniques, and melds them together to form a new technique, based on parsing trees. They then test their model against the three models by themselves, and show that it is clearly better.

Presenting this paper was a valuable experience for me. I had to understand the paper thoroughly (I got asked a couple of questions that I couldn't answer, and Jiazhong had to explain it for me.) I learned how research is presented and verified, and how you're expected to find criticism. From just observing the research group, I was suddenly a participant!