6/29/2012: Further Annotations I continued refining my annotations on non-literal language. After our discussion on Friday I felt like I had a better idea of which phrases were in which category. She asked me to look at another 10 leads for each category. I picked 10 more at random and annotated them with the categories I came up with. I did up refining and moving the categories around one last time based on the advice from Ani and my partners during the meeting. I had forgotten the name for one of the categories they had explained so I had labeled it the TR category. However, this category was meant to be phrasal verbs. I refreshed myself on the definition of a phrasal verb and then moved around the phrases accordingly. Surprisingly I had done a very good job at distinguishing them already. These leads turned the results away from what they originally had been. Originally the politics category looked like it would end up being the least likely to have non-literal language because they had a significantly greater leads that didn’t contain any non-literal language. When I annotated 10 more leads I found that the politics category evened out with the rest of the categories on this aspect. It seems more likely that amount of non-literal language doesn’t differ too much from category to category.
Moving on with this side project Ani wanted me to look into searching for these phrases throughout the entire corpus. She was interested in seeing how the phrases were different from sentence to sentence and also if there was a way to search for these non-literal phrases, including phrases with replaceable words inside. For instance, one of the phrases I found was “pull all Israeli settlers out” where “all Israeli settlers” is what will be replaced by other words in different sentences. I need to play around with trying to find ways to write a script that can search for that.
On another note, it seems we have enough data now to begin working on designing a survey or hit for Mechanical Turk. To start this process Ani asked us to think of questions that we wanted answered about the leads.
Additionally this week we are playing around with machine learning for educational purposes. We were given task of finding a feature or features that would help a computer classify an article into its correct category. Because this was a sort of eye opening experience for me I think I will give it its own entry.