PHILLY:New In The City

7/11/2012: Continuing The Search

I have continued my search for the non-literal phrases throughout our whole database. I've made it all the way to my PV category. I've decided to make sure I look for all forms of the phrase. For instance, take the phrase "line up," I need to search for "line up" but also "lined up" and "lining up." The PV category has more phrases that used both literally and non-literally. I looked at the xml files created by the Standford parser to see if there was any difference between the syntax trees when the phrase is used literally versus non-literally. I didn't find anything but that's not surprising from research that has already been done on non-literal phrases. Unfortunately this meant that to distinguish which sentences used the phrases literally or non-literally, I had to manually read through them all and label them myself. 

I also wrote a Python script that traverses the Standford parser xml files and rewrites the articles so that the article text files are one sentence per line. Then I was able to use these files to search for how many sentences contained the non-literal phrases.

My results varied by the phrase. Some of the phrases where used mostly literally like "get it" and others mostly non-literally like "count on." There were also some phrases where it was used literally and non-literally equally, like "go up." All these examples are from the PV category.