PHILLY:New In The City

08/16/2012: The Last Meeting

We have our last meeting tomorrow because I'm leaving the day after. I finally got the rest of the mark-ups for Tamil and Bulgarian. Also I got an e-mail from a student from Ani who she asked to mark up the sentences in Chinese Mandarin. She sent me the mark up on the first category (non-literal noun phrases). I won't have time to do analysis on all of them until she gets them done. This won't be until after I head home but I will get to include it in the final report.

                She also marked up the translations different than the other two even though Ani sent her the tags that she used. The problem is that Chinese is structured different than Bulgarian and most other language. She added two categories because the Chinese structure makes some different errors. Actually Mandarin had the same amount of mistranslations as Tamil but they weren't all the same phrases. Most of them overlapped because in both situations only 3 sentences didn't translated in both languages.  And the last thing I looked at with the unfinished Chinese mark-up was that all three language couldn't translate 9 out of 17 in the non-literal noun phrase category (52%). That is pretty impressive that together they couldn't translate half of the phrases. I mean I guess it's not really cool because things aren't being translated but it helps my cause.

                For the rest of the basic stats for categories. For Bulgarian the familiar non-literal verb phrases 40% didn't translate. This makes sense again because the words in the phrase are supposed to relate to the non-literal meaning. The idiom category, where the words don't relate to the non-literal meaning had 83% that didn't translate. This makes sense because if the words don't help you remember the meaning than it is more likely that it won't translate. The last category is phrasal verbs and this had 47% that didn't translate. Then overall 53% of the of the non-literal phrases didn't translate.

                It was a little different for Tamil. Tamil did a little better at the idiom category, 66% of the phrases weren't translated correctly. This time non-literal noun phrases were more tricky to translate than familiar non-literal verb phrases. However some of the noun phrases were pretty out there and not familiar but Bulgarian translation did better at them. Then for phrasal verbs 64% of the phrases didn't translate correctly. Then overall 65% of the phrases didn't translate, which is more than Bulgarian. We do know that the structure of the Tamil sentences is more different to English than Bulgarian. This may be why overall Tamil had more garbled translations.

                Also we finally finalized a mixture of positive and negative questions for the Mechanical Turk study. We sent out a 4th patch with these altered questions. This time though people didn't respond as quickly. Maybe people got sick of answering questions on our leads or something. Anyway we still are getting results but they are not full yet. However because we are starting to part Ethan took the responses we had from that batch and ran some of the same tests he ran on the other batches and so far the results look a lot better. There is higher agreement and their might actually be correlations between the questions now.

                Then with all the different sets of libsvm test with different features Robin got up to a 87% accuracy of categorizing interesting and boring leads. That is pretty cool. We wanted to get it higher but that is always the case.