PHILLY:New In The City

08/08/2012: More Analysis and Negatives

                More of the Tamil mark-up is done. I have the mark-up on the next category. This category is less garbled than the non-literal noun phrase category. However this makes a lot of sense because this category is the non-literal verb phrases where the words in the phrase relate to the non-literal meaning of the word. For Tamil 14 out of 26 (or 51%) were mistranslated. Unfortunately Annie didn't have time to mark up the other errors in the translations, unless she thought it was supposed to be non-literal language. If we had more time we would go through and make sure that each translation mark-up contained the same tags and the they were looking for the same specific things. This is just a basic pilot study but if we wanted to continue this research that is what we would do.

                We are still discussing how to change the questions and the scale. We spent a whole meeting arguing over the changes. We all have our opinions but we spent most of the time discussing how to reword the questions. We are trying to figure out how to make the questions are as clear as possible so our responses are better and it takes less time for a person to do a HIT. We also found that almost all of the questions focuses on positive aspects of writing and not the negative. We wanted to change the questions around so that some of them were focusing on negative aspects because that way the person taking the HIT wouldn't be able logically put all highs or put all lows. It would be a way to help weed responses that are random and not.

                We thought we would try to see what would happen if we sent out an all negative batch so we sent out another batch of 200 leads with all questions switched from being positive to being negative. We were having trouble finding any good correlations between questions. We thought maybe we would find that some of the qualities that matched together to make a good lead or a bad lead but the answers are too spaced out. Ethan ran a bunch of correlation and t-tests to try and find some using both the actual responding and the combining of the bottom and high numbers.

                We tried also trying to see if there was any difference in the genres because as we noticed the writing structure and quality varied from genre to genre. However this didn't help either. We were pretty disappointed and started to think that Mechanical Turk was a failure. We then began to wonder how accurate other Mechanical Turk studies were. I know I've read some paper that had positive results from Mechanical Turk and now I'm beginning to wonder if they just know better how to use Mechanical Turk. It's possible as newbies that our study was just not working well with the system. Annie had used Mechanical Turk for her similar research project and that seemed to be more successful.  

                Also during this time Robin as been working on different features to run though libsvm. Some things she looked at was difficulty of translation and emotion words. There were many other features. She spends most of her time running different sets of features to see we can find some features between interesting and boring leads.