On Tuesday, I flew in from Puerto Rico and moved into a Columbia housing in Broadway Hall. The room has a good amount of space for a single, and there is a common kitchen, although we had to provide a personal mini-fridge. The location is very convenient-- I'm right across from the Columbia entrance on 114th Street, and also within a two to three street radius of several supermarkets and an Asian convenience store.
I started work on Wednesday. Since I worked in the Columbia Speech Lab last summer, I was really happy to be greeted by familiar faces. I practiced running the deception experiment again to get reacquainted with procedures. I also advertised for the experiment and started scheduling subjects for the following week. I learned how to transcribe speech in order to hand correct the transcriptions produced by workers in Amazon Mechanical Turk.
I assisted with one experiment to get back into the hang of things. Getting participants to fill in the pair types we need is getting harder; at this point we're only accepting male American English and Mandarin speakers. I joined the Columbia University Facebook groups page to advertise our study. We got replies right away (about 10 within the first couple of hours), which was more than we had gotten all week despite flyering the campus. I was able to schedule three experiments for the week. I also completed the manual corrections to the transcriptions for our experiment.
I will be learning R, a statistical programming language, in order to run statistics on the data we've collected from the experiment thus far. I watched an introductory lecture on R to get a feel for syntax, libraries, scripting, and statistical analyses. I then went over the experiment findings so far and started some practice statistical analyses on our data: doing correlational tests involving confidence scores, successful lies, and successful guesses with the up-to-date data. I realized that I would need to write some scripts to process the data before I can re-run the statistical analyses performed on the sample of data from Spring 2014.
I'm finally on the protocol, so I scheduled and ran two experiments. I spent the rest of the time re-running the statistics on the same data having made corrections to two columns that were incorrectly calculated. The only findings that were affected were those involving those two columns, which gives me confidence that I'm getting the correct results in R (in the past the data was analyzed using the statistical program SPSS). I got started on cleaning up our spreadsheets with R, so that they can be analyzed. I'm still in the process of writing more functions to get the data into the right format.
This week we ran three experiments. We are getting closer to our goal of at least 12 pairs of each type: male, female, and male-female pairs of English, Mandarin, English-Mandarin. We're having the hardest time filling in male Mandarin.
I finally got the data into the right format using R. This took longer than I expected because I kept running into unexpected problems, such as missing values being filled in with random markers: "NA", "N/A", "-", "*", "***", etc.! R would have been able to interpret empty cells as missing values, but it instead took those strings of characters as levels, so it was unable to perform simple arithmetic on numeric columns, and it couldn't simple correlations because it required binary factors. I had to learn to handle each problem as it came by looking up more information on R and its libraries. I'm thinking of putting together a little guide to cleaning up spreadsheets with a list of functions and libraries to run for future use in the lab.
I was able to get started on confirming the findings from the Spring 2014 sample with the rest of the data. Once I'm done with that, I will revisit the research questions of the project to come up with new statistical analyses to run, as well as check if previously insignificant results have now reached significance due to the larger amount of data.
I confirmed the findings from the Spring 2014 sample with the rest of the data and detailed the changes that happened. Some of our correlations got stronger, but some disappeared from the overall group. Perhaps they can be found in some of the subsets of the data. I ran so many correlations I decided to work on a script to speed up the process. My script runs all the possible correlations without being redundant or returning results we don't care about and then only prints out the significant and marginally significant results (p value of 0.06 or less).
Thought about the best way to aggregate the data to answer our remaining questions. I reran the correlations on a smaller subset of the data set. Due to the old set's lack of recordings, the smaller data set selected is one with experiments that ran starting summer 2013. I'm not sure which data set is better. The one with all the data in it seems to have a few more findings. Our major finding from before: a correlation between deception detection and successful lies still goes to female with the 200s set. And in that data set also English.
We had a very slow week in terms of running experiments because we were mainly seeking pairs of Mandarin male speakers and a couple Mandarin male-female pairs. However, it turned out that we need to re-run a few different pair types (including English speakers), so we will probably be able to schedule a lot more for the next week. I will also be meeting with one of my mentors Dr. Michelle Levine to discuss the statistics in the experiment.
As I predicted, we were able to recruit more subjects this week. We ran four more pairs this week and already have a few scheduled for the next. I went over the statistics with Dr. Levine, and we concluded that we needed to start balancing out the groups. There is an overwhelming number of female subjects in our data. We will be discussing more about the statistics on Monday.
After discussing more on the statistics, we concluded that we would run a separate female-only study due to our abundance of female subjects, where we only balance for language, and another overall study where we balance for gender and language, as well. I got started on running all of the statistics again and comparing the results that include all the data versus the data starting summer 2013.
Since this is my final full week, I also tried to organize and document all the information so that it would be easy for others to pick up where I left off. This includes trying to modify my correlations script so that the process is almost automatic given the Excel files.
I met with my mentors to discuss the statistical analyses. We decided to proceed with the data starting summer 2013. I also tried reading more on the NEO-FFI, the personality test our study uses because we have significant correlations involving it. On Wednesday, my last day, Grace Ulinski and I made a presentation on our project. We gave an overview on the project and discussed our contributions to it. The presentation can be found here: Presentation