I just finished my first week of research at Carnegie Mellon! Everyone has been very nice and I am about to go out to dinner with some of the LTI students. I started working on an English-Bantu code-switching project, so I started learning how to make clicks. I hadn't realized that there were so many, it turns out they have their own IPA chart. I feel like I'm going to learn a lot this summer!
This week Carolyn had me give a presentation on the research I was doing at Columbia. Public speaking has always been a weakness of mine, so it was really good practice, although a little nerve-wracking. I held up pretty well answering questions, until people started asking me about why we used the statistical methods we used in our analysis. This week I started working on processing Vivian De Klerk's Xhosa-English code-switching corpus.
I found out i'm getting published! A paper my research group back at Columbia submitted
to Interspeech was just accepted.
In other exciting news,Kristine, the other DREU intern arrived at CMU this week, and she is a little bit awesome, and I am very excited to get to work with her. I kept analyzing the code-switching corpus I had been working on last week, and started looking at some new corpora as well. We have weekly code-switching reading groups and this week one of Professor Rose's grad students introduced us to code-switching between English and Zulu in Facebook, which made for a pretty good read. On Friday I went to see X-Men on opening night with LTI!
This week I went to see Brian Murphy talk about computational neuroscience,
where he spoke about how semantic meaning is stored in the brain. There was a lot of neuroscience that I didn't really have the
background to understand, but it was still fascinating from a linguistic perspective!
I also started working with some new data called the Ottoman Corpus. It is a series of eight minute debates between college students where they argue about why the Ottoman Empire fell, using a list of relevant facts as evidence. We are going to be looking at whether the debaters begin to speak more like each other, and whether this is correlated with how much they are learning from each other. I've been doing lots of reading in preparation for this project, and very excited to start playing with the data!
This week I went to my first thesis defense, which was a very interesting experience.
The student defending was Andreas Zollmann, an LTI student who
has been working on synchronous-grammar based approaches to statistical machine translation.
Somehow the idea of doing a thesis doesn't seem quite as daunting as it did a couple years ago.
And I always love getting to listen to people talk about their exciting new research!
If there is one thing that I was (pleasantly!) surprised by, it is how much reading there is. I am building up quite the bibliography this summer. It reminds me of that Borges quotation, "I have always imagined that Paradise will be a kind of library." Also, Prof Rose also held a party at her house for the interns! It was really fun to get to meet her family, and eat some very delicious food while hanging out with her grad students.
Most of our research group was out this week at ACL, the Association for Computational Linguistics conference, which is the biggest conference in the field. So I took the time to read some of the papers. Also, in other exciting news, President Obama came to visit a Carnegie Mellon robotics lab this week! I watched Obama's speech with kaitlyn, another LTI intern. We tried to go take some food from the presidential buffet, but it was all gone by the time the secret service cleared out.
For the code-switching project we started looking at a new corpus, which is the transcripts
of the Truth and Reconciliation Commision, which is the group
that was tasked with discovering and documenting the wrongdoing of the government during
the civil unrest that came with the downfall of Apartheid.
Although they are fascinating, reading too many of those can get a little deppressing, and
it is very nice to have two projects to go back and forth between.
Tonight I am heading out Contra Dancing with some people from LTI. Rumor has it that there might be a bubble sort dance!
This week I met with Carolyn to talk about my plan for applying grad schools, and she was very helpful!
I have whittled down the list of potential schools I was looking at, and she gave me some ideas
of schools that have outstanding NLP that I hadn't known about. I am very excited about my first draft of a list,
and Carnegie Mellon is definitely near the top!
I also finished up my work on Topic Segmentation models of various code switching corpora. I have been doing topic modeling so that we can figure out what the content of the corpora is when code-switching is being used. There are definite themes that seem to inspire speaker to switching back to their native language, such as health, money, and family. If we know when people are likely to switch languages in innocent contexts, it can be used to figure out when people are using it to conceal information in front of other people.
For the Social Accomodation project we need to measure how much the participants of the debates come to speak like each other. The usual method is to look at speech features such as volume, pitch range, speaking rate, and other acoustic and prosodic features. However we have been looking into using changes in a speaker's vowel space, and if the vowels of the two speakers begin to sound more like each other, then the vowel spaces are converging. Although it sounds very elegant in theory, and all the literature supports the idea, it turns out to be a bit messier in practice. Even with clear audio files it can still be hard to determine whether tiny changes in speech are meaningful. The team will keep working on it after I am gone, but right now it is looking a little discouraging for this new vowel space convergance method.
I am at my last day of work at Carnegie Mellon, it's very sad. This week I started reading Penelope Eckert's Style and Sociolinguistic Variation, and I think I am going to start reading a lot more about variationist linguisitcs. I'm all ready to drive back to New York later tonight, and am very excited to have a few weeks off before school starts. I had a final meeting with Carolyn today, and we talked about what I will be doing my senior year, and I think I have decided to do a thesis. I've really enjoyed being able to focus on research this summer, and I think I would miss it too much if I didn't get to have a project of my very own next year. I am so grateful to have had this opportunity to really envelop myself in research, thanks so much to the DREU program!