Luise - DREU Research Intern
This week, I continued working on the scripts to format our data. After writing several scripts, I realized that I might run into problems later on. Basically, we have a bunch of transcripts that needs to be matched up with words in our lexicon, but not all words in the transcript are actual words - instead they might be {laugh}, {distortion}, [door slamming], (()) for unintelligible words and so on. Luckily we found that Kaldi provides some example scripts for a similar data set (namely in Egyptian rather than English as my main dataset is), so I decided to use these as a starting point for my own future scripts. They were written in perl and bash which I were not that familiar with, so reading through them and figuring out what is going on has taught me a lot not just about Kaldi but also about bash and perl. Bonus! Of course, since the example scripts were made for a different data set, I have had to tweak them and write supplementary scripts to make it work for our data - but it has been very helpful to have some sort of guideline. After finishing up my scripts, I started to process our data, but soon realized that I am not yet completely capable of verifying all the output. Is it a problem that LG.fst is not stochastic? Why would some of the utterances not align? Should I be worried if a tree has a pdf-id with no stats? These questions probably only make a bit more sense to me than they do to you, so I realized that it would be nice to have some point of reference against which I could check my results. I therefore decided to process the similar dataset that Kaldi provided scripts for, assuming that the output of these would be good. I just finished, so next week I am ready to move on to the “real” dataset! This week I also moved into my permanent room in Columbia Summer Intern Housing. I got a really big single room in an apartment with three other girls who all intern at different companies in Midtown. They are great, and so are the girls next door, so I think we are going to have a lot of fun this summer! Last weekend I went to explore Williamsburg, a very lively, hip and young area of brooklyn. It was great! And their food market was amazing. I also had the first meetup with the Ultimate Frisbee summer league I’m going to play with while I’m here. It was great to play again - and a really good way to meet a bunch of girls from the area. Unfortunately, it was pouring… I look forward to our next round of games - hopefully with better weather!