Things are wrapping up here in Corvallis! Rachael, Lawrence and I are working on a final write-up of the work we've done this summer. We would like the write-up to be useful to undergrads who might work on this project in the future, so they'll understand what we've done so far and know how to use the code. At the same time, we are assuming some prior knowledge. The main ideas we want to cover are the source of the data sets, the work we've done on denoising, and recommendations for using Random Forest to segment the data. While I feel that we've accomplished a lot this summer, I wish certain things felt more finalized. I am learning that there is a lot of uncertainty in Machine Learning. Even after comparing a large number of different metrics, it can be very difficult to know for sure what the best approach to classification is.
Rachael and I are also preparing a short abstract about our work this summer to submit to DREU, so we can be considered to present at the Grace Hopper Conference poster fair. Rachael went to Grace Hopper and presented her work last year, and several girls from my school are planning on going this year.
It's weird to be leaving Corvallis so soon! I've become very close with Rachael, Lawrence and my housemates. Overall DREU has been an amazing experience and I highly recommend it!
At the meeting today, we presented some specific examples of the segmentation boxes Random Forest was generating. We showed a selection of individual spectrograms that we considered to be representative of the overall dataset, with generated boxes drawn on. For the most part, the boxes seemed quite reasonable. We did see some very small boxes that didn't appear to contain much birdsong, so we decided to try increasing the minimum box size.
We also talked about the way our audio data is normalized. When we build spectrograms from wave files, the volume is normalized such that the pixel with the greatest value is always set to 1. All the other values are increased or decreased accordingly. This means that even in a very quiet sample, the greatest value in the spectrogram will still be equal to 1. We are concerned that this is leading to amplified noise in spectrograms with very little birdsong. We discussed ways of eliminating this type of normalization from our spectrogram conversion program. One challenge is that the recording boxes in H. J. Andrews are recording at different volumes, and we would like to keep some normalization between the recording boxes.
On the social side of things, Rachael and Lawrence and I went to see Inception. We all thought it was pretty good, but I wanted it to be weirder. I've also been throwing pottery at the Oregon State craft center. For about 50 dollars, I've been able to use the wheels, kilns and glazes there all summer. I'm glad I had this opportunity since I can't throw at Pomona. I've made a bunch of stuff, and now I'm working on getting it all glazed and fired so it will be ready to go when I leave.
We are trying a third approach to denoising the audio clips to get them ready for the website. This approach involves identifying islands of birdsong in the time-frequency domain (either by the hysteresis or Random Forest method), and blacking out (i.e. silencing) all the other areas on the spectrogram. We then convert the denoised spectrogram back into the time domain. This has variable results. In some cases, it sounds great. In others, it introduces a weird robotic-sounding artifact. We are considering refining this procedure by blurring the edges of the region we black out from the spectrogram.
Testing our various denoising algorithms has been fairly unscientific so far. We have mostly been listening to the effect each algorithm has on a selection of clips that we believe to be representative of the overall data. In order to make a final decision, we would like to present samples of denoised clips to users of the website.
We presented the ROC curves with varying parameters for Random Forest to Forest and Professor Fern. In many cases, these graphs helped us select which parameters we would like to use. We saw that we can reduce the number of decision trees from 100 to 25 and improve speed significantly without hurting performance. A neighborhood size of 6 and a very large number of training samples provided the best trade-off between speed and acuracy. Previously, Random Forest was designed to pull a certain number of training samples (points in the spectrogram with accompanying feature vector) from each spectrogram. However, we would like the total number of training examples to be insensitive to the number of spectrograms in the training set, so I am changing the code to pull a certain total number of samples. At the meeting, Forest and Professor Fern said that they would like to see more specific examples of the way Random Forest is working on individual spectrograms, so I will also be preparing that data this week.
At our Thursday meeting, Forest gave us some suggestions for how we might improve the runtime and accuracy of the Random Forest classifier. Some approaches we're experimenting with are: decreasing the number of decision trees to 25, increasing the number of training samples, and increasing the neighborhood size used to compute the feature vector. I am now generating ROC curves that reflect the performance of Random Forest when each of these parameters is varied. We are also designing an experiment to test our denoising algorithms with actual birders to see if they find the cleaned-up files easier to listen to.
I am very happy with my decision to find housemates in Corvallis using Craigslist. Before coming to Corvallis, I posted a profile about myself and the type of living situation I was looking for on CraigsList. Since Corvallis is a college town and lots of students leave rooms behind in the summer, I got a LOT of responses. I decided to live with a group of students my age who are all members of the OSU varsity rowing team. In some ways they're very different from me, but we are getting along well. It's nice to have people around - I think I would get lonely in the dorms. They've also taken me out to bars and introduced me to lots of people. While there's a good amount of luck involved when it comes to finding housemates, I would definitely recommend this approach to future DREU students.
We are continuing work on both the denoising and segmentation algorithms. The denoising is finally starting to sound good. We are comparing two different methods. In one, the wave file is converted to the frequency domain, filtered, and then converted back to the time domain. In the other, a convolution kernel is computed in the frequency domain, and then applied in the time domain. The time domain convolution seems to be working better. We've generated some spectrograms of the audio clips before and after filtering to demonstrate our results.
We have also completed an ROC curve describing the accuracy of Random Forest for segmenting audio. A true positive rate is shown on the vertical axis, and a false positive rate is shown on the horizontal axis. By varying the threshold at which we label a point birdsong or not birdsong, we are able to track changes in precision and recall. Random forest seems to be performing much better than hysteresis in identifying birdsong (yeilding a better trade-off between precision and recall).
We decided to test our segmentation algorithms (which identify bits of audio containing birdsong) more rigorously. Previously, we tested bounding boxes by finding an analog to each box in the test data, and measuring how much the two boxes overlap. This method was useful for determining how precise our boxes were, but it didn't tell us much about what they were missing. Instead, we are planning to compare the total overlap between all the boxes we generate and all the boxes in the test set. We are testing this with boxes generated using hysteresis and random forest, for a variety of different thresholds.
Our work is very self-directed right now. I meet about once a week with my mentor (Dr. Fern), the grad student who is working on this project (Forest), and a couple other professors. The other two undergrads and I present what we've been working on, and get some instructions for the next week. Other than that we are able to do work wherever we want. Sometimes we work work in the engineering building. Other times it's nice to be able to meet with the other undergrads and work in coffee shops instead of being in an office all the time. The other two students are my project are great - since Rachael and I are both from out of town, we have been exploring Corvallis a lot, and Lawrence often comes with us. There's a really great farmer's market twice a week, and we ran into Forest there on Wednesday.
We are still working on denoising the wave files to get them ready for the website. To make sure our code is working, we have been comparing it step-by-step to the output from Matlab. This is frustrating at times because Matlab has a lot of built-in functions that C++ lacks. We have been working on adapting an inverse fourier transformation written in C++ to work with real (instead of real and imaginary) input. Another student has also joined our group - a DREU student from UNC Charlotte named Rachael.
We've switched modes for a little while, and are now working on getting clips ready for a website that would allow birders to tag recordings with species information. The website will help us get more training data. In order to do this, we need to clean up some very noisy audio clips. We have been using a wiener filter to remove noise from spectrograms, but have not applied it directly to audio files. Instead, we can only apply it to spectrograms (which store the audio information in the time-frequency domain). We're working on converting denoised spectrograms back into wave files so we can use them on the website. This is complicated by the fact that the fourrier transformation that creates spectrograms from wave files produces both real and imaginary coefficients for each data point. To make a spectrogram, we take the magnitude of the real and imaginary coefficients combined. We are experimenting with different input for the inverse fourrier transform.
We're working on segmenting the recordings we have from H. J. Andrews into chunks that contain birdsong, and chunks that don't. That means reading a lot of research about audio segmentation. It's hard to find relevant research because this type of segmentation is unusual. Most of the research I am finding involves separting broad categories of sound: speech, music, and environmental noise, for examples. I am hoping to look at some research on segmenting speech into words or syllables, since that might be comparable to segmenting birdsong into syllables (little bits of song).
In some ways, segmenting the birdsong is a computer vision problem. We convert short clips of audio into a visual representation of that audio called a spectrogram. By performing a fourrier transformation on the wave form, we are able to represent time on the horizontal axis, and the relative intensity of each component frequency on the vertical axis. Spectrograms are used by (human) bird watchers to describe the calls of different species. Some bird watchers can even "read" a spectrogram and sing out the call it represents.
We want to find the regions of spectrograms that represent bridsong. We're comparing two algorithms for indentifying birdsong in spectrograms of the recordings: a machine learning algorithm called Random Forest, and a simpler depth first search algorithm similar to the paint bucket in Microsoft Paint called hysteresis. For training and test data, we're using some spectrograms a former student labeled last summer. We're testing a variety of parameters for each, and examining the degree of overlap between the bounding boxes we produce and the boxes in the test data.
Hi from Corvallis! I just got here a few days ago, and I've been settling in, learning about the project, and exploring Corvallis. I'm staying with some OSU rowers I met on CraigsList. It's a strange experience living in a house full of strangers, but it's nice to have company in a new town.
Yesterday I met Lawrence, an Oregon State undergrad I'll be working with this summer. He walked me through some of the code we'll be working with. It's a little intimidating coming into a project with so much already written. Mostly we have a lot of utilities for converting between audio recordings and clean spectrograms describing short clips of the audio.
This morning I went to a meeting with Lawrence, Professor Fern (my DREU mentor) and Forest Briggs, a graduate student working on this project. They told me more about work that's been done on the project so far, and we discussed tasks Lawrence and I can work on. Here is what I know about the project so far: