DREU Blog

Weekly research journal.

Close

Week 1

The beginning of anything is always a strange balance of slow and intensive acclimation: slow, in that one meets and begins to learn about their coworkers and the tasks they will accomplish; intensive, in that one must be instructed in the goings-on of the new tasks, whether it be protocol, tools, or background knowledge. A beginning is overwhelming, while simultaneously being inviting and even relaxing. In this week, I have experienced all of these emotions, and probably more!

Getting to know everyone I work with has been great. They come from all over the country and the world (even in our small group!), and everyone brings different abilities to the table. The other undergrads I work with are very kind and I had a lot of fun sharing in the joys and frustrations of programming with them this past week. We worked together to write a Python script that would parse data returned by surveys on mTurk, which we will be using as a means to collect large amounts of data for relatively cheap. While the survey itself is still being developed, learning to use mTurk has been (and surely will be) an adventure in and of itself. Simply installing it requires maybe three different accounts and the Windows installation has a bug in it that we had to correct (many thanks to Jessy, a PhD student of Ani's, for this one). But we got it to work, and that's what counts!

This project is planned to a side project to turn to when we get frustrated or otherwise need a break from the major project we will be working on, but for this week, our focus has been on this one. But the question remains, what are we surveying? The goal is thus: build a classifier that will be able to determine the generality versus the specificity of a sentence. As a very specific sentence in an article would likely contain important information about the piece in which it is placed, a computer knowing where the specificity lies may be able to more accurately find, classify and summarize a text while ignoring extraneous information. Sounds great! However, we don't quite know what to look for in classifying a sentence into these categories. What makes a sentence more general or specific than another? This is why we are fine tuning our survey in order to find data to properly train a classifier. Our findings this week have mostly been concluding that it is very difficult to write a good survey to adequately measure what information you're going for. It's a beginning, both for the idea (though it started before us) and for us undergraduate researchers as well!

This week mostly saw us writing the script and running some statistics on the data we received, as well as learning a whole lot about NLP, very quickly. Three of us have never taken an NLP class, and so we've had to learn from the bottom what terms, concepts and such we need to know. I still have about five different tabs open on NLP subjects right now! This is what has been overwhelming, I believe. It's a lot to take in! It's exciting, however, like a whole new door has been opened. Learning will be a process, but already I've made a lot of progress.

As for the aforementioned joys and frustrations of programming, I love working with others on a project and getting through it together; It's always nice to have someone to high five when your program runs after all! I think it helped as a bonding experience in that way. Wenli, another undergrad, had never used Python before, and so it was fun teaching her the syntax as we went as well. mTurk has a very strange method of outputting data (it is entirely reliant on the input data from the first participant, for some reason), and figuring out how to parse it was frustrating at worst, but ultimately rewarding. I have an email containing the files with the subject line “WE DID IT!!!!”. I have much less background in statistics, and so I found that portion to be more difficult.

We aren't anywhere near done yet, but it has been a successful week in my book! Hopefully as we begin work on the main project, I will have a firmer grasp on NLP concepts and will be able to dive in with more fervor.

Week 2

Because Monday was Memorial Day, this was a short week, but it certainly wasn't short of things to do. Our last undergrad working on our team arrived this week, Beatriz. We were divided this week into two groups, essentially. While Khyathi and Beatriz (who have more NLP experience) worked on sifting through the many, many medical documents we have downloaded from PubMed, Lily, Wenli, and I finished and presented our analysis on our annotation data and, for the rest of the week, explored the LibSVM classifier and gain an understanding of how it works. We also started looking into feature selection for the specificity classifier, or well, that's what I was particularly interested in researching.

On the analysis of our annotation data, we didn't figure out too much, largely because we have such a small sample size at the moment. After talking with Ani, though, we believe we may have found a way to really get at the aspects of a sentence that lead it to be more ambiguous by discounting coreference problems, otherwise known as unknown antecedents (known from context) in linguistics-speak. I don't have any previous experience with NLP, so it's been really interesting to learn about terms the field has for ideas I know from my language studies and how NLPers go about solving problems raised by those ideas.

I learned some basics about how to use LibSVM after many failed attempts at understanding their provided "beginner's guide". It talks about astroparticle physics! I'm afraid I'm a little bit too much of a beginner for even that. Thankfully, there were some other helpful tutorials online and I managed to train a classifier to distinguish between sentences about metalworking and sentences about woodworking, just as a test. Lily wrote a script that formatted the files properly to take in our sentence data, using the words themselves as features. It's not a very accurate classifier, but it's definitely a beginning! Feature selection for the task seems a lot more difficult than we had thought. I decided to look for helpful information from cognitive grammar studies, and I have a few ideas to look into next week!

Week 3

This week saw us working more independently on a lot of subjects! While Wenli, Lily, and I still focused on the specificity project and Beatriz and Khyathi worked on PubMed, even within these groups we saw more separation, at least for the first part of the week. Those of us working on specificity spent a lot of this week reading up on different subjects and coming up with ways to use features and different data sets to train our classifiers on.

Ani was attending a conference for the first half of the week, and during that time is when we mostly did research and worked separately. We put aside the annotation work for the most part and worked with the classifier. I wrote a script to take in mTurk result files and output a single file that would work for either training or testing a classifier, as specified. It still needs a lot of work (really large amounts of data would render it ridiculously over-complicated), but it was a lot of fun to write! I forgot how much I enjoy programming, and it made for a nice break between reading dense academic articles. Speaking of which, I ran into a bit of a dead end looking for specificity in cognitive language studies, but I'm too interested in the subject to give it up quite yet! I also attempted to mess with CoreNLP, but with little success as of yet!

Later in the week, we worked together again in analyzing our annotation data by looking at how "in context" the ambiguities of the sentences were. If an ambiguity could be cleared up by reading the context of the sentence, then it is likely that the sentence isn't as general as one that has ambiguities not found in the context, for example. We're looking for evidence of that being the case, and whether annotators agree about where information can be found. Finding the former is easier, while the latter is more difficult to measure. In addition to the annotations, we worked on using tf-idf as a feature for the classifier. This was more an exercise in using a common NLP method than a means toward the actual building of the classifier, but it was helpful all the same!

My biggest revelation of the week was simply how much data there is! To train a classifier even remotely accurately, a sample size of 10,000 documents is no biggie. We have been working with a small size of 1,000 pre-labeled articles as we have yet to be given access to the NLP grid that contains A Lot of data, and even still it astounds me how much more accurate the classifier gets! It's really neat how that works, although statistics are not my strong suit. I also had a lot of fun this week talking with everyone about their lives and their cultures! It's sometimes a little quiet and perhaps a little lonely in the lab, but when everyone gets to talking, it's really nice. Lily and I even went to a neat exhibition with some of her friends over the weekend (you can see one of the structures in the video here!). It was beautiful, and walking around the city to get there was lovely as well. There's so much to do in this city!

Week 4

Wow, it doesn't really seem like I've been here that long, but we're almost to the halfway point!

Early in the week, my team and I found ourselves questioning whether we really understood what "specificity" was, vagueness of topic or (purposely) under-specified terms, or perhaps some combination of the two? Together we re-evaluated how we understand it and came up with some potential changes to our annotation task to better reflect our choices. We still have work to do on that, but at least we have a clearer picture of what we want to look for! In addition, we put our project up on Github for easier collaboration and, more importantly, version control. We accidentally lost some code beforehand, and while it was easily fixed, we decided this change would be helpful to us.

As far as what research we actually did, we looked at seeing what connections we could find between identified ambiguous phrases in sentences and how specific the sentences are, according to where in context the ambiguities are clarified (in the immediate context, in some previous context, vaguely discussed in some context, or not found in previous context). We're also hoping to look at the rarity of the ambiguous phrases soon, but first we're working on finding inverse document frequencies from a larger corpus. The one we're going to use is from the New York Times and is over ten million articles! The scale of these things is truly astounding to me, but that's part of what's really new for me with the introduction to NLP!

I spent a lot of the week reorganizing and improving the code we had already written to make it more reusable and intuitive. I really enjoy organizing code; it's almost relaxing and very satisfying, if you ask me. Give me code and let me abstract your concepts for you. I'll have a great time! Besides this task, Ani also gave me an idea of something to do with the PubMed data, which would involve looking for emerging terms by comparing word frequencies in various years to see what topics have risen to popularity. It would take a while to get to that point, as tokenizing those medical terms properly alone would be a difficult task ("breast" and "cancer" vs "breast cancer", for example), but doing some work on it could lead to some cool results! I hope to work on this in the coming week, but first I will have to get the data through Khyathi (who is out unofficial Guide to PubMed Data Acquisition at this point).

I've gotten to know both Lily and Wenli pretty well, and it's always fun to talk to them! I've learned a lot about UPenn from talking to them, and it's such a different world from my itty-bitty college. I can't even imagine having such big class sizes! It's also so busy here, while Berea is almost separated from the outside world. In Philadelphia, you're always right in the middle of society, and it seems like everyone's ambitions are as big as the city itself. Intimidating but inspiring at the same time!

Week 5

We're officially past the midpoint, and we've been working here for a full month as of June 18th. This week I did a lot of reflection on what I've accomplished, and while in some ways it seems like we haven't made much progress on our projects, I believe that we've learned an incredible amount.

The first half of this week saw me performing a seemingly endless struggle trying to parse a corpus from the New York Times consisting of about two million articles from 1987 to 2007. Now that's a nice sample size. I was simply collecting term frequencies in order to create more accurate inverse document frequency values for use with our annotation data, but debugging a program that can run for four hours before a problem comes up was a challenge; every time I thought I had corrected all the errors, more would come up--to which I have one response: unicode is a cruel beast. In the end, however, I was able to collect all the data, which felt like a real triumph to me. Now our TF-IDFs as a feature can be much more accurate, and I can say that I wrote a script to parse two million news articles. Another cool tidbit that came out of this task is that I learned how to run programs in the background through the terminal and also that the word "buttface" was used at least once in a Very Serious newspaper.

We also worked on clarifying our instructions for the mTurk tasks so that the participant scores may better reflect our definition of generality and specificity. To help clarify the issue and run with a hypothesis we have, we are planning to put out a task that asks turkers if a sentence can be interpreted without its context. We hypothesize that the sentences that have more varied specificity ratings (ie, ones that annotators could not agree on) will be those that are more tied to their context. We'll see if data will support it! We haven't put up the task yet, but I've started working on a script to collect the specificity variations for comparison.

Outside of my internship, I did a lot of work on coordinating everything I need for studying abroad next semester. I finally bought the plane tickets (!!!) which is very exciting and very expensive. Well, to me anyway. I also went to a notary to sign a required life insurance policy. I'm not worried about it, but it's certainly something I've never done before! I had hoped to visit the Philadelphia Museum of Art, but I'll save it for another weekend; I still have five more weeks to go, after all! :)

Week 6

This week saw a large revision in the goals of our project--rather than working on creating a general-specific classifier, we are simply aiming to create the annotated corpus to write a corpus study with the eventual goal of creating a classifier using this collected data. It is a bit sad to see that we will be unable to reach the goal we had intended to, but in light of the complications with specificity that we have run into, it is an entirely valid result to have as we enter the second month of our research. Specificity of language is not as simple as we had thought, and it appears that it may be better defined on a phrasal or clausal level than on a sentence level, as has been our method until now. We still have a lot planned, but it's unfortunate to see that the results of our research are actually setbacks! It happens though, and is the nature of research, as I've seen it.

As a group, Wenli, Lily and I worked on figuring out different ways to present the ideas our corpus represents as well as working on creating a task for turkers to complete that asks them to rate sentences as understandable and interpretable outside of context, which I described briefly in last week's entry. It has not been released yet, but we've done more to prepare. Individually, I was working with both the New York Times corpus and the unstructured abstract corpus from PubMed and collecting term frequency and potentially other data from their text. It has been slow going as CoreNLP found my original text files to be too large to annotate. With over 9 million abstracts, the PubMed data would take a long time anyway, but when problems arise, it takes a lot of hours to recover from those mistakes. I'm getting there though! (:

On Thursday I went to the Baltimore Avenue $1 Stroll, where many stores on the street sell items for $1 along with local veggie sales and live music. It was pouring rain, but it was really fun anyway. A personal favorite of mine was a man with a pirate hat and an accordion singing sea shanties. I also tried banana whip for the first time, and it was delicious despite the rain that decided it wanted to be involved in my snack. In light of the marriage equality decision, we and a few other UPenn students went to Big Gay Ice Cream to celebrate which was neat! I had heard of it before, and we certainly don't have such a place in Kentucky (sadly). I teared up in the lab when I found out. I almost wish I had been in Kentucky where the changes would be felt, but regardless I am so, so indescribably happy. It can be related to my work here as well, though I do not wish to diminish the importance of this historic event. In the same way that we still have a long way to go in our research, there is still much to be done for sexuality and gender minority equality; however, we've taken big steps and are headed towards our goals (even if they needed to be altered along the way).

Week 7

Saturday was Independence Day, and it wasn't until earlier this week that I realized I was in the city for the holiday. They signed the Declaration here! This is why we even have the 4th of July (even if the date is a little off). How cool is it that I got to be here to celebrate it in view of Independence Hall? I saw a replica Revolutionary War Encampment, a concert right outside Independence Hall, and a fireworks show above the Philadelphia Museum of Art (it couldn't compare to Thunder over Louisville in my heart, but it was lovely nonetheless). I have a picture of the concert set up on the right; I still find it hilarious that the Declaration and a Founding Father's statue were the backdrop for a concert.

Because of the holiday, we had a short week at work, but it was not unproductive. I took a break from working with the NYT and PubMed data for the most part, as I had gotten too frustrated with errors involving the file sizes to properly solve them. Instead, Wenli, Lily, and I worked on ideas to make our data presentable and running new analyses on our annotated corpus both to find trends that could represent potential features for specificity classification and to verify the annotator agreement on the sentence classification and phrases that the annotator's believe to be under-specified.

The former agreement has increased due to changes in the instructions of the task and seems to be promising according to pairwise annotator correlations. The latter is not quite as correlated, but the vast majority of under-specified phrases are noun phrases, which is interesting as it may point to what information readers consider most essential in sentences, namely, the "who and what" of a situation. We've looked into classifying adjectives into two categories, those that require prior knowledge to compare to (tall, stronger) and those that do not (equal, round); however, based on the frequency of adjectives found to be under-specified (less than 2% of words selected), this information may not be very useful as a feature for classification. It's such an interesting idea, but ultimately we should only focus on what will be relevant and useful, and simply leave such ideas for another day as the data suggests it will have little effect on our results. This is especially true this far into the internship, but it's still a hard lesson to learn!

Week 8

It's incredibly hard to believe that there are only two weeks left! I feel like there is so much more left to do and see before I can leave Philadelphia, but I'll have to make do. This weekend I went to the Franklin Institute, which was very cool. Currently there's an exhibit on Genghis Khan and another with artwork made of Legos by Nathan Sawaya. Both were interesting, as was the rest of the museum. I've included here a picture of the Liberty Bell made of Legos, all patched up! Problem solved. The piece is titled "Fixed". It was a really cute exhibit, and a lot of his more personal pieces were deeper though no less playful in terms of medium.

Work this week was relatively straightforward: we are writing a paper, and so we have been brainstorming ideas and gathering data for discussion in aforementioned paper. It's somewhat intimidating that this paper would be presented at a conference potentially, if all goes well. I would certainly be unable to attend (Ani mentioned that it's next May...in Slovenia), but it's still really cool. Ani seems to think it's possible, but I'm not sure if our corpus is ready. It still feels as if it's too small. It's just a child in my mind...is this how parents feel? Are we sending our little corpus out into the big bad world? Do I need to pack it lunch for its first day of school? I wonder if we've done enough preparation and if our writing will be able to support our work well enough. I haven't written any sort of technical paper since high school (Hey, I was under the impression that I would solely be a Spanish major for a while now!). I have full confidence in my ability to analyze literature, but will I be able to properly analyze research findings? I feel like I've gotten practice, but doing calculations and writing about calculations are two entirely different matters. Fingers crossed that it will go well!

One of the more interesting (and relieving) analyses we did this week was about whether our human annotators agreed better than randomly generated annotators did, given the distribution. Thankfully, they did. To explain, our specificity scale is 0-6, from most specific to most general. We compared our annotator's ratings pairwise and plotted the differences between those scores. Using the distribution of that data, we also generated the same amount of random ratings, and our human annotators had much lower rating differences than the randomly generated annotations did, even though they were informed of the human distribution. This means that our human annotators have a significant agreement in what sentences they find to be specific or not. The graph of this data can be seen above. It only represents one set of randomly generated annotators, but for our final paper we will graph an average of a thousand or so randomly generated sets to better account for variance in the random selections. It will also probably not be purple and green.

Phew, hopefully all that made sense! This is why I'm a but worried about the paper writing; I know what I'm thinking about, but how do I describe it to others? Even so, I have hope that it will go well and function as a nice conclusion to all that I'll have done here in my ten weeks!

Week 9

Whoops, this post is a little later than I intended! A bunch of us went to New York on Saturday, and it was super fun. I had never been before, but wow is it huge! I mean, objectively I knew this, but actually being there is a whole other matter. I'm just a small girl from Kentucky in this big world! We had really great burgers (mine was an elk burger), went to the MoMA (I almost cried when I saw Starry Night, no kidding), and were prevented from crossing the street due to Obama going by (I just wanted to get across Times Square, but it was pretty funny all the same!). We also visited Nintendo World, which in reality was a pretty small thing, but as a child I had always wanted to go to the Pokémon Center, and it finally became a reality. You bet I bought a couple overpriced plushies. That was definitely a thing I did.

This week was the last in which Ani would be here! She ended up accidentally planning to go to a conference in Germany during Lily, Wenli, and I's last week here, but it's okay. We're still going to talk with her by email and potentially Skype too, though we went for a final lunch on Friday too. On the same day, she approved our outline for the paper we're writing (dun dun dun) with some changes, so we'll write a draft of it this week! Work this week mostly included preparing this outline and collecting all the data we would need to make our points. This task is still a work in progress, but we're getting close to being organized. I'm certain we can handle the task!

I've also finally gotten a handle on using CoreNLP without it dying due to memory overhead issues. I also managed to use a Python script to call the command on a great number of documents. I'm impressed with myself, but am still worried that I won't be able to finish this little project while simultaneously working on the paper for specificity. I've only decided to continue it as Khyathi will be here after we leave, and perhaps having IDFs from the unstructured PubMed abstracts will be beneficial to her! Plus, it's something fun to do between all the writing and sentence annotation. :)

Week 10

Hello, this is Bridget coming to you live from Louisville, Kentucky after my very, very long drive from Philadelphia! We spent about twelve hours in transit, but it's worth it--I'm glad to be home. Of course I had lots of fun during these ten weeks and it was a great experience, but it's nice to be surrounded by familiar people and places again!

This last week of the internship was completely consumed by working on the draft for the corpus study we have been writing. We didn't even manage to finish it, but we certainly have a nice framework to work with. I hope to be able to utilize that framework for the paper I need to write for Berea. I actually still have lots to do for this experience, even though my ten weeks are over! It's almost overwhelming me to think about, but hopefully I can get it all done before the deadlines.

As is always the case when writing papers, I found myself needing more information than I originally intended for the sections that were assigned to me. For example, I found myself wanting the frequency of various interrogative pronouns in English, but the corpora I had available to me would not be very conductive for this measurement (mostly because journalistic articles rarely ask direct questions). I was able to find the relevant information for some such questions, but others, such as this one, still elude me. These are the aspects of paper writing that I find most frustrating, but on the other hand, putting all my ideas to paper is somehow satisfying. It's give and take.

Because this week was focused on writing rather than collecting new data, I have little to say on the "new discoveries" front! Instead, I've been doing some reflection on my experiences in Philly. The city itself was so different than where I'm from, even than Louisville which is the biggest city in Kentucky. Philadelphia was huge in comparison, but that made it incredibly fun and easy to explore (SEPTA was very helpful). I also enjoyed the diversity of the city. At work, we were a diverse group of undergrads on the team, with three being international students and everyone having different backgrounds. We were all women, but that just added to the fun, particularly because it was so unusual for computer science! It was great. I was able to learn new things from each of them, and in turn I hope that I was able to teach them something as well. I really appreciate all the help Ani and our team members gave to me! I'll be writing a more extensive reflection soon, but I had a lot of fun and it was very strange to say goodbye after all this time spent with everyone!