Caitlin McCollister

REU in Bioinformatics: Summer 2010

Weekly Journal

Week Ten: August 1 - August 6

Going Home

Thursday, August 5

You can now check my "research" page to see my final report, including fascinating network visualization results!

What could this possibly be? There's only one way to find out!

It's been a good final week at UNCC. I picked up my driving buddy at the airport on Monday. Yesterday, we packed up my car as much as we could, I checked out of my room, and said goodbye to everyone. You guys were the best!

Okay, just one more stop at Trader Joe's before we set sail for Kansas! If I don't try the saltwater taffy I've been ogling all summer, I might regret it for the rest of my life.

P.S. It was worth it. The watermelon ones are my favorite.

Week Nine: July 25 - July 31


Thursday, July 29

We all did our presentations this morning (and spilling into this afternoon) and I'm pretty sure everyone made it out mostly unscathed. We have lowered the group nervousness level down to GREEN. I was the first one on the schedule, followed by Yingxu, followed by 18 more. While I might not have chosen that slot voluntarily, I'm glad it was assigned to me. It was so much easier to be done with it right away and just pay attention to the other presentations. After all, we do have some interesting projects that have been going on here. :)

Well, the DREU gang and I are here for one more week. I'm glad I have good company while wrapping up this final paper, but sad we only have that much time left. Perhaps one of these days we'll go to Amelie's French Bakery and eat delicious pastries, and coffee and... oh, we'll work a little too, of course.

Winding up, winding down

Monday, July 26

Shannon gave me this paper to read and think about while he was out of town for a conference a while back, and I finally finished it:

Validating module network learning algorithms using simulated data

... and then he heard about this one while attending that very conference and thought it looked like my kind of thing:

Modularity and anti-modularity in networks with arbitrary degree distribution

These will definitely be the last two papers I digest this summer. My eyes have actually threatened to fall out. That's fine though, because now I know my direction for the final paper (I won't spoil the surprise yet!)

On Friday most of the bioinformatics lab group took a field trip to the Discovery Place museum in downtown Charlotte. It's definitely a youth-oriented museum, so Jessica, Amy and Sudeeba brought their children too. I remember going to the Exploratorium in San Francisco and the Tech Museum in San Jose many times when I was young and living in California. Ah, so many memories! This was a perfect way to get out of the lab and approach science with reckless, youthful enthusiasm.


On Thursday, all the computer science REU students at UNCC are giving oral presentations of their work this summer. There are five of us here through DREU and fifteen other students in various arrangements. UNCC's own REU program officially ends on Friday, so this is the last week I'll see many of the friends I've made here. I'll try to remember that when balancing my time this week between preparing the presentation and spending time with the wonderful group I've gotten to know so well in just two months.

Week Eight: July 18 - July 24


Tuesday, July 20

On the research front, things have been pretty lively! I wrote... some code!

After doing some sight-seeing around different parts of bioinformatics, I've come back to regulatory networks and decided to stick with it for the rest of the summer. I remembered one of the first papers I read provided its data set (transcription factors in yeast) for download. It's quite simple, just a two-column table of "a regulates b" type relationships. They've already done the work of assembling this list (I still can't imagine what kind of processes this involved, but I'll leave that aside).

Genomic analysis reveals a tight link between transcription factor dynamics and regulatory network architecture

As a beginning to my actual project for the DREU program, I tried out the MATLAB functionality for graph/network representations. I've used MATLAB before in various math courses, and it was pretty much as I remembered it.

However, I had also heard good things about Mathematica 7, especially for its capabilities in scientific data processing and visualization. I read some tutorials, gave it a try, and I've been enjoying it ever since. I'll give more details when I've taken in a bit more, and settled on a specific research question.

Week Seven: July 11 - July 17


Sunday, July 11

Yesterday my roommate Ashley and I, together with quite a few more UNCC REU people ventured to the Carowinds theme park! We departed from our apartment area around 10:00 and were in the park by 11:00, and it was surprisingly nice outside. By around 2:00 it was getting pretty warm, so we hung out in the indoor video game arcade for a while before squeezing in a few more rides. By 5:00 it was uncomfortably hot and we'd ridden most of the rides that we were excited about so we decided to call it a day. The INTIMIDATOR was my favorite ride, by far. I'm so brave, I wasn't even... INTIMIDATED!


Week Six: July 4 - July 10

Ma'am, we're cutting you off

Tuesday, July 6

You're obviously over the legal limit on journal articles.

While Shannon's been gone I've had time to do some reading and thinking about where I would like to go with the research process this summer. I think I will still need some help defining a particular question that I can realistically "do" something with in the rest of the ten week program.

These are three articles I've read, thought through, highlighted and commented... Maybe I went a little crazy with the PDF-gathering. The common theme that interests me is how a well-engineered, modular programming framework can improve computational efficiency and give more scientifically meaningful results.

Classification of DNA sequences using Bloom filters

Stranneheim's paper strikes me as having the most well-defined objectives and thoughtful interpretation of statistics calculated from their data. It's certainly easy (especially for computer science people) to get caught up in finding more matches, more data, new methods. I liked that they didn't ignore or brush over the fact that they performed the same process using different sequence searching algorithms and each one yielded some matches that neither of the other two detected.

A Parallel Programming Framework for Multi-core DNA Sequence Alignment

There are a few points in the Almeida paper where at least I was not convinced they were thinking through the biological basis and significance of what they were doing, but it raised the most questions in my mind about experiment design and methodology for that very reason. I was amused to see the exact processor I have in my desktop listed as what they used for performance profiling on a reasonably priced quad-core machine.

Using Bloom Filters for Large Scale Gene Sequence Analysis in Haskell

The work by Malde is unmistakably a computer science paper, using Haskell to implement some fairly simple sequence alignment tasks. I took a course on Functional Programming this spring (we did our projects in Haskell) and I genuinely enjoyed it. It's a strikingly different way of thinking throughout the "development" process. By far the largest proportion of your time is not writing code, but planning and, well, meditating... becoming one with the monad... Sometimes it helps to stand on your head and look at it all upside down, too.

While I wouldn't expect the bioinformatics field at large to dive head first into functional programming languages (you'd probably hit your head on the bottom of the pool), it really demonstrates how a carefully-written program structure (and included data types) aids in rapid prototyping at a high level, as well as testing for program correctness using generated test cases.

Happy 3rd of July!

Sunday, July 4

Yesterday I went with Charlotte and Millie (dreu buddies) and a few more girls to see an outdoor symphony concert followed by fireworks display in South Park. I drove through the South Park mall area on one of my first days here and found it thoroughly confusing. However Charlotte and I are were the only two of us with a car here and we needed both to transport everybody, so I bravely faced the challenge and, well, this time I/we completely missed the I-77 exit we needed. I figured it out eventually, though: when we passed the "Welcome to South Carolina" signs I knew we'd gone too far.

Anyway, we had a lovely picnic and enjoyed the show with about 10,000 other people. Quite the place!


Week Five: June 27 - July 3

Adventures of Deliciousness

Thursday, July 1

My roommate/coworker Yingxu and I have had good luck with a couple of restaurants we've tried in the area. She had never had thai food but wanted to try it, and I love thai food, so last weekend we had dinner at a place near campus called Thai House (there was also a power outage at the apartment, so we couldn't really cook there).

We both know and love mexican food also, so for lunch today we went to Zapata's. It's in a kind of silly little shopping mall/center/thing built around a man-made lake with quaint little bridges and paddleboats and swans in the water... the whole works ;) Okay, maybe it's not fair for me to make fun of it now, because I did enjoy walking around outside afterwards. Today we had the nicest weather I've experienced so far this summer. Partly cloudy, less than 85 degrees, and low humidity.


Research Topics

Sunday, June 27

Something I really wanted to see this summer is an inside view of how academic research progresses over time, and as one moves from the role of a student and employee to researcher or professor. As I'm considering the possibilities for graduate school, it would be so nice to have a feeling of how I might spend my days: talking and planning projects on whiteboards, reading, programming, or perhaps making coffee?

Fortunately Shannon takes care of his coffee needs so I don't have to worry about that. He did, however, recommend a book that I've been enjoying reading today (it goes pretty quickly... imagine that!) It's called The Craft of Research, and it's a guide for graduate students or later undergraduates entering the "research" phase: what makes a good research problem?

Even if I have a few years left of following and doing my part of a professor's work, I can't help but wonder what motivations led them to it: did they have a question in mind and then find a way to fund it, or take a topic with money behind it and mold it to their interests? And before anyone starts the investigation itself, how do you know if you even have a scientifically viable question? There has to be something "answerable" or quantifiable about it, for sure, but can you imagine a worthwhile continuation of that topic in future papers?

For now I'll try to focus on writing just one paper for this summer. I'm not really sure what my focus will be. Shannon is going out of town for a week so I'll be exploring more on my own, I think.

Week Four: June 20 - June 26

Home away from home

Wednesday, June 23

I'm growing pretty fond of my little room in the apartment here. The only complaint I had with the room was that the desk chair isn't really a desk chair, so I went to Good Will hoping I might find something more usable and not too expensive, since I probably won't be able to fit it in my car when I drive back to Kansas.

Aha! What's this? A chair! A nicer one than I expected to find, and for $25. If I need to, I can always re-donate it in August.


It does make me miss my own desk and chair at home, especially the cute furry chair-warmer:


Week Three: June 13 - June 19

Regulatory Networks

Friday, June 18

I've still been working pretty independently, reading new journal articles and chapters of books as needed to fill in missing pieces of background knowledge on genetics.

I hadn't known just how dynamic the system of gene and protein interactions could be. Inside every living cell, a gene may be expressed at varying levels in response to environmental conditions, but also according to the current levels of other genes and proteins. One (large) area of bioinformatics is the effort to represent these systems as a graph-theoretic network, whose nodes are individual genes and directed edges indicate "a regulates b" relationships among them.

If you'd like to see some of this yourself, here is a link to one of the articles Shannon had me read earlier this month: Stochastic mechanisms in gene expression

It is becoming more clear how my work and school experiences are relevant to this particular area of bioinformatics. As I'm moving on to literature that's more focused on the statistics and graph theory aspects, I feel like I understand much more of the material. When I do encounter an obscure biology or chemistry explanation, it's less shocking to me now and I feel more comfortable inferring the relative meaning rather than looking up absolutely everything I don't recognize.

Working Environment

Tuesday, June 15

My laptop is currently set up at an empty space in the wet lab area. The room has nice, tall windows and people in lab coats performing mysterious procedures. Shannon gave me two articles from bioinformatics journals to digest. So far, my journal article digestion process entails reading the paper on one half of my screen while I look up every tenth word in a web browser on the other half.


As expected, I've had quite a few questions about the biological processes involved in gene transcription and translation. I went to Shannon's office and we had a very helpful, impromptu two hour catch-up lesson. It's good to know he's one of the endurance-class professors who can talk about their field until someone drags them away (or distracts them by asking a question about something else). That should be perfect for this summer.

Week Two: June 6 - June 12

Biotechnology: Now with Real Fruit Funding!

Friday, June 11

This week, Shannon drove five of us to the David H. Murdock Research Institute in Kannapolis, about 20 miles away from UNCC. A professor from North Carolina State University was giving a seminar about her lab's recent research in genomics. Apparently they host such things rather often, as it gets people together from different universities and companies in industry as well. It was a very relevant experience for us to see a non-classroom presentation and observe how people of different seniority levels interact and "network" with each other.

After the talk, we took a brief tour of the genetic sequencing lab space featuring... next generation sequencing machines: the Genome Analyzer IIx from Illumina and the Genome Sequencer FLX from Roche!

It was the first time the other students and I had visited the facility, and honestly we were all taken aback by the extravagance of it. How is all of this funded? Who works here? Why are all the walls painted yellow?

The DHMRI website explains that this building is part of the North Carolina Research Campus, a public/private collaboration to provide lab space and services for North Carolina universities and paid contracting for industry. I'm curious exactly who makes up this particular "industry" scene, and who managed to get this much public funding when so many universities are in the middle of drastic budget cuts, hiring freezes and tuition increases.

Finally, the yellow walls and recurring fruit-themed murals are due to David H. Murdock's career in business which includes, among other things, being the chairman of Dole Fruit Company. Murdock himself dictated the particular shade of yellow to be used on the walls.


Week One: June 1 - June 5

Lab meeting

Saturday, June 5

I attended the first biweekly lab meeting to take place during my time here. Jessica, Shannon, and all the students working in their labs get together to talk about procedures coming up in the wet lab, request refills on supplies, and make (mostly) lighthearted jokes at each other.

Yingxu is here under the DREU program working with Jessica (and one of my three roommates in our on-campus apartment!)

Amy and Warren are senior undergraduates at UNCC. They're both biology majors, but Amy's interest is more in the biology side of the field while Warren is more focused on computing. It seems that to give us a chance to gain experience and confidence in giving technical presentations, one of the students will give a presentation at every lab meeting about something new they've been learning. Today it was Amy's turn. She did really well, even if Warren and I accidentally made her nervous by snickering at something we thought was a joke but actually wasn't. Apparently, people pretty routinely talk about "promiscuous" behavior in the expression of certain genes. I have a feeling it won't be the last time I'll see people from the two "halves" of this field fail humorously to understand each other.

Orientation Day

Wednesday, June 2

Today was the first organized day of activities for REU students at UNCC. We all met in Woodward hall to introduce ourselves and take a survey. Next we went out to the courtyard to do some team-building exercises. We played some games in small groups, like improvising tools and a procedure to lift some objects out of a circle drawn on the ground without stepping inside it.

We had a long break scheduled for lunch, so most of us walked together over to the student union to get our UNCC photo IDs first. While some of the on-campus dining options are closed for the summer, I was amused to see that the food court in the Cone Center has some of the exact same food served on the KU campus in Lawrence, Kansas: chick fil-a sandwiches, sushi with gusto (too much gusto, not enough fish for me), pizza and energy drinks of all kinds. I was lured in by a bowl of udon noodles, as were about half of the others.

We're already starting to make plans for things to do for fun this summer. They people who know this area say from out of town need to try some of the local food specialties: north carolina barbecue (looking forward to it!), fried chicken (maybe I'll just conveniently neglect that one) and sweet tea (it's everywhere! really!) Waffle House was briefly mentioned and quickly vetoed. Carowinds, an amusement park south of Charlotte would be a fun trip, hopefully some weekend when it's not too hot outside.

Everyone reconvened after lunch to take a tour of the different labs students will be working in: small groups of about three to five professors, their graduate students, and their common work spaces. Even after spending thirty minutes at each of the Woodward Hall computer science labs (Human-Computer Interaction, Future of Computing, Visualization, Games for Learning) I still felt like I'd seen just a tiny sliver of everything that goes on inside. By the time we walked over to the Bioinformatics building, we were all getting a little worn out from the long day. I did get to meet my mentor, Professor Schlueter, in person for the first time. He was pretty enthused to show us the variety of things people are working on there, so I have a feeling I'll get plenty more chances to see the building and meet more of the people in it.