Back to Main Page

Research Journal on RNA/DNA Secondary Structure

Where I Work
May 30 June 1 June 8 June 13 June 14 June 23 June 30 July 7 July 14 July 16 July 21 July 23 July 25 Aug 2 Aug 3

May 30, 2001, Week 1

Met with Anne on Monday, and she gave me some papers and books to read, and Danielle's also helped out, as she's had a little bit of experience with DNA computing before. I did find out I'm a few steps ahead of the game in one respect, I've had some experience with LaTeX at Rose, and Anne wants us to use that to write up our stuff. On the other hand, I've never bothered to learn HTML before now.. Anyhoo, I've spent the past two days looking over the research Dr. Condon is working on, and I think I've got a pretty decent idea what's going on thus far. I'm going to try and see if I can 'splain everything without math first...

Dynamic Programming

you know how recursion works, we keep calling the same function on progressively smaller bits of the big thing, well, apparently, we use dynamic programming when while we're dividing everything, we keep running into the same things to figure out... Regular recursion just keeps recomputing the smaller things when it discovers we need them again... What dynamic programming does is we have a table that stores the values for all the smaller solutions, and we work our way up. so where the function
 fibonnacci(n){
        if (n==0 || n==1)
	 return 0;
	else
	 return fibonnacci(n-1) + fibonnacci(n-2);
	} 

keeps having to recompute numbers.. say you want fibonnacci of 5, first you compute fibonnacci of 4, then you compute fibonnacci of 3. However, when you compute fibonnacci of 4, you have to compute fibonnacci of 3, so you end of figuring out the same problem lots of times.

Well, dynamic programming is supposed to start from the bottom up, we get this big table of fibonacci numbers, and we just start filling it in from fib[0] to whatever number we want, and then, when we need to know, say the fibonnacci of 3, we just grab it from the table rather than computing it.

Of course, for the fibonnacci problem, this is a pretty ridiculous solution.. there are other ways to speed up fibonnacci calculations (tail-recursion) that don't rely on tables and all that.... Supposedly it's supposed to be a great solution for optimization problems that have overlapping subproblems and that it keeps calling the same subproblems. If you wanna know more about Dynamic Programming, I suggest finding a copy of the "big white book of algorithms" Introduction to Algorithms, by Corman, Leiserson, and Rivest. It's a nice resource.

DNA secondary structure

So, what does all'a this have to do with me? Well, background first
DNA is formed from 4 protiens: A,T,C,G. As bond to Ts, and Cs bond to Gs. A normal strand consists of two strings of protiens that are bonded together.. so like a string AATTCCG will be connected to say, TTAAGGC. These DNA strands like to pair up. When we do DNA computing or other stuff with DNA, instead of working with a double strand, we use a single string... and unfortunately, this single string, lacking a partner to pair up with, tries to pair up with itself, so it makes makes all these crazy loops and bulges to try and minimize the energy of the strand...(the energy I think is just a measure of how stable the shape is... smaller energy is more stable..)

People are using Dynamic Programming to predict what shape a strand of DNA or RNA will take. There's even a website that'll take a random DNA or RNA strand you type in and fold it for you. RNA Folding There's a whole bunch of nasty recursion relations...(they're not really as nasty as they first look, but they ain't 'xactly fibonnaccis either...)

so that's what I spent the past few days digesting. I guess I'll have more info on what in particular I'm doing when I get more stuff digested.
Top


   June 1, 2001, Week 1

1.1   notation review

S : { z1, ..., zm | zi = wi or wi}
s S

1.2   new work

Danielle and I met with Anne today to discuss how we were coming with the formulas... Between Danielle and I, we were able to get quite a bit of headroom. I worried about one thing, and Danielle worked on anther problem. We were focusing on the
V(i,j) = Ï
Ì
Ó
+ for i £ j
min(eH(i,j), eS(i,j) + V(i+1,j-1), VBI(i,j),VM(i,j)) for i < j
In our case, what we want to do is extend this for the situation where we can choose between two words to put at each place. So, first we extend the eH(i,j) part to be mins SeHs(i,j).
Unfortunately, this looks exponential... that was kicking me in the rear for a while... fortunately for us, eH(i,j) depends on j-i-1, si,si+1,sj-1, and sj, so we might actually be able to split this into cases and come up with something that won't be exponential. I think I'm going to nail that this weekend, then start on the VBI stuff...
Danielle was working on the second part of that equation, eS(i,j) + V(i+1, j-1). That needs to be divided into cases as well, and it gets even more tangled up, just due to the recursion.

1.3   random kinda related to work stuff

I've actually been a bit of a slap last night, hanging out with a college friend who visited rather than doing work. Wrote this in LATEX and converted it to HTML using Hevea, isn't that neato!? here's the links to the LATEX page (it's a really confusing website) and the Hevea page. Also a guide to LATEX.
Top

June 8, Week 2

Danielle and I have started to worry about implementation a bit. So, we're basically trying to decide between C, C++, and Java. See, Danielle's had most all her experience with C, and I've only had C++ and Java, additionally, Zuker's code is in C++, and to top it off, I despise that darn language. Zuker's code is about 12,000 LOC, so it'd be really nice not to have to rewrite all of that; it seems like a huge undertaking. I've also discovered the wonder of the "Beta Lab Nutrition Co-op". It's a snack thing operating out of the lab I work in, where you just sign up with a small deposit ($5.00 Can), and mark off every time you take a candy bar or a soda from the fridge. The cost is about half as much as it costs to buy a can at a coffee shop or something. I've been meeting some of the Lab regulars. It's a pretty quiet area really, as it's all theory people. Also discovering that every week on Wednesday or Thursday they have a speaker present a paper on something or other.
Top

June 13, Week 3

Alrighty, I've been pretty lax at keeping this up, so here's a rundown on what I've been doing for the past 2 weeks. I mentioned before that the hangup we were having has to do with the number of times we have to call eH(i,j). At first it looks like we'd have to call it 4 different times, based on whether i, i+1, j-1, and j were either fixed in the same word as the one preceding or if they were not in the same word as the one preceding. This is to simplistic though, the way it really works is that word(i) < or = word(i+1) < or = word(j-1) < or = word(j). This puts the number of cases to worry about to 8, not 4, and since we'll have to do more or less the same thing for eL and eM, it'll get even worse for those! So, I've been working on an algorithm to simplify that. It's not actually that hard, and it's combined with the algorithm to create all the possible strings that can be generated from a given i and j. So, what happens is this: a binary tree is created that has a depth of (j - i + 2) using a for loop. At the start of the loop, it checks if the index of the loop is i, i+1, j-1, j. If so, then it checks if the current index is fixed in the same word as the one previous or not, and adds nodes appropriately. If the index is not i,i+1, j-1, or j, then it just adds "filler" values appropriately. Then, when we want the possible strings, we just run thru the tree and get them! And then this can easily be extended to deal with the eL and eM functions as well.

So, that's how we're doing that. I'd originally started working on this in C++, because the original program was written in C and I don't know C, but after much irritation (I hate C++), I rewrote it in JAVA. I've got the stuff working on my craptop, and I was hoping to clean up the code a bit at the beta lab today, but I'm having a bit of trouble actually shifting my files to those computers... The craptop decided to be ornery and wouldn't let me dial in, so I brought it here, and discovered I'm having a heck of a time logging on here as well sigh Such is life...
Top


June 14, Week 3

Well... Just had a meeting with Anne today. She's been off at a conference in Florida. She spoke to another guy from SUNY Stony Brook who's just about to publish a paper doing well, exactly what we're working on, except limited to words of length 3 and extended to more than two branches at a time. We're waiting to hear back from this guy now to see if he'll let us see his code or whatnot, and maybe extend them. So, I won't be cleaning up my code today as it's now pretty much obsolete, I'll be working on typing an explanation of the base functions eH,eS,eL,eM I guess.
Top

June 23, Week 4

This week, we got the code from the SUNY-SB people (Barry and Steve). At first attempt, it didn't compile, but that was easily fixed with modifications due to platform dependencies. We spent quite some time tracing the code, making sure we had a good enough idea how to get it work for our stuff. We're still having problems making the program work with a protein input instead of a straight RNA strand. Hopefully we can get that to work. So that's the tone for this week: Code tracing. Although I must say, it really is quite pretty code. It's easy to read and doesn't make my eyes go crossed trying to figure out the typedefs and names and such. I'm also learning the ins and outs of Makefile and compiling with something a bit more complicated the visual studio. Sometimes it's amazing what knowledge you just don't pick up.

On a side note, while I was chatting with Danielle in the Lab, she made an offhand comment I just gotta note down. She described herself as something like "a tv you can carry on a conversation with at times". I had to stop work and just laugh for a while about that. It's so true. I mean, I'm not a big talker, but Danielle is, and if you don't say anything to her, she strikes up a conversation with you, you can just sit back and be amused. Thought I'd mention that here, as it happened in the lab....
Top


June 30, Week 5

Breakthrough!! Got the program to work with protein inputs!! The problem was pretty simple actually, Barry had originally had two functions that could read in a protein. One was an artifact, the other was the used function, however, on the comments for each, they took files that had different orders for stuff. So, we were trying to read in a file that didn't have columns in the right order. :P We also now have a program that will generate sets of words for a set of constraints, so we can use that to create words to test. Now what we need to do is come up with a good way to test stuff. We plan on doing some playing with it as well, to try and see if we can input designs presented in some of the papers we've read.
Top

July 7, Week 6

Okay, so after a few random runs, Danielle and I came up with a set of tests to look at. What we want to do is take all 4, 5, 6 letter configurations of the alphabet {a,t,c}, divide them by GC%, and run those sets to see if we can come up with a "good" estimate of how much GC% is good in words, and use that to decide what else we want to run tests of. We also want to see if we can get a limit for the number or concatenations we should test to make sure we have no secondary structure; i.e., if a set of words won't fold for any configuration of 3 words concatenated together, do we then know that any number of concatenations are possible without forming a secondary structure. However, we've run into another difficulty with the code we're working with. Right now we're not getting an answer we know is incorrect for one particular set of words. Been talking to Barry about the problem, and hoping to find a solution. We're also working on a proof that we need only test up to a certain number of concatenations.
Top

July 14, Week 7

Presented a first draft of a proof that we need test only all configurations of length 3. I could show it off here, but it's pretty full of holes. We spent the next hour or so picking it apart and putting it back together.. which started us on a tangent about context free-ness in secondary structures. Huh, I never really expected to see that reappear in my research. I'm rather happy I took Theory of Computation. Anne gave us a two books about it. One of them is the Theory of Compuation book by Sipser that I already had for class (and I even brought it too!!!) and another was one that was more directly related to DNA computing. Will spend some time looking up papers that could give us a better picture of this. By now we're headed off into a tangent that Anne wasn't even expecting. hey, it's all good.
Top

July 16, Week 8

I've been incredibly lax in keeping this updated, so I'm going to sit down tonight and fix this up a bit. For now, here's how things are looking at the start of this week. Danielle and I are working out a context free grammar to build words with secondary structure. We already know we can't do it for strands in general, but we're pretty sure you can do it for the case where any sub secondary structure has an maximum and minimum bound on its energy. So, that's what I'm working on now. I did find a few papers on language theory relating to secondary structures, and the big one is by Searls. Anne also suggested that the two of us do a talk on our summer research at the weekly Beta lab presentation.
Top

July 21, Week 8

Yep, at work on a Saturday, Danielle and I are working on our paper and a talk we'll be doing on Wednesday. Man do I hate talks as well, sigh anyhow though, the talk is coming along okay, but I think I'm about ready to head off for a while, probably come back to it later today. The way we're splitting it up is that Danielle is doing the stuff on the recursions, and I'm working on the context free stuff. Perhaps if I feel like it, I'll add a link to a pdf of our slides for the talk... Maybe not, who knows. Grr... I hate public speaking.
Top

July 23, Week 9

Arrgg, I'm about to start throwing some computers around the lab! About time to call it quits for the day here and work on practicing my talk somewhere where I won't break anything or disturb anyone. Anyhoo.. I finally got my photos from before scanned in and moved to my html directory, (this adventure is part of the reason I wanted to toss some computers around). Besides that, fleshing out my paper and working on my talk. I'll tell ya how it goes on Wed!
Top

July 25, Week 9

Ahhh... the relaxation of a completed talk. That's a nice feeling. Anyhow, the talk went okay, so I'm taking a break from research stuff and using my time to do some updates to this page and gab a bit about what else I did besides prepare for the talk. And I didn't bother putting notes onto a computer format, so I won't be adding them to this site. The final report should cover all the same stuff anyhow.
Top

Aug 2, Week 10

Who-ee, just finishing up the paper now. Taking a break from writing that to type this up. I'd'a put something up here a bit earlier in the week, but the rose network went on the fritz for almost a week, and I didn't feel like trying to update something I couldn't post immediately, as I'm swapping off between doing that here at the lab and at the appartment. So the network is up and working now, although I did have to sacrifice some unread email to the network gods. Anyhoo. I spoke to Anne on Monday and Wednesday. I finally got a chance to look at the new batch of code Barry sent Danielle and I a week back. So far it looks like the main problem we had with the last code got fixed, but I think there's still a problem with the output I get. I'm hoping to trace that back today and tomorrow. I'm starting to learn more tricks about using a Linux computer to do useful stuff like compiling and whatnot. The paper's coming along pretty much fine. I think I've made the last major change just now, so I'm rereading the thing, trying to figure out where all we accidentally wrote something out a bit too ambiguously or changed a word definition mid-paper. In a few hours I'm also meeting with the undergrad who's taking over the project, hopefully to pass along some stuff, explain some of the more common difficulties I've had. Oh wait... I still have a diagram to add to the paper... nix that line about "last major change". Anyhow, for the last two days, it looks like my work schedule consists of polishing the paper, tracing code, and information transfer. I'll probably add one more entry into this journal tomorrow before I leave and expect to see the final paper magically appear not much later than Saturday the 4th, but for now, see ya's!

Arrggg!!! started to dig into the code I got. As far as I can tell, the function that outputs stuff to the file is selectively deciding what it should output and what it shouldn't output to the file. Grrr... I dislike C++. Okay, got that problem squared away, (ever notice how it's always something stupid that can hang you up?) now I"m back to the same error I had before the update. Except now I think I may actually be able to solve it. Gotta reread through some emails, I think I remember getting advice on this before.
Top


August 3, Week 10

Ahhh... last day of work. I guess I'm a bit sorry to see it end. The work has definitely been entertaining. Anyhow, today's mostly been cleanup. I cleaned out my directories, tar-ed my files, getting ready to send remove them from the UBC network and over to my rose account, just in case I ever feel like looking over that stuff. I've already sent along the final document to the undergrad taking over the research and to Anne. I've worked out how much money I owe to the Beta "nutrition" co-op... those soda's really add up, but I don't think it's been over 15 bucks canadian for all 10 weeks, so it's not That bad. Before I leave I have to drop off my passkey with Anne, but beyond that, I don't have much else to do. Stick a fork in me, I'm done.
Top