DREU 2009 Ioana Bercea

Personal Mentor Colleagues Project Journal Final Report

Biological Intuition

INTRODUCTION
Retrotransposons are genetic elements that are able to replicate and insert themselves into random locations on the chromosome. They are part of the bigger class of transposons. The "retro-" component in their name emphasizes the fact that, as opposed to DNA transposable elements who manifest themselves as DNA sequences, retrotransposons have RNA as an intermediary. They would first copy into RNA format and then, with the help of reverse transcriptase, back into DNA that may insert itself into the genome.
The fate of these retrotransposons depends mainly on the place in which the retrotransposon is inserted. First, we can distinguish between two cases.

      -The retrocopy is inserted into a gene( chimeric case)
         The retrocopy can be inserted into an intron and be transcriptionally silent, accumulating mutations with time. It can also disrupt the exon-intron structure, by becoming an exon. If transposed into an exon, it causes mutations and participates in the transcript( such a gene would be called a chimeric gene).

      -The retrocopy is inserted into some other place on the genome
         In that case, it can decay and not ever be transcribed. In that case, we call it a retro-pseudo-gene(as opposed to retrogenes). Sometimes, though, these retrocopies can recruit promoters and actually become functional. Such genes are generally believed to be intron-less, though there is recent evidence that such retrogenes can develop introns.

IMPORTANCE
Retrotransposition is a process through which genome sized is increased and new genes are formed. Some of the genes affected by such mutations can even exhibit different new functions. Such a process, therefore, is an important element in the evolution of genomes. For example, it is believed that retrogenes are involved in the evolution of sexual traits. It has recently been discovered that the UTP14c retrogene is associated with spermatogenesis and fertility in man. Other studies revealed a retrogene that linked to a human recessive disorder, gelatinous drop-like corneal dystrophy, a form of blindness.

QUESTIONS
   - Can we trace the origin of the identified retrogenes? Even if we assume that the cDNA of retrogenes will be very similar to the parental cDNA, how can we effectively distinguish between the possible parents?
   - How does time affect the survival rate of retrogenes? Can we trace the life of retrogenes and detect certain critical moments?(for example, how many retrocopies survive immediately after they are transcribed? how many retrocopies from the same parent have the same age?)
   - How "random" is the place where these retrogenes are transcribed? Are there "preferred" places on the genome or can we find a correlation between the initial parental place and the retrocopy's place?
   - Are there any probabilities we can assign to stages of the retrocopy survival process?

Computational Component

PIPELINE
We will do a homology-based search through the input genome, based on cDNA transcripts. After we locate the possible hits on the genome, we use GeneWise in order to determine the exon-intron structure of the genes at those locations. Based on the predicted structure, we classify the cDNA sequences as having different types of retrogenes and then proceed to appropriately detect the parent location that was initially retrocopied.

TOOLS
   - Perl & BioPerl (main programming language)
   - MySQL (for storing and manipulating the data)
   - GenBank (opensource genomes)
   - BLAST (for matching the cDNA with locations on the genome)
   - GeneWise (for predicted gene structure)

Mathematical Model

Coming soon

PROBABILISTIC MODEL

BIOLOGICAL PARAMETERS