Saturday, June 15, 2013

Thoughts on New Gene Origination

The other day, I wrote a damning critique of Darwin's theory and offered nothing in the way of a positive alternative to the traditional view of accumulated-point-mutations as a driving force for evolution. It's easy to take potshots at someone else's theory and walk away. As a rule, I don't like naysayers who criticize something, then offer nothing in return. So I'd like to take a moment to try to offer a different perspective on evolution. In particular, I'd like to offer my own theory as to how new genes arise.

The question of where new genes comes from is, of course, one of the foremost open problems in biology. Current theory revolves mostly around gene duplication followed by modification of the duplicated gene (via mutations and deletions) under survival pressure [reference 4 below]. Gene fusion and fission have also been proposed as mechanisms for gene origination [3]. In addition, genes derived from noncoding DNA have recently been described in Drosophila [1]. Likewise, transposons (genes that jump from one location to another) have been implicated in gene biogensis [3].

The problem with these theories is that various enzymes are required in order for duplication, transposition, fusion, fission, etc., to occur (to say nothing of transcription, translation initiation, translation elongation, and so on), and existing theories don't explain how these participating enzymes appeared, themselves, in the first place. A fully general theory has to start from the assumption that in pre-cellular, pre-chromosomal, pre-organismic times, genes (if they existed) may have occurred singly, with multiple copies arising through non-enzymatic replication. Likewise, we should assume that early protein-making machinery was probably non-enzymatic, which is to say entirely RNA-based (i.e., ribozymal). If the idea of catalytic RNA is new to you or sounds unreasonably farfetched, please review the 1989 Nobel Prize research by Altman and Cech.

The fundamental mechanisms of de novo gene creation available in pre-enzymatic times might well have been nothing more than ribozymal duplication of nucleic acid sequences followed by erroneous translation. "Erroneous translation" can be of two fundamental types: frameshifted translation, and reverse translation. (Reverse translation here means transcription of the antisense strand of DNA and subsequent translation to a polypeptide.)

DNA is parsed 3 bases at a time (the 3-base combinations are called codons; each codon corresponds to an amino acid). If a single base is spuriously added to, or deleted from, a gene, the reading frame is disrupted and a hugely different amino-acid sequence results. This is called a frameshift error or frameshift mutation.

Spurious addition or deletion of a single base to a free-floating piece of single-stranded genetic material (RNA or DNA) is all that's needed in order to cause frameshifted translation. The protein that results from a frameshift error is, of course, in general, vastly different from the original protein.

If pre-organismic nucleic acids were single-stranded, then reverse translation would require 3'-to-5' reading of the nucleic acid as well as 5'-to-3' reading. If, on the other hand, early nucleic acids were double-stranded, then 5'-to-3' (normal direction) translation of each strand would suffice to give one normal and one reverse translation product. (Note for non-biologists: In all known current organisms, reading of DNA and RNA takes place in the 5'-to-3' direction only.)

Nucleic acids (RNA and DNA) have directionality, defined by the orientation of sugar backbone molecules in terms of their 5' and 3' carbons.

It's interesting to speculate on the role of reverse translation in production of novel proteins, especially as it applies to early biological systems. We don't know if early systems relied on triplet codons (or even if all four bases—guanine, cytosine, adenine, thymine—existed from the beginning). We also don't know if there were 20 amino acids in the beginning. There may have been fewer (or more).

A novel possibility is that early triplet codons were palindromic (giving identical semantics when read in either direction). There are 16 palindromic codons in the codon lexicon (AGA, GAG, CAC, ACA, ATA, TAT, AAA, and so on) which today encode 15 amino acids out of the 20 commonly used. In a palindromic-codon world, the distinction between "sense" and "antisense" nucleic acid sequences vanishes, because a single-stranded gene made up of palindromic codons could be translated in either direction to give a polypeptide with the same sequence, the only chirality arising from N- to C-terminal polarity. For example, the sequence GGG-CAC-GCG-AAA would give a polypeptide of glycine-histidine-alanine-lysine whether translated forward or backward, the only difference being that the forward version would have glycine at the N-terminus whereas the reverse version would have glycine at the C-terminus. The secondary and tertiary structures of the two versions would be the same. As long as catalytic function didn't directly depend on an amino or carboxy terminus of an end-acid, the two proteins would also be functionally indistinguishable.

Codon palindromicity is potentially important in any system in which single-stranded genes are bidirectionally translated, because in the case where a gene does happen to rely heavily on palindromic codons, the reverse-translated product will (for the reasons just explained) have the potential to be functionally paralogous to the forward-translated product (to an extent matching the extent of palindromic-codon usage). But this assumes that in early organisms (or pre-organismic soups), single-stranded genes could be translated in the 5'-to-3' direction or the 3'-to-5' direction.

It turns out modern organisms differ markedly in the degree to which they use palindromic codons, and there are (remarkably) some prokaryotes whose genes use an average of ~40% palindromic codons. The complementary strand of DNA would, of course, contain palindromic complements: AGA opposite TCT, CCC opposite GGG, etc.

All of this makes for interesting conjecture, but does any of it really apply to the natural world? For example: Do organisms actually employ strategies of "erroneous translation" in creating new proteins? Did today's microbial meta-proteome arise through mechanisms involving frameshifted and/or reverse translation? Is there any evidence of such processes, one way or the other? Tomorrow I want to continue on this theme, presenting a little data to back up some of these strange ideas. Please join me; and bring a biologist-friend with you!


References
1. Begun, D., et al. Evidence for de novo evolution of testis-expressed genes in the Drosophila yakuba/Drosophila erecta clade. Genetics 176, 1131–1137 (2007).
2. Fechotte, C., & Pritham, E. DNA transposons and the evolution of eukaryotic genomes. Annual Review of Genetics 41, 331–368 (2007)
3. Jones, C. D., & Begun, D. J. Parallel evolution of chimeric fusion genes. Proceedings of the National Academy of Sciences 102, 11373–11378 (2005).
4. Ohno, S. Evolution by Gene Duplication (Springer-Verlag, Berlin, 1970).