Wednesday, December 31, 2014

Transposable Elements and Common Descent of Humans and other Primates

Note on this blog - I set this blog up twice several years ago. The second attempt was started with this post, but it fell into some kind of digital hole at google, and I couldn't find any way to get back to it to edit the first post or to add any new posts. I've now moved this post to the blog at the original address I would like to change the address to, but google has never responded to me about fixing this problem, so there seems to be no way to do it. You'll have to save a link, or remember to type in an address that doesn't match the name of the blog.
I am writing this because it seems to me that the type of evidence that I am going to describe here for evolution is both decisive and readily comprehensible by laymen. I am going to concentrate on humans, chimpanzees and other primates beause this seems to be what concerns people the most, and because several primate genomes have been fully sequenced (including the human genome, which has of course been studied more intensely than any other.) This means that there is a staggering amount of evidence. I will just note that the same kind of analysis could be done for any other group of closely related organisms where substantial amounts of genome sequence are available, because transposable elements are present in nearly all animals and plants and in a large portion of microbes.

The argument, simply put, is this. Primate genomes contain large numbers (millions) of sequences (ranging from 50 or so bp to 6000 bp) which got where they are by being copied from another location in the genome and inserted where they are now. (There are longer sequences that have been duplicated to new locations in the human genome, but they are not my focus here.) The processes by which these sequences get inserted are not target specific. When a sequence segment "jumps" (actually in most cases the sequence is copied and inserted) its final location in the genome is largely random. And here is the essential point. When you compare the genomes of different species (say human and chimp) huge numbers of these transposed sequences are found in exactly the corresponding position in the 2 genomes. Depending on when the transposon at a particular site was inserted, it may be there in multiple species. Very old transposons can be present at a particular site in all mammals. Generally the more closely related two species are, the more transposon insertion sites they will share.

Some transposable elements are still "jumping" in the human and other genomes. About 80 instances of genetic diseases have been found where a transposable element inserted into a gene in an individual and was not present in either parent, and over 7000 locations have been identified in the human genomes sequenced so far where a transposable element has inserted in some chromosomes but is absent in other copies of the chromosome and in all chimp chromosomes. These latter new insertions are interesting but not relevant to my argument, except that they may serve to convince people that transposons really do get where are by being copied and inserted. They aren't just repetitive sequences. One creationist web site did a whole series on Alu transposons and never mentioned that they get where they are by insertion.

Now logically, the presence of a transposable sequence at a particular location in different species could be the result of parallel transpositions of the same element in the two separate species. However, none of the transposon types that occur in primates are targeted to specific locations. There are transposons in bacteria that target specific sites that can be unique in a genome, but none of the transposons in primates work this way. When a transposon in a mammal jumps, it has about 3 billion base pairs to "choose from" when it lands. To get target specificity would require a recognition sequence of at least 16 bp if the enzymes involved had absolute target specificity for one particular sequence. (There are about 2^32 possible sequences 16 bp long = over 4 billion sequences. The transposons in mammals do have a statistical preference for much shorter (AT rich) sequences (about 5-6 base long), but there are millions of short sequences in a mammalian genome that fit these preferences, and the preference isn't absolute. A transposon will sometimes land in a suboptimal sequence. Any break in the DNA caused by various kinds of damage can be a target for insertion of a transposon.

The upshot of this is that even a single case of transposons inserting independently at the exactly corresponding site in two different species is a rare event. This has been reported, but it was possible to distinguish the 2 events even in this case because the two insertions were by different classes of transposon (the sequence that inserted at the site was completely different in the two species. In the human genome over 900 classes and subclasses of transposable element have been distinguished.) It is also possible to distinguish insertions at the same site of the same element, because the inserted elements are often truncated from their full length, and the different lengths distinguish separate events.

The result of this is that the hundreds of thousands of cases of the same transposon being inserted at exactly the corresponding position in the genomes of different species can only be be accounted for by the different species having common ancestors in which the transposon insertions took place.

An additional level of specificity is added by the fact that transposons often insert within previously inserted transposons. This is not surprising when you realize that the human genome (and other primate genomes) are at least 50% composed of transposable element sequences. When transposons have inserted into previous inserted transposons it is possible to analyze the sequences and determine the order in which the different transposons were inserted. There are over 600,000 clusters of multiple transposons like this in the human genome. When you compare the human and chimp genomes you find that the transposons were not only inserted at the same sites in the two genomes, they were inserted in the same order.

When you put all this together it is apparent that the odds against millions of transposon insertions (the human genome contains about 3 million total) occurring in parallel at the corresponding sites in different species and in the same order are astronomical. (Actually trans-astronomical. The calculation would produce a number larger than any number that is useful in astronomy.)

The result of this is that there are only two possibilities to account for the transposons in animal genomes, common ancestors or miracles, millions and millions of miracles. But the trouble with miracles is that when you have you invoked them, you have quit doing science, because miracles can account for anything. You can postulate that the whole world, including all our memories and all the physical evidence, was created 5 minutes ago. No one can prove that it didn't happen. It just isn't very interesting. Once you have started doing that, evidence is irrelevant. You might as well just go to the beach or watch TV or whatever you prefer. There's no reason to do all the work and spend all the money that it takes to do science. So, if you want to stick to science, the only way to account for all those transposons inserted at the same position in different species is that those species had common ancestors. At least that's the only scientific possibility I can think of.

That is the argument in brief. There is a huge mass of details that one can get into about the different kinds of transposons in mammalian genomes, their mechanisms of transposition, the different ways of estimating the age of individual insertion sites, the occurrence of sequences in the transposable elements that have some function for the cell, the many ways that transposons alter chromosome sequences by carrying neighboring sequences with them when they transpose, the mechanisms that cells employ to suppress the transcription of TEs most of the time, the way that new TE insertions get into the germ line so that they are inherited in the next generation, the occurrence of transposition in somatic cells and the effect that that sometimes has on induced cancer, etc. But what I have presented here is the basic argument that TEs provide for the fact that current species groups have common ancestors, i.e., that speciation and evolution have occurred.


2001 Endogenous retroviruses in the human genome sequence. Genome Biol. 2:reviews1017.

2001 Initial sequencing and analysis of the human genome. Nature 409, 860. See Section "Repeat content of the human genome."

2004 Whole-genome analysis of Alu repeat elements reveals complex evolutionary history. Genome Res. 14, 2245.

2006. Retroposed elements as archives for the evolutionary history of placental mammals. PLOS Biol. Apr 4, e91.

2007 Evolutionary History of Mammalian Transposons Determined by Genome-Wide Defragmentation. PLOS Comput. Biol.3(7): e137.

2008 Mammalian non-LTR retrotransposons: For better or worse, in sickness and in health. Genome Res. 18, 343.

2009 The impact of retrotransposons on human genome evolution. Nature Rev. Genetics 10, 691.

2013 Mobile element scanning (ME-Scan) identifies thousands of novel Alu insertions in diverse human populations. Genome Res. 23,1170.

To give some idea of what interspecies comparison of TE insertion sites looks like, I am including a figure that aligns a 50,000 bp region of the human and chimp genomes, with the TEs marked.

Figure 1. The upper panel is repetitive sequences determined by Repeat Masker software in a segment of human chromosome 3. The bottom panel is the corresponding segment in chimp. Generally the elements present in human are present in chimp, although they may not line up perfectly due to small insertions or deletions in the intervening sequences. Darker shades of grey mean that the element is very similar to the standard sequence that the software uses for that type of element, and thus that the element is younger and has had less time for its sequence to diverge. In a least one case of an old highly diverged L2 element, the software detected it in human but missed it in the chimp sequence. SINEs are short interspersed elements, the most common of which in humans is Alu elements. LINEs are long interspersed elements, the most common of which in humans are LINE-1s. LTR elements are endogenous retroviruses and related elements which lack the envelope gene and thus can't form virus particles. LTR stands for long terminal repeat, the diagnostic characteristic of these kind of elements. DNA transposons are old elements that moved by a cut-and-paste mechanism, but none of them have been active in the line that led to humans for a very long time. The bottom 2 or 3 tracks of each part of the figure show TEs that have been interrupted by the insertion of another TE.

1 comment:

  1. if those sequence are functional- then we can say that its not the result of insertion but a design.