Repeated Sequence Homogenization Between the Control and Pseudo-Control Regions in the Mitochondrial Genomes of the Subfamily Aquilinae
LUIS CADAH´IA1*, WILHELM PINSKER1, JUAN JOSE´ NEGRO2, MIHAELA PAVLICEV3, VICENTE URIOS4, AND ELISABETH HARING1
1Molecular Systematics, 1st Zoological Department, Museum of Natural History
Vienna, Vienna, Austria
Department of Evolutionary Ecology, Estacio´
gica de Don˜ana, Sevilla, Spain
Centre of Ecological and Evolutionary Synthesis (CEES), Department of Biology, Faculty for Natural Sciences and Math, University of Oslo, Oslo, Norway
Estacio´n Biolo´gica Terra Natura (Fundacio´n Terra Natura—CIBIO),
Universidad de Alicante, Alicante, Spain
ABSTRACT In birds, the noncoding control region (CR) and its flanking genes are the only parts of the mitochondrial (mt) genome that have been modified by intragenomic rearrangements. In raptors, two noncoding regions are present: the CR has shifted to a new position with respect to the ‘‘ancestral avian gene order,’’ whereas the pseudo-control region (CCR) is located at the original genomic position of the CR. As possible mechanisms for this rearrangement, duplication and transposition have been considered. During characterization of the mt gene order in Bonelli’s eagle Hieraaetus fasciatus, we detected intragenomic sequence similarity between the two regions supporting the duplication hypothesis. We performed intra- and intergenomic sequence comparisons in H. fasciatus and other falconiform species to trace the evolution of the noncoding mtDNA regions in Falconiformes. We identified sections displaying different levels of similarity between the CR and CCR. On the basis of phylogenetic analyses, we outline an evolutionary scenario of the underlying mutation events involving duplication and homogenization processes followed by sporadic deletions. Apparently, homogenization may easily occur if sufficient sequence similarity between the CR and CCR exists. Moreover, homogenization itself allows perpetuation of this continued equalization, unless this process is stopped by deletion. The Pandionidae and the Aquilinae seem to be the only two lineages of Falconiformes where homology between both regions is still detectable, whereas in other raptors no similarity was found so far. In these two lineages, the process of sequence degeneration may have slowed down by homogenization events retaining high sequence similarity at least in some sections.
In contrast to the nuclear genome, where duplications and rearrangements are an impor- tant driving force of genome evolution, the mitochondrial (mt) genome of vertebrates has a more or less conserved structure. Nevertheless, modifications and rearrangements were detected in several groups of vertebrates. For example, the exchange of the positions of tRNA genes has been found in marsupials (Pa¨a¨bo et al., ’91),
Grant sponsors: Terra Natura Foundation; European Synthesys program; Grant number: AT-TAF-2434; Grant sponsors: Government of the Comunidad Valenciana; Spanish Ministry of Education and Science; Grant number: AP2001-1444
Additional Supporting Information may be found in the online version of this article.
*Correspondence to: Luis Cadah´ıa, Molecular Systematics, 1st
Zoological Department, Museum of Natural History Vienna, Burgring
7, A-1010 Vienna, Austria. E-mail: email@example.com
Received 28 July 2008; Revised 15 December 2008; Accepted 22
reptiles (Kumazawa and Nishida, ’95) and fish (Miya and Nishida, ’99). Gene duplications were found in reptiles (Kumazawa et al., ’96, ’98; Macey et al., ’97) as well as in fish (Inoue et al., 2001; Lee et al., 2001). In birds, however, the noncoding control region (CR) and its flanking genes are the only parts of the mt genome that have been involved in intragenomic rearrangements. Desjar- dins and Morais (’90) described a major deviation with respect to the mammalian gene order in the mt genome of galliform birds. Further studies revealed that this gene arrangement is also present in many other bird species and thus it was considered as the ‘‘standard avian gene order.’’ However, later it turned out that not all birds share this genomic arrangement. Consider- ing the galliform standard gene order as the ancestral state in the avian lineage, from which the other rearrangements derived, we follow the concept of Gibb et al. (2007), who introduced the term ‘‘ancestral avian gene order.’’
The first deviations from this ancestral gene order were detected in the mt genomes of the two raptor species Falco peregrinus (Mindell et al., ’98) and Buteo buteo (Haring et al., ’99). Furthermore, the comparison of mt genomes in several bird lineages revealed that similar rearrangements are present in bird species belonging to six different orders: Cuculiformes, Falconiformes, Passeriformes, Piciformes, Psittaciformes and Procellariiformes (Mindell et al., ’98; Haring et al., ’99, 2001; Bensch and Ha¨rlid, 2000; Eberhard et al., 2001; Mun˜oz et al., 2001; Va¨li,
2002; Abbott et al., 2005; Gibb et al., 2007). As each of these bird orders is more or less undisputed as a monophylum, the sporadic occurrence of these rearrangements, some- times only in subbranches of the different lineages, implies that they have originated independently several times during avian evolu- tion. The common characteristic of this mt rearrangement is the existence of an additional noncoding region besides the CR. The functional CR has moved between the tRNAThr and tRNAPro genes, and an additional noncoding section is located between the tRNAGlu and tRNAPhe genes at the original site of the CR in the ancestral gene order.
To explain the origin of this mt rearrangement, the ‘‘duplication hypothesis’’ has been favored by several authors (Moritz et al., ’87; Quinn, ’97; Mindell et al., ’98; Bensch and Ha¨rlid, 2000). It assumes that the rearrangement was initiated by a tandem duplication of the original CR together
with flanking sections (e.g., tRNA-Thr, tRNA-Pro, nd6, tRNA-Glu), followed by deletions or partial degeneration in both duplicated sections. Support for the duplication hypothesis came from the apparent similarity between the two noncoding sequences as observed in species of the order Passeriformes (Smithornis: Mindell et al., ’98; Phylloscopus: Bensch and Ha¨rlid, 2000). Later, two almost identical copies of the CR were found in the genus Amazona (order Psittaciformes), where both paralogues contain the same conserved sequence motifs (Eberhard et al., 2001). This situation was interpreted to represent an early stage after CR duplication, previous to the degeneration of one copy. In most other cases the second noncoding sequence lacked the conserved motifs characteristic for a functional CR probably representing an eroding remnant of the original CR free of functional constraints. The various designations used so far for the second noncoding region detected in various species reflect in general the interpretation of the authors concerning this lack of function and/or the assumed origin via duplication of the authentic CR (e.g., pseudo- control region, CCR: Haring et al., ’99; noncoding region, nc: Bensch and Ha¨rlid, 2000; CR(2): Gibb et al., 2007). For the sake of congruence with earlier studies (Haring et al., ’99, 2001; Va¨li, 2002), we maintain the term CCR for the copy located downstream of the functional CR. This seems to be justified because in most cases among the falconiform species investigated so far this second copy lacked CR-specific sequence motifs.
In Falconiformes, until recently, no sequence similarity between CR and CCR has been found. All raptor species analyzed exhibited a CCR with a characteristic structure consisting of a 50 nonre- petitive region without similarity to the CR followed by a large cluster of conserved tandem repeats (Mindell et al., ’98; Haring et al., ’99, 2001; Va¨li, 2002). Nevertheless, neither this structural characteristic nor the lack of sequence similarity necessarily contradicts the duplication hypothesis. It was assumed that the rearrangement was initiated by a single mutation event early in the Falconiformes lineage. Subsequently, the CCR degenerated completely while repetitive sequences accumulated, similarly as found, e.g., in the 30 region of many CRs. A surprising finding was reported recently by Gibb et al. (2007): Two almost identical copies of the CR are present in the mt genome of the osprey Pandion haliaetus.
Furthermore, Gibb et al. (2007) concluded that both CR copies may be functional and, conse- quently, they designated the second copy as CR(2). In this study, we present a striking intragenomic similarity between the two noncoding sequences in Bonelli’s eagle Hieraaetus fasciatus (Accipitri- dae, Falconiformes), which provides strong sup- port for the duplication hypothesis. To investigate this sequence similarity in more detail, we characterize the mt gene order in Bonelli’s eagle in the region comprising the CR and CCR. We also describe the internal structure of both noncoding regions and the presence of conserved sequence
blocks in the CR.
Furthermore, we trace the evolution of the CR and CCR sequences in Falconiformes. With this purpose, we perform intra- and intergenomic sequence comparisons in H. fasciatus and other falconiform species including published sequences
as well as several new sequences determined in
this study. In particular, we were interested in finding out to what degree the second CR copy is retained in the various lineages and whether sequence homogenization (through mechanisms such as gene conversion or recombination) between the paralogues may have played a role in the aquiline lineage, to which H. fasciatus
belongs. On the basis of phylogenetic analyses of
different sections of CR and CCR, we outline an evolutionary scenario that could explain the different levels of similarity observed in these sections and the underlying mutation events.
MATERIALS AND METHODS
Samples and DNA extraction
To determine CR and CCR sequences, five samples of H. fasciatus were analyzed, one of them consisting of cells frozen after plasma separation (sample 1360, Ca´diz, Spain) and the other four consisting of blood, three preserved in ethanol (HFA, Murcia, Spain; PD1, Rabat, Morocco; HFC, Alicante, Spain) and one preserved in Seutin buffer (SA2, Morocco) (Seutin et al., ’91). Furthermore, CR and CCR sequences from one sample of Accipiter gentilis (HAB-1A, bred in captivity, Austria) as well as CR sequences from Aquila heliaca (Ahel1, Lower Austria, Austria), Aquila chrysaetos (Achr1, St. Petersburg, Russia) and Aquila pomarina (Apom1, Slovakia) were determined for sequence comparison.
Genomic DNA was extracted using a slightly modified version of the proteinase K–LiCl method described by Gemmell and Akiyama (’96). Samples
(100 mL of blood in ethanol or Seutin buffer) were incubated at 561C overnight in 300 mL extraction buffer (100 mM NaCl, 50 mM Tris–HCl, 1% SDS,
50 mM EDTA, pH 8, 100 mg/mL proteinase K).
Nucleic acids were extracted with 5 M LiCl and chloroform:isoamyl alcohol 24:1, and then precipi- tated with ethanol. For the Aquila samples, DNA was extracted from feather samples with the DNeasy Kit (Qiagen, Hilden, Germany).
PCR amplification, cloning and sequencing
Primer sequences are listed in Table 1. Primer binding sites and resulting polymerase chain reaction (PCR) fragments are depicted in Figure 1. For the determination of the CR sequence of H. fasciatus, either the complete CR was amplified with primers that bind in the flanking tRNAThr and tRNAPro genes (ThrF/ProR) or the sequence was amplified in two overlapping fragments (361 bp overlap) using the primer pairs Thr1/CSB— and CR21/Pro—. Amplification of the CCR was performed using the primers nd6-11/
12S-1— binding in the genes for nd6 and 12S rRNA, respectively. A PCR fragment including the section from the 30 part of the CR, tRNAPro, nd6 to tRNAGlu was amplified with the primer pair Hier- CR41/Hier-Glu2—. For Ac. gentilis it was possible to amplify and clone the complete section span- ning from tRNAThr to 12S rRNA with primers Thr1/12S-1—, and sequencing was performed with primer walking. The CR sequences of Aquilinae species were obtained by amplification of two overlapping PCR fragments using the following primer pairs: Thr1/CSB— and SpiCR3
1/Pro— (Table 1).
PCR amplification was performed in an Eppen- dorf thermocycler in a volume of 25 mL containing
2.5 mL of PCR buffer, 0.2 mM of each nucleotide,
1 mM of each primer, 1 unit Dynazyme DNA
polymerase (Finnzymes, Espoo, Finland) and
100 ng of DNA. The PCR reaction comprised an initial heating for 2 min at 941C followed by 35 cycles: 10 s at 941C, 15 s at annealing temperature and 60 s at 721C. After the last cycle, a final extension of 5 min at 721C was performed. PCR products were extracted from agarose gels with the QIAquick Gel Extraction Kit (Qiagen) and cloned using the TOPO TA Cloning Kit (Invitro-
gen, Carlsbad, CA). Sequencing of the clones (both
directions) was performed by primer walking at
MWG-Biotech (Ebersberg, Germany).
TABLE 1. Primers utilized to amplify the mt CR and CCR in Hieraaetus fasciatus, Accipiter gentilis and three Aquila species
Primer Sequence Annealing temperature (1C) Region amplified Reference
ThrF TTGGTCTTGTAAACCAAARANTGAAG 62 CR 1
ProR AATNCCAGCTTTGGGAGYTG 62 CR 1
Thr1 AACRTTGGTCTTGTAAACC 50 50 -Section CR 2
CSB— ATGTCCAACAAGCATTCAC 50 50 -Section CR This study CR21 AAACCCCTAGCACTACTTGC 54 30 -Section CR This study SpiCR31 CGGACCGGTAGCTGTCGGAC 58 30 -Section CR This study Pro— GAGGTTTGAGTCCTCTTTTTC 54 30 -Section CR 2
nd6-11 ACCCGAATCGCCCCACGAG 57 CCR 3
12S-1— ATAGTGGGGTATCTAATCCCAGTTT 57 CCR 3
Hier-CR41 CACCCAAAACAACCTCTA 52 30 End CR to tRNAGlu This study
Hier-Glu2— TTTGGAGAGAAGCCAAGCA 52 30 End CR to tRNAGlu This study
CR, control region; CCR, pseudo-control region.
1Godoy et al. (2004).
2Nittinger et al. (2005).
3Haring et al. (’99).
Fig. 1. Mt gene order in the region including the CR and CCR in H. fasciatus. Primer binding sites (arrows) and PCR fragments (f-CR-L, f-CCR, f-nd6, f-CR-50 , f-CR-30 ) are indicated. Primer names are given in italics. CR, control region; CCR, pseudo-control region; PCR, polymerase chain reaction.
When targeting mt sequences, using blood as a source of DNA can be problematic, given that avian erythrocytes are nucleated but are relatively depau- perate in mtDNA; therefore, PCR might favor the amplification of ‘‘numts’’ (‘‘nuclear copies of mito- chondrial genes’’; Lo´pez et al., ’94). To make sure that we were amplifying mt fragments, we com- pared the blood-obtained sequences used in this study with partial CR and CCR sequences obtained afterwards from muscle and feather samples, which are mtDNA-richer tissues (Sorenson and Quinn,
’98). These sequences were identical to those obtained using blood as a template, supporting the assumption that the sequences studied here are of mitochondrial origin (data not shown).
Sequences were aligned and edited manually with BioEdit 7.0.1 (Hall, ’99). Distances
(p-distances) were calculated by hand; all gaps were treated as one mismatch irrespective of their size. NJ trees (neighbour joining; based on p- distances) were calculated using the software package PAUP (version 4.0b10; Swofford, 2002) to illustrate the complicated pattern of varying sequence similarities between CR and CCR across different sections. These trees are not intended to provide a phylogeny of the taxa involved, even more as the alignments of these sections are very short. Instead, we use them to demonstrate the effect of chimerical (homogenized/nonhomogen- ized) sections. Thus, determining substitution models for such chimerical sequences as well as application of more sophisticated tree building algorithms is not useful in this case. Positions with gaps in pairwise comparisons were excluded from the analysis. For the tree in Figure 5B, all positions with gaps were excluded because of
several large deletions in various taxa. Bootstrap values (1,000 replicates) were also calculated with PAUP. Sequences determined for the phylogenetic comparisons are deposited under the following GenBank accession numbers: H. fasciatus (FJ627048), Ac. gentilis (FJ627047), Aq. pomarina (FJ627045), Aq. heliaca (FJ627046) and Aq. chrysaetos (FJ627044). For comparisons the fol- lowing sequences from GenBank were used: conserved sequence blocks within CR: B. buteo (AF380305; Haring et al., 2001), Neophron perc- nopterus (AY542899; Roques et al., 2004), Gypae- tus barbatus (AY542900; Roques et al., 2004), F. peregrinus (DQ144188; Nittinger et al., 2005), Ciconia ciconia (AB026818), Alectoris barbara (AJ222726; Randi and Lucchini, ’98) and P. haliaetus (NC998550; Gibb et al., 2007); CR: Spizaetus nipalensis (AP008238; Asai et al.,
2006), Grus japonensis (AB017620; Hasegawa et al., ’99), P. haliaetus (NC008550; Gibb et al.,
2007); CCR: Aq. heliaca, Aq. pomarina, Aq. chrysaetos (AF435096, AF487453.1, AF435099; Va¨li, 2002).
Although recent molecular investigations re- vealed that within Aquilinae the genera Spizaetus, Aquila and Hieraaetus as currently defined are paraphyletic groups (Helbig et al., 2005; Lerner and Mindell, 2005; Griffiths et al., 2007; Haring et al., 2007) and, accordingly, several species mentioned in this article should be renamed, a thorough taxonomic revision comprising all repre- sentatives of this subfamily is still lacking. Thus, we follow Dickinson (2003) and use the conven- tional names in this article (also concerning orders, families and subfamilies).
Gene order of CR, WCR and flanking genes of H. fasciatus
From three samples of H. fasciatus (1360, HFA and SA2), the complete CR was amplified using primers that bind in the flanking tRNA genes (PCR fragment f-CR-L in Fig. 1). From two additional samples (HFC, PD1), the complete CR was obtained by isolating two overlapping frag- ments (PCR fragment f-CR-50 , f-CR-30 ). Successful amplification of the whole CCR (f-CCR) was only achieved in one sample (HFC). This fragment includes regions flanking the CCR (50 side: part of nd6 and tRNAGlu; 30 side: tRNAPhe and part of 12S
rRNA). The sequence analysis confirmed that in H. fasciatus the CR is flanked by the tRNAThr and tRNAPro genes, whereas the CCR is located between the tRNAGlu and tRNAPhe genes. To investigate if the mt gene order in the CR/CCR section of H. fasciatus is the same as in other birds of prey (e.g., Mindell et al., ’98; Haring et al., ’99,
2001; Roques et al., 2004; Nittinger et al., 2005; Gibb et al., 2007), we amplified the interjacent sequence (PCR fragment f-nd6). The expected PCR product of 688 bp was obtained from all five individuals. From one sample (HFC) the respec- tive fragment was sequenced and thus the com- plete sequence spanning from tRNAThr to 12S was determined in this individual. This sequence was used for all subsequent analyses.
Structure of the CR and WCR
The complete CR of H. fasciatus (sample HFC) is
1,158 bp long. The usually distinguished three domains (DI–DIII) of the CR were recognized in H. fasciatus (Fig. 2). To identify previously described conserved sequence blocks in the CR of H. fasciatus, we aligned the sequence with those of two other accipitrid species: S. nipalensis (Asai et al., 2006) and Ac. gentilis (this study), which is shown in the
electronic supplement (ES.1). The alignment was
readily achieved for DII. However, in parts of DI and DIII it proved difficult to accomplish because of length variation owing to repeats, which are located at different positions and vary in length, sequence and number. Furthermore, we constructed an align- ment of the conserved sequence boxes described by Randi and Lucchini (’98) and detected in H. fasciatus