Amphipods (Crustacea) of Lake Baikal are a very numerous and diverse group of invertebrates generally believed to have originated by adaptive radiation. The evolutionary history and phylogenetic relationships in Baikalian amphipods still remain poorly understood. Sequencing of mitochondrial genomes is a relatively feasible way for obtaining a set of gene sequences suitable for robust phylogenetic inferences. The architecture of mitochondrial genomes also may provide additional information on the mechanisms of evolution of amphipods in Lake Baikal.
Three complete and four nearly complete mitochondrial genomes of Baikalian amphipods were obtained by high-throughput sequencing using the Illumina platform. A phylogenetic inference based on the nucleotide sequences of all mitochondrial protein coding genes revealed the Baikalian species to be a monophyletic group relative to the nearest non-Baikalian species with a completely sequenced mitochondrial genome - Gammarus duebeni. The phylogeny of Baikalian amphipods also suggests that the shallow-water species Eulimnogammarus has likely evolved from a deep-water ancestor, however many other species have to be added to the analysis to test this hypothesis.
The gene order in all mitochondrial genomes of studied Baikalian amphipods differs from the pancrustacean ground pattern. Mitochondrial genomes of four species possess 23 tRNA genes, and in three genomes the extra tRNA gene copies have likely undergone remolding. Widely varying lengths of putative control regions and other intergenic spacers are typical for the mitochondrial genomes of Baikalian amphipods.
The mitochondrial genomes of Baikalian amphipods display varying organization suggesting an intense rearrangement process during their evolution. Comparison of complete mitochondrial genomes is a potent approach for studying the amphipod evolution in Lake Baikal.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-016-3357-z) contains supplementary material, which is available to authorized users.
Ancient freshwater lakes are the birthplaces of very diverse and mostly endemic biota. Their eco-systems are «natural laboratories of evolution» that offer insights into many evolutionary topics, attracting continuous attention of the scientific community and advancing the elucidation of the speciation mechanisms [1, 2]. About ten of the contemporary freshwater lakes existed for longer than one million years. Among them, Lake Baikal is the oldest (reviewed in [1–4]). The age of Lake Baikal is estimated by different authors to be in the range of 25 to 30 million years [5, 6].
Similarly to other ancient giant lakes, a greater part of the huge diversity of animals inhabiting Lake Baikal belongs to the species flocks . These diverse groups of monophyletic species are believed to evolve in the confines of the lake through adaptive radiation in sympatry [2, 8, 9], although geographic isolation is possible in some cases [10–12].
Amphipods are the most diverse group of Baikalian invertebrates (more than 350 described species). They are extremely diverse morphologically and have a wide range of ecological specificities [13, 14].
Most of Baikalian amphipod species have evolved in the confines of Lake Baikal, although some of them have later spread to other water bodies in Eurasia [13, 14]. The only Holarctic species inhabiting shallow bays of Lake Baikal is Gammarus lacustris Sars, 1864 [14, 15]. Also Gammarus dabanus Takhteev et Mekhanikova, 2000, an endemic species from the mountain streams of Khamar-Daban ridge, was found at the edge of Baikal near the estuaries . The discovery and description of all amphipod species from Lake Baikal is still far from approaching completion (i.e. [12, 16–19]). The work on the revision of their higher level taxonomy is still in progress [13, 14, 17, 19–27]. According to the most modern revision by Kamaltynov  all Baikalian amphipod species belong to 76 genera and eleven families, ten of which are autochthonous: Carinogammaridae Tachteev, 2000, Crypturopodidae Kamaltynov, 2001, Macrohectopodidae Sowinsky, 1915, Micruropodidae Kamaltynov, 1999, Baikalogammaridae Kamaltynov, 2001, Ommatogammaridae Kamaltynov, 2009, Acanthogammaridae Garjajeff, 1901, Eulimnogammaridae Kamaltynov, 1999, Pachyschesidae Kamaltynov, 1999, Pallaseidae Tachteev, 2001.
Application of molecular phylogenetic approaches allowed to refine the taxonomy of Baikalian amphipods and to outline the main trends of their evolution, specifically their phylogenetic position relative to the non-Baikalian amphipod taxa. It was shown that at the very beginning of the diversification Baikalian amphipods had split into at least two major lineages delineated by their ecological and morphological traits [28–32]. This observation could also be explained by two independent colonizations of Lake Baikal. Several groups have shown that the ancestors of the extant Baikalian amphipods were closely related to the ancestors of the Holarctic species of Gammarus Fabricius, 1775 [28, 30–34].
Englisch et al. (2003) showed that Baikalian species Parapallasea lagowskii (Dybowsky, 1874) was «nested within genus Gammarus» . In the study by McDonald et al. performed on 32 species of Baikalian amphipods and 29 non-Baikalian species only species belonging to the family Micruropodidae clustered together with Gammarus . An early study by Sherbakov et al. based on partial 18S rRNA sequences showed close relationship of the two Baikalian lineages to different species of Gammarus . This was later confirmed in the detailed study by Hou and Sket, which utilized more genetic markers and sampled more species of Gammaridae . One lineage of Baikalian amphipods was found to be a sister group to the “Oriental Gammarus clade”, the other one was closer to the “Eurasian clade” . Molecular phylogeny also revealed inconsistencies in the current taxonomy of Baikalian amphipods. Genera Acanthogammarus Stebbing, 1899 and Pallasea Bate, 1862 were shown to be polyphyletic [11, 28, 35].
Majority of the previous molecular studies of evolutionary history and taxonomy of Baikalian amphipods are impeded by insufficient statistical support of the most important clades [28, 30, 35, 36]. The presence of poorly resolved clades might be explained both by insufficient length of sequences used and by their low mutational rate (e.g.18S rDNA) [29, 35]. Unfortunately, more recent works aimed at resolving the complicated relations within Gammarus included only a few Baikalian species or did not address their phylogenetic relationship [31, 32, 37, 38]. Therefore, further study of Baikalian amphipods is necessary to elucidate their phylogeny.
In this study we use nucleotide sequences of completely sequenced mitochondrial genomes (mitogenomes) for the evolutionary inferences and clarification of the taxonomy of Baikalian amphipods. The ten studied species appear to form a single clade relative to Gammarus duebeni Lilljeborg, 1852. We also perform a detailed comparative structural analysis of amphipods mitochondrial genomes and find an unexpectedly high degree of their length variation and extensive gene rearrangements.
Results and Discussion
Mitochondrial genomes organization
We obtained three complete and four nearly complete sequences of mitochondrial genomes from Baikalian amphipod species. Information about animals sampling sites, sequences lengths, numbers of reads, GenBank accession numbers, etc. of currently studied Baikalian amphipod species and that of previously published ones is presented in Table 1. Acanthogammarus victorii (Dybowsky, 1874) and Garjajewia cabanisii (Dybowsky, 1874) have incomplete non-coding regions sequences due to their low complexity and the presence of repeats (the 51-mer tandem repeats in the control region (CR) of A. victorii and the CR sequence duplication in the mitochondrial genome of G. cabanisii) that have prevented automatic assembly process. Mitochondrial genomes of Crypturopus tuberculatus (Dybowsky, 1874) and Linevichella vortex (Dybowsky, 1874) also lack various parts of the coding sequences due to the difficulties in amplification of these regions. The sizes of complete mitochondrial genomes range from 14,370 to 18,114 b.p., which is within the range of mitogenomes of other amphipods (14,113 to 18,424 b.p.). The AT content varies from 62.24 to 68.96% for completely sequenced mitochondrial genomes and from 57.45 to 64.86% for partially sequenced ones (Table 2). In general this is compatible with the values of other known complete amphipod mitogenomes (64.03–76.03%), although the AT content of Brachyuropus grewingkii (62.24%) was found to be the lowest within amphipods [39–47].
Summary of the Baikalian amphipod data presented in the study
Comparison of mitogenomic characteristics of Baikalian amphipods
All completely sequenced mitochondrial genomes of Baikalian amphipods have strong negative values of GC-skew (prevalence of C over G) on the (+) strand (−0.30 to −0.18) and their AT-skew (prevalence of A over T) varies from −0.02 to 0.01 (Table 1, Additional files 1 and 2). Species with partial mitochondrial genome sequences have GC-skew from −0.25 to −0.01 and AT-skew from −0.07 to +0.02. The unequal nucleotide content between two strands is typical for mitochondrial DNA. This is a consequence of an asymmetric mutational process that affects the A and C nucleotides during the replication and transcription [48, 49].
Complete sequences of Baikalian amphipod mitochondrial genomes contain 13 protein coding genes, two rRNA genes and from 22 to 23 tRNA genes, a CR and intergenic spacers of different number and lengths (Fig. 1, annotations of studied mitochondrial genomes are in Additional file 3).
Organization of mitochondrial genomes of Baikalian amphipods in comparison to the pancrustacean ground pattern. Genes on the (+) strand are located above the line, whereas genes coded on the (−) strand are below the line. Transfer RNAs genes are...
Protein-coding genes order
All completely sequenced Baikalian amphipod mitochondrial genomes possess a typical set of 13 protein-coding genes (PCGs). PCGs of each species have the same transcriptional polarity as in the pancrustacean ground pattern  (Fig. 1), however the order of PCGs has been altered relative to the ground pattern in the mitochondrial genomes of three species under study. Pallaseopsis kesslerii (Dybowsky, 1874) has a translocation of the nad1 gene, C. tuberculatus possesses translocation of nad5, cytB, nad1, cox3, nad3, nad4, nad4l, nad6, Gmelinoides fasciatus (Stebbing, 1899) has translocations of nad1, nad5, nad4, nad4l, nad6 and cytB (Fig. 1). These re arrangements have never been observed in other amphipod species.
The only feature in common among the amphipods with the altered gene order is that they are relatively shallow water dwellers (Table 1) belonging to the Micruropus flock . However, there are other shallow water amphipods in the dataset which have an unaltered order of mitochondrial genes.
In the non-Baikalian amphipods one may also find deviations from the pancrustacean ground pattern. Onisimus nanseni G.O. Sars, 1900 has nad6 and cytB translocation, in some species of Metacrangonyx Chevreux, 1909 cytB is inverted [39, 40, 45], the species of Caprella Lamarck, 1801 have nad4, nad4l, nad6 translocations and nad5 inversion. Gene order in the species belonging to Pseudoniphargus Chevreux, 1901 differs from the pancrustacean ground pattern and is variable within the genus. P. daviui Jaume, 1991 has nad1 translocation and P.sorbasiensis Notenboom, 1987 has nad6, cytB translocation.
Non-Baikalian amphipods with rearranged PCGs relative to the pancrustacean ground pattern are distant from each other taxonomically (up to a superfamily level) as well as ecologically. Among the species with an altered gene order there are two genera of freshwater stygobionts: Palearctic genus Metacrangonyx (superfamily Hadzioidea) and Nearctic genus Pseudoniphargus (superfamily Allocrangonyctoidea) , Arctic marine species O.nanseni (superfamily Lysianassoidea)  and some species of Caprella (superfamily Caprelloidea) inhabiting warm seas around the world [51, 52]. Still, amphipods carrying the unchanged ground pattern are at least equally taxonomically and ecologically diverse. Parhyale hawaiiensis (Dana, 1853) (superfamily Talitroidea) is a marine species distributed in north Atlantic , Bahadzia jaraguensis Jaume and Wagner, 1998 (superfamily Hadzioidea) is a stygobiontic species that was found in the caves of West Indian and Caribbean regions [54, 55], G. duebeni (superfamily Gammaroidea) is a North Atlantic distributed species , Gondogeneia antarctica (Chevreux, 1906) (superfamily Calliopioidea) is a South Antarctic species .
It seems that the changes in the gene order of Baikalian as well as non-Baikalian amphipods can not be directly correlated with any meaningful environmental parameter. Similarly, although it is possible that some lineages are more prone to reshuffling of their mitochondrial genes then the others, they do not appear to be phylogenetically clustered. Obviously, further careful investigation with a more comprehensive dataset is necessary to resolve the ancestral pattern of Baikalian amphipod mitochondrial genomes.
Base composition bias in protein-coding genes
The GC-skew values were measured for all protein coding genes similarly to the measure of strand composition bias . Mitochondrial genomes of most Baikalian amphipod species studied here have negative GC-skew in PCGs that are located on the (+) strand. PCGs encoded on the (−) strand, on the contrary, have positive GC-skew (Fig. 2, Additional file 2). Such strand bias is typical for most mitochondrial genomes in Malacostraca [42, 59]. However, in C. tuberculatus, the composition bias was found to be noticeably lower than in the other species. The GC-skew values of the (+) strand PCGs vary from −0.11 to +0.14 and of the (−) strand PCGs GC-skew values vary from +0.02 to +0.09 (Fig. 2). In the nad3 and nad6 genes the strand bias is reversed and thus the GC-skew is positive.
Statistical data for PCGs and ribosomal genes in mitochondrial genomes of Baikalian amphipods. a GC-skew values of PCGs and ribosomal genes. (+) marks genes coded by the positive strand, (−) marks genes coded by the negative strand. b An illustration...
The strand bias reversion was previously detected in the mitochondrial genomes of several different invertebrate species such as Katharina tunicata (Mollusca), Florometra serratissima (Echinodermata), Argiope amoena (Chelicerata), Aleurochiton aceris, Bemisia tabaci, Campanulotes bidentatus, Thrips imaginis, Bothriometopus macrocnemis, Cotesia vestalis, Neomaskellia andropogonis, Tetraleurodes acacia ect. (Hexapoda), Ligia oceanica, Hutchinsoniella macracantha, Tigriopus californicus, Lepeophtheirus salmonis, Calanus hyperboreus, Argulus americanus, Procambarus clarkii, Corallianassa couitierei, Nihonotrypaea japonica, Cambaroides similis, Homarus gammarus ect. (Crustacea) [43, 45, 49, 58]. Within amphipods the reversed strand bias was found in the mitochondrial genomes of Metacrangonyctidae species . As previously suggested by Hassanin et al., 2005, the strand reversion can happen due to the reversion of one or several genes relative to the CR or alternatively, it could be explained by the reversion that includes the CR . We did not succeed in sequencing the CR of C. tuberculatus and thus we can not conclude which scenario is more likely in this case. Still, taking into account the relatively low values of GC skew of the majority of protein coding mitochondrial genes in this organism, one may speculate that the reversion is a relatively recent event on the evolutionary time scale . Taxa with a reversed strand bias should be taken into phylogenetic analysis with caution as they tend to promote long branch attraction artifacts .
AT content in protein-coding genes and codon usage
The AT content of completely sequenced PCGs varies from 57.01% (in A.victorii) to 66.78% (in Eulimnogammarus cyaneus (Dybowsky, 1874)) (Table 2). The PCGs of A. victorii have the lowest AT content among entirely sequenced mitochondrial genomes of amphipods. Codon positions in mitochondrial PCGs were noted to have an unequal AT content. Mitochondrial genomes studied here have the lowest AT content at the first codon position, a slightly higher AT content at the second position and the highest AT content at the third codon position (Fig. 2). The A. victorii and C.tuberculatus species, however, have the highest AT content at the second codon position in their PCGs, and the lowest at the first codon position (Fig. 2).
Codon usage analysis in Baikalian amphipods revealed the presence of all codon types typical for the invertebrate mitochondrial code (Additional file 4). Along with the regular start codons (ATA and ATG), the unusual ones (ATT, ATC, TTG and GTG) were predicted to initiate mitochondrial PCGs. Presence of unusual start and stop codons is generally typical for mitochondrial genomes . Some PCGs of the investigated genomes possess truncated stop codons (T) that are presumably completed after a post-transcriptional polyadenylation  (Additional file 3).
In all currently studied genomes the most frequently used codons are TTT (Phe) (5.40 to 7.05%) and TTA (Leu) (4.69 to 8.72%). Also some of the more frequent codons include ATT (Ile) (2.93 to 5.54%), ATA (Met) (3.69 to 6.07%) (Additional file 4). In non-Baikalian amphipods these four codons are also among the most abundant ones. Bias toward the AT-rich codons is typical for many arthropods .
The Effective Number of Codons (ENC)  in Baikalian amphipods varies widely, ranging from 45.04 in Eulimnogammarus verrucosus (Gerstfeldt, 1858) to 55.67 in C.tuberculatus (Fig. 2). These values exceed slightly the ENC in other amphipod species, which range from 36.47 to 49.74. A strong positive correlation (r = 0.97) between ENC and GC content at the third codon position was found in mitochondrial genomes of Baikalian amphipods (Fig. 2). The ENC in all sequenced genomes increases with the decrease of AT content in protein coding genes.
Nucleotide diversity analysis
Nucleotide diversity (Pi) was estimated for two groups of protein-coding genes, the first one encompasses the Baikalian amphipod species, the second one contains non-Baikalian species (see species list in Additional file 5) (Fig. 2). These data may be useful for designing new molecular markers for phylogenetic inferences of amphipods. The patterns of nucleotide diversity within mitochondrial genes in both groups were nearly identical, with lower values for genes of Baikalian amphipods. This may be due to the much higher evolutionary span of the non-Baikalian amphipod species in comparison to the Baikalian amphipods. Indeed, all Baikalian amphipods belong to a single superfamily Gammaroidea, while the rest of analyzed amphipods represent superfamilies Hadzioidea, Allocrangonyctoidea, Lysianassoidea, Caprelloidea, Talitroidea, Gammaroidea, Calliopioidea.
In both groups the cox1 sequences have the smallest values of diversity (0.23 ± 0.01 for Baikalian species, and 0.24 ± 0.01 for non-Baikalian ones), whereas the atp8 appears to be the most variable (0.43 ± 0.03 for Baikalian species and 0.47 ± 0.01 for non-Baikalian ones) (Fig. 2). The highest difference in the nucleotide diversity between Baikalian and non-Baikalian species was noted in the nad6, which might be caused by the lack of a complete sequence from L.vortex. Other studies dedicated to animal mitochondrial genomes also revealed the cox genes to be the most conserved and the atp genes to be the most variable [45, 64–68]. However, there are animal groups with the different patterns of nucleotide variability distribution within protein coding genes of their mitochondrial genomes [69, 70]. It seems that the pattern of nucleotide diversity is a lineage specific feature.
On the basis of the data of all available Baikalian amphipod mitochondrial genomes it is possible to predict which genes are more suitable for different phylogenetic applications in this taxonomic group. The most variable genes (atp8, nad2, nad4l, nad5, nad6) can be utilized as markers for species level phylogeny and population studies. Alternatively, sequences of the least variable mitochondrial cox genes may be suitable for deep phylogeny (families and higher taxa) investigations.
Transfer RNA genes
From 22 to 23 tRNA genes were identified in the completely sequenced mitogenomes of Baikalian amphipod species. The locations of tRNA genes are highly variable and all studied mitochondrial genomes have tRNA genes with altered positions relative to the pancrustacean ground pattern (Fig. 1).
The secondary structures of 213 mitogenome-encoded tRNAs ranging from 49 to 64 b.p. in length and including those of E.verrucosus  were predicted for the analysis (Additional file 6). Most of the predicted tRNAs possess the expected clover-leaf structures, however some of them have aberrant structures. The tRNA-Ser (UCU) lacks the DHU arm in all studied species. The DHU arm is also missing in the tRNA-Val of B.grewingkii, A.victorii, E.cyaneus, and G.cabanisii, in the tRNA-Ser (UGA) of G.fasciatus and L.vortex, and in the tRNA-Tyr of G.fasciatus. The tRNA-Gln lacks the TψC arm in all studied species except for C.tuberculatus. The TψC arm is also absent in the tRNA-Thr of G.fasciatus and in the tRNA-His and tRNA-Pro of P.kesslerii (Additional file 6).
The presence of tRNAs with aberrant structures was found to be typical for crustaceans [39, 41–45, 47, 72–76] and can also be seen in other invertebrates [77, 78]. The presence of aberrant tRNAs in mitochondrial genomes is usually explained either by selection towards minimization of the mitochondrial genome  or by replication slippage .
In four out of ten studied mitogenomes the additional tRNA gene copies are found along with the standard set of 22 tRNA genes. A copy of trnP is found in Eulimnogammarus vittatus (Dybowsky, 1874), trnL1 in G.cabanisii, trnQ in G. fasciatus and trnD in A. victorii (Fig. 1, Additional files 3 and 6). The copies of tRNA-Pro, tRNA-Leu1 and tRNA-Asp possess the standard clover-leaf structure. In G. fasciatus a clover-leaf structure is seen in one tRNA-Gln copy (Q2), whereas the other copy (Q1) lacks the TψC arm (Additional file 6). P.kesslerii also has an additional tRNA adhering to a typical clover-leaf structure, which however has four bases at the presumed anticodon site. The nucleotides in the middle of the anticodon loop may correspond to either Phe or Ile depending on the exact position of the anticodon. An abnormal anticodon suggests that this copy is a pseudogene (ψ F/I) (Additional file 6).
The presence of additional tRNA genes was noted in many mitochondrial genomes of both vertebrate and invertebrate species. If the duplications occurred recently both copies are either identical or very similar in the primary sequence [80–83]. The copies maintain functionality if they retain some conserved elements of their secondary structure . One of the two copies may subsequently undergo elimination. The vestiges of such copies that accumulated substitutions are identified as pseudogenes [85, 86]. Another possible mechanism involves the switch of tRNA identity as the consequence of a point mutation in the anticodon. This mechanism was called tRNA remolding by Cantatore et al. .
While the trnL1 genes of G.cabanisii share 75.00% identity pointing to their origin from a single trnL1 gene, the tRNA duplicates in other amphipods may have switched their identities: one of the two trnP genes in E. vittatus has 78.68% identity with the trnL1, in G. fasciatus a copy of trnQ has 70.00% identity with the trnH gene, and in A. victorii a copy of trnD gene has 67.21% identity with the trnH gene. It is likely that the additional tRNA genes of E. vittatus, G. fasciatus and A. victorii have been undergone remolding.
Evidence of tRNA remolding was found in mitochondrial genomes of different groups, including Porifera [88–91], Mollusca [92, 93], Echinodermata , Arthropoda [43, 91], Chordata [94, 95]. It was recently shown that the remolding event of Trp (UCA) to Gly (UCC) took place in the ancestor of amphipods . Additional tRNA copies found in the mitogenomes of Baikalian amphipods illustrate different stages of the rearrangement process.
Ribosomal RNA genes
The rRNA genes in mitochondrial genomes of all studied amphipods are located on the (−) strand. In two out of eight mitochondrial genomes with the completely sequenced coding portion (G.fasciatus and P.kesslerii) the rRNA genes have altered positions in comparison to the pancrustacean ground pattern  (Fig. 1, Additional file 3). Rearrangements in the mitochondrial genome involving ribosomal RNA genes or/and PCGs occur much less frequently than those involving only tRNA-coding genes [96–98], thus they are called «major rearrangements» . The rearrangements of rRNA genes and PCGs might potentially affect the effectiveness of replication and transcription of mitochondrial genomes.
The length of completely sequenced Baikalian amphipods’ rrnL genes varies from 976 to 984 b.p., and the rrnS gene length varies from 618 to 630 b.p. (Additional file 7). The lengths of rRNA genes in Baikalian amphipods appear to be slightly smaller than those in previously sequenced amphipod mitochondrial genomes [40–42, 44–47]. The AT content ranges from 64.50 to 73.75% in the rrnL genes, and from 62.38 to 70.90% in the rrnS genes respectively (Additional file 7). The GC-skew calculations for rRNA genes give highly positive values (from 0.23 to 0.39 for completely sequenced genes) that are comparable to the values calculated for PCGs encoded on the (−) strand (Additional file 7).
Control region and intergenic spacers
Mitochondrial genomes of Baikalian amphipods have varying numbers and lengths of non-coding regions (Additional file 3). Control region (CR) is the most important non-coding part involved in replication and transcription of the mitochondrial DNA . For identification of a putative CR we searched features typically associated with such regions in invertebrates (poly-T stretch, tandemly repeated sequences, hairpin structures, AT-rich sequences (Additional file 1) [49, 58, 99–103].
In the mitochondrial genome of E. cyaneus a putative CR is a 181 b.p sequence between the rrnS gene and the trnY-trnQ-trnC-trnI-trnM-nad2 gene cluster. A similar CR location was also observed in the previously published mitochondrial genomes of E.vittatus (1053 b.p.)  and E.verrucosus (473 b.p.)  and also in the mitochondrial genome of G. duebeni  (Fig. 1, Additional file 3).
Mitochondrial genomes of A.victorii and B.grewingkii  possess large non-coding sequences between the rrnS and nad2 genes. These sequences are interrupted by several tRNA genes. A putative CR in A.victorii is located between the locus containing a 13-T stretch and the trnI gene. This region was not sequenced completely, however all features typical for a CR were found in the partial sequence of the 1390 b.p. long locus. A putative CR in B.grewingkii is a 1264 b.p. long stretch between the 11-T locus and the trnD gene (Fig. 1, Additional file 3).
In the mitochondrial genome of G.cabanisii the CR is located between the rrnS gene and one of the duplicated trnL1 (trnL1/1) and measures 1212 b.p. in length. (Fig. 1, Additional file 3). Additionally we found traces of a second CR-like sequence located between the nad2 gene and the trnL1/2-rrnL-trnV-rrnS gene cluster. This region was not sequenced in its entirety, only the regions of 102 b.p. and 232 b.p. corresponding to its flanks were sequenced. The 102 b.p. region has a 98.03% identity with the corresponding region of the original CR, whereas the 232 b.p. region has a 65.60% identity with the other flank of the original CR (Additional file 8). It is likely that the latter region has undergone degeneration after the duplication. Without the complete sequence it is difficult to estimate the time of duplication and the exact sequence of events that led to the appearance of the second CR or speculate about its possible function. Thus we annotated these sequences simply as non-coding regions. The CR duplications were observed in mitochondrial genomes of some invertebrates, including ticks [106, 107], ostracods , sea cucumbers , katydids etc. . Amphipods Caprella mutica Schurin, 1935 and Caprella scaura Templeton, 1836 also possess highly identical duplicated CRs in their mitochondrial genomes [41, 43].
The position of CR between the rrnS and nad2 genes in the mitochondrial genomes is typical for some other amphipods: G. duebeni , O. nanseni , G. antarctica , several species of Pseudoniphargus  and for the pancrustacean ground pattern as well , although the adjacent tRNA genes are often different.
The mitochondrial genome of P.kesslerii has a large non-coding sequence between the nad1 and nad2 genes that is separated by a few tRNA genes and one pseudo tRNA gene. A 340 b.p. sequence located between the 10-T locus and the pseudo tRNA gene (ψ F/I) was defined as a CR (Fig. 1, Additional file 3).
The mitochondrial genome of G. fasciatus possesses two large non-coding regions: one between the rrnS and nad5 genes, the other between the nad5 and nad4 genes. Both of these regions are interrupted with several tRNA genes. A 235 b.p. region between the rrnS and trnV was identified as a CR. All other non-coding regions were annotated as intergenic spacers (Fig. 1, Additional file 3).
It is notable that while most features typical for the CRs are found in the corresponding regions in the majority of sequenced mitochondrial genomes of Baikalian amphipods, the mitogenomes of E. cyaneus and A.victorii only possess tandem repeats.
Variation in the CR location was found in other amphipod species with an altered protein-coding and ribosomal RNA gene order. In the mitochondrial genomes of species from the Metacrangonyx genus the CR sequences were identified between the rrnS and cytB genes [39, 40, 45]. In C. mutica and C. scaura [41, 43] the first CR is located between the nad6 gene and the trnC-cytB gene cluster, and the second CR is located between the nad4l-trnP and the trnI-trnM-trnY-trnQ-nad2 gene clusters. The lengths of CR sequences in non-Baikalian amphipods also vary significantly from 25 b.p. in M. goulmimensis to 1626 b.p. in G. duebeni [40–42, 44–47]. Furthermore, it was previously shown that this feature varies considerably even between individuals of the same species. Such CR length variation was noted in Metacrangonyx longipes Chevreux, 1909 (26 and 40 b.p.) and Metacrangonyx goulmimensis Messouli, Boutin and Coineau, 1991 (25 and 471 b.p.) [40, 45]. Thus, the variable location and length of CR sequences in mitochondrial genomes of the Baikalian amphipod species is in concordance with the characteristics of CRs in other amphipods and invertebrates in general.
The number of intergenic spacers in completely sequenced mitochondrial genomes of Baikalian amphipods varies from 9 to 21 and their total length varies from 268 to 3863 b.p. (Table 2, Additional file 3). Three Baikalian amphipods with entirely sequenced mitochondrial genomes (B.grewingkii, P.kesslerii, and G.fasciatus) possess the largest portions of intergenic spacers which take up 10.80, 11.42, and 21.32% of their complete mitochondrial genomes length. Within the currently sequenced non-Baikalian amphipods only G. antarctica  has a comparable length of the non-coding intergenic spacers (4354 b.p. in total). The presence of large and numerous intergenic spacers in some mitochondrial genomes studied here may be an evidence of former duplication events .
Distinct features of Baikalian amphipod mitochondrial genomes
Structural analysis of Baikalian amphipod mitochondrial genomes identified several features that differ in most of the studied mitogenomes. Alterations in gene order, strand bias reversion, presence of additional tRNA genes and large non-coding regions as well as a CR duplication indicate intense rearrangement processes, which have occurred during the evolution of Baikalian amphipods. Such rearrangements are usually a result of two major mechanisms, i.e. duplication and subsequent loss of mitochondrial genome regions [112, 113] and intramitochondrial recombination . It is likely that different number and type of rearrangements led to the observed patterns of gene order and variance in the non-coding regions in the mitochondrial genomes of Baikalian amphipods. To decipher the mechanisms behind the appearance of every pattern and to predict an ancestral pattern for the mitogenomes of Baikalian amphipods a further careful investigation with a more comprehensive dataset is necessary.
For morphological identification of Baikalian amphipod species in our study we used the most modern taxonomy . This taxonomy become a result of successive revision of previous ones [13, 20–22, 24, 26, 27]. Since new families were introduced in the course of recent revisions of Baikalian amphipods, some species and genera were renamed and thus the species list differs from the one given in the previous publications [28, 30, 35, 36]. To make the phylogenetic results of our work comparable with previous studies we considered them in the context of contemporary taxonomic system that combined all taxonomy alterations .
The phylogenetic analyses of amphipod species based on 13 concatenated mitochondrial protein coding gene sequences using Bayesian Inference (BI) resulted in a well-supported tree. The Baikalian species in the BI tree form a monophyletic group and the amphi-Atlantic amphipod species G. duebeni is the nearest outgroup to the Baikalian clade (Fig. 3).
Phylogenetic inference (BI) of amphipods based on 13 mitochondrial protein-coding genes sequences. Numbers above the branches indicate Bayesian posterior probabilities. Coloured rectangles denote depths of habitats of Baikalian amphipods. Yellow rectangles...
The previous phylogenetic inferences based on the cox3 gene fragment , 18S rRNA and the cox1 gene fragments [28, 35] as well as on a combined set of molecular data (rrnL and 18S rRNA, cox1 genes fragments) and morphological characters  and the most recent study, which involved four molecular markers (cox1, EF-1α, 18S and 28S rRNA) , revealed that different species of a genus Gammarus were included into a clade of Micruropodidae family. The phylogenetic trees derived in those studies had low support in many nodes. Utilization of 13 protein coding genes of the mitochondrial genome helped us to obtain a better supported tree of Baikalian amphipods in comparison with the ones described in previous molecular studies. The relatively small number of taxa sampled in our study does not allow us to rule out the possibility that the Baikalian clade might still be split into two or more independent lineages if more species are added to the analysis, particularly if G. lacustris is included. However, presently our data suggest that the common outgroup to all Baikalian species is the nearest sister species G. duebeni. This result corroborates the recent study by Hou and Sket (2016) .
The phylogenetic tree demonstrates separation of Baikalian amphipod species into two clades where the one includes species from the Micruropodidae and Crypturopodidae families and the other includes species from the Acanthogammaridae, Pallaseidae, and Eulimnogammaridae families (Fig. 3). This clusterization is in concordance with the current view on the existence of two major amphipod lineages in Lake Baikal, which is based on morphological investigations  and molecular phylogenetic data [30–32, 35].
The phylogenetic tree obtained in our study can be used to make some assumptions about the origination of certain ecological features of Baikalian amphipods. The Baikalian amphipods C. tuberculatus, L. vortex and G. fasciatus, which comprise one of the lineages in our study (Fig. 3), are mainly shallow-water inhabitants, having smooth body and display tolerance to warm water [13, 14, 115, 116]. Ancestors of these species may have originated during the Tertiary period when the climate was warm  and therefore they may be warm-water relicts [14, 25]. The second lineage of Baikalian amphipods contains species that possess more diverse morphological features, lifestyle and ecological niches. The radiation of this lineage presumably coincided with the completion of climate cooling period in the late Eocene – early Oligocene according to Kamaltynov . This lineage is subdivided into two clades: the first one includes armored and relatively shallow water species P. kesslerii (Pallaseidae) and A. victorii (Acanthogammaridae) [13, 14], the second clade consists of two armored deep-water predator species (G. cabanisii and B. grewingkii) and a derived clade of smooth littoral amphipods of Gen. Eulimnogammarus Bazikalova, 1945, which differ from the rest by their feeding habits. The separation of species from the second lineage into two clades is also supported by morphological features, as A.victorii and P.kesslerii possess more plesiomorphic traits in comparison to the rest of the species [13, 14]. The topology of the BI tree shows that the species of Eulimnogammarus genus may have originated from an ancestor inhabiting the abyssal zone of Lake Baikal. However, it is possible that an expanded sampling of Baikalian species may yet show the different trends of origination of the extant amphipods inhabiting different depths of the lake.
The topology of the BI tree revealed the paraphyly of family Acanthogammaridae, which in our study includes A.victorii, B.grewingkii and G.cabanisii. The phylogenetic separation of species within the Acanthogammaridae family (according to the contemporary taxonomy) was also observed in previous studies, though the trees in those studies were poorly resolved [28, 30, 35]. The Acanthogammaridae includes species with diverse morphological and ecological features united on the basis of armament characteristics . However, according to Kamaltynov, the taxonomy of contemporary Acanthogammaridae still needs refinement. Some genera have features that may be sufficient for establishing new families. Additional molecular phylogenetic data including some nuclear markers and larger sampling of species from the Acanthogammaridae is necessary for the description of new families.
The detailed comparative study of mitochondrial genomes of endemic Baikalian amphipods belonging to the most diverse species flock of Lake Baikal is reported. The newly sequenced complete and nearly complete mitochondrial genomes of seven Baikalian species were presented in this study i.e. Acanthogammarus victorii, Crypturopus tuberculatus, Eulimnogammarus cyaneus, Garjajewia cabanisii, Gmelinoides fasciatus, Linevichella vortex and Pallaseopsis kesslerii.
The examined mitochondrial genome sequences were used to resolve phylogenetic relationships within the group of Baikalian amphipods. The branching order of the phylogenetic tree supports the separation of Baikalian species into two major lineages. Our data also support the paraphyly of Acanthogammaridae. The measurement of nucleotide diversity of protein-coding parts of Baikalian and non-Baikalian species allowed us to define the fragments of mitogenome most suitable for different scales of phylogenetic and/or population studies of amphipods.
The structural analysis of available Baikalian amphipod mitochondrial genomes revealed high variability in their length and gene order. Gene order of all mitochondrial genomes studied is changed relative to the pancrustacean ground pattern. Four species (G.cabanisii, E.vittatus, A.victorii and G.fasciatus) have extra tRNA genes copies. More severe rearrangements of protein coding genes and ribosomal RNA genes are found in P. kesslerii, C. tuberculatus and G. fasciatus. Multiple and unusually long intergenic spacers are found in mitochondrial genomes of B.grewingkii, P.kesslerii and G.fasciatus. Unusually high structural diversity of the amphipod mitochondrial genomes in Lake Baikal points at the possibility of high mutagenic pressure. This turns Baikalian amphipods into a potentially attractive model for studies of mitochondrial genome evolution in general.
Sampling, DNA extraction, mitochondrial genome amplification, sequencing
Amphipods species used for this study were collected in Lake Baikal from 2011 to 2013. Sampling was performed both manually at the water edge and by trawling from the ship at greater depths (Table 1). The species investigated in the current study were selected to maximally cover the range of species morphologies and ecological niches such as depth inhabited. Additional requirement was the ease of species diagnosis of the specimens.
Total DNA was extracted using modified CTAB method . Depending on size of species we used either a whole animal or its part (a leg) for DNA extraction. The species identification was carried out using taxonomic system of Kamaltynov .
Total DNA samples of three out of seven amphipod species (G. fasciatus, A. victorii and G. cabanisii) was used directly for Illumina next generation sequencing libraries preparation. Paired-end libraries with insert size of 300 b.p. and 600 b.p. were prepared according to protocols provided by manufacturer for HiSeq (TruSeq DNA Sample Prep Kit) and MiSeq (Nextera DNA Library Preparation Kit) Systems (Illumina, San Diego, CA, USA).
Alternatively, DNA samples of other four species (P. kesslerii, L. vortex, C. tuberculatus and E. cyaneus) were used for long-range amplification of mitochondrial DNA. Mitochondrial genomes sequences were amplified as two fragments overlapping between cox1 to rrnL genes (Additional file 9). Primers were designed based on the alignment of cox1 and rrnL gene sequences of Baikalian amphipods available in GenBank. Additionally the universal Folmer’s primers were used . The sequences of primers used in this study are available in Additional file 5.
Primers COI_L1 and 16S_H were used to amplify the c.a. 11Kb long (in all species) fragment spanning from cox1 to rrnL. PCR was performed using EncycloPlus PCR kit (Eurogen, Moscow, Russia). Each reaction contained 1 μl of 10× Encyclo buffer, 0.2 μl of dNTP Mixture (10 mM each), 0.5 μl of each primer, 1 μl of template DNA (10–50 ng), 0.2 μl of Encyclo polymerase mix and 6.6 μl of sterile water up to 10 μl. Amplification was carried out under following conditions: 95 °C for 1 min., followed by 30 cycles of 93 °C for 15 s., 60 °C for 15 s., 72 °C for 9 min., with final elongation at 72 °C for 3 min. Different pairs of primers were used to amplify the second fragment spanning from rrnL to cox1 in cases of different species as summarized in Additional file 5. PCR conditions were the same but the elongation times were decreased.
PCR products were purified by ethanol precipitation and were further utilized for libraries preparation using the Nextera DNA Library Preparation Kit (Illumina, San Diego, CA, USA) provided by manufacturer. The paired-end libraries with insert length of 600 b.p. were constructed and sequenced using MiSeq System (Illumina, San Diego, CA, USA).
Reads processing, mitochondrial genomes assembly and annotation
All sets of reads were cleaned from adapters, and parts with a quality score below 15 were trimmed using Trimmomatic-0.32 . De-novo assembly was carried out with SPAdes 3.0.0 assembler . Scaffolds of mitochondrial genome in the assemblies were identified using BLAST  and the reference sequences of amphipod E. verrucosus. Complete and nearly complete amphipod mitochondrial genome sequences obtained after assemblies were further used as reference sequences for mapping reads of appropriate species using Bowtie2 2.1.0 . All generated read alignment files were used for manual correction of errors in reference sequences and for estimation of coverage depth. Visualization of read alignment files was made by Tablet 1.13.07.31 .
An automatic annotation of mitochondrial genome sequences was performed using the MITOS pipeline with default settings . GenomeView browser  was used for visualization of annotation files and manual correction of gene boundaries. PCGs boundaries were verified by comparison with the orthologs of other amphipods and also taking into account adjacent tRNA genes positions . rRNA genes boundaries were identified by comparison to mitochondrial rRNA genes sequences of other amphipods. tRNA genes and their secondary structures were predicted by MiTfi  as part of the MITOS pipeline .
Protein-coding genes analyses
Nucleotide composition and codon usage were calculated by The Sequence Manipulation Suite . Effective number of codons was assessed using INCA 2.1 . AT and GC skew of entire mitochondrial sequences were calculated using the following formulae: AT-skew = (A-T)/(A + T) and GC-skew = (G-C)/(G + C) . Visualization of AT-skew and GC-skew plots as well as AT content plot for (+) strand were made by a custom Python script with a sliding window of 100 b.p. with steps of 10 b.p. Nucleotide diversity (Pi) was estimated for every protein-coding gene of Baikalian and non-Baikalian amphipods (see the list of species in Additional file 5) using DNAsp v.5 .
Protein coding nucleotide sequences were used for the phylogenetic inferences. Non-Baikalian amphipods were represented in the dataset by a single sequence for each species available in GeneBank by March 2016 (Additional file 5).
Each protein coding gene sequence set was aligned separately in codon-based fashion with TranslatorX web server  using ClustalW algorithm and then the sets were concatenated using Seaview 4.5.4 . A resultant alignment contained 11,013 characters.
The best model of nucleotide substitution (GTR + I + G) was chosen with jModelTest . The phylogenetic trees were built by MrBayes v. 3.2.1. . Four independent runs of four MCMC chains were performed. Chains were run for five million generations, the first 30% of generations were discarded as burn-in. The resultant phylogenetic tree was visualized in FigTree v.1.4.2. . Metacrangonictidae clade was used as a root of the tree.
Computational experiments have been performed with HPC cluster “Academician V.M. Matrosov” (Irkutsk Supercomputer Centre of SB RAS, http://hpc.icc.ru). We are grateful to the three anonymous reviewers for their valuable comments and suggestion that helped us to improve the manuscript.
The work was supported by the governmentally funded project No 0345-2014-0005; Russian Foundation for Basic Research (Grants no. 15-04-05841, 15-07-03827 and 15-04-03848). Phylogenetic analysis in this study was supported by the Russian Science Foundation (Grant Number: 14-50-00029). The publication cost of this article has been funded by the Russian Foundation for Basic Research.
Availability of data and materials
All data generated or analysed during this study are included in this published article and its supplementary information files.
EVR and DYS are responsible for conception and design of the study. EVR, DYS and EAS carried out taxonomic sampling. RMK performed the morphological identification of species. EVR performed the laboratory works. MDL performed the library preparation for Illumina Sequencing. KVM and EVR are responsible for assembly and annotation of mitochondrial genomes. DYS, VVA, RMK, EAS, AVG and ASA participated in statistical and phylogenetic analyses. EVR wrote the manuscript. All authors read and approved the final version of the manuscript.
The authors declared that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
|ENC||Effective number of codons|
Additional file 1:(16M, pdf)
AT content plots for the (+) strand of mitochondrial genomes of Baikalian amphipods under study. The beginning of every graph corresponds to the start of the cox1 gene. Dashed lines indicate the places of disruptions in incompletely sequenced mitochondrial genomes. The putative CRs are marked by the colored boxes. (PDF 16,893 kb)Additional file 2:(18M, pdf)
AT-skew and GC-skew plots for the (+) strand of mitochondrial genomes of Baikalian amphipods under study. The beginning of every graph corresponds to the start of the cox1 gene. Dashed lines indicate the places of disruptions in incompletely sequenced mitochondrial genomes. (PDF 18,605 kb)Additional file 3:(29K, xlsx)
Organization of the mitochondrial genomes of Baikalian amphipods under study. The incomplete stop codons are labeled in the tables with parentheses. (XLSX 28 kb)Additional file 4:(18K, xlsx)
Summary of codon usage in protein coding genes of mitochondrial genomes of studied Baikalian amphipods. Bold notations indicate species with incomplete protein coding sequences. Sign “#” indicates total number of certain codon in protein-coding sequences of every species; “%” indicates percent of certain codon in total coding sequence in every species. (XLSX 17 kb)
Although adequate resolution of higher-level relationships of organisms apparently requires longer DNA sequences than those currently being analyzed, limitations of time and resources present difficulties in obtaining such sequences from many taxa. For fishes, these difficulties have been overcome by the development of a PCR-based approach for sequencing the complete mitochondrial genome (mitogenome), which employs a long PCR technique and many fish-versatile PCR primers. In addition, recent studies have demonstrated that such mitogenomic data are useful and decisive in resolving persistent controversies over higher-level relationships of teleosts. As a first step toward resolution of higher teleostean relationships, which have been described as the “(unresolved) bush at the top of the tree,” we investigated relationships using mitogenomic data from 48 purposefully chosen teleosts, of which those from 38 were newly determined during the present study (a total of 632,315 bp), using the above method. Maximum-parsimony and maximum-likelihood analyses were conducted with the data set that comprised concatenated nucleotide sequences from 12 protein-coding genes (excluding the ND6 gene and third codon positions) and 22 transfer RNA (tRNA) genes (stem regions only) from the 48 species. The resultant two trees from the two methods were well resolved and largely congruent, with many internal branches supported by high statistical values. The tree topologies themselves, however, exhibited considerable variation from the previous morphology-based cladistic hypotheses, with most of the latter being confidently rejected by the mitogenomic data. Such incongruence resulted largely from the phylogenetic positions or limits of long-standing problematic taxa, which were quite unexpected from previous morphological and molecular analyses. We concluded that the present study provided a basis of and guidelines for future investigations of teleostean evolutionary mitogenomics and that purposeful higher-density taxonomic sampling, subsequent sequencing efforts, and phylogenetic analyses of their mitogenomes may be decisive in resolving persistent controversies over higher-level relationships of teleosts, the most diversified group of all vertebrates, comprising over 23,500 extant species.
Recently, much attention has been paid to evolutionary genomics as a postgenomics biology (Easteal 2000 ). In fact, application of genome technologies (e.g., whole-genome shotgun sequencing; Venter, Smith, and Hood 1996 ) capable of producing sequence data in large quantities is now expected in this field (Pollock et al. 2000 ). Pollock et al. (2000) designed a method for applying such genome technologies to evolutionary genomics, using vertebrate mitochondrial genomes (mitogenomes) as examples, demonstrating its feasibility on the basis of simulation experiments. In addition, Pollock et al. (2000) convincingly listed many potential benefits as primary products of such large-scale evolutionary genomic sequencing, with one of the most promising being demonstrably more accurate phylogenetic reconstructions with unprecedentedly long DNA sequences from many taxa. For large-scale evolutionary genomics involving multiple laboratories in different countries, these ideas appear to be sound and far-reaching. For moderate-scale evolutionary genomics (e.g., within a specific group of vertebrates), however, a more practical method that is feasible in a single laboratory or a few laboratories would be desirable, owing to various resource limitations and depending on the purposes of the studies.
For fish mitogenomes, one practical method is a PCR-based approach that employs a long PCR technique and many versatile PCR primers and is accurate and faster than sequencing cloned mitochondrial DNA (mtDNA) (Miya and Nishida 1999 ; for bird mitogenomes, see Sorenson et al. 1999 ). In fact, complete mtDNA sequences for 12 teleost species have already been determined using this method (Miya and Nishida 1999, 2000b ; Inoue et al. 2000, 2001a, 2001b, 2001c, 2001d ; Ishiguro, Miya, and Nishida 2001 ; Kawaguchi, Miya, and Nishida 2001 ). Furthermore, including the additional species determined during the present study, the number of complete mtDNA sequences determined using this method (total 50) has become compatible with those determined in other studies for whole vertebrates, including fishes (total 69; Pollock et al. 2000 ), indicating a certain utility of our approach in moderate-scale evolutionary genomic studies. Thus, technical difficulties in obtaining mitogenomic data from various fishes within a relatively short period of time had been overcome, although there still remained problems as to whether or not they were suitable for inferring phylogenies, thus providing a basis for further comparative evolutionary analyses.
To address this problem, Miya and Nishida (2000b) explored the phylogenetic utility and limits of individual and concatenated mitochondrial genes for reconstructing higher-level relationships, using the complete mtDNA sequences of eight teleosts (of noncontroversial relative phylogenetic positions). They demonstrated that nucleotide sequences from concatenated protein-coding (no third codon positions) plus transfer RNA (tRNA; stem regions only) genes (hereinafter called “mitogenomic data”) were most able to reproduce the expected phylogeny of teleosts with high statistical support, unlike most individual genes. Accordingly, mitogenomic data can be expected to resolve the persistent controversies over the higher-level relationships of teleosts. Subsequently, Inoue et al. (2001d) resolved the interrelationships of five major lineages of basal teleosts (Osteoglossomorpha, Elopomorpha, Clupeomorpha, Ostariophysi, and Protacanthopterygii, given various rankings), for which five alternative phylogenetic hypotheses had previously been proposed on the basis of both morphological and molecular data, using mitogenomic data. Thus, the mitogenomic data not only are able to reproduce the expected phylogeny of teleosts (Miya and Nishida 2000b ), but can also resolve specific phylogenetic questions for higher-level relationships of teleosts (Inoue et al. 2001d ), the most diversified group of all vertebrates, comprising over 23,500 extant species (J. S. Nelson 1994 ).
Higher teleosts have been collectively called Acanthomorpha, Acanthopterygii, or Percomorpha, depending on their limits (fig. 1 ; Rosen 1973 ; Lauder and Liem 1983 ; J. S. Nelson 1984, 1994 ; Stiassny 1986 ; Stiassny and Moore 1992 ; Johnson 1993 ; Johnson and Patterson 1993 ; Stiassny, Parenti, and Johnson 1996 ). If higher teleosts are equated with the most comprehensive group, Acanthomorpha (Rosen 1973 ), they comprise about 14,650 species placed in 20 orders, 296 families, and 2,615 genera (calculated from J. S. Nelson 1994 ). Their interrelationships have long been controversial and so complex that G. Nelson (1989) described them as the “(unresolved) bush at the top of the tree,” which is still evident, as seen in figure 1 . A strict consensus tree (fig. 1C ) generated from two recently proposed hypotheses on higher teleostean relationships (Johnson and Patterson 1993 [fig. 1A] ; J. S. Nelson 1994 [fig. 1B] ) clearly included unresolved “bushes” (fig. 1C ). As a first step toward resolution of higher teleostean phylogenies containing such enormous taxonomic diversity, this study attempted (1) to circumscribe a well-supported monophyletic group encompassing such “bushes” (=Percomorpha) and (2) to determine the phylogenetic position of such a monophyletic group relative to other major lineages using mitogenomic data. By doing so, we hoped to clarify where the phylogenetic problems lay, thus providing a basis and guidelines for subsequent resolution of the “bushy top.” Clearly, this study was not intended to resolve intrarelationships of the bushy top, as such an investigation would require wider and more extensive taxonomic sampling.
Materials and Methods
For resolution of a complex phylogeny with enormous taxonomic diversity, such as that seen in higher teleosts (fig. 1 ), it is essential to conduct purposeful taxonomic sampling that increases phylogenetic accuracy (Hillis 1998 ). We employed two taxonomic sampling strategies, individually or in combination, according to the following recommendation: (1) “select taxa within the monophyletic group of interest that will represent the overall diversity of the group” (strategy 3 in Hillis 1998 ), and (2) “select taxa within the monophyletic group of interest that are expected (based on current taxonomy or previous phylogenetic studies) to subdivide long branches in the initial tree” (strategy 4 in Hillis 1998 ). Hillis (1998) stated that careful addition of taxa to ensure coverage of the group of interest and to purposefully break up long branches (a combination of strategies 3 and 4) seemed to be the optimal taxonomic sampling strategy.
For circumscription of the upper bushes, we chose at least one species each to represent all of the major lineages above Paracanthopterygii (=Acanthopterygii; fig. 2 ); of those major lineages that occupied the three most basal positions in either Johnson and Patterson's (1993) or J. S. Nelson's (1994) hypotheses (see fig. 1A and B ; Mugiliformes, Atherinomorpha, Stephanoberyciformes, Beryciformes, and Zeioidei), we added one to three species so as to locate their phylogenetic positions more correctly, assuming possible cases in which they would be placed outside the upper bushes. For more accurate determination of the relative phylogenetic positions of the upper bushes, we chose two to three species from all major lineages up to the Paracanthopterygii in order to break up possible long branches (fig. 2 ). Final rooting was done using a clupeid, Sardinops melanostictus, with reference to the recent mitogenomic analysis of basal teleostean phylogeny by Inoue et al. (2001d) . All species used in this study are listed in table 1 , along with references and DDBJ/EMBL/GenBank accession numbers.
A portion of the epaxial musculature (ca. 0.25 g) was excised from fresh specimens of each species and immediately preserved in 99.5% ethanol. Total genomic DNA was extracted using the Qiagen QIAamp tissue kit following the manufacturer's protocol.
Mitochondrial DNA Purification by Long PCR
The mitogenomes of the 38 species were amplified in their entirety using a long PCR technique (Cheng et al. 1994 ; Miya and Nishida 1999 ). Seven fish-versatile long PCR primers (S-LA-16S-L, L2508-16S, L12321-Leu, H12293-Leu, H15149-CYB, H1065-12S, and S-LA-16S-H; for locations and sequences of these primers, see Miya and Nishida 2000b ; Inoue et al. 2000, 2001d ; Ishiguro, Miya, and Nishida 2001 ; Kawaguchi, Miya, and Nishida 2001 ) were used in various combinations so as to amplify the entire mitochondrial genome in a single reaction or two reactions. When long PCR using the above primers proved to be unsuccessful for a certain segment, species-specific primers were alternatively designed with reference to the partial nucleotide sequences from either 16S rRNA, ND5, or cytochrome b (cyt b) genes, determined from total DNA with the fish-versatile primers listed below.
Long PCR was done in a Perkin-Elmer Model 9700 thermal cycler, with reactions being carried out with 30 cycles of a 25-μl reaction volume containing 13.25 μl sterile distilled H2O, 2.5 μl 10× LA PCR buffer (Takara), 4.0 μl dNTP (4 mM), 2.5 μl each primer (5 μM), 0.25 μl of 2.5 U LA Taq (Takara), and 1 μl template. The thermal cycle profile was that of “shuttle PCR”: denaturation at 98°C for 10 s, with annealing and extension combined at the same temperature (68°C) for 16–20 min. Long PCR products were electrophoresed on a 1.0% L 03 agarose gel and later stained with ethidium bromide for band characterization via ultraviolet transillumination. The long PCR products were diluted with sterile TE buffer (1:10–100) for subsequent use as PCR templates.
PCR and Sequencing
A total of 182 fish-versatile PCR primers were used in various combinations to amplify contiguous, overlapping segments of the entire mitochondrial genome for each of the 38 species (for primer locations and sequences of these primers, see Miya and Nishida 1999, 2000b ; Inoue et al. 2000, 2001a, 2001b, 2001c, 2001d ; Ishiguro, Miya, and Nishida 2001 ; Kawaguchi, Miya, and Nishida 2001 ). Species-specific primers were designed in cases in which no appropriate fish-versatile primers were available. A list of PCR primers used for a specific species is available from M.M. on request.
PCR was done in a Perkin-Elmer Model 9700 thermal cycler, and reactions were carried out with 30–32 cycles of a 15-μl reaction volume containing 8.3 μl sterile, distilled H2O, 1.5 μl 10× PCR buffer (Takara), 1.2 μl dNTP (4 mM), 1.5 μl of each primer (5 μM), 0.07 μl Taq DNA polymerase (Z Taq, Takara), and 1 μl template (diluted long PCR products). All PCR products overlapped by approximately 50–300 bp.
The thermal cycle profile was as follows: denaturation at 98°C for 1 s; annealing at 50–53°C, depending on primer specificity, for 5 s; and extension at 72°C for 10–20 s, depending on the expected size of the PCR products. The PCR products were electrophoresed on a 1.0% L 03 agarose gel (Takara) and stained with ethidium bromide for band characterization via ultraviolet transillumination.
Double-stranded DNA products, purified using a Pre-Sequencing Kit (USB), were subsequently used for direct cycle sequencing with dye-labeled terminators (Applied Biosystems Inc.). Primers used were the same as those for PCR. All sequencing reactions were performed according to the manufacturer's instructions. Labeled fragments were analyzed on a Model 373S/377 DNA sequencer (Applied Biosystems Inc.). All DNA sequence electropherograms were carefully checked to see whether or not they included the mitochondrial pseudogenes in the nuclear genome (for checkpoints, see Mindell et al. 1999 ). As pointed out by Mindell et al. (1999) , features that were consistent with mitochondrial origin were (1) the presence of a conserved reading frame in protein-coding genes among all taxa, with decreasing rates of variability at third, first, and second codon positions, respectively; (2) the absence of extra stop codons, frameshifts, or unusual amino acid substitutions; and (3) no sequence changes resulting in losses of known secondary structure in tRNA and rRNA genes.
The DNA sequences were edited with EditView, version 1.0.1; AutoAssembler, version 2.1 (Applied Biosystems); and DNASIS, version 3.2 (Hitachi Software Engineering). Individual gene sequence alignments for the 48 teleosts were initiated with Clustal X (Thompson et al. 1997 ) with default gap penalties and adjusted manually using DNASIS. Amino acids were used for alignments of the protein-coding genes and secondary-structure models (Kumazawa and Nishida 1993 ) for alignment of tRNA genes. Since unambiguous alignments of the two rRNA genes (12S and 16S) on the basis of secondary-structure models (e.g., Miya and Nishida 1998, 2000a ; Yamaguchi et al. 2000 ) were not feasible, they were not used in the analyses. The ND6 gene was not used in the phylogenetic analyses because of its heterogeneous base composition and consistently poor phylogenetic performance (Zardoya and Meyer 1996 ; Miya and Nishida 2000b ). Also, tRNA loops and other ambiguous alignment regions, such as the 5′ and 3′ ends of several protein-coding genes, were excluded from the analyses. In addition, third codon positions in the protein-coding genes that would positively mislead an analysis of higher-level relationships of teleosts (Miya and Nishida 2000b ) were excluded from the analyses, leaving 7,002 and 908 available nucleotide positions from the 12 protein-coding and 22 tRNA genes, respectively. Amino acid sequences were not used in phylogenetic analysis, as the resulting trees were not well resolved, with many internal branches being supported by relatively low bootstrap values compared with those obtained by nucleotide sequences. Also, there was no noticeable improvement in tree statistics or bootstrap values when greater weight was given to amino acid replacements, which required more nucleotide changes, by using the PROTPARS weight matrix for amino acids provided in MacClade (Maddison and Maddison 1992 ). Similarly, although Naylor and Brown (1997,1998) suggested that isoleucine (I), leucine (L), and valine (V) were the amino acids responsible for phylogenetic inconsistency, exclusion of these three amino acids through conversion of L and V into I from the data matrix (see Cao et al. 1998 ; Mindell et al. 1999 ) did not improve the tree statistics or the bootstrap values.
MacClade, version 3.08 (Maddison and Maddison 1992 ) was used in various phases of the phylogenetic analyses, such as preparing data matrices in NEXUS format, exporting tree files, and exploring alternative tree topologies. Aligned sequence data in NEXUS format are available from M.M. on request.
All phylogenetic analyses were performed using PAUP 4.0b4a (Swofford 1998 ). Heuristic maximum-parsimony (MP) analyses were conducted with tree bisection-reconnection branch swapping and 100 random-addition sequences. Support for internal branches was assessed using 500 bootstrap replications, with 20 random-addition sequences performed in each replication. All phylogenetically uninformative sites were ignored. Gaps were considered as missing data rather than as fifth characters, to prevent those longer than one or two bases from being taken as representing multiple events (Swofford 1993 ).
Heuristic maximum-likelihood (ML) analyses were conducted to determine the statistically most likely phylogeny with the following parameters: substitution model set at transition/transversion ratio = 2; the HKY85 (Hasegawa, Kishino, and Yano 1985 ) two-parameter model variant for unequal base frequencies; empirical base frequencies; starting branch lengths obtained using the Rogers-Swofford approximation method; and molecular clock not enforced. Nucleotide positions, including gaps, were all excluded.
Tests of alternative phylogenetic hypotheses were accomplished using the constraint tree option in PAUP 4.0b4a (Swofford 1998 ). Differences in tree topologies were compared between the unconstrained and the constrained MP trees, with tree length differences being statistically evaluated using the Templeton (1983) test implemented in PAUP 4.0b4a (Swofford 1998 ).
The complete L-strand nucleotide sequences from the mitogenomes of the 38 species (except for a portion of the putative control region and a few tRNA genes for some species; see table 1 ) have been registered in DDBJ/EMBL/GenBank under accession numbers AP02915–AP02952. The genome content of the 38 species included 2 rRNA, 22 tRNA, and 13 protein-coding genes, plus the putative control region, as found in other vertebrates (Lampris guttatus tRNAThr and tRNAPro genes located downstream of the cyt b gene could not be sequenced owing to technical difficulties). As in other vertebrates, most genes were encoded on the H-strand, except for the ND6 and eight tRNA genes, which were encoded on the L-strand.
Although the gene arrangements of most species were identical to those in typical vertebrates, those of five species (Sigmops gracile, Chauliodus sloani, Myctophum affine, Diaphus splendidus, and Caelorinchus kishinouyei) were unique among vertebrate mitogenomes. To date, only two patterns of gene rearrangements have been found in teleosts (S. gracile [=Gonostoma gracile] [Miya and Nishida 1999] ; Conger myriaster and their allies [Inoue et al. 2001c] ). Notwithstanding, the five species listed above exhibited gene arrangements that differed completely from the two examples. These unique mitogenomes will be described elsewhere, with discussions of their phylogenetic utility and the possible mechanisms generating such gene arrangements.
Heuristic MP analysis of the nucleotide sequences from the concatenated 12 protein-coding (no third codon positions) and 22 tRNA (stem regions only) genes yielded a single most-parsimonious tree (fig. 3 ) with a length of 22,932 steps (consistency index = 0.245; retention index = 0.345; rescaled consistency index = 0.087). Many internal branches were supported by moderate (60%–79%) to high (80%–100%) bootstrap values, except for a within-monophyletic group that was taken to represent upper bushes (fig. 3 ), with the observation of many unbisected long branches (fig. 4 ) probably being due to the lower-density taxonomic sampling compared with their taxonomic diversity (fig. 2 ).
ML analysis of the same data set using the HKY85 model of sequence evolution produced a tree similar to that found in the MP analysis (fig. 4 ), with −ln L = 126,245.01. Topological differences were minor, being observed where bootstrap values in the MP tree were around or lower than 50% (fig. 3 ). No statistically significant differences were found between the two topologies (z = −0.756, P = 0.449).
Molecular studies have been expected to prove decisive in resolving persistent controversies over higher-level relationships of teleosts (J. S. Nelson 1984 ; G. Nelson 1989 ), although they have not yet fulfilled their promise (Stepien and Kocher 1997 ; Miya and Nishida 2000b ). Indeed, an earlier study using partial amino acid sequences from the three mitochondrial protein-coding genes (Normark, McCune, and Harrison 1991 ) suggested an unorthodox tree (e.g., nonmonophyletic teleosts) that Patterson, Williams, and Humphries (1993) criticized as “goofy” from the morphologist's point of view. Bernardi et al. (1993) and Rubin and Dores (1995) analyzed amino acid sequences of growth hormones on the basis of different data sets, including 25 and 24 teleosts, respectively, comparing them with those from two outgroups, and reported that the resulting MP trees agreed well with the morphology-based trees. Lê, Lecointre, and Perasso (1993) analyzed nuclear 28S rRNA sequences from 31 gnathostomes (including 18 teleosts) and found a highly supported node within the teleosts (Clupeomorpha + Ostariophysi) that was incongruent with the morphology-based tree. Using entire mitochondrial cyt b gene sequences from 31 fishes (including 30 teleosts), Lydeard and Roe (1997) reported that the resulting MP trees were largely congruent with the morphology-based tree, although some incongruities (e.g., paraphyletic Neoteleostei) were observed. With the exception of the sister relationship of Clupeomorpha + Ostariophysi (Lê, Lecointre, and Perasso 1993 ), however, no novel molecular phylogenetic hypotheses have been considered significant in recent studies dealing with higher-level relationships among major teleostean lineages (Lecointre and Nelson 1996 ; Johnson and Patterson 1996 ). Also, it should be noted that Meyer (1994) and Ortí and Meyer (1997) explicitly demonstrated limits of partial mitochondrial gene sequences for resolving higher-level relationships of teleosts. Recently, Wiley, Johnson, and Dimmick (2000) employed a total-evidence approach in analyzing acanthomorph relationships, combining partial nucleotide sequences of the mitochondrial 12S rRNA (572 bp) and nuclear 28S rRNA (1,112 bp) genes and morphological data (38 characters) from 27 species, thereafter subjecting the combined data to MP analyses. Although they had expected that a combination of mitochondrial and nuclear ribosomal genes would provide a strong database from which they could evaluate acanthomorph relationships, their expectation was only partly met. Their molecular data seemed to be unsuitable for such an analysis of higher-level relationships owing to the high level of saturation observed in both the highly variable mitochondrial 12S rRNA and the extremely conservative nuclear 28S rRNA genes (Wiley, Johnson, and Dimmick 2000 ). Consequently, their molecular data were of little help in evaluating alternative hypotheses unless used in a total-evidence context, as the authors frankly acknowledged.
It appears that adequate resolution of higher-level relationships of any organism will require longer DNA sequences (Miya and Nishida 2000b ). If so, the question remains as to which types of genes or genomes are useful in reconstructing the higher-level relationships of teleosts. It remains unclear whether or not mitochondrial or nuclear genes are generally more efficacious for such a purpose, although the transition from an apparently unsolvable problem to a solvable problem has come about mainly from the availability of complete mtDNA sequences in mammals (Penny et al. 1999 ; Springer et al. 1999 ). Additional complete mtDNA sequences have also been expected to be useful in answering some questions about fish relationships (Stepien and Kocher 1997 ). Inoue et al. (2001d) , in fact, resolved the interrelationships of five major lineages of basal teleosts, for which five alternative phylogenetic hypotheses had been proposed on the basis of both morphological and molecular data, using mitogenomic data. Accordingly, mitogenomic data can be expected to resolve the persistent controversies over higher teleostean relationships, which are apparently more complex and difficult than those for basal teleosts (G. Nelson 1989 ). Below is a brief discussion of phylogenetic issues for the higher teleosts from the standpoint of both global and local taxonomic congruence between the MP tree obtained in this study (fig. 3 ; statistically indistinguishable from the ML tree) and previously proposed hypotheses, most of them based on morphological analyses. Our discussions of phylogenetic issues are restricted to recent major contributions and are not intended to be exhaustive.
Recently, two major hypotheses on higher teleostean relationships have been proposed (Johnson and Patterson 1993 [fig. 1A] ; J. S. Nelson 1994 [fig. 1B] ). Johnson and Patterson (1993) presented a cladogram (their fig. 24) summarizing their “views” on acanthomorph interrelationships on the basis of their own extensive, comparative anatomical survey. Although they provided many putative synapomorphies (a total of 34 characters) substantiating their hypothesis, along with detailed discussions on their validity, they did not construct a character matrix including all of the species that they examined. Alternatively, their “views” were corroborated by MP analysis of an abbreviated character matrix comprising 39 anatomical features taken from 16 acanthomorph species (their fig. 25). Johnson and Patterson (1993 , p. 621) stated that the number of taxa examined by them and ideally included in their sample far exceeded the limits of available parsimony programs at that time. Major topological differences between the present MP tree (fig. 3 ) and that of Johnson and Patterson (1993) were in the placements of zeiforms, stephanoberyciforms, and beryciforms (for details, see below). When topological constraints of Johnson and Patterson's (1993) tree (those of Johnson  and Olney, Johnson, and Baldwin  being followed below Lampridiformes) were enforced, two minimum-length trees that required an additional 413 steps were found, with the differences being highly significant (z = −11.847, P < 0.0001; z = −10.428, P < 0.0001).
J. S. Nelson (1994) subsequently presented different views on higher teleostean relationships in his comprehensive guide to fish classification (now in its third edition), which has been used as a standard reference and has been highly influential in systematic ichthyology. Although his hypothesis was presented in the form of two separate, sequential cladograms (J. S. Nelson 1994 , pp. 194, 215), he failed to provide any relevant evidence or corroborative statements for such relationships. His hypothesis seemed to have been based on the conventionally accepted views on acanthomorph relationships advocated by Rosen (1973) and subsequently reviewed by Lauder and Liem (1983) , with partial modifications being made where he agreed with Johnson and Patterson's (1993) radically different views. When topological constraints of J. S. Nelson's (1994) tree were enforced, one minimum-length tree that required an additional 588 steps was found, with the difference being highly significant (z = −13.977, P < 0.0001).
Basal interrelationships of the four major lineages (fig. 5 ) generally followed previously proposed hypotheses (Rosen 1973 ; Lauder and Liem 1983 ; Fink 1984 ; Johnson 1992 ), with Ostariophysi, Protacanthopterygii, Stomiiformes, and Aulopiformes sequentially diverging in this sequence. Notwithstanding, the monophyly of each taxon, particularly Ostariophysi and Protacanthopterygii, should be substantiated on the basis of more extensive taxonomic sampling, considering their heterogeneous compositions. It should be noted that two comprehensive major clades within the teleosts, Neoteleostei and Eurypterygii, were strongly supported, with bootstrap values of 93% and 96%, respectively.
Although Olney, Johnson, and Baldwin (1993) thought that Ateleopodiformes formed an unresolved trichotomy with Stomiiformes and more advanced neoteleosts (fig. 1 ), the former did not occupy a position close to the stomiiforms in the present tree (fig. 5 ), instead forming a sister group relationship with Lampridiformes (fig. 6 ), with a moderate bootstrap value (76%). This sister group relationship should not be entirely unexpected by systematic ichthyologists, because present ateleopodiforms (=ateleopodids) had previously been included in lampridiforms (J. S. Nelson 1976, 1984 ). Although Olney, Johnson, and Baldwin's (1993)