Electronic Supplementary Material Methods (a) Species sampling

Дата канвертавання25.04.2016
Памер25.6 Kb.
Electronic Supplementary Material


(a) Species sampling

Zalmoxidae were collected over multiple collecting trips mostly by P.P.S., including Indonesia (2006), New Caledonia (2007), Fiji (2008), Palau (2010), the Philippines (2010), and Australia (2011). Numerous museum collections of Zalmoxidae throughout their known range were included, with the exception of Mauritius and the Seychelles Islands. Data collected in a previous systematic study (1) were accessed from GenBank and/or updated with new sequences. Collected specimens were preserved in 96% EtOH and stored at -80 ºC. The list of specimens, including voucher numbers, GenBank accession codes, and collection details, is found in electronic supplementary material, table S1.

(b) Molecular methods

Total DNA was extracted from the legs of animals using Qiagen’s DNEasy tissue kit (Valencia, CA, USA). Purified genomic DNA was used as a template for PCR amplification. Molecular markers consisted of two nuclear ribosomal genes (18S and 28S rRNA), two nuclear protein-encoding genes (histones H3 and H4), and two mitochondrial protein-encoding genes (cytochrome c oxidase subunit I and cytochrome b). Primer sequences and fragment lengths are as in a previous study [1].

Polymerase chain reactions (PCR), visualization by agarose gel electrophoresis, and direct sequencing were conducted as described in a previous study [2]. Chromatograms obtained from the automatic sequencer were read and sequences assembled using the sequence editing software Sequencher (Gene Codes Corporation, Ann Arbor, MI, USA). Sequence data were edited in Se-Al ver. 2.0a11 [3].
(c) Phylogenetic analyses

Maximum likelihood (ML) and Bayesian inference (BI) analyses were conducted on static alignments, which were inferred as follows. Sequences of ribosomal genes were aligned using MUSCLE ver. 3.6 [4] with default parameters, and subsequently treated with GBlocks v. 0.91b [5] to cull positions of ambiguous homology. Sequences of protein encoding genes were aligned using MUSCLE ver. 3.6 with default parameters as well, but alignments were confirmed using protein sequence translations prior to treatment with GBlocks ver. 0.91b. The size of data matrices for each gene prior and subsequent to treatment with GBlocks ver. 0.91b is provided in electronic supplementary material, table S2.

ML analysis was conducted using RAxML ver. 7.2.7 [6] on 40 CPUs of a cluster at Harvard University, FAS Research Computing (odyssey.fas.harvard.edu). For the maximum likelihood searches, a unique GTR model of sequence evolution with corrections for a discrete gamma distribution (GTR + ) was specified for each data partition, and 500 independent searches were conducted. Nodal support was estimated via the rapid bootstrap algorithm (1000 replicates) using the GTR-CAT model [7]. Bootstrap resampling frequencies were thereafter mapped onto the optimal tree from the independent searches.
BI analysis was performed using MrBayes ver. 3.1.2 [8] with a unique model of sequence evolution with corrections for a discrete gamma distribution and a proportion of invariant sites specified for each partition, as selected in Modeltest ver. 3.7 [9,10] under the Akaike Information Criterion [11]. Model implementation for each dataset in indicated in electronic supplementary material, table S3. Default priors were used starting with random trees, and four runs, each with three hot and one cold Markov chains, were performed until the average deviation of split frequencies reached <0.01 (4  107 generations). After burn-in samples were discarded, sampled trees were combined in a single majority consensus topology, and the percentage of nodes was taken as posterior probabilities.
Parsimony analyses were based on a direct optimization (DO) approach [12] using the program POY ver. 4.1.2 [13]. Tree searches were performed using the timed search function in POY, i.e., multiple cycles of (a) building Wagner trees, (b) subtree pruning and regrafting (SPR), (c) tree bisection and reconnection (TBR), (d) ratcheting [14], and (e) tree-fusing [15,16], on 40 CPUs of a cluster at Harvard University, FAS Research Computing (odyssey.fas.harvard.edu). Timed searches of 24 hours were run for the individual and combined analyses of all molecules under a mixed parameter set, such that ribosomal genes were weighted using the parameter set 3221 (indel opening cost = 3; indel extension cost = 1; transversions = transitions = 2) and protein-encoding genes were weighted using the parameter set 121 (indel cost = 2; transversion cost = 2; transition cost = 1). The design of this parameter set follows previous exploration of Opiliones datasets [17].
Two iterative rounds of tree-fusing, taking all input trees from the timed search, were conducted for the combined analysis of molecular data under the mixed parameter set. Thereafter, the input trees from the timed search and the optimal trees from tree-fusing were subjected to a 6-hour timed search as before. After a third round of tree-fusing, all previous input trees, the optimal trees from tree-fusing, and the optimal trees from the short timed search were subjected to another 24-hour timed search. Finally, the trees from each previous step were subjected to 20 rounds of tree fusing under the mixed parameter set to check for heuristic stability [18]. Nodal support for the optimal parameter set was estimated via jackknifing (250 replicates) with a probability of deletion of e-1 [19].
(d) Estimation of divergence times

Ages of clades were inferred using BEAST ver. 1.6.1 [20,21]. We specified a unique GTR model of sequence evolution with corrections for a discrete gamma distribution and a proportion of invariant sites (GTR +  + I) for each partition (as with BI analysis).

Divergence time calibration drew upon a previous study of the suborder Laniatores [22], wherein molecular dating was conducted using the same methodology and constrained using fossil taxa. In the present study, we took the 95% HPD intervals to constrain three nodes: the superfamily Zalmoxoidea, the superfamily Samooidea, and the split between the two superfamilies (the root). We used normal distribution priors for the three nodes to characterize the calibrations, upon observation of Gaussian posterior distributions for these nodes’ age estimates from the previous study [22].
An uncorrelated lognormal clock model was inferred for each partition, and a Yule speciation process was assumed for the tree prior. We selected the uncorrelated lognormal model because its accuracy is comparable to an uncorrelated exponential model, but it has narrower 95% highest posterior density intervals. Additionally, the variance of the uncorrelated lognormal model can better accommodate data that are already clock-like [20]. Priors were sequentially optimized in a series of iterative test runs; the command files are available upon request from the authors. Four Markov chains were run for 108 generations, sampling every 104 generations. Convergence diagnostics were assessed using Tracer ver. 1.5 [23].
However, use of “secondary” calibrations (i.e., transitive use of divergence time estimates across studies), and in particular with errorless point calibrations, has been criticized for engendering spurious estimates [24, 25]. Additionally, normal distribution priors can sometimes be inappropriate for dating analyses, particularly if the position of a calibrator along a branch length is unknown [26]. To test the appropriateness of the secondary calibrations we employed from the previous study, we constructed a 228-taxon dataset, combining all 147 focal taxa with those employed in our previous study [22], wherein all known families of Laniatores, as well as representatives of the other three suborders, are sampled. We estimated divergence times in BEAST with the same number of Markov chains and generations as for the 147-taxon dataset. Subsequent to alignment and treatments with GBlocks ver. 0.91b (as above), the resulting 228-taxon matrix was smaller than the original dataset (6172 versus 6563 nucleotide positions), owing to sequence variability across Opiliones. We used fossil taxa to calibrate divergence times, as follows. We constrained the age of Eupnoi to 410 Ma using the crown group Devonian harvestman Eophalangium sheari [27]; a normal distribution with a standard deviation of 5 Myr was applied to this node to account for uncertainty in estimation of the fossil age. Dyspnoi were constrained using the Carboniferous fossils Eotrogulus fayoli and Nemastomoides elaveris [27] Given that the most recent common ancestor of Dyspnoi could be older than the age of these fossils (each represents an extant superfamily), we used a lognormal distribution prior for the root of Dyspnoi, permitting its age to predate the Carboniferous taxa (mean in real space of 300 Ma, offset of 2). We subsequently compared estimates from this dataset to the 147-taxon dataset.
Tree files of estimated ages and 95% highest posterior density (HPD) intervals for both the 147-taxon dataset and the 228-taxon dataset have been deposited in TreeBase.
(e) Ancestral range reconstruction

To maintain comparability between the BEAST, ML, BI, and DO topologies, we used divergence time estimates from the 147-taxon dataset for biogeographic analyses. Likelihood analysis of range evolution was conducted using the dated phylogeny and the DEC model as implemented in the program Lagrange [28,29]. We coded the ranges of terminals as 14 areas. We implemented three models: (a) an unconstrained model; (b) a stepping-stone model, wherein spatial (but not temporal) information was incorporated; and (c) a stratified model, wherein seven spans of geological time were delimited and the relationships of the areas were recorded during each span of time. Geological events used to delimit the time spans follow Hall [30] and Sanmartín & Ronquist [31]. The maximum number of areas in ancestral ranges was held at two, a convention that reflects empirical observations of Zalmoxidae species, the majority of which are narrowly distributed endemics. Dispersal constraints were set to 1.0 (if landmasses were connected), 0.1 (if landmasses were disjunct), or 0 (if landmasses did not exist). The list of areas and the dispersal constraint matrices of the stratified model are provided in electronic supplementary material, table S4 (Python scripts specifying dispersal constraint matrices are available upon request from the authors).

Bayesian analysis of range evolution was conducted using the program RASP [32]. As an ultrametric tree is not required (the analysis does not account for time), we analyzed all four topologies using RASP. The ranges of terminals were coded in the same manner as for the DEC model. Two runs of 106 generations were run, sampling every 103 generations, such that the average deviation of split frequencies reached <0.001. After burn-in samples were discarded, frequencies of the ancestral areas reconstructed at all nodes were combined from the two runs.

(f) Analysis of diversification rate

Temporal shifts in diversification rate were examined with the R package LASER [33]. The dated phylogeny of the Zalmoxidae subtree was isolated from the BEAST topology and pruned such that species represented by multiple terminals were subsequently represented by a single specimen. Multiple diversification models were fitted to the dated phylogeny of Zalmoxidae and the fit of the alternative models was compared using the Akaike Information Criterion [11]. Diversification parameters were computed using the best-fitting model amongst two constant rate and six variable rate models.


  1. Sharma, P. P. 2012 New Australasian Zalmoxidae (Opiliones: Laniatores) and a new case of male polymorphism in Opiliones. Zootaxa.

  2. Sharma, P. & Giribet, G. 2009 Sandokanid phylogeny based on eight molecular markers—The evolution of a southeast Asian endemic family of Laniatores (Arachnida, Opiliones). Mol. Phylogenet. Evol. 52, 432-447. (doi:10.1016/j.ympev.2009.03.013)

  3. Rambaut, A. E. 1996 Se-Al sequence alignment editor. University of Oxford, UK Program and documentation available from:

  4. Edgar, R. C. 2004 MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792-1797. (doi:10.1093/nar/gkh340)

  5. Castresana, J. 2000 Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 17, 540-552.

  6. Stamatakis, A. 2006 RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688-2690. (doi:10.1093/bioinformatics/btl446)

  7. Stamatakis, A., Hoover, P. & Rougemont, J. 2008 A rapid bootstrap algorithm for the RAxML Web servers. Syst. Biol. 57, 758-771. (doi:10.1080/10635150802429642)

  8. Ronquist, F. & Huelsenbeck, J. P. 2005 Bayesian analyses of molecular evolution using MrBayes. In Statistical Methods in Molecular Evolution. (Ed Nielsen, R.) New York, NY: Springer.

  9. Posada, D. 2005 Modeltest 3.7. Program and documentation available from:

  10. Posada, D. & Crandall, K. A. 1998 Modeltest: testing the model of DNA substitution. Bioinformatics 14, 817-818. (doi:10.1093/bioinformatics/14.9.817)

  11. Posada, D. & Buckley, T. 2004 Model selection and model averaging in phylogenetics: advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Syst. Biol. 53, 793-808. (doi:10.1080/10635150490522304)

  12. Wheeler, W. C. 1996 Optimization alignment: the end of multiple sequence alignment in phylogenetics? Cladistics 12, 1-9. (doi:10.1111/j.1096-0031.1996.tb00189.x)

  13. Varón, A., Vinh, L. S. & Wheeler, W. C. 2010 POY version 4: phylogenetic analysis using dynamic homologies. Cladistics 26, 72-85. (doi:10.1111/j.1096-0031.2009.00282.x)

  14. Nixon, K. C. 1999 The parsimony ratchet, a new method for rapid parsimony analysis. Cladistics 15, 407-414. (doi:10.1006/clad.1999.0121)

  15. Goloboff, P. A. 1999 Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics 15, 415-428. (doi:10.1111/j.1096-0031.1999.tb00278.x)

  16. Goloboff, P. A. (2002) Techniques for analyzing large data sets. In Techniques in Molecular Systematics and Evolution. (Eds Desalle, R., Giribet, G., Wheeler, W. C.) pp. 70-79. Basel, Switzerland: Brikhäuser Verlag.

  17. Sharma, P. P., Vahtera, V., Kawauchi, G. Y. & Giribet, G. 2011 Running WILD: The case for exploring mixed parameter sets in sensitivity analysis. Cladistics 27, 538-549. (doi:10.1111/j.1096-0031.2010.00345.x)

  18. Giribet, G. 2007 Efficient tree searches with available algorithms. Evol. Bioinform. 3, 341-356.

  19. Farris, J. S., Albert, V. A., Källersjö, M., Lipscomb, D. & Kluge, A. G. 1996 Parsimony jackknifing outperforms neighbor-joining. Cladistics 12, 99-124. (doi:10.1111/j.1096-0031.1996.tb00196.x)

  20. Drummond, A. J., Ho, S. Y. W, Phillips, M. J. & Rambaut, A. 2006 Relaxed phylogenetics and dating with confidence. PLoS Biology 4, e88. (doi:10.1371/journal.pbio.0040088)

  21. Drummond, A. J. & Rambaut, A. 2007 BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214. (doi:10.1111/j.1096-0031.2009.00296.x)

  22. Sharma, P. P. & Giribet, G. 2011 The evolutionary and biogeographic history of the armoured harvestmen—Laniatores phylogeny based on ten molecular markers, with the description of two new families of Opiliones (Arachnida). Invertebr. Syst. 25, 106-142. (doi:10.1071/IS11002)

  23. Rambaut, A. & Drummond, A. J. 2009 Tracer v. 1.5. Program and documentation available from:

  24. Shaul, S. & Graur, D. 2002 Playing chicken (Gallus gallus): methodological inconsistencies of molecular divergence date estimates due to secondary calibration points. Gene 300, 59-61. (doi:10.1016/S0378-1119(02)00851-X)

  25. Graur, D. & Martin, W. 2004. Reading the entrails of chickens: molecular timescales of evolution and the illusion of precision. Trends Genet. 20, 80-86. (doi:10.1016/j.tig.2003.12.003)

  26. Ho, S. Y. W. & Phillips, M. J. 2009 Accounting for calibration uncertainty in phylogenetic estimation of evolutionary divergence times. Syst. Biol. 58, 367-380. (doi: 0.1093/sysbio/syp035)

  27. Dunlop, J. A. 2007 Paleontology. In Harvestmen: The Biology of Opiliones. (Eds R. Pinto-da-Rocha, G. Machado & G. Giribet) pp. 247-265. Cambridge, MA: Harvard University Press.

  28. Ree, R. H., Moore, B. R., Webb, C. O. & Donoghue, M. J. 2005 A likelihood framework for inferring the evolution of geographic range on phylogenetic trees. Evolution 59, 2299-2311. (doi:10.1554/05-172.1)

  29. Ree, R. H. & Smith, S. A. 2008 Maximum likelihood inference of geographic range evolution by dispersal, local extinction, and cladogenesis. Syst. Biol. 57, 4-14. (doi:10.1080/10635150701883881)

  30. Hall, R. 2002 Cenozoic geological and plate tectonic evolution of SE Asia and the SW Pacific: computer-based reconstructions and animations. J. As. Earth Sci. 20, 353-434. (doi:10.1016/S1367-9120(01)00069-4)

  31. Sanmartín, I. & Ronquist, F. 2004 Southern Hemisphere biogeography inferred by event-based models: plant versus animal patterns. Syst. Biol. 53, 216-243. (doi:10.1080/10635150490423430)

  32. Yu, Y., Harris, A. J. & He, X.-J. 2011 RASP (Reconstruct Ancestral State in Phylogenies) 2.0 beta. Program and documentation available from:

  33. Rabosky, D. L. 2006 LASER: A maximum likelihood toolkit for detecting temporal shifts in diversification rates from molecular phylogenies. Evol. Bioinform. 2, 247-250.

База данных защищена авторским правом ©shkola.of.by 2016
звярнуцца да адміністрацыі

    Галоўная старонка