Electronic Supplementary Material Appendix Additional material and methods Taxon sampling




старонка1/3
Дата канвертавання22.04.2016
Памер475.32 Kb.
  1   2   3
Electronic Supplementary Material

Appendix 1. Additional material and methods

Taxon sampling. The monophyly of termites and mantids is well established, only the Blattodea might be non-monophyletic. This is because the relationship between the cockroaches and the termites has not been resolved, so we needed a more thorough taxon sampling of cockroaches and termites than of the mantids We therefore sampled a representative subset of Mantodea (five of the 15 families, but ranging from the most basal taxon to some of the most derived taxa), almost all subfamilies of cockroaches, and all subfamilies of termites.
Sequencing. Five genes were chosen for their complementary resolving power: two mitochondrial genes: ribosomal 12S and protein-coding COII; two nuclear ribosomal genes: 28S and 18S; and one nuclear protein-coding gene: histone 3. These provide a total of some 4,900 base-pair characters. Total DNA was extracted from ethanol-preserved specimens using the DNeasy tissue kit by Qiagen. In some cases the presence of certain mineral or organic matter in the gut was found to inhibit PCR amplification, particularly in soil-feeding termites, so the abdomens and guts were subsequently removed from specimens before extraction. The small size of many termite workers, and their lack of either large mandibular or flight muscles, meant that there were often low DNA yields. Therefore soldiers or alates were preferentially used for DNA extraction, or, where neither soldiers nor alates were available, alternatively multiple workers from a single colony were used to increase the DNA yield. These problems did not arise in either cockroaches or mantids, where body sizes were generally much larger.

A range of universal and termite-specific primers was used in PCR amplification (see Supplementary Information). PCR samples were cleaned by PEG precipitation, and cycle sequenced using Applied Biosystem’s BigDye Terminator v3.1 Cycle Sequencing Kit. Following purification by precipitation, automated sequencing was carried out on an Applied Biosystems 3730 DNA Analyser. Genbank accession numbers are listed in the Supplementary Information section.

The combined character matrix had some missing gene sequences, predominantly due to the difficulty of sequencing histone 3 for the termites. However, generally we included all available taxa in the final analysis as recent simulations(Wiens 2006) have shown that trees can be reconstructed accurately with far higher proportions of missing data than were found in this dataset.

Alignment and tree building. For each gene we explored a range of gap opening costs (2-60) in the program Clustal X (Thompson et al. 1997). Gap extension costs were held equal to the opening costs and all costs were similarly held equal between the pairwise and multiple alignment steps (specific parameters are in the Supplementary Information section). Choosing one Clustal alignment over another can be highly subjective and problematic since each makes a statement about homologies, and may lead to different tree topologies. Rather than judging the effectiveness of differing gap costs by subjective assessment of the subsequent topologies produced, we have attempted to quantify objectively the level of homology established prior to phylogenetic analysis. For each of the five genes, approximately 10 clearly defined, conserved and homologous ‘landmark regions’ were selected. These regions, ranging from 7 to 123 base pairs long, and distributed throughout the length of each gene locus, were then used to score the effectiveness of the alignment process in that part of the gene . The landmark regions were identified in each sequence in an alignment, and all sequences scored according to the presence and degree of misalignment evident within each of the landmark regions, and in the adjacent areas of sequence (see Supplementary Information). The set of alignment parameters producing fewest misaligned regions was judged to be the preferred alignment. That alignment was used for the Bayesian phylogenetic analysis. We tested the effect on key clade recovery of using the suboptimal alignments by running MP analyses on subsets of combined sub-optimal and optimal alignments as follows: all sub-optimal, all suboptimal except 18S, all suboptimal except 18S and COII, all optimal except 12S, all optimal.

We used MR Modeltest (Nylander 2004) to estimate separate ML parameters for each aligned set of gene sequences. All five genes were estimated to have nst=6 (general time reversible model), and gamma rate = invariant. All but histone 3 had statefreqp =dirichlet , while histone 3 had statefreq = fixed (equal) In the most recent version of MrBayes these parameters can be estimated within Mr Bayes during the Bayesian analysis proper, but we found with a dataset of this size that the computing time needed for this all-in-one analysis was prohibitive.

We ran three parallel analyses of the data using MrBayes 3.1 (Ronquist et al. 2005) with each run having 1,000,000 generations (for exact settings see Supplementary Information). Within each run we had four chains and sampled trees every 200 generations. Inspection of the loglikelihood scores for the 1,000,000 generations indicated that the scores levelled off around the 400,000 generation and so we conservatively chose to look only at the last 500,000 generations of trees. For these we used PAUP*(Swofford 1999) to construct 50% majority rule consensus trees for each run. The software program Mesquite was used to visualise the trees and to examine all nodes with <50% support regardless of whether they were in our majority rule consensus tree.

Across the three runs there was no important difference in overall topology of the majority rule trees, all the differences were nested within the Blaberoidae (as defined in Fig 1), and this did not affect the position of termites within the cockroaches. We have used the trees from Run 1 in the figures (as this run had the highest max likelihood); figures derived using other runs were essentially identical.

We used the posterior probabilities at each recovered node to test the strength of alternative phylogenetic hypotheses. In addition, we used log likelihood tests (Shimodaira-Hasegawa tests) as implemented in PAUP*(Swofford 1999) as an additional test of the degree of support for alternative hypotheses. For the latter analysis we constrained the tree a priori so that the node of interest was present in the tree that we tested against our obtained Bayesian tree.

We also analysed the data using maximum parsimony (MP). Phylogenetic searches were conducted under parsimony using PAUP* version 4.0b10 (Swofford, 1999). In each case, analysis was carried out as follows: heuristic search, 10,000 random addition replicates, tree bisection and reconnection (TBR) branch-swapping, all character positions weighted equally, gaps treated as missing data, saving 1 tree only at each step. The shortest tree saved was then used in a subsequent heuristic search, as a starting tree for TBR branch swapping, saving multiple trees (maxtrees set to vary as necessary). Consensus trees were then produced. This methodology was adopted to maximise the number of replicates performed on the large data sets, to identify the most optimal tree ‘islands’ (Maddison, 1991), followed by a more intensive search to determine the most parsimonious trees from the initial set. Bootstrap support values were calculated using 999 random permutations of the combined dataset, again using PAUP* .



The final combined dataset had 4912 total characters. Of these, 2311 were constant, 684 were varaibale but non parsimony informative and 1917 were parsimony informative.

Appendix 2. Primers used in this study


Primer name

Direction

Gene

Sequence (5’ to 3’)

Length

Modified A-tLeu

Forward

COII

CAGATAAGTGCATTGGATTT

20

B-tLys

Reverse

COII

GTTTAAGAGACCAGTACTTG

20

12S forward

Forward

12S

TACTATGTTACGACTTAT

18

12S reverse

Reverse

12S

AAACTAGGATTAGATACCC

19

Hux

Forward

28S

ACACGGACCAAGGAGTCTAAC

21

MB2

Forward

28S

TCTAACATGTGCGCGAGTC

19

Win

Reverse

28S

GTCCTGCTGTCTTAAGCAACC

21

GPW3

Reverse

28S

TTAACCCGGCGTTTGGTTC

19

18S 1F

Forward

18S

TACCTGGTTGATCCTGCCAGTAG

23

18S a0.7

Forward

18S

ATTAAAGTTGTTGCGGTT

18

18S a2.0

Forward

18S

ATGGTTGCAAAGCTGAAAC

19

18S 9R

Reverse

18S

GATCCTTCCGCAGGTTCACCTAC

23

18S bi

Reverse

18S

GAGTCTCGTTCGTTATCGGA

20

18S b3.9

Reverse

18S

TGCTTTRAGCACTCTAA

17

H3 AF

Forward

H3

ATGGCTCGTACCAAGCAGACVGC

23

H3 AR

Reverse

H3

ATATCCTTRGGCATRATRGTGAC

23





Appendix 3. Tables showing the quality of individual sequences aligned under different gap opening and extension costs, using Clustal X.






















Sequences scored within conserved regions, number of affected sequences given. Clustal gap opening and extension costs held equal to each other.



















* = Preferred alignment, ** = suboptimal alignment











































12S























































Alignment

 

 

 

 

Conserved Region

 

 

 

 

 






















Cost

1

2

3

4

5

6

7

8

9

Total

 






















2

C1

C1

B8

B4

C1

C1

D1

D1

A

B12, C4, D2

 






















5

A

A

B4

A

A

A

A

A

A

B4

** Suboptimal alignment



















10

A

A

A

A

A

A

A

A

A

A

* Preferred (optimal) alignment
















15

B1

A

D2

D2

D1

A

A

A

A

B1, D5

 






















20

C1, D8

A

A

D23

A

A

A

A

A

C1, D31

 






















30

D58

A

A

D1

A

A

A

A

B1, D1

B1, D60

 








































































































































28S























































Alignment

 

 

 

Conserved Region

 

 

 

 

 

























Cost

1

2

3

4

5

6

7

8

9

Total

























2

D2

D2

A

A

A

A

B1

A

D1

B1, D5

* Preferred (optimal) alignment
















5

B1

B1

B35, C2

B11

A

A

A

A

A

B51, C2

** Suboptimal alignment



















10

B2

B1

B44, C1, D2

C12

A

A

A

A

A

B47, C13, D2

























15

D2

D6

B14, D7

A

A

A

A

A

A

B14, D15

























20

D6

B5, D7

D7

B13

A

A

A

A

A

B18, D20

























30

D18

D25

D16

D2

A

A

A

A

A

D61














































































  1   2   3


База данных защищена авторским правом ©shkola.of.by 2016
звярнуцца да адміністрацыі

    Галоўная старонка