Supplementary materials and methods
Bacterial strains and culture conditions
We used 18 T. forsythia strains that were isolated from periodontitis patients at Tokyo Medical and Dental University by Dr. Umeda during 1990–2010 (Supplementary Table S18). We tried to use the culture media described previously (Takemoto et al., 1997), but they did not work. Then, we modified that method by adding fetal bovine serum; this approach is effective for T. denticola culture (Van and Smibert, 1983). The isolates were maintained anaerobically (10% CO2, 10% H2, 80% N2 [Mitsubishi Gas Chemical, Tokyo, Japan]) in 6% Gifu anaerobic medium (GAM, [Nissui, Tokyo, Japan]), supplemented with yeast extract (1 mg/mL [Nacalai Tesque, Kyoto, Japan]), hemin (5 µg/mL [Sigma-Aldrich, St. Louis, U.S.A.]), menadione (0.5 µg/mL [Nacalai Tesque]), fetal bovine serum (35 µl/mL [HyClone, Tauranga, N.Z.]) and 98% (w/v) N-acetylmuramic acid (NAM, 15 µg/mL [Sigma-Aldrich, Deisenhofen, Germany]), or on GAM blood agar plates (6% GAM, 5% sheep blood [Nippon Biotest Laboratories, Tokyo, Japan], 1.5% agarose [Nacalai Tesque], yeast extract [1 µg/mL], hemin [5 µg/mL] and menadione [0.5 µg/mL]) with 98% (w/v) NAM disk (20 g/L, antibiotic test filter paper thin, [Advantec, Tokyo, Japan]). After incubation for 7–14 days at 37ºC, colonies were injected into 10 mL GAM liquid medium and cultured for 14 days at 37ºC.
Genomic DNA was extracted from each isolate using the methods described previously (Watanabe et al., 2013).
Determination of T. forsythia complete genome sequences
Complete genome sequences of 2 T. forsythia strains (KS16 and 3313), which differed in clinical parameters, were determined using a combination of GS Junior Titanium instrument (454 Life sciences a Roche Company, CT, U.S.A.) for shotgun/paired-end sequencing and Genome Analyzer IIx (GAIIx; Illumina, CA, U.S.A.) for 101-bp paired-end and traditional Sanger sequencing (ABI 3730; Applied Biosystems, CA, U.S.A.). The Newbler v2.6 (454 Life sciences a Roche Company) and Trimmomatic v0.20 software packages were used for trimming and filtering of the sequenced reads of GS Junior Titanium and GAIIx, respectively. After the trimming/filtering, the total length of all the reads was as the followings: GS Junior Titanium: KS16 47-Mbp and 3313 40-Mbp; GAIIx: KS16 999.8-Mbp and 3313 695.2-Mbp. Then, these reads were assembled in Newbler v2.6 to obtain scaffolds. Sequence gaps between the scaffolds were closed by PCR amplification and Sanger sequencing. SNPs and homopolymers were corrected by mapping of Sanger and GAIIx reads, respectively.
In the complete genome sequences, protein-coding regions (CDSs) were predicted by the Rapid Annotation Transfer Tool (RATT; Otto et al., 2011) using strain ATCC 43037 as a reference. After that, MetaGeneAnnotator (Noguchi et al., 2006), Glimmer v3.02 (Salzberg et al., 1998; Delcher et al., 1999; Delcher et al., 2007) and IMC-GE v3.0.30 (in silico Molecular Cloning Genomics Edition; In Silico Biology, Yokohama, Japan) were used to improve accuracy of CDS prediction. Then the predicted CDSs were annotated by BLASTP search against the NCBI GenBank non-redundant (nr) database (last accessed Aug 27, 2012; Benson et al., 2013) under the thresholds of query coverage (≥60%) and sequence identity (≥60%). Ribosomal RNA (rRNA), transfer RNA (tRNA), and non-coding RNA were identified using RNAmmer v1.2 (Lagesen et al., 2007), tRNAscan-SE v1.3.1 (Schattner et al., 2005), and Rfam v11.0 software packages (Griffiths-Jones et al., 2005), respectively.
Insertion sequences (ISs) were identified using ISSaga v1.0 (Varani et al., 2011) and a BLASTN search against the ISFinder database (last accessed Dec 20, 2001; Kichenaradja et al., 2010) under the thresholds of sequence identity (≥95%) and e value (≤1e-5). Miniature inverted-repeat transposable elements (MITEs) were identified using MUST v1.0 (Chen Y et al., 2009) under the threshold of e value (≤1e-5) and the Markov Cluster Algorithm (MCL) (van Dongen and Abreu-Goodger, 2012). Conjugative transposons (CTns) were identified using Mauve v2.3.1 (Darling et al., 2004) and GenomeMatcher v1.75 (Ohtsubo et al., 2008), if a region fulfilled the following two conditions: i) high nucleotide similarity to both CTns of the reference strains (T. forsythia ATCC 43037, T. denticola ATCC 35405, and P. gingivalis ATCC 33277) and ii) att sites at both ends.
Determination of T. forsythia draft genome sequences
Draft genome sequences of 16 T. forsythia strains (sit3, 15, 20, 2612, MH6, L7, 273, KM4, 3322, 291, 3114, TR1, 2444, 1224, 2442 and K8) were determined using GAIIx sequencing (101-bp or 250-bp paired-end). Trimmomatic v0.20 and Velvet v1.2.03 (Zerbino and Birney, 2008) were used for trimming/filtering and assembly, respectively (Supplementary Table S2). Then the draft genomes were annotated using Rapid Annotation using Subsystem Technology server v4.0 (RAST; http://rast.nmpdr.org/; Aziz et al., 2008). Uncharacterized hypothetical genes were characterized using BLASTP against the nr database (last accessed Aug 27, 2012; Benson et al., 2013) under the threshold of e value (≤1e-5) to find any homologs. Within the hypothetical genes without any homologs, we further searched for presence of any protein motifs by BLASTP against NCBI’s Conserved Domain Database (CDD; last accessed Mar 21, 2013; Marchler-Bauer et al., 2011) under the thresholds of bit score (≥50) and e value (≤0.001). Ribosomal RNA, transfer RNA, noncoding RNA, and the MGEs were identified as described above. The fastq data have been deposited in the DDBJ/EMBL/GenBank databases.
Estimation of pan-genome size
In addition to the genomic data which we determined, we obtained the draft genome sequence data of 8 T. denticola (ASLM, ATCC 33520, ATCC 33521, F0402, MYR-T, SP33, SP37 and US-Trep) and 2 P. gingivalis (W50 and JCVI SC001) strains from the DDBJ/EMBL/GenBank databases. All the sequence data were reannotated by using RAST to obtain annotated data under the same prediction/annotation conditions. In this study, we call the genes that were common for all the strains in each species or among the 3 species the “core genes.” The core genes of each species or among all 3 species were further clustered using the PGAP v1.11 under the thresholds of coverage (≥50), bit score (≥50), and e value (≤1e-8) to identify the core genes. Regression analysis for new genes and the pan-genome was conducted as described previously (Tettelin et al., 2008). The power laws are formulated as n (new) = κ∙N-α and n (pan) = ε∙Nγ. The n (new) and n (pan) are the average of new gene and pan-genome distributions, respectively. N is the number of genomes; κ, α, ε, and γ are free parameters. Values of α less than 1.0 and γ greater than 0 indicate an open pan-genome (Tettelin et al., 2008).
Multiple dot plot analysis
We used the 2 T. forsythia genomes described above for multiple dot plot analysis in which all three species of the red complex were included. In addition, we obtained the complete genome sequence of 1 T. forsythia (ATCC 43037), 6 T. denticola (ATCC 35404, ATCC 35405, AL2, H1-T, H22 and OTK), and 3 P. gingivalis (ATCC 33277, TDC60 and W83) strains from the DDBJ/EMBL/GenBank databases. These sequences were aligned using nucmer in the MUMmer v3.23 (Delcher et al., 2003) and GenomeMatcher. In the dot plot, each dot represents similarity of a 20-bp sequence, and rearrangement breakpoints were identified by the ends of lines that were drawn by continuous plot of the dots. The rearrangement breakpoints were characterized by the features located at both ends, as described previously (Watanabe et al., 2013).
For each bacterial strain, the number of each MGE was divided by the number of CDS to calculate the MGE/CDS ratio (IS/CDS, CTn/CDS, and MITE/CDS). In each MGE, the mean and standard deviation were calculated from the ratios of all the strains for each species, and were statistically tested for significant differences between the strains.
Construction of phylogenetic tree and SNP analysis
In each species in the red complex, the core genes that were single copy in all the strains were tested to see whether rearrangement in a gene was significant, by using the Phi test (Huson and Bryant, 2006) under the threshold p value (≤5.00e-02). An amino acid substitution model was selected using ModelGenerator v0.85 (Keane et al., 2006). MEGA v5.2 (Tamura et al., 2011) and RAxML v7.0.4 (Stamatakis, 2006) with the PROTGAMMAIJTTF model were used to construct a neighbor-joining and minimum-evolution tree, and a maximum likelihood tree, respectively, with 500 bootstrap replicates.
For all the gene clusters in each species where a gene was located in more than 2 genomes, SNP loci were identified by comparing nucleotide sequences after alignment, and a dN/dS ratio was calculated by PAML v4.6 (Yang, 1997). In each gene cluster, the dN/dS ratio was calculated for every gene pair; the median value was calculated from all the ratios in the cluster.
CRISPR sequences and CRISPR-associated (cas) genes were identified by using CRISPRFinder (last accessed Sep 18, 2013; Grissa et al., 2007). The cas gene candidates were searched using BLASTP in the CDD database under the threshold of e value (≤0.001). On the basis of amino acid sequence similarity and a protein motif of the cas genes, CRISPR types and structures in T. forsythia and T. denticola were determined according to CRISPR layouts in Makarova et al., (2011). In each CRISPR type, consensus sequences of all repeat sequences were determined using WebLogo v3.3 (Crooks et al., 2004). In T. forsythia, the repeat sequences were further analyzed using CRISPRmap v3.1.5 (Lange et al., 2013) to characterize these repeats. The phage-like structures in the CRISPRs were identified using both nucleotide sequence similarity with the known phage sequences in the NCBI nucleotide database and gene annotations; PHAST (last accessed Jun 7, 2013; Zhou Y et al., 2011) yielded no phage regions in any T. forsythia strains. For each CRISPR type, a non-redundant unique spacer list was obtained by all-to-all BLASTN search, with the following criterion; two spacers were considered to have high nucleotide similarity to each other if the bit score of BLASTN was more than 50 (Watanabe et al., 2013; Pride et al., 2012). We used the BLASTN search under following conditions: word size 7 and dust filter off. A heatmap and dendrogram were drawn, as described previously (Watanabe et al., 2013).
To characterize a target of each spacer sequence, the spacer list was subjected to a BLASTN search with a word size 7 and the dust filter “off” against 7 databases, as described previously (Pride et al., 2012; Watanabe et al., 2013). Hits were considered significant when a bit score was over 50. The subject sequences were annotated by BLASTX search in the NCBI GenBank nr database (last accessed Aug 27, 2012; Benson et al., 2013) and were annotated using the predicted protein of query sequences under the thresholds of the highest bit score and e value (≤1e-03). For uncharacterized hypothetical genes, we further searched the subjects for presence of any protein motifs as described above. Protospacer-adjacent motifs (PAMs) were predicted from alignments of the 30-bp sequences at both ends of the subjects, using WebLogo.
In each species, mapping analysis of metabolic pathways was carried out; the core genes were mapped on metabolic pathways using KAAS (http://www.genome.jp/kaas-bin/; Moriya et al., 2007) according to the bidirectional best hit method and partial genome mode, and were visualized using the iPath v2 (Yamada et al., 2011). The metabolic pathways included 5 main pathway groups (carbohydrate, energy, lipid, nucleotide, and amino acid), which were further subdivided into 29 small pathway groups. Each of the 29 pathway groups consisted of a collection of stand-alone pathways that determined the changes of substances controlled by bacterial genes. The complementation ratio (the proportion of the mapped genes in all the genes required in each stand-alone pathway) was calculated for each species. The ratio was also obtained under the condition where the core genes of either 2 or 3 species were analyzed. The mean and standard deviation were calculated from all the ratios of each stand-alone pathway, except for the case where all 3 species were included in the above mapping analysis. The proportions of the complementation ratio were calculated between the conditions where different numbers of the species were analyzed. Every small pathway group was analyzed to find any stand-alone pathways, where the genes were mapped to encode all the genes required in each pathway completely.
To test for the presence of the known virulence genes in all the genomes of strains in each species, BLASTP was used against the virulence genes described in the previous reports (Holt et al., 1999; Sharma, 2010; Dashper et al., 2011), under the thresholds of query coverage (≥80%) and sequence identity (≥80%). In addition, we tried to find any novel virulence gene candidates using BLASTP against the virulence factor database (VFDB; last accessed Nov 25, 2013; Chen L et al., 2012) and the microbial database of protein toxins, virulence factors, and antibiotic resistance genes for biodefense applications (MvirDB; last accessed Mar 21, 2007; Zhou CE et al., 2007) under the threshold of e value (≤1e-7). The presence of both the known virulence genes and the novel virulence gene candidates was visualized by drawing a heatmap, using the R software package (http://cran.r-project.org/).
The T. forsythia genome sequences have been deposited in the DDBJ/EMBL/GenBank databases under the following accession numbers: complete genomes of AP013045 (KS16) and AP013044 (3313), draft genomes of AP013045 (KS16), AP013044 (3313), DRS013040 (sit3), DRS013026 (15), DRS013027 (20), DRS013030 (2612), DRS013038 (MH6), DRS013037 (L7), DRS013031 (273), DRS013036 (KM4), DRS013034 (3322), DRS013032 (291), DRS013033 (3114), DRS013039 (TR1), DRS013029 (2444), DRS013025 (1224), DRS013028 (2442) and DRS013035 (K8), respectively.
Genomic characteristics of T. forsythia
Genomic information reflects evolutionary changes within a bacterial species; identification of core genes that are conserved among all the strains and strain-specific genes that give unique characteristics to each strain is important for the understanding of bacterial evolution. Nonetheless, there was only one genome sequence of T. forsythia that is publicly available: the laboratory strain ATCC 43037. In this study, for the purpose of characterization of T. forsythia species based on genomic information, we determined genome sequences of 18 T. forsythia strains, which had been isolated from different patients with several clinical types of periodontitis (Supplementary Table S2 and S18). Two complete genome sequences (Supplementary Fig. S15) and 16 draft genome sequences were included.
The 19 genomes, which included the laboratory strain ATCC 43037 and the 18 sequenced strains, had 2,912 ± 86 (mean ± SD) protein-coding regions (CDSs; Supplementary Table S2). The pan-genome, a sum of core and dispensable genomes, contained 5,418 CDSs, of which 1,733 CDSs (32.0%) were the core genes (genes in common among all the 19 strains; Supplementary Fig. S16, Supplementary Table S4).
All the 19 strains lacked the genes for the synthesis of N-acetylmuramic acid (NAM); this result was consistent with the fact that T. forsythia requires NAM for in vitro (laboratory) bacterial growth (Braham and Moncla, 1992). Furthermore, we constructed phylogenetic trees from the core genes (Supplementary Fig. S5). We found that when we analyzed the 2 clinical types of periodontitis, T. forsythia strains of the same type were distributed in different clusters. These data suggested that there were no obvious associations between a periodontitis type and phylogenetic relationships. Strains TR1, 2444, 2442, and 1224 were clustered together, but there were differences in the parameters between these strains; probing depth (mm), clinical attachment level, and bleeding on probing differed among these strains. This result is suggestive of applicability of CRISPR typing to T. forsythia species as a method for clustering strains with differing clinical parameters.
As comparative analysis of the complete genome sequences, whole-genome dot plot analysis among strains KS16, 3313, and ATCC 43037 showed that the genome organization was highly conserved. Genomic rearrangements were observed only in the comparison of KS16/3313 and KS16/ATCC 43037 (Supplementary Fig. S17), and one of the rearrangement breakpoints contained an rRNA operon.
These results suggested that the genome structure was likely stable, and essential properties such as bacterial growth were highly conserved within T. forsythia strains regardless of the periodontitis type.
Supplementary Figure legends
Supplementary Figure S1 Comparison of gene contents of T. forsythia, T. denticola, and P. gingivalis
Commonality and uniqueness of genes are shown among (A) 19 T. forsythia, (B) 6 T. denticola, (C) 3 P. gingivalis strains, and (D) the red complex species.
Each circle represents the totality of all genes in each strain. The circles are overlapping to show common genes among the strains. The central area represents the core genes of each strain. The areas that consist of only one circle show the strain-specific genes in each strain, with the numbers of the genes.
Supplementary Figure S2 An accumulation curve for the total number of genes, core genes, and new genes as a function of the number of genomes analyzed for the red complex species
Accumulation curves for the pan-genome (squares) and the numbers of the core genes (circles) and new genes (triangles), are drawn for each species: T. forsythia (gray), T. denticola (blue), and P. gingivalis (red). The number of the plots is different among the 3 species because of the difference in the number of the genomes analyzed. The formula that estimates the pan-genome size and the number of the new genes is shown near the curves.
Supplementary Figure S3 Comparative analysis of the rate of MGEs (CTn, IS, and MITE) in the red complex species
The MGE/CDS ratio is shown as follows: (A) IS/CDS, (B) CTn/CDS, and (C) MITE/CDS. Asterisks indicate statistical significance of differences between the ratio of 2 species: *P < 0.01 and **P < 0.05.
Supplementary Figure S4 Genetic layout of CRISPR/Cas loci of 19 strains of T. forsythia
The CRISPR/Cas arrays are shown as colored genes as in the margin note. (A) The arrays of T. forsythia type I-B/rp30 and (B) T. forsythia type ll/rp47. The phage-like structures are indicated by the boxes.
Supplementary Figure S5 Phylogenetic analysis based on the core genes of T. forsythia
Phylogenetic trees are shown with the bootstrap values at each node. (A) A maximum likelihood tree, (B) a neighbor-joining tree, and (C) a minimum-evolution tree, respectively. In the tree in panel A, clinical types of periodontitis are indicated on the right side of the strain names: the aggressive type (“A” in blue) and the chronic type (“C” in red). Phylogenetic distances are indicated by a scale bar beneath the trees.
Supplementary Figure S6 Genetic layout of the CRISPR/Cas locus of 6 strains of T. denticola
The CRISPR/Cas locus (type II-A/rp27) of T. denticola is shown for 6 T. denticola strains, as in Supplementary Figure S4.
Supplementary Figure S7 Polymorphism of repeat lengths and nucleotides in T. forsythia and T. denticola
The arrays of the repeat lengths and nucleotides are indicated by colored boxes for each isolate: (A) T. forsythia type I-B/rp30 (ATCC 43037, KS16, and 3313), (B) T. forsythia type II/rp47 (KS16), (C) T. denticola type II-A/rp27 (AL2 and OTK), (D) T. denticola type II-A/rp23 (ATCC 35404), and (E) T. denticola type II-A/rp36 (ATCC 35405). Polymorphism of repeat lengths is indicated by the lengths of the colored boxes, with the blank boxes showing the absence of repeat sequences. Nucleotide polymorphisms are indicated by the colors as follows: no polymorphism (yellow), 1-bp polymorphism (red), 2-bp (purple), 3-bp (green), 4-bp (pink), and a 5-bp polymorphism (blue).
Supplementary Figure S8 Polymorphism of CRISPR repeat sequences in T. forsythia and T. denticola
Polymorphism of the CRISPR repeat sequences is shown as the visualization by means of the WebLogo software. (A) Repeats of the type rp30 (30-bp), rp47 (47-bp), and rp28 (28-bp) CRISPRs in 19 strains of T. forsythia. (B) Repeats of the type rp27 (27-bp), rp23 (23-bp), and rp36 (36-bp) CRISPRs in 5 strains of T. denticola. The horizontal axis indicates the position of repeats and the vertical axis indicates the number of bits (log24 = 2 bits). The vertical line on the upper side of each nucleotide character is the error bar.
Supplementary Figure S9 Classification of T. forsythia repeat sequences based on the CRISPRmap database
(A) The repeat sequences of the T. forsythia CRISPRs are taxonomically classified and shown in a hierarchical tree constructed using CRISPRmap: (i) repeats of type rp30 (30-bp), (ii) rp47 (47-bp), and (iii) rp28 (28-bp). The bacterial and archaeal domains are indicated by dark brown and blue-green lines in the tree, respectively. The circles outside the tree indicate the following: superclass (the outermost), sequence family (middle), and structure motif (the innermost). (B) Polymorphisms in the CRISPR repeat sequences of the type rp30 against the CRISPRmap database, as visualized using WebLogo. (C) A minimum free energy (MFE) structure is shown for the types rp47 and rp28.
Supplementary Figure S10 Alignment of the sequences adjacent to the target regions of the CRISPR spacers
In each CRISPR type, WebLogo figures are shown with the alignment of the sequences adjacent to the target regions of the CRISPR spacers: (A) T. forsythia type I-B/rp30, (B) T. forsythia type II/rp47 (KS16), and (C) T. forsythia no type/rp28. Each figure and alignment are divided into the left and right part to indicate the alignments of the upstream and downstream sequences of the targets, respectively.
Supplementary Figure S11 Clustering by spacer content in T. forsythia
The presence of each unique CRISPR spacer is shown by means of a heatmap in each T. forsythia strain. The dendrogram is constructed from Euclidian distances. In the heatmap, 2 colors are used according to the bit score: red, ≥ 50; yellow, < 50.
Supplementary Figure S12 Similarity searches using CRISPR spacers of T. denticola
For the T. denticola CRISPR spacers, the presence of targets is shown as in Figure 3.
Supplementary Figure S13 A complementation ratio of the metabolic pathways and its fold increase
(A) For the 29 main pathways, the complementation ratio among the 3 numbers of the species (1, 2, and 3 species) is shown. The bars of either 1 or 2 species are the mean ratio of possible combinations (1 species: T. forsythia, T. denticola, P. gingivalis; 2 species: T. forsythia and T. denticola, T. denticola and P. gingivalis, P. gingivalis and T. forsythia). Additionally, the ratio of the butyric acid pathway is shown (rightmost). (B) The proportions of the complementation ratio are shown between the conditions where different numbers of the species are analyzed (1: 2 species, 2: 3 species, 1: 3 species).
Supplementary Figure S14 Clustering by presence of the virulence genes
Presence of the known virulence genes and the novel virulence gene candidates is visualized by drawing a heatmap, where the red and yellow boxes indicate the presence and absence of the genes, respectively. The phylogenetic tree in T. forsythia to the left of the heatmap is the same as that of the maximum likelihood method in Supplementary Figure S5.
Supplementary Figure S15 Complete genome sequences of T. forsythia KS16 and 3313 genomes
(A) KS16 and (B) 3313 genome sequences. Each circle indicates the following: nucleotide positions (the outermost), forward and reverse CDSs (second and third level, respectively), rRNA (fourth level), tRNA (fifth level), G+C% content (sixth level), GC skew (seventh level), ISs (eighth level), MITEs (ninth level), CTns (tenth level), and CRISPRs (the innermost).
Supplementary Figure S16 Comparison of gene contents of 19 T. forsythia strains
Each circle represents the whole of the genes in each strain. The circles are overlapping to show commonality and uniqueness of genes among the strains. The central area represents the 1,733 core genes of 19 T. forsythia strains. The areas that consist of only one circle show the strain-specific genes in each strain, with the numbers of the genes.
Supplementary Figure S17 Genome structures of T. forsythia and the positions of MITE, IS, CTn, CRISPR, and rRNA
Genome structures of 3 T. forsythia strains (ATCC 43037, KS16 and 3313) are shown as in Figure 1. The positions of the mobile genetic elements (IS, MITE, and CTn), CRISPR, and rRNA are shown along the 2 axes. The positions of rRNA operons in the rearrangement breakpoints are indicated by the yellow dotted lines.
Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA et al (2008). The RAST Server: rapid annotations using subsystems technology. BMC Genomics 9: 75.
Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J et al (2013). GenBank. Nucleic Acids Res 41: D36-42.
Braham PH, Moncla BJ (1992). Rapid presumptive identification and further characterization of Bacteroides forsythus. J Clin Microbiol 30: 649-654.
Chen Y, Zhou F, Li G, Xu Y (2009). MUST: a system for identification of miniature inverted-repeat transposable elements and applications to Anabaena variabilis and Haloquadratum walsbyi. Gene 436: 1-7.
Chen L, Xiong Z, Sun L, Yang J, Jin Q (2012). VFDB 2012 update: toward the genetic diversity and molecular evolution of bacterial virulence factors. Nucleic Acids Res 40: D641-645.
Crooks GE, Hon G, Chandonia JM, Brenner SE (2004). WebLogo: a sequence logo generator. Genome Res 14: 1188-1190.
Darling AC, Mau B, Blattner FR, Perna NT (2004). Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res 14: 1394-1403.
Delcher AL, Harmon D, Kasif S, White O, Salzberg SL (1999). Improved microbial gene identification with GLIMMER. Nucleic Acids Res 27: 4636-4641.
Delcher AL, Salzberg SL, Phillippy AM (2003). Using MUMmer to identify similar regions in large sequence sets. Curr Protoc Bioinformatics Chapter 10: Unit 10.13.
Delcher AL, Bratke KA, Powers EC, Salzberg SL (2007). Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23: 673-679.
Dashper SG, Seers CA, Tan KH, Reynolds EC (2011a). Virulence factors of the oral spirochete Treponema denticola. J Dent Res 90: 691-703.
Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy S, Bateman A (2005). Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Research 33: D121-D124.
Grissa I, Vergnaud G, Pourcel C (2007). CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats. Nucleic Acids Res 35: W52-57.
Holt SC, Kesavalu L, Walker S, Genco CA (1999). Virulence factors of Porphyromonas gingivalis. Periodontol 2000 20: 168-238.
Huson DH, Bryant D (2006). Application of phylogenetic networks in evolutionary studies. Mol Biol Evol 23: 254-267.
Keane T, Creevey C, Pentony M, Naughton T, McInerney J (2006). Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. Bmc Evolutionary Biology 6.
Kichenaradja P, Siguier P, Pérochon J, Chandler M (2010). ISbrowser: an extension of ISfinder for visualizing insertion sequences in prokaryotic genomes. Nucleic Acids Res 38: D62-68.
Lagesen K, Hallin P, Rodland E, Staerfeldt H, Rognes T, Ussery D (2007). RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Research 35: 3100-3108.
Lange SJ, Alkhnbashi OS, Rose D, Will S, Backofen R (2013). CRISPRmap: an automated classification of repeat conservation in prokaryotic adaptive immune systems. Nucleic Acids Res 41: 8034-8044.
Makarova KS, Haft DH, Barrangou R, Brouns SJ, Charpentier E, Horvath P et al (2011). Evolution and classification of the CRISPR-Cas systems. Nat Rev Microbiol 9: 467-477.
Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C et al (2011). CDD: a Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Res 39: D225-229.
Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M (2007). KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res 35: W182-185.
Noguchi H, Park J, Takagi T (2006). MetaGene: prokaryotic gene finding from environmental genome shotgun sequences. Nucleic Acids Research 34: 5623-5630.
Ohtsubo Y, Ikeda-Ohtsubo W, Nagata Y, Tsuda M (2008). GenomeMatcher: a graphical user interface for DNA sequence comparison. BMC Bioinformatics 9: 376.
Otto T, Dillon G, Degrave W, Berriman M (2011). RATT: Rapid Annotation Transfer Tool. Nucleic Acids Research 39.
Pride DT, Salzman J, Haynes M, Rohwer F, Davis-Long C, White RA et al (2012). Evidence of a robust resident bacteriophage population revealed through analysis of the human salivary virome. ISME J 6: 915-926.
Sharma A (2010). Virulence mechanisms of Tannerella forsythia. Periodontol 2000 54: 106-116.
Salzberg SL, Delcher AL, Kasif S, White O (1998). Microbial gene identification using interpolated Markov models. Nucleic Acids Res 26: 544-548.
Schattner P, Brooks A, Lowe T (2005). The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Research 33: W686-W689.
Stamatakis A (2006). RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22: 2688-2690.
Takemoto T, Kurihara H, Dahlen G (1997). Characterization of Bacteroides forsythus isolates. J Clin Microbiol 35: 1378-1381.
Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S (2011). MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Molecular Biology and Evolution 28: 2731-2739.
Tettelin H, Riley D, Cattuto C, Medini D (2008). Comparative genomics: the bacterial pan-genome. Curr Opin Microbiol 11: 472-477.
van Dongen S, Abreu-Goodger C (2012). Using MCL to extract clusters from networks. Methods Mol Biol 804: 281-295.
Van Horn KG, Smibert RM (1983). Albumin requirement of Treponema denticola and Treponema vincentii. Can J Microbiol 29: 1141-1148.
Varani AM, Siguier P, Gourbeyre E, Charneau V, Chandler M (2011). ISsaga is an ensemble of web-based methods for high throughput identification and semi-automatic annotation of insertion sequences in prokaryotic genomes. Genome Biol 12: R30.
Watanabe T, Nozawa T, Aikawa C, Amano A, Maruyama F, Nakagawa I (2013). CRISPR Regulation of Intraspecies Diversification by Limiting IS Transposition and Intercellular Recombination. Genome Biol Evol 5: 1099-1114.
Yamada T, Letunic I, Okuda S, Kanehisa M, Bork P (2011). iPath2.0: interactive pathway explorer. Nucleic Acids Res 39: W412-415.
Yang Z (1997). PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13: 555-556.
Zerbino DR, Birney E (2008). Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18: 821-829.
Zhou CE, Smith J, Lam M, Zemla A, Dyer MD, Slezak T (2007). MvirDB--a microbial database of protein toxins, virulence factors and antibiotic resistance genes for bio-defence applications. Nucleic Acids Res 35: D391-394.
Zhou Y, Liang Y, Lynch KH, Dennis JJ, Wishart DS (2011). PHAST: a fast phage search tool. Nucleic Acids Res 39: W347-352.