The genome of pear (Pyrus bretschneideri Rehd.)




старонка1/5
Дата канвертавання25.04.2016
Памер0.55 Mb.
  1   2   3   4   5
The genome of pear (Pyrus bretschneideri Rehd.)

Jun Wu1,11, Zhiwen Wang2,11, Zebin Shi3,11, Shu Zhang2,11, Ray Ming4,11, Shilin Zhu2,11, M. Awais Khan5, Shutian Tao1, Schuyler S. Korban5, Hao Wang6, Nancy J. Chen7, Takeshi Nishio8, Xun Xu2, Lin Cong2, Kaijie Qi1, Xiaosan Huang1, Yingtao Wang1, Xiang Zhao2, Juyou Wu1, Cao Deng2, Caiyun Gou2, Weili Zhou2, Hao Yin1, Gaihua Qin1, Yuhui Sha2, Ye Tao2, Hui Chen1, Yanan Yang1, Yue Song1, Dongliang Zhan2, Juan Wang2, Leiting Li1,4, Meisong Dai3, Chao Gu1, Yuezhi Wang3, Daihu Shi2, Xiaowei Wang2, Huping Zhang1, Liang Zeng2, Danman Zheng5, Chunlei Wang8, Maoshan Chen2, Guangbiao Wang2, Lin Xie2, Valpuri Sovero9, Shoufeng Sha1, Wenjiang Huang1, Shujun Zhang3, Mingyue Zhang1, Jiangmei Sun1, Linlin Xu1, Yuan Li1, Xing Liu1, Qingsong Li1, Jiahui Shen1, Junyi Wang2, Robert E. Paull7, Jeffrey L. Bennetzen6, Jun Wang2,10, Shaoling Zhang1


1Centre of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing 210095, China; 2BGI-Shenzhen, Shenzhen 518083, China; 3Institute of Horticulture, Zhejiang Academy of Agricultural Sciences, Hangzhou 310021, China; 4Department of Plant Biology, University of Illinois, Urbana, IL 61801, USA; 5Department of Natural Resources and Environmental Sciences, University of Illinois, Urbana, IL 61801, USA; 6Department of Genetics, University of Georgia, Athens, GA 30602, USA; 7Department of Tropical Plant and Soil Sciences, University of Hawaii, Honolulu, Hawaii 96822, USA; 8Graduate School of Agricultural Science, Tohoku University, Aoba-ku, Sendai 981-8555, Japan; 9Department of Crop Sciences, University of Illinois, Urbana, IL 61801, USA; 10Department of Biology, University of Copenhagen, Copenhagen, Denmark; 11These authors contributed equally to this work.
Corresponding authors. Shaoling Zhang (slzhang@njau.edu.cn) and Jun Wang (wangj@genomics.org.cn)
Supplementary Information
Table of Contents

Supplementary Data

Supplementary Tables 1 to 15

Supplementary Figures 1 to 29

Supplementary References

Table of Contents

Species and Latin Name

3

Supplementary Data

4

Identification of orthologous genes

4

Gene family evolution

5

Analysis of transcription factors

5

Supplementary Tables 1-16

7

Supplementary Figures 1-29

26

Supplementary References

66


Species: Common and Latin Name

Pear: Pyrus bretschneideri

Apple: Malus × domestica

Strawberry: Fragaria vesca

Grape: Vitis vinifera

Papaya: Carica papaya

Poplar: Populus trichocarpa

Rice: Oryza sativa

Tomato: Solanum lycopersicum

Arabidopsis: Arabidopsis thaliana



Supplementary Data

Identification of orthologous genes

Using results of gene family clusters (Supplementary Table 15), statistical analysis of orthologs or unique paralogs involved in eight plant species was conducted (Supplementary Fig. 26). Orthologs were classified as follows: single-copy orthologs are genes wherein no other paralogs are present in a family; multiple-copy orthologs are genes wherein one family contains at least two genes of this species and at the same time contains all other species; unique paralogs are genes wherein one family only contains genes of this species; and other orthologs are all other genes as well as non-clustered genes. This comparative analysis of all eight plant species clustered 273,401 non-redundant protein sequences into 27,413 gene families (Supplementary Table 15). Of 42,812 protein-coding genes in pear, 34,083 genes were classified into 16,960 gene families, of which 1,207 gene families are unique, while 8,729 unique genes cannot be clustered into any families. These 1,207 gene clusters represent 2,978 genes, of which 1,214 contain InterPro domains and were assigned to GO (Gene Ontology) categories. The remaining 1,764 are previously unidentified predicted genes of unknown functions. These numbers are consistent with relative proportions from other sequenced genomes. Within these eight sequenced plant species, strawberry, papaya, apple, and pear have been selected to study intersection of gene families (Supplementary Fig. 27). A core set of 9,118 families is shared among the four plant species, reflecting a relatively recent common origin. While 1,376 families (3,693 genes) are specific only to pear, 16,955 families are conserved with at least one other plant species and may be specific to fruit tree characteristics.

A total of 840 unique genes were found in pear and annotated by GO with distribution patterns similar to all annotated genes in pear (Supplementary Fig. 14). A specific feature of pear gene clusters is the relatively high levels of metabolic and cellular processes, followed by transcription factor categories, kinase domains and enzymatic activities related to fruit development, ripening, and sugar metabolism.

Gene family evolution

A total of 7721 gene families shared by all eight species, except for rice, have been selected to study evolution of gene families by CAFÉ (De Bie et al. 2006). It is revealed that species with recent whole-genome duplication (WGD) and rapid evolution have more expansion families (Supplementary Fig. 28), such as pear (2,339 expansions and 351 contractions), apple (1,558 expansions and 519 contractions), poplar (3,359 expansions and 163 contractions), and Arabidopsis (1,168 expansions and 673 contractions). Expansions in gene families may result in corresponding gene function enhancement, and may reflect greater demand for specialized biological pathways in different plant species.



Analysis of transcription factors

To identify genes coding for transcription factors, previously constructed domain alignments (from the Pfam database version 23.0 http://pfam.sanger.ac.uk/) and newly established alignments (PlnTFDB http://plntfdb.bio.uni-potsdam.de/v3.0/) were used to query the pear proteome, using the hmmpfam programme of the HMMER suite (http://hmmer.janelia.org/). In total, 3,221 putative pear transcription factors (TFs), distributed over 84 families, have been identified (Supplementary Table 8). This represents 7.5% of 42,812 predicted protein-coding loci. A similar analysis of the apple genome has identified 3,627 filtered putative TFs, representing 8.0% of 45,293 predicted protein-coding loci (Supplementary Fig. 29). Differences in TF types between pear and apple are not remarkable; however, differences in TF numbers of the same family have been observed. In general, gene members in TF families are high in both pear and apple (Supplementary Table 8). For ~40 TF families, gene members are at least two-fold higher in pear and apple compared to those in grape, strawberry, papaya, poplar, and Arabidopsis. This may reflect differences in biological functions. For example, MYB, NAC, AP2-EREBP, and bHLH families have over 170 members of TFs. Functions of these TF families are known to be associated with regulation of plant development, resistance to biotic and abiotic stresses, hormone responses, and signal transduction.

MYB TFs have been implicated in regulating diverse plant responses, including growth, regulation of primary (sucrose) and secondary (lignins and phenylpropanoids) metabolites in response to hormones, abiotic and biotic stress, as well as responses to light, and circadian rhythms. Overall, pear has 326 MYB and MYB-related TFs, while apple and strawberry have 385 and 202, respectively (Supplementary Table 8).

A total of 99 MADS-box genes has been detected in pear, which is a significant tribe of TFs involved in controlling all major aspects of development, including male and female gametophytes, embryo, seed, flower, fruit, and root development. Two types (Type I and Type II) of MADS proteins have been identified in different plant species. In pear, 43 belong to Type I and 56 belong to Type II MADS proteins, which is similar to strawberry (40 Type I and 47 type II), but lower than that of apple (49 type I and 82 type II) (Supplementary Table 9).



Supplementary Table 1. Global statistics of pear genome for BAC sequencing.


Insert Size

No. of BAC

Total Data (Gb)

Sequence Depth (X)

250 bp

38,304

171.59

44.80

500 bp

38,304

158.06

41.26

Total

38,304

329.65

86.06



Supplementary Table 2. Global statistics of pear for whole-genome shotgun (WGS) sequencing.


Pair-end Libraries

Insert

Size

Read

Length

Total

Data (Gb)

Sequence Depth (X)

Solexa Reads

180bp

100

30.5

57.8

500bp

100

8.8

16.6

800bp

100

5.0

9.4

2kb

49

4.3

8.1

5kb

49

3.1

5.8

10kb

49

0.3

0.5

20Kb

49

3.6

6.8

40Kb

49

1.4

2.6

Total







57.0

107.6



Supplementary Table 3. Assessment of sequence coverage of the pear genome assembly using BAC sequences. The BAC sequence was taken as the query and was mapped to the assembled genome sequence. Coverage ratio (%) shows the BAC’s coverage by scaffolds, and the gap ratio (%) shows the gap length in scaffolds.


Bac/

Fosmid Id

Length

(bp)

Coverage Ratio

No. Of Alignment

Block

No. of Scaffold

Scaffold Length

(bp)

No. of Gaps

Gap Length

(bp)

Gap Ratio

(%)

BAC1_ prspcxa

118877

1.00

4

1

1172172

2

126

0.01

BAC2_ prspaxa

94558

1.00

12

1

496520

8

507

0.10

BAC3_ prspexa

100007

0.90

19

1

602150

4

339

0.06

BAC4_ prspbxa

128225

0.98

27

1

651105

1

7

0.00

BAC5_ prspgxa

118978

1.00

22

1

1563720

9

583

0.04



Supplementary Table 4. Statistics of repetitive elements in the pear. Repbas (Jurka et al. 2005) transposable elements (TEs): the result of RepeatMasker (Smit et al. 2004) based on Repbase; TE proteins: the result of RepeatProteinMask (Smit et al. 2004) based on Repbase; De novo: repeats found with de novo; Combined: combined results of Repbase TEs, TE proteins and De novo repeats.






RepBase TEs

TE Proteins

De novo

Combined TEs




Length

(bp)

% in Genome

Length (bp)

% in Genome

Length

(bp)

% in Genome

Length

(bp)

% in Genome

DNA

39,773,928

7.77

5,062,498

0.99

36,144,170

7.06

62,047,493

12.12

LINE

7,046,284

1.38

5,116,554

1.00

8,491,177

1.66

14,995,897

2.93

LTR

145,689,828

28.45

52,537,852

10.26

208,790,537

40.77

220,102,125

42.98

SINE

137,735

0.03

0

0.00

183,888

0.04

301,828

0.06

Other

1,703

0.00

0

0.00

0

0.00

1,703

0.00

Unknown

0

0.00

12,795

0.00

4,283,753

0.84

4,296,548

0.84

Total

190,708,350

37.24

62,720,884

12.25

249,240,782

48.67

271,937,641

53.10



Supplementary Table 5. The repeat content in pear. The classification was compatible with the RepeatMasker (Smit et al. 2004) program.


Type

Size (bp)

Percent (%)

DNA/Academ

6,123

0.00120

DNA/CMC-Chapaev

61,034

0.01192

DNA/CMC-Chapaev-3

6,164

0.00120

DNA/CMC-EnSpm

5,608,352

1.09546

DNA/CMC-Transib

56,761

0.01109

DNA/DNA

19,622,007

3.83271

DNA/En-Spm

1,994,298

0.38954

DNA/Ginger

196,541

0.03839

DNA/Harbinger

3,646,458

0.71225

DNA/Helitron

3,873,651

0.75663

DNA/IS

60,029

0.01173

DNA/IS4EU

1,299

0.00025

DNA/Kolobok-Hydra

61,922

0.01210

DNA/Kolobok-T2

24,388

0.00476

DNA/MULE-F

56

0.00001

DNA/MULE-MuDR

2,812,256

0.54931

DNA/MULE-NOF

345

0.00007

DNA/Maverick

193,396

0.03778

DNA/MuDR

1,070,695

0.20914

DNA/NOF

402

0.00008

DNA/Novosib

128,149

0.02503

DNA/P

63,160

0.01234

DNA/PIF-Harbinger

13,681,892

2.67245

DNA/PIF-ISL2EU

57

0.00001

DNA/PiggyBac

18,892

0.00369

DNA/Sola

303,751

0.05933

DNA/TcMar

42,519

0.00831

DNA/TcMar-Ant1

615

0.00012

DNA/TcMar-Fot1

45,426

0.00887

DNA/TcMar-ISRm11

7,020

0.00137

DNA/TcMar-Marin

2,631

0.00051

DNA/TcMar-Mariner

2,183

0.00043

DNA/TcMar-Pogo

32,881

0.00642

DNA/TcMar-Sagan

143

0.00003

DNA/TcMar-Stowaway

7,194

0.00141

DNA/TcMar-Tc1

10,513

0.00205

DNA/TcMar-Tc2

100

0.00002

DNA/TcMar-Tc4

38

0.00001

DNA/TcMar-Tigger

369

0.00007

DNA/TcMar-m44

237

0.00005

DNA/Zator

31,807

0.00621

DNA/hAT

344,494

0.06729

DNA/hAT-Ac

10,510,482

2.05298

DNA/hAT-Blackjack

1,983

0.00039

DNA/hAT-Charlie

171,762

0.03355

DNA/hAT-Pegasus

2,957

0.00058

DNA/hAT-Restles

1,590

0.00031

DNA/hAT-Restless

43

0.00001

DNA/hAT-Tag1

2,689,092

0.52525

DNA/hAT-Tip100

3,298,090

0.64421

DNA/hAT-Tol2

366

0.00007

DNA/hAT-hAT5

1,108

0.00022

DNA/hAT-hATm

20,448

0.00399

DNA/hAT-hATw

3,571

0.00070

DNA/hAT-hATx

333

0.00007

DNA/hAT-hobo

153

0.00003

LINE/Ambal

1,090

0.00021

LINE/CR1

16,635

0.00325

LINE/CR1-L2

735

0.00014

LINE/CR1-Zenon

102

0.00002

LINE/CRE

1,031

0.00020

LINE/DRE

3,516

0.00069

LINE/Dong-R4

2,085

0.00041

LINE/I

6,326

0.00124

LINE/I-Nimb

1,719

0.00034

LINE/Jockey

68,591

0.01340

LINE/L1

10,075,654

1.96805

LINE/L1-Tx1

19,055

0.00372

LINE/L2

267,363

0.05222

LINE/LINE

1,185

0.00023

LINE/LOA

2,062

0.00040

LINE/Penelope

89,787

0.01754

LINE/Proto1

572

0.00011

LINE/Proto2

3,534

0.00069

LINE/R1

46,955

0.00917

LINE/R2

45,308

0.00885

LINE/R2-Hero

53

0.00001

LINE/RTE

743

0.00015

LINE/RTE-BovB

4,408,634

0.86113

LINE/RTE-RTE

18,677

0.00365

LINE/RTE-X

3,955

0.00077

LINE/Rex-Babar

6,000

0.00117

LINE/Tad1

3,419

0.00067

LINE/Zorro

102

0.00002

LTR/Caulimoviru

1,591,831

0.31093

LTR/Caulimovirus

2,625,049

0.51274

LTR/Copia

86,429,855

16.88210

LTR/DIRS

8,222

0.00161

LTR/ERV1

274,548

0.05363

LTR/ERVK

127,235

0.02485

LTR/ERVL

7,243

0.00141

LTR/ERVL-MaLR

116

0.00002

LTR/Foamy

7,617

0.00149

LTR/Gypsy

130,449,009

25.48024

LTR/Gypsy-Cigr

65,613

0.01282

LTR/Gypsy-Gmr1

2,202

0.00043

LTR/Gypsy-Troyk

1,290

0.00025

LTR/Gypsy-Troyka

45

0.00001

LTR/LTR

38,166,045

7.45487

LTR/Lenti

1,392

0.00027

LTR/Ngaro

9,849

0.00192

LTR/Pao

111,660

0.02181

Other/Composite

540

0.00011

Other/DNA_virus

1,029

0.00020

Other/Other

134

0.00003

SINE/5S

1,193

0.00023

SINE/Alu

357

0.00007

SINE/B2

688

0.00013

SINE/B4

8,002

0.00156

SINE/C

9,293

0.00182

SINE/Deu

210

0.00004

SINE/ID

66,227

0.01294

SINE/MIR

115

0.00002

SINE/Mermaid

64

0.00001

SINE/SINE

208,975

0.04082

SINE/Salmon

517

0.00010

SINE/V

78

0.00002

SINE/tRNA-7SL

1,733

0.00034

SINE/tRNA-CR1

184

0.00004

SINE/tRNA-Glu

296

0.00006

SINE/tRNA-Lys

1,805

0.00035

SINE/tRNA-RTE

2,440

0.00048

Satellite/Satellite

350,647

0.06849

Simple_repeat/Simple_repeat

1,131,844

0.22108

Unknown/Unknown

4,296,548

0.83923

  1   2   3   4   5


База данных защищена авторским правом ©shkola.of.by 2016
звярнуцца да адміністрацыі

    Галоўная старонка