Articles https://doi.org/10.1038/s41477-019-0452-6 Musa balbisiana genome reveals subgenome evolution and functional divergence Zhuo Wang1,12, Hongxia Miao1,12, Juhua Liu1,2,12, Biyu Xu1,12, Xiaoming Yao3,12, Chunyan Xu3,12, Shancen Zhao4, Xiaodong Fang3, Caihong Jia1, Jingyi Wang1, Jianbin Zhang1, Jingyang Li2, Yi Xu2, Jiashui Wang2, Weihong Ma2, Zhangyan Wu3, Lili Yu3, Yulan Yang3, Chun Liu3, Yu Guo3, Silong Sun3, Franc-Christophe Baurens5,6, Guillaume Martin   5,6, Frederic Salmon6,7, Olivier Garsmeur5,6, Nabila Yahiaoui5,6, Catherine Hervouet5,6, Mathieu Rouard   8, Nathalie Laboureau9,10, Remy Habas9,10, Sebastien Ricci6,7, Ming Peng1, Anping Guo1, Jianghui Xie1, Yin Li   11, Zehong Ding1, Yan Yan1, Weiwei Tie1, Angélique D’Hont   5,6*, Wei Hu   1* and Zhiqiang Jin   1,2* Banana cultivars (Musa ssp.) are diploid, triploid and tetraploid hybrids derived from Musa acuminata and Musa balbisiana. We presented a high-quality draft genome assembly of M. balbisiana with 430 Mb (87%) assembled into 11 chromosomes. We identified that the recent divergence of M. acuminata (A-genome) and M. balbisiana (B-genome) occurred after lineage- specific whole-genome duplication, and that the B-genome may be more sensitive to the fractionation process compared to the A-genome. Homoeologous exchanges occurred frequently between A- and B-subgenomes in allopolyploids. Genomic varia- tion within progenitors resulted in functional divergence of subgenomes. Global homoeologue expression dominance occurred between subgenomes of the allotriploid. Gene families related to ethylene biosynthesis and starch metabolism exhibited signif- icant expansion at the pathway level and wide homoeologue expression dominance in the B-subgenome of the allotriploid. The independent origin of 1-aminocyclopropane-1-carboxylic acid oxidase (ACO) homoeologue gene pairs and tandem duplication- driven expansion of ACO genes in the B-subgenome contributed to rapid and major ethylene production post-harvest in allotrip- loid banana fruits. The findings of this study provide greater context for understanding fruit biology, and aid the development of tools for breeding optimal banana cultivars. Bananas (Musa ssp.) are large herbaceous plants that are peren- Musa acuminata (A-genome, 2n = 2x = 22; n represents the haploid nial but monocarpic. They originated in Southeast Asia and the chromosome number), Musa balbisiana (B-genome, 2n = 2x = 22), Western Pacific and were one of the first crops to be domesti- Musa schizocarpa (S-genome, 2n = 2x = 22) and Australimusa spe- cated, about 7,000 years ago, in Southeast Asia1,2. Bananas are widely cies (T-genome, 2n = 2x = 20)2. The majority of edible cultivated distributed throughout the tropics and subtropics, where they are a bananas originated from intraspecific or interspecific hybridization staple food and fruit for millions of people2,3. Moreover, bananas are between wild diploid M. acuminata (A-genome) and M. balbisiana one of the major export commodities of several developing coun- (B-genome) species. Combinations of these A- and B-genomes have tries and represent the largest international trade in fruit3,4. Thus, resulted in various genotypes of cultivated edible bananas, includ- bananas are an essential food resource and have important socio- ing diploid (AA, BB and AB), triploid (AAA, AAB and ABB) and economic and ecological roles. tetraploid (AAAB, AABB, ABBB) variants6. The triploid genotype The genus Musa belongs to the monocotyledon Musaceae family variants constitute the predominant cultivated varieties that are along with the genus Ensete. Its wild species have traditionally been planted worldwide. subdivided into four sections: Eumusa (x = 11; x represents the num- Genome sequencing of the A-genome banana has provided ber of chromosomes), Rhodochlamys (x = 11), Australimusa (x = 10) insights into the evolution of monocotyledonous plants1,9. Although and Callimusa (x = 9 or 10)5–7, and refined recently to two sections the A-genome represents a crucial step in the genetic improvement in which the Rhodochlamys and Australimusa were merged into the of banana, the lack of a high-quality B-genome sequence greatly Eumusa and Callimusa, respectively8. Most edible bananas belong to hinders germplasm characterization and the molecular breeding the Eumusa (or Musa) section, and are categorized into the dessert of banana. A draft B-genome has previously been reported, but or cooking group based on their usage. Furthermore, bananas of exhibited low quality, based on assembly and annotation via map- this section are distinguished based on their genetic background as ping reads to the A-genome2. Here, we sequenced the genome of the 1Key Laboratory of Biology and Genetic Resources of Tropical Crops, Institute of Tropical Bioscience and Biotechnology, Chinese Academy of Tropical Agricultural Sciences, Haikou, China. 2Key Laboratory of Genetic Improvement of Bananas, Hainan province, Haikou Experimental Station, China Academy of Tropical Agricultural Sciences, Haikou, China. 3BGI Genomics, BGI-Shenzhen, Shenzhen, China. 4BGI Institute of Applied Agriculture, BGI-Shenzhen, Shenzhen, China. 5CIRAD, UMR AGAP, Montpellier, France. 6AGAP, Univ Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France. 7CIRAD, UMR AGAP, Guadeloupe, France. 8Bioversity International, Montpellier, France. 9CIRAD, UMR BGPI, Montpellier, France. 10BGPI, CIRAD, INRA, Montpellier SupAgro, Montpellier, France. 11Waksman Institute of Microbiology, Rutgers, The State University of New Jersey, Piscataway, NJ, USA. 12These authors contributed equally: Zhuo Wang, Hongxia Miao, Juhua Liu, Biyu Xu, Xiaoming Yao, Chunyan Xu. *e-mail: dhont@cirad.fr; huwei2010916@126.com; 18689846976@163.com 810 NATuRe PLANTS | VOL 5 | AUGUST 2019 | 810–821 | www.nature.com/natureplants Nature PlaNts Articles double haploid of the wild diploid genotype Pisang Klutuk Wulung single-copy orthologous genes indicated a very recent diver- (DH-PKW, 2n = 2x = 22), belonging to the species M. balbisiana that gence time of about 5.4 MYA for the A- and B-genomes, with contributed the B-subgenome to cultivated allotriploid bananas. We their common ancestor having diverged from Poaceae ~134 MYA further compared the B- and A-genomes to investigate subgenome (Supplementary Fig. 5). This estimate is consistent with a diver- evolution, genetic diversity and the functional divergence of sub- gence time of 4.6 MYA between the A- and B-genomes, which genomes in polyploid bananas. Our analyses provide insights into was estimated based on 17 bacterial artificial chromosome (BAC) the evolution and regulation of fruit-ripening processes in bananas. clones that contained 23.5 Kb of coding sequence18. Our estimate In particular, the results highlight a significant contribution of the is more recent than the 20.9 MYA (estimated by three genes and B-genome towards ethylene biosynthesis and starch metabolism one internal transcribed spacer region) or the 27.9 MYA (estimated during fruit ripening. by 19F genes)19,20. However, the increased sampling of informative characters in our genome-wide study for phylogenetic analyses Results should contribute to a more accurate divergence time estimation. Genome assembly and annotation. To reduce heterozygosity, Three rounds of whole-genome duplication (WGD) events (α-, we used the DH-PKW genotype for our genome sequencing and β- and γ-WGD) have occurred in the Musa lineage1, which was assembly10. A total of 58.99 gigabases (Gb) (113×) of PacBio single- validated by our four-fold synonymous third-codon transversion molecule long reads and 86.34 Gb (166×) Illumina paired-end and (4dTv) analysis (Supplementary Fig. 6). WGD is frequently, and mate-pair reads were used for assembly (Supplementary Table 1), almost always, followed by diploidization and fractionation, which producing 492.77 megabases (Mb) of scaffolds. The contig N50 and includes chromosome rearrangement, gene loss and biased reten- scaffold N50 of the final assembly were 1.83 and 5.05 Mb, respec- tion21. After diploidization and divergence, A- and B-genomes start tively (Supplementary Table 2). K-mer analysis suggested that the to evolve independently. To investigate the evolutionary differences draft assembly covers approximately 95% of the genome size of between the A- and B-genomes after divergence, we performed DH-PKW (Supplementary Fig. 1). We further evaluated the com- three types of comparative analysis: (1) assessment of gene family pleteness of the scaffold assembly using the BUSCO (v.3) plants expansion/contraction between A- and B-genomes by comparison datasets11. Precise exon placement of 91.3% of the total 1,440 single- to 14 other plant genomes; (2) comparison of structural variation in copy orthologue groups in the embryophyta dataset was identified the A- and B-genomes by synteny analysis; and (3) comparison of in the B-genome assembly. To anchor the scaffolds to chromo- synteny between the ancestral syntenic block and the A/B-genomes. somes, we constructed high-throughput chromosome conforma- We analysed the gene family clustering and expansion/contraction tion capture (Hi-C) libraries of DH-PKW, generating 72 Gb (138×) of banana genomes of M. acuminata (A-genome) and M. balbisiana Hi-C pair-end reads (Supplementary Table 1). Duplicate removal, (B-genome), compared to 14 other plant genomes using OrthoMCL sorting and quality assessment were performed with HiC-Pro12, and CAFE22,23 (Supplementary Table 7 and Supplementary Fig. 7). and uniquely mapped valid reads were used for Hi-C scaffolding by We found 9,038 gene families that were conserved in M. balbisiana, LACHESIS software13 (Supplementary Fig. 2). As a result, 430 Mb M. acuminata, Oryza sativa, Brachypodium distachyon and Vitis (87.27%) of the assembly and 94.0% of the genes were placed vinifera. In contrast, 348 and 639 gene families were specific to the on 11 chromosome groups (Fig. 1 and Supplementary Table 3). A- and B-genome, respectively (Supplementary Fig. 8). After their The 11 pseudo-molecules were named in accordance with the divergence, 1,761 gene families were expanded and 203 were con- M. acuminata (A-genome) reference sequence1,9. tracted in the A-genome, while 392 gene families were expanded About 55.75% of the B-genome assembly was composed of repet- and 1,008 contracted in the B-genome (Supplementary Fig. 9 and itive sequences, which is higher than the 41.85% of the A-genome Supplementary Tables 8 and 9). Kyoto Encyclopedia of Genes and assembly (Supplementary Table 4). This may be due to the span- Genomes (KEGG) pathway enrichment analysis of the significantly ning of repetitive regions by long reads14. Long terminal-repeat expanded gene families (P < 0.05) in the B-genome suggested that it (LTR) retrotransposons represented the most abundant fraction of was enriched in the photosynthesis and biosynthesis of secondary transposable elements in the A- and B-genomes, among which the metabolite pathways, including those associated with the metabo- families Gypsy and Copia accounted for 12.88 and 28.04% of the lism of inositol, starch and sucrose, linoleic acid and arachidonic B-genome, respectively. DNA transposons comprised 2.12% of the acid (Supplementary Fig. 10 and Supplementary Table 10). Plants B-genome and 2.03% of the A-genome (Supplementary Table 4). produce a high diversity of secondary metabolites with prominent LTR retrotransposons tended to accumulate near the centromeric functions in defence against a variety of herbivores and pathogens, and pericentromeric regions (Fig. 1). Active insertions of LTR ret- in addition to the mitigation of various types of abiotic stresses24. rotransposons occurred more recently in the B-genome (peak at Thus, these observations are consistent with the association of the 0–0.5 million years ago (MYA)) relative to the A-genome (peak at B-genome with improved vigour and tolerance to both biotic and 1.5–2.0 MYA) after their divergence (Supplementary Fig. 3). Both abiotic stresses2. genomes experienced a wave of LTR retrotransposon amplification Synteny analysis indicated a high level of genomic colinearity and but differed in insertion histories. sequence similarity between the A- and B-genomes (Supplementary We annotated the genome using the MAKER pipeline15 incorpo- Fig. 11). We identified 72 large syntenic blocks between the A- and rating ab initio predictions, homologous proteins and transcriptome B-genomes, including 15 large blocks each containing over 900 gene data from six samples, resulting in 35,148 protein-coding genes in the pairs. These 72 syntenic blocks comprised 75.02% of A-genome space B-genome (Supplementary Table 5). Of these, 33,137 (94.27%) were (containing 23% transposable elements) and 68.01% of B-genome located on the 11 pseudo-chromosomes (Supplementary Table 3). space (containing 22% transposable elements) (Supplementary Overall, 86% of the genes transcribed in our RNA sequence (RNA- Table 11). We also identified two large translocations and two inver- Seq) analysis. Additionally, we identified 3,329 transcription factors sions between the A- and B-genome after their divergence. One large among 88 families in the B-genome using the iTAK programme16 reciprocal translocation comprises 7.09 Mb on chromosome (chr)1 (Supplementary Table 6). of the B-genome and 7.03 Mb on chr:3 of the A-genome, and one large inversion of 9.39 Mb on chr 5 of the B-genome and 8.83 Mb The B-genome is more sensitive to fractionation than the on chr 5 of the A-genome (Fig. 1 and Supplementary Fig. 11). These A-genome. Compared to other monocots, the Musa lineage exhib- translocations and inversions were also supported by the rearrange- its a relatively slower evolutionary rate as demonstrated previ- ments based on genetic mapping25, and can serve to introduce novel ously17 (Supplementary Fig. 4). Phylogenetic analyses based on 519 genetic diversity into the A- and B-genomes26. NATuRe PLANTS | VOL 5 | AUGUST 2019 | 810–821 | www.nature.com/natureplants 811 Articles Nature PlaNts Gypsy Copia Orthologous genes Genes ACHR01 ACHR02 BCHR01 ACH 3 R 0 03 20 BCHR02 20 3010 40 10 30 20 20 30 10 40 40 10 30 20 2 30 0 10 10 40 20 30 30 20 10 10 20 30 30 20 40 10 10 40 20 30 30 B R10 4 C 0 HR ACH 09 H CHR11 BC R1 A 0 BCHR11 Fig. 1 | Characterization of M. balbisiana (B-genome) and M. acuminata (A-genome) chromosomes. Elements are arranged in the following scheme (from outer to inner). (1) Distribution of Gypsy elements (non-overlapping, window size, 50 kb); (2) distribution of Copia elements (non-overlapping, window size, 50 kb); (3) distribution of orthologous gene pairs between two genomes (non-overlapping, window size, 50 kb); (4) gene density (non-overlapping, window size, 50 kb); (5) syntenic relationships between A- and B-genomes. The connecting blue lines represent alignment blocks, red lines represent inversions, green lines represent translocations and grey lines show small blocks with<30 gene pairs. Previously, 12 Musa ancestral blocks were assembled that repre- (gene loss) and diploidization processes had occurred extensively sented the ancestral genome before α/β-WGD1,27,28. We identified after WGD events in the Musa lineage (Supplementary Table 12). 97 syntenic blocks resulting from α/β-WGD events in the B-genome Taken together, our results indicated that the B-genome exhibited by comparison to the 12 Musa ancestral blocks. These blocks con- less expansion and more contraction of gene families, less syntenic tained 24,639 genes and represented 70.10% of all gene models coverage ratio and a higher singleton ratio in the ancestral blocks involved in WGD-driven regions. We also identified 100 syntenic compared to the A-genome after divergence. Cycles of WGD fol- blocks that contained 26,780 genes (75.61%) in the A-genome lowed by diploidization and fractionation have occurred across land (Supplementary Table 12). Of these ancestral Musa α/β blocks, plants, and are important in determining chromosome structure and 56.89% (15,236) and 60.59% (14,930) were singletons in the A- gene content. Consequently, these processes have significantly con- and B-genome, respectively, suggesting that genome fractionation tributed to the evolutionary success of plants21,29. The diploidization 812 NATuRe PLANTS | VOL 5 | AUGUST 2019 | 810–821 | www.nature.com/natureplants BCHR05 BCHR0 2 40 B 1 C0 HR03 30 20 10 40 30 20 10 20 10 20 10 20 10 20 30 ACH 1R 009 ACHR08 ACHR07 ACHR0 6 AC HR 05 ACHR04 10 20 30 10 20 10 40 30 20 10 30 20 08 10 BCHR CH R07 B BCHR06 Nature PlaNts Articles a FenJiao b Kamaramasenge c Pelipita 30 30 30 10 28 Mb chr01 10 28 Mb 10 chr01 10 28 Mb chr01 10 48 Mb 10 30 48 Mb 30 30 48 Mb 30 30 30 10 29 Mb 29 Mb chr02 10 29 Mb chr02 10 10 chr02 10 32 Mb 32 Mb 10 32 Mb 30 30 30 30 30 30 10 34 Mb 34 Mb chr03 10 34 Mb chr03 10 10 chr03 10 30 Mb 30 Mb 10 30 Mb 30 30 30 30 30 30 10 37 Mb 37 Mb 37 Mb chr04 chr04 10 10 10 chr04 10 42 Mb 42 Mb 10 42 Mb 30 30 30 30 30 30 10 41 Mb 41 Mb chr05 10 41 Mb chr05 10 10 chr05 10 42 Mb 42 Mb 10 42 Mb 30 30 30 30 30 30 10 37 Mb 10 37 Mb 37 Mb chr06 10 chr06 10 chr06 10 41 Mb 41 Mb 10 41 Mb 30 30 30 30 30 30 10 34 Mb 34 Mb 10 34 Mb chr07 chr07 10 10 10 chr07 36 Mb 36 Mb 10 36 Mb 30 30 30 30 44 Mb 30 44 Mb 30 10 44 Mb chr08 chr08 10 10 10 chr08 10 10 30 45 Mb 30 45 Mb 30 45 Mb 30 10 41 Mb 30 30 41 Mb 41 Mb chr09 chr09 10 10 10 chr09 10 37 Mb 37 Mb 10 37 Mb 30 30 30 30 30 30 10 37 Mb 10 37 Mb 10 37 Mb chr10 10 chr10 10 chr10 42 Mb 42 Mb 10 30 42 Mb 30 30 30 10 27 Mb 30 30 27 Mb 27 Mb chr11 chr11 10 10 29 Mb 10 chr11 10 30 29 Mb 10 29 Mb 30 30 Fig. 2 | Coverage depth and genome structure summary for three allotriploid banana accessions. a–c, Chromosome coverage and structure for accessions FenJiao (genome group, ABB) (a), Kamaramasenge (genome group, AAB) (b) and Pelipita (genome group, ABB) (c) with 100 kb non-overlapping sliding windows. The upper red bar and lower blue bar represent coverage depth of the A- and B-subgenome, respectively. and fractionation processes involve a series of evolutionary events, A-subgenome were replaced by the B-subgenome and there were including repetitive DNA loss, chromosome rearrangements and 18 segmental homoeologous exchanges on chromosomes 6, 9 and 10 complex patterns of gene loss29,30. The above evidence supports the (Fig. 2c and Supplementary Table 15). This indicates the eight A and hypothesis that the B-genome was more sensitive to fractionation 25 B chromosome constitution of Pelipita, consistent with previous than the A-genome after their divergence. genomic in  situ hybridization studies33. This classification is fur- ther supported by phylogenetic analyses based on genotyping data Genetic diversity in polyploid bananas and functional divergence (Supplementary Fig. 12). A total of 18,475,661 single-nucleotide of subgenomes. Polyploid species usually exhibit vigorous growth, polymorphisms (SNPs), 1,425,391 small insertions and deletions including high-quality production and high fitness31. Most banana and 220,452 structural variations were identified in the samples cultivars are polyploid and exhibit various levels of ploidy and (Supplementary Tables 16–18). Analysis of gene and SNP density on genomic background32. The genetic classification of some bananas the chromosomes indicated that SNPs were preferentially located on is discordant, as is the case for Pelipita. Previous studies have shown non-gene-rich regions (Supplementary Fig. 13). There were ~2.5- that its karyotype comprises eight A and 25 B chromosomes as fold SNPs on the A-genome of Pisang_Mas and Pisang_Kra com- opposed to the predicted 11 A and 22 B chromosome distribution33. pared to the B-genome of Balbisiana (Supplementary Table 16). The Understanding the genetic diversity and genomic constitution of nucleotide diversity (π,0.0059) of A-subgenomes was higher than Musa accessions would inform genomic group classifications, in that of the B-subgenomes (0.0031) in accessions Fenjiao, Pelipita addition to conservation and breeding strategies. Therefore, we re- and Kamaramasenge. sequenced five triploid bananas and four diploid bananas to investi- Gene family expansion and contraction analysis of M. acumi- gate their genetic diversity (Supplementary Table 13). nata and M. balbisiana in comparison to other sequenced genomes Simultaneous alignment of re-sequencing data to the A- and indicated that there are 83 gene families significantly expanded in B-subgenomes identified the uniquely mapped reads that were the A-genome (and conversely contracted in the B-genome). These used to analyse coverage depth, variations calling and homoeolo- families included plant–pathogen interactions, glycosphingolipid gous exchanges on each chromosome (Supplementary Table 14). biosynthesis-ganglio series and glycosaminoglycan degradation Homoeologous exchanges were characterized by read coverage pathways among others (Supplementary Table 19). Conversely, that showed a chromosomal region with a duplicated copy from 33 gene families were significantly expanded in the B-genome the corresponding homoeologous subgenome34. These analyses (and contracted in the A-genome). These families included those confirmed that genome constitutions for the banana accessions involved in photosynthesis, metabolic pathways and ribosome, were, in most cases, consistent with previous genome group classi- among others (Supplementary Table 20). This indicates that the A- fications based on morphological traits. We identified 48 segmental and B-genomes may have functionally diverged at the genome level homoeologous exchanges in the accession FenJiao, including nine during their respective genome evolution. from the B- to the A-subgenome and 39 in the reverse direction To explore the transcription of allopolyploid subgenomes, we (Fig. 2a and Supplementary Table 15). We also found four segmen- assessed the expression of homoeologue genes from the A- and tal homoeologous exchanges from the B- to the A-subgenome in the B-subgenomes of the triploid FenJiao. Expression levels were mea- accession Kamaramasenge, and replacement of chromosome 10 of sured within different tissues, at different stages of fruit development the B-subgenome by the A-subgenome. (Fig. 2b and Supplementary and ripening and in banana seedlings after abiotic stress treat- Table 15). For the accession Pelipita, chromosomes 2, 7 and 11 of the ments (Supplementary Table 21). A total of 25,717 homoeologous NATuRe PLANTS | VOL 5 | AUGUST 2019 | 810–821 | www.nature.com/natureplants 813 Articles Nature PlaNts gene pairs were identified between the A- and B-subgenomes based five gene pairs were dominantly expressed in the A-subgenome of on the gene alignment method with cumulative identity percent- FJ (Fig. 3b and Supplementary Tables 33 and 34). The dominant age (CIP) ≥ 60% and cumulative alignment length percentage expression of these genes in various tissues may be related to the (CALP) ≥ 60% (refs. 35,36). Among the homoeologue gene pairs, fundamental role of ethylene biosynthesis. 81.83% were further supported by syntenic analysis (Supplementary Both ACS and ACO have previously been demonstrated as Table 22). Expression of all homoeologue gene pairs was assessed to limiting enzymes within ethylene biosynthesis41. The expression determine the distribution of expression fold change of B/A in the abundance of MA-ACS1 (AB021906) and MA-ACO1 (X91076) is triploid ‘FJ’. The log2(RPKM B/RPKM A) was 1.2/1, where RPKM consistent with ethylene production during the post-harvest banana- is reads per kilobase million, which differed from the genomic ripening period43. Of the ten ACS gene pairs, MaACS7/MbACS7, constitution value of 2/1 (Supplementary Fig. 14). This result which is a homologue of MA-ACS1, exhibited high expression could be explained by dosage compensation, wherein the triploid levels during fruit ripening and was dominantly expressed in the expression levels are reduced to a diploid state37,38. We further char- B-genome (Fig. 3b). MbACS6 and MbACS7 are paralogous in a acterized 1,075 and 4,032 homoeologue gene pairs that showed large syntenic block (the block contains 19 gene pairs), and main- expression dominance in the A- and B-subgenome, respectively tain synteny and close evolutionary relationships to MaACS6 and (Supplementary Table 23). KEGG enrichment analysis indicated MaACS7, respectively, suggesting that these genes duplicated that genes with expression dominance in the B-subgenome were from WGD (Fig. 3c and Supplementary Fig. 18). Of the nine ACO associated with 2-oxocarboxylic acid metabolism and the arginine gene pairs, three (MaACO2/MbACO6, MaACO3/MbACO7 and biosynthesis pathway (q-value < 0.05) (Supplementary Table 24), MaACO8/MbACO13) exhibited high expression levels during whereas those showing expression dominance in the A-subgenome fruit ripening and were dominantly expressed in the B-subgenome were not significantly enriched in KEGG pathways. (Fig. 3b). MaACO2/MbACO6 and MaACO3/MbACO7 are Non-synonymous/synonymous substitution (Ka/Ks) ratios in the syntenic block of chr:5 and chr:6 between the A- and were calculated for all homoeologue gene pairs between the A- and B-genome, respectively, and belong to the same phylogenetic clade B-subgenomes. The Ka/Ks ratios of genes with expression domi- (Fig. 3d,e and Supplementary Table 35). These results indicated that nance in the A-subgenome (median, 0.157) were slightly lower MaACO2/MbACO6 and MaACO3/MbACO7 originated indepen- than those in the B-subgenome (median, 0.196) and non-dominant dently and developed crucial functions during fruit ripening. genes (median, 0.186) (Supplementary Fig. 15). We also observed high expression levels (log2RPKM > 4 in at We then constructed a gene co-expression network for those least one stage of fruit ripening in the B-genome) of homoeologue genes with expression dominance using weighted gene co-expres- gene pairs, probably related to fruit softening (pectin methylester- sion network analysis (WGCNA)39. The results indicated that 87 ases, galactosidases, expansions and pectatelyase)44, cell wall modi- and 295 genes with dominance expression interacted with 4,302 and fication (xyloglucan endotransglucosylase/hydrolases, fasciclin-like 4,612 genes in the A- and B-subgenome, respectively. KEGG path- arabinogalactan proteins and β-d-xylosidase)45,46 and aroma produc- way enrichment analysis suggested that genes in the co-expression tion (alcohol dehydrogenases)40. These genes are closely involved in network of the A- and B-subgenomes were commonly associated fruit ripening and are regulated by ethylene44. Almost all of these with starch and sucrose metabolism (ko00500) and other metabolic gene pairs showed expression dominance in the B-subgenome of FJ pathways. In particular, ubiquinone and other terpenoid-quinone during fruit ripening, which is the same as the dominance expres- biosynthesis, photosynthesis–antenna proteins, carotenoid biosyn- sion of gene pairs related to ethylene biosynthesis (Supplementary thesis and other glycan degradation pathways were specially enriched Fig. 19 and Supplementary Tables 36 and 37). This co-dominance of in the A-subgenome (q-value < 0.05) (Supplementary Fig. 16 and homoeologue gene pairs in the B-subgenome further supports the Supplementary Table 25). In contrast, selenocompound metabo- significant contribution of the B-genome to ethylene biosynthesis lism and cyanoamino acid metabolism pathways were particularly and fruit ripening. enriched in the B-subgenome (q-value < 0.05) (Supplementary Gene duplication is a major mechanism that generates new Fig. 17 and Supplementary Table 26). Overall, these results further genetic diversity as a basis for evolutionary innovation in eukary- support the hypothesis of functional divergence between the A- and otes47. Compared to the 11 ACO genes within the A-genome, the B-genomes at the transcriptional level. ACO genes in the B-genome expanded significantly to 18 members. The expansion of ACO genes, including MbACO2, -3, -4, -5 in chr 3, Expression dominance of homoeologue gene pairs in the ethyl- MbACO8, -9, -11 in chr 6 and MbACO16, -18 in scaffolds, was ene biosynthesis pathway and expansion of the ACO family affect driven by tandem duplications in the B-genome (Fig. 3b,d). Of the fruit ripening. Ethylene plays a key role in the regulation of climac- ACO genes that were expanded in the B-genome, MbACO2 and teric fruit ripening post-harvest40. The core enzymatic steps in the MbACO3 showed strong expression levels during the fruit-ripen- ethylene biosynthesis pathways are well characterized, and include ing stages with log2RPKM > 11 at 6 days post-harvest (DPH) in FJ, S-adenosyl-l-methionine synthase (SAMS), 1-aminocyclopropane- which is coincident with the ethylene climacteric period (Fig. 3b,f 1-carboxylic acid synthase (ACS) and ACO41,42 (Fig. 3a). However, and Supplementary Table 38). In addition, MbACO8, -9, -11, -16, the expansion and expression dominance of these homoeologue -18 exhibited high expression levels in roots and fruits at 0 days after gene pairs during polyploid fruit ripening remains largely unknown. flowering (DAF) (Fig. 3b and Supplementary Table 38). These genes We identified 12 SAMS, 11 ACS and 11 ACO genes from belonged to the same cluster and their expression patterns were the A-genome and 10 SAMS, 11 ACS and 18 ACO genes in the highly concordant with their duplication (Fig. 3d,e), suggesting that B-genome, which represents a significant expansion compared the expansion and evolution of ACO genes in the B-genome con- to the seven other sequenced plant species among the mono- tributed to tissue development and fruit ripening. cots and eudicots22 (Supplementary Tables 27 and 28). We fur- Ripening of FJ was more rapid than BX during post-harvest rip- ther characterized 28 homoeologue gene pairs from the A- and ening. BX required 8 and 14 DPH to reach the more-green-than- B-genomes (Supplementary Table 29). These gene pairs displayed yellow and full-yellow stages, respectively, whereas FJ required similar expression profiles in the BaXiJiao (Musa AAA group, cv. 3 and 6 DPH to reach these stages, respectively (Supplementary Cavendish, BX), the A-subgenome of FenJiao (Musa ABB group, Fig. 20). The dominant expression of ethylene biosynthesis and fruit cv Pisang Awak, FJ) and the B-subgenome of FJ (Fig. 3b and ripening-related gene pairs and the expansion of the ACO family in Supplementary Tables 30–32). Interestingly, eight gene pairs exhib- the B-genome could have contributed to increased ethylene produc- ited homoeologue expression dominance in the B-subgenome and tion and faster fruit ripening in FJ compared to BX. 814 NATuRe PLANTS | VOL 5 | AUGUST 2019 | 810–821 | www.nature.com/natureplants Nature PlaNts Articles a b Methionine SAMS MaSAMS1 MaSAMS2 * MaSAMS1 MbSAMS1 MaSAMS2 MbSAMS2 MaSAMS3 MaSAMS3 MaSAMS4 S-adenosyl- * MaSAMS4 MbSAMS3 * * * * * L-methionine MaSAMS5 MaSAMS5 MbSAMS4 MaSAMS6 MaSAMS7 * * * * * * * MaSAMS6 MbSAMS5 MaSAMS7 MbSAMS7 MaSAMS8 MaSAMS8 * * * * 5 MaSAMS9 MaSAMS9 MbSAMS6 ACS MaSAMS10 MaSAMS11 * * * * * MaSAMS10 MbSAMS8 MaSAMS11 MaSAMS12 MaSAMS12 MbSAMS10 0 MbSAMS9 MaACS1 MaACS1 MbACS2 MaACS2 MaACS2 MbACS3 1-aminocyclopropane-1-carboxylate MaACS3 MaACS3 MbACS1 −5 MaACS4 MaACS4 MbACS4 MaACS5 MaACS5 MbACS5 MaACS6 MaACS6 MbACS6 ACO MaACS7 MaACS7 MaACS8 * * * * MbACS7 −10 MaACS8 MbACS9 MaACS9 MaACS9 MbACS10 MaACS10 MaACS10 MaACS11 MaACS11 MbACS11 H H MbACS8 MaACO1 MaACO1 * MbACO1 C=C MaACO2 MaACO2 * * MbACO6 MaACO3 MaACO3 * * * ** ** MbACO7 H H MaACO4 MaACO4 MbACO14 MaACO5 MaACO5 MbACO10 MaACO6 MaACO7 * MaACO6 * * MbACO17 MaACO7 MbACO12 MaACO8 MaACO8 MbACO13 MaACO9 MaACO9 * * * * * MaACO10 MaACO10 MaACO11 MaACO11 * * MbACO15 MbACO9 MbACO11 MbACO2 MbACO4 MbACO18 MbACO8 MbACO16 MbACO3 MbACO5 c MaACS1 d MaACO1 MbACS2 MaACS2 MbACO1 MaACO2 MaACS3 MbACS1 ACH MbACO5 R01A AC CH MbACO4 HR01 AC R01 R0 2 2 HR MaACO3 3 A .54 C 0 . BCH H 3 9 BCHR01 05 1 . . 9 9 5 2.01 MbACS3 01 R 1.1 1 MaACS4 2 MbACO3 7 4 3 03 2 0 .0 1.12 .7 0 MbACO2 1 6 .9 8 1.13 26.28 .3 6 2.0 9.84 7 MaACO4 40.7 MbACS9 7 MaACS5 40. 26.29 MbACO6 6.3 9.85 6 3 7 9 6 .7 MaACO5 9 26.30 31.78 6.3 9.86 5 39.78 MaACS6 MbACO1210.38 MaACO6 MbACS8 31.79 MbACO11 9.87 MbACS7 39.77 37.23 MbACO10 10.37 31.80 MaACO7 34.3 9 1 .88 37.22 MbACO9 34 10.36 .32 MbACO8 9. MbACS6 21 2 8 37. 7 9 .68 31.39 3 3 4 MaACS7 5 .3 10.35 2.16 .3 3 3 27 1.38 .6 3 3 MbACO7 9 5.3 MaACS8 MbACS5 3 4 31.37 2.15 27 MaACO8 7.33 .70 ACHR08 16 1.25 R05 H 6 5 B C . .3 .5 C A 32 5 1.2 4 7 HR 7.3 2 BCHR MaACO9 05 R09 MaACS9 08 MbACS4 BC ACH MbACO13 HR B 1 CH 0 R10 ACHR10 ACHR10 MaACO10 MbACS10 MaACS10 MbACS11 MbACO14 MaACO11 MaACS11 MbACO15 e f 18 O1 M bA C 1 a O AC MaAC Mb M O13 O 8 A 3 0.2 CO7 MbAC 8 CO MaA MaAC 1 O 00 2 95.6 99.6 1 M 0 bA 0 C 7 O6 6 99 . .8 9 10 0 52 1 . 01.3 9 98.5 9.2 7.4 100 99.6 9 77.4 100 1 71.124.6 87.5 . 9 2 9.7 86.3 99.6 12 S. bicolor 48 89 .7 100 BX S. lycopersicum M. acuminata FJ M. balbisiana P. dactylifera A. thaliana A. officinalis 6 Mb B. distachyon M A O. sativa b C A O C 10 99.3 M O bAC 8 9 O 2.95 9.3 11 100 9 78 .2 M 9 a .6 A 92 CO6 96 96.1 M 8 aA 5 CO5 959739... 5786. 185.4 10 0 MbACO9 MbACO16 0 CO10 MaA MaACO9 BX_0 DAF FJ_ 0D AF BX_2 0D AF FJ_ 20 DAF BX_0 DPH FJ_ 0D PH BX_8 DPH FJ_ 3D PH BX_1 4D PH FJ_ 6D PH MaACO4 Fig. 3 | Phylogeny and expression patterns of ethylene biosynthesis genes between M. acuminata (A-genome) and M. balbisiana (B-genome). a, Overview of the ethylene biosynthesis pathway. b, Expression patterns of SAMS, ACS and ACO family genes in the root and leaf, and at different stages of fruit development and ripening in BX, the A-subgenome of FJ and the B-subgenome of FJ. Genes aligned horizontally in the heat map indicate homoeologue gene pairs between the A- and B-genomes. White boxes with diagonals indicate the lack of homoeologue gene pairs between the A- and B-subgenomes. Asterisks indicate expression dominance of homoeologue gene pairs between the A- and B-subgenomes of FJ. c,d, Synteny analysis of ACS (c) and ACO (d) families between the A- and B-genomes. Red lines indicate paralogous gene pairs resulting from WGD, blue lines indicate homoeologous gene pairs, purple lines indicate tandem duplication, light blue strips indicate aligned syntenic blocks, light green strip indicates translocation block and light red strips indicate inversion blocks.The blocks in outer ring represent location and length of genes; blue blocks represent genes from A-genome and orange blocks represent genes from B-genome. e, Phylogenetic analysis of ACO family genes among nine species: M. acuminata, M. balbisiana, A. thaliana, O. sativa, Sorghum bicolor, Solanum lycopersicum, Phoenix dactylifera, Asparagus officinalis and B. distachyon. f, Ethylene production at different stages of fruit development and ripening in BX and FJ. Error bars show standard error of the mean from three independent experiments (n = 3). NATuRe PLANTS | VOL 5 | AUGUST 2019 | 810–821 | www.nature.com/natureplants 815 1 BC 6 . H. 2 8 3 R4 02 6.83 6 3 .8 .4 2 2 M 3.b 41 A M Cb O M A 1 b C 8 A O 3.40 M C 1 bA O 7 C 1 O 4 MbA 2 C 5.64 O3 100 5 78 .. 62 599.9 89.6 80.541.6 92.2 36.2 5. 100 2 6 0 6 .04 98.6 20.05 85.3 97.5 2 3 00 .. 0 5 6100 88.7 7 9 5 80 3 .1 .9 3 2 93.8 0.53 M M bb A A C C O O 5 4 BX_Root BX_Leaf BX_0DAF BX_20DAF BX_0DPH BX_8DPH BX_14DPH Ethylene production (ng g–1 h–1) FJ_Root BC FJ_leaf HR05 FJ_0DAF B 1 7 C FJ_20DAF4 .. 3 7 1 H 8 R0 FJ_0DPH3 1 FJ_3DPH 4.77 FJ_6DPH 14. 1 7 0 6 .53 10.52 8.34 FJ_Root 8.3 FJ_leaf 5 FJ_0DAF 8. FJ_20DAF 6 3 . 6 78 FJ_0DPH 6 FJ_3DPH .79 FJ_6DPH 6 2 .. 8 0 0AC 0 HR07 ACH R06 16 .58 16 .59 27 .94 27.9 5 27.96 10.51 16.30 16.2 9 16. 28 9.9 7 9.9 6 HR07 BC BCH R06 ACHR04 6.3 3 6.3 4 20 .08 20. 09 89.2 99.2 99.3 20.1 0 99 .6 34.7 6 MbAC O15 98 .7 93. 8 MaACO 11 86. 6 100 78.2 MbACO1299.934.77 97.1 MaACO7 34.78 9 72. 13.45 4 13.4 10 0 43 .81 7 13.4.4 9 8 4.4 4.4 7 23 .0 2 BCHR0 4 Articles Nature PlaNts a b c Light reaction SWEET Calvin cycle 12 SUT MaAMY1 MaAMY1 MaAMY2 MaAMY2 * * * MbAMY3 MbAMY6 MaAMY3 MaAMY3 Sucrose * * * MbAMY7 10 SuSy MaAMY4 MaAMY4 * * MbAMY8 SuSy MaAMY5 MaAMY5 10 SWEET SUT 8 MaAMY6 MaAMY6 * * MbAMY2 UGPase MaAMY7 MaAMY7 MbAMY11 UDPG MaAMY8 MaAMY8 MbAMY12 5 6 AGPase MaAMY9 MaAMY9 MbAMY13 UGPase MbAMY1 UGPase MbAMY4 0 UDPG 4 G-1-P GBSS MbAMY5 AGPase MbAMY9 2 −5 SSS MbAMY10 ADPG MaBMY1 MaBMY1 * * * MbBMY2 SBE MaBMY2 MaBMY2 * MbBMY3 −10 DBE GBSS SBE MaBMY3 MaBMY3 MbBMY4 SSS SSS MaBMY4 MaBMY4 * * * MbBMY5 MaBMY5 MaBMY5 MbBMY1 DBE MaBMY6 MaBMY6 * * * MbBMY6 Amylopectin Starch Amylose MaBMY7 MaBMY7 MbBMY8 AMY MaBMY8 MaBMY8 MbBMY7 MaBMY9 AMY MaBMY9 MbBMY9 MaBMY10 MaBMY10 BMY MaBMY11 MaBMY11 * * * MbBMY11 MaBMY12 Linear glucans MaBMY12 MbBMY12 MaBMY13 MaBMY13 BMY DPE MbBMY10 MbBMY13 Maltotriose Maltose a a MaDPE1 MaDPE1 ina t MbDPE1 ian at iva MaDPE2 MaDPE2 * * MbDPE2 . a cu m alb is . b A. t ha lia na S. ly co pe rs icu m P. p er sic a P. t ric ho ca rp a V. v ini fe ra M B. d ist ac hy um DPE DPE s O. Glucose Heteroglucan M d MaAMY1 e MaBMY1 MbBMY2 MbAMY5 MaBMY2 MaAMY2 MbAMY4 MbBMY1 MaBMY3 ACHR MaAMY3 03 ACH MbAMY3 R01 A MbBMY4 01 CH MaBMY4 A R 2 BCHR 0 C 2 5. HR MaAMY4 00 0 24 .44 2 6 0. . 5 3 4 61 2 6.45 MaBMY5 4 0 31.50 .9 24.45 5.97 9 MbBMY3 .55 MbAMY2 MaAMY5 31.51 20. 6 5 .6 4 5 31.52 4.98 5.98 20.27 6.64 MbBMY5 20.28 4.97 5.99 6 MaBMY6 MbAMY1 7.43 .6 2 3 5.06 20.29 5.70 4.96 29.5 7.44 5 25.0 MaBMY7 5 MaAMY6 5.71 MbBMY6 7.45 25.04 29.54 7.48 5.72 28.21 7.08 MbAMY8 MaBMY8 7.47 28. 7 2 . 2 09 29.53 MbBMY8 7.46 7 MbAMY7 MaAMY7 .1 5 0 2 .30 29.52 6.09 28 . . 823 7 6.08 5.31 MaBMY9 MbBMY7 29.51 2.88 6.07 5 2 .3 MbAMY6 6 2 2 MaAMY8 30.27 .19 29.50 3 . 6 8 7 26. . 9 2 7.30 BC 4 30.26 4 MbBMY10 5 2 0 MaBMY10 HR0 3 6 6. 30.2 5.63 AC 8 7.29 HR05 B . C 0 .21 45 H MbAMY10 R 8 7 0 0 7 B HR BC MaAMY9 HR C 07 H CHR08 BCHR10 MbBMY9 R AC MaBMY11 B 08 BCHR10 ACHR10 MbAMY9 MaBMY12 MbBMY11 MbAMY11 MbAMY13 MbAMY12 MbBMY12 MaBMY13 MbBMY13 f g 90 60 BX FJ BX_0DAF BX_20DAF BX_80DAF BX_8DPH BX_14DPH 30 0 10µm FJ_0DAF FJ_20DAF FJ_80DAF FJ_3DPH FJ_6DPH 0D AF BX_ FJ_ 0D AF BX_2 0D AF FJ_ 20 DAF BX_0 DPH FJ_ 0D PH BX_8 DPH FJ_ 3D PH BX_1 4D PH FJ_ 6D PH Fig. 4 | Comparison of genomic expansion, evolutionary history and differential expression patterns of the starch metabolic pathway between M. acuminata (A-genome) and M. balbisiana (B-genome). a, Overview of the starch biosynthesis and degradation pathway. b, Gene families in the starch metabolic pathway that are expanded in M. acuminata and M. balbisiana. c, Expression patterns of families AMY, BMY and DPE in the starch degradation pathway in BX, the A-subgenome of FJ and the B-subgenome of FJ during fruit-ripening stages. Horizontally oriented genes in the heat map indicate homoeologue gene pairs between the A- and B-genomes. White boxes with diagonals indicate that no homoeologue gene pairs were identified between the A- and B-genomes. Asterisks indicate expression dominance of homoeologue gene pairs between the A-subgenome of FJ and the B-subgenome of FJ. d,e, Synteny analyses of AMYs (d) and BMYs (e) between the A- and B-genomes. Red lines indicate paralogous gene pairs resulting from segmental/WGD- driven duplication, blue lines indicate homoeologous gene pairs, purple lines indicate tandem duplication, light blue strips indicate aligned syntenic blocks, light green strip indicates translocation block and light red strips indicate inversion blocks. The blocks in the outer ring represent location and length of genes; blue blocks represent genes from A-genome and orange blocks represent genes from B-genome. f, Starch contents at different stages of fruit development and ripening in BX and FJ. Error bars show standard error of the mean from three independent experiments (n = 3). g, Scanning electron microscopy of starch granules at different stages of fruit development and ripening in BX and FJ. The experiment was repeated three times independently with similar results. 816 NATuRe PLANTS | VOL 5 | AUGUST 2019 | 810–821 | www.nature.com/natureplants Starch content (%) 7 7 .. 2 1 8 BC 9 HR 7 0 .1 3 8 7. 6 1 . 7 89 6.88 6.87 3.04 4.88 4.89 4.9 2 0 4.40 24.41 A 2C 4.4 H 2R1 20 4.4 A 3 CHR08 ACHR07 BX_0DPH BCHR04 BCHR BX_8DPH 5 0. 3 62 B BX_14DPH 2 5 C8 . H. 6 1 1 R0 2 1 2 8.1 2 0 8 1 . 2 0 . 9 88 12.87 FJ_0DPH 12. 1 8 4 6 .2 FJ_3DPH 2 FJ_6DPH 14.21 14.20 7.68 7.69 7.70 17.52 FJ_0DPH 17.53 FJ_3DPH 17 FJ_6DPH 2 .3 5 . 4 42 23.4 2 3 6 3 . . 4 44 6 3 A . C 4 H 4 R06 ACH R04 ACHR05 log (RPKM) 2 HR 03 8. 08 AC 8.0 9 15 .53 15. 54 15. 55 24.3 0 24.3 1 24.32 15.61 5.601 15.5 9 3.4 5 4 3.4 3.4 3 26 .33 HR06 BC 26 .3 2 BCHR05 ACHR 05 36 .46 1.6 1 1.62 1.63 Gene number 3.02 3.03 5.05 5.04 5.0 3 5.0 2 1 5.0 BCHR 04 Nature PlaNts Articles The active starch metabolic pathway in the B-genome during genes related to starch degradation in the B-subgenome also prob- fruit development and the ripening process. Starch is the most ably led to elevated starch degradation in FJ (Fig. 4c,f,g). Therefore, widespread and abundant storage carbohydrate in plants. It is also the active starch metabolic pathway in the B-genome leads to a major component of cultivated banana, accumulating to high lev- marked starch accumulation and degradation during the fruit els (~60–75% of dry weight) and leading to the presence of large development and ripening processes, respectively. starch granules (~8–30 m) during banana fruit development, along with near-complete conversion to soluble sugars during post-har- Discussion vest ripening48–51. Thus, banana could serve as an excellent model Most cultivated bananas are triploid, having evolved from two for the investigatation of starch metabolism in fresh starchy fruits. wild diploid species, M. acuminata and M. balbisiana6. Our previ- The major enzymes that are responsible for starch biosynthesis ous M. acuminata genome sequencing efforts provided genomic (sugars will eventually be exported transporter: SWEET; sucrose resources to inform banana breeding1. However, there is also an transporter: SUT; sucrose synthase: SuSy; UDP-glucose pyrophos- urgent need to develop the genomic resources of M. balbisiana, phorylase: UGPase; ADP-glucose pyrophosphorylase: AGPase; which is a crucial contributor to cultivated varieties. Improved granule-bound starch synthase: GBSS; soluble starch synthase: SSS; understanding of the B-genome structure, subgenome evolution, starch branching enzyme: SBE; and starch debranching enzyme: homoeologous exchange, genetic diversity in polyploidy bananas DBE) and degradation (α-amylase: AMY; β-amylase: BMY; and and gene expansion and expression patterns will help in the design starch phosphorylase: DPE) are encoded by multigenic families51–56 and application of breeding strategies for novel banana cultivars (Fig. 4a). We identified 101 starch metabolism-related genes in the with improved traits. A-genome, including 77 in the starch synthesis pathway and 24 in Cycles of WGD followed by diploidization events have occurred the starch degradation pathway. Ninety-six such genes were identi- across land plants, and have significantly contributed to their evo- fied in the B-genome, including 68 in the starch synthesis pathway lutionary success24,25. Our results indicate that the B-genome may and 28 in the starch degradation pathway (Supplementary Table 39). be more sensitive to fractionation than the A-genome after WGD, In the starch synthesis pathway, five (SuSy, GBSS, SSS, SBE although the A- and B-subgenomes have diverged very recently. and DBE) out of nine families showed obvious expansion in the Variation in genomic structure between the A- and B-genomes A- and B-genomes of banana, compared to seven other plant consists of chromosome rearrangements and gene loss during species that included two fruit plants, tomato and grape. These diploidization, which has resulted in the functional divergence of expansions suggest a potential significance in banana (Fig. 4b subgenomes in polyploidy bananas. This divergence is supported and Supplementary Table 40). We characterized 54 homoeologue by differential enrichment of expanded/contracted gene families gene pairs from these families in the A- and B-genomes. Of these, between the A- and B-genomes and the expression dominance 27 homoeologue gene pairs had expression dominance in root, leaf of homoeologue genes from A- and B-subgenomes in triploids. and fruit tissues with seven dominant in the A-subgenome and 20 Although homoeologue expression dominance has been identi- in the B-subgenome (Supplementary Fig. 21 and Supplementary fied in certain polyploid species57–60, the relationship between Tables 41–45). Consequently, the starch synthesis pathway is more homoeologue expression dominance and functional divergence active in different tissues within the B-subgenome than in the (especially in regard to ethylene biosynthesis/starch metabo- A-subgenome. Of those gene pairs with dominant expression in the lism) of subgenomes in triploids remains to be elucidated. Thus, B-subgenome, MbSWEET17, MbSuSy1, MbSuSy2 and MbSuSy9 these results provide an important basis for the improvement of had high expression levels (log2-based RPKM > 6) at 0 DAF and agriculturally important traits by focusing selection on transcrip- 20 DAF, suggesting an important role in starch synthesis during tionally dominant genes. It is worth noting that homoeologous fruit development (Supplementary Fig. 21 and Supplementary exchanges may obscure the signal of expression dominance in Table 43). In contrast, most of the starch synthesis-related genes subgenomes of allopolyploids61. The extensive homoeologue that were unique to the A- or B-genome exhibited low expression exchanges in allopolyploid bananas may remove many progenitor levels (Supplementary Fig. 22). genome conflicts that result in subgenome biases in gene content Within the starch degradation pathway, genome annotation and expression. Thus, homoeologous exchanges may contribute indicated that families AMY and BMY had an obvious expansion to the diversity of homoeologue expression dominance and in the banana A- and B-genomes compared to seven other plant induce a series of rapid genetic and epigenetic modifications for species (Fig. 4b and Supplementary Table 40). We characterized agronomic traits61. 21 homoeologue gene pairs within this pathway from the A- and Previous studies have suggested an important role of ethylene B-genomes. Among these gene pairs, 11 had dominant expression production in fruit ripening and starch metabolism with regard to and were associated with the B-subgenome during fruit ripening fruit quality post-harvest27,43,55. However, the genetic mechanisms (Fig. 4c and Supplementary Tables 46–50). Among the dominant underlying polyploid fruit ripening are less well known. Here, we genes, MbAMY-2, MbAMY-3, MbAMY-8, MbBMY-6, MbBMY-8 analysed biological processes related to these two pathways in trip- and MbDPE-2 exhibited high expression (log2-based RPKM > 6 in loid bananas. We identified significant genomic expansions and at least one stage) during fruit ripening (Fig. 4c and Supplementary dominant expression of homoeologue genes in the B-genome at Table 48). MbAMY-1, -2, -3, -4, -5 and MbAMY-6, -7, -8 showed close the pathway level of gene families and, most notably, in the ACO proximity and an evolutionary relationship, suggesting that these family, known to play a critical role in ethylene production. Our genes duplicated from a tandem copy (Fig. 4d and Supplementary analysis revealed the origin and evolution of crucial gene families Fig. 23). MbBMY-5, -8 and MbBMY-6, -12 are two paralogous pairs in these pathways, particularly for the independent origin of the in syntenic blocks of the B-genome and showed close evolutionary MaACO2/MbACO6 and MaACO3/MbACO7 gene pairs and their relationships, suggesting that these genes duplicated from WGD specific function in fruit ripening. Moreover, we identified that this (Fig. 4e and Supplementary Fig. 24). Taken together, these results tandem event has led to expansion of ACO genes in the B-genome suggest that tandem-driven AMY duplication and WGD-driven and to strong expression of these genes during fruit ripening. Our BMY duplication in the B-genome contribute to starch degradation. analysis also indicated a potential link between the dominant expres- The dominant expression of genes involved in starch biosynthe- sion of homoeologue genes and the expansion of gene families with sis in the B-subgenome could have led to increased starch accumu- fruit ripening and starch metabolism. Previous studies have dem- lation during fruit development in FJ relative to BX (Fig. 4f,g and onstrated that the B-genome is associated with improved vigour Supplementary Fig. 21). In addition, the dominant expression of and tolerance to both biotic and abiotic stresses. Consequently, the NATuRe PLANTS | VOL 5 | AUGUST 2019 | 810–821 | www.nature.com/natureplants 817 Articles Nature PlaNts B-genome is a target for banana breeding programmes2. Here, we and PlantTFDB71,72, with families from PlantTFcat73 and AtTFDB74 used as highlight that the B-genome is of great importance in ethylene bio- supporting evidence. In total, we identified 3,329 transcription factor genes in synthesis and post-harvest banana ripening, which will further our M. balbisiana and 3,899 in M. acuminata (Supplementary Table 6). understanding of ripening mechanisms in polyploid fruit. Gene family analysis. A total of 500,142 genes from 16 plant species with The M. balbisiana genome assembly, along with our previously available whole-genome sequences were used for gene family clustering analysis. acquired M. acuminata genome data, may aid in the discovery of BLAST67 (v.2.2.26; -p blastp) was used to generate pairwise protein sequence cultivar-specific sequences that are related to important cultivar- alignments with E-values of < 1 × 10–5. Then, OrthoMCL22 was used to cluster specific traits, including shelf life, quality and stress tolerance. similar genes by setting the main inflation value at 1.5 and using default settings for the other parameters. These analyses resulted in 39,358 gene families Thus, these resources can be used to build molecular characteriza- comprising 393,700 genes from the 16 species (Supplementary Table 7 tion strategies for various cultivars to assist in rapid quality control and Supplementary Fig. 7). and the conservation of biodiversity. The data from this study also We identified 519 single-copy gene families shared among the 16 species, pave the way for whole-genome association studies, germplasm and constructed a phylogenetic tree using MrBayes (v.3.1.2)75 software with the improvement and genetic modification of bananas to meet increas- general time-reversible model (Supplementary Fig. 4). Divergence times for the 16 species were also estimated based on fourfold degenerate sites of all single- ing commercial demands. These genomic resources and results also copy orthologous genes using the MCMCTree programme in the PAML package reinforce the use of the banana as an appropriate model to study (v.4.4)76 (Supplementary Fig. 5). subgenome evolution and fruit biology in triploid variants. Due to CAFÉ (v.2.1)23 was used to analyse the expansion and contraction of gene the sterility and seedlessness of banana cultivars, further efforts will families. A random birth-and-death model was used to assess gene gain or loss in gene families across the specified phylogenetic tree (Supplementary Fig. 9). be needed to leverage the key gene resources from precisely charac- Families with P < 0.05 were considered as significant expansion or contraction, and terized germplasms to achieve effective breeding schemes. pathway enrichment analysis of these families was conducted using the enrichment pipeline77. Additional details are available in the Supplementary note. Methods Sample collection. A double haploid of the wild diploid genotype PKW; 2n = 2x = 22 Whole-genome alignment analysis. MCSCAN78 (parameters: -a -e 1e-5 -s 5) was was provided by the Centre de Coopération Internationale en Recherche used to detect collinearity within M. acuminata (A-genome) and M. balbisiana Agronomique pour le Développement (CIRAD) for genome sequencing. Fresh, (B-genome) and among various species. Syntenic blocks containing at least ten unexpanded leaves were harvested and then frozen immediately with liquid nitrogen gene pairs were obtained. All of the orthologous and paralogous gene pairs were to preserve genomic DNA for isolation. High-molecular weight genomic DNA was extracted from the syntenic blocks for calculation of 4dTv79 distances using the extracted using a standard cetyltrimethyl ammonium bromide (CTAB) method62. HKY substitution model80 (Supplementary Fig. 8). Additional details are available DNA integrity was assessed by agarose gel electrophoresis (concentration of agarose in the Supplementary note. gel, 1%; voltage, 150 V; electrophoresis time, 40 min). Finally, DNA was purified from the gel using a QIAquick Gel Extraction kit (QIAGEN). Orthologous gene pair analysis. BLAST67 (v.2.2.26, -p blastp) was used to align M. acuminata proteins to M. balbisiana proteins for identification of orthologous Library construction and sequencing. One paired-end and eight mate-pair genes. The value 1 × 10–5 was used as a cut-off to define the raw orthologues. We libraries were constructed for short-read sequencing on the Illumina HiSeq 2000 then filtered the BLAST results using two parameters (CIP ≥ 60% and CALP ≥ 60% platform, which generated around 86.34 Gb (166× coverage) of high-quality data. (ref. 35). We identified 25,717 orthologous gene pairs (81.83% consistency with For long-read DNA sequencing, 5.79 million SMRT long reads (58.99 Gb data, syntenic blocks) between M. acuminata and M. balbisiana using these two 113× coverage) were sequenced using the PacBio Sequel system with libraries of parameters (Supplementary Table 22). The orthologous gene pairs were first 20-kb insert size; sub-reads had a mean length of 10.2 kb and N50 length of 16.6 kb. aligned using MUSCLE (v.3.8.31)81, then the Ka/Ks ratio of each gene pair was One Hi-C library was prepared and sequenced on Illumina NovaSeq 6000 to calculated using KaKs_Calculator (v.2.0)82 with model yn00. The significant generate 71.96 Gb (138× coverage) of high-quality data63 (Supplementary Table 1). difference between Ka/Ks values was detected by Student’s t-test. Additional details are available in the Supplementary note. Re-sequencing analysis. Nine different genotypes of banana were used for re- Genome assembly. De novo assembly of DH-PKW was performed using wtdbg sequencing, including the triploid plants BaXiJiao (subgroup Cavendish, AAA), (v.1.2.8; https://github.com/ruanjue/wtdbg) based on PacBio data (only reads Gros_Michel (subgroup Gros Michel, AAA), FenJiao (subgroup Pisang Awak, longer than 1 kb were used in assembly). The assembled genome was corrected for ABB), Kamaramasenge (AAB) and Pelipita (ABB), in addition to the diploid plants two rounds using the ‘wtdbg-cns’ programme in the wtdbg package. We then used Pisang_Mas (subgroup Sucrier, AA), Pisang_Kra (subsp. malaccensis, AA), DH- the algorithm Arrow (https://github.com/PacificBiosciences/GenomicConsensus), PKW (BB) and balbisiana (BB) (Supplementary Table 13). BaXiJiao, Gros_Michel which takes into account all of the underlying data and the raw quality values and FenJiao were obtained from the Tissue Culture Centre of the Chinese Academy inherent in SMRT sequencing, to polish the assembly again for the final consensus of Tropical Agricultural Sciences (CATAS). Pelipita, Pisang_Mas, Pisang_Kra, accuracies. The final consensus contigs were scaffolded using the SSPACE-standard M. balbisiana and Kamaramasenge were provided by the Bioversity International programme64 (v.3.0) with meta-pair reads from libraries of insert size 2–20 kb. Musa Transit Centre. Genomic DNA was extracted from fresh leaves of seedlings at Based on Hi-C data, 430.02-Mb scaffolds were anchored to 11 pseudo-molecules the five-leaf stage using the CTAB method62. using LACHESIS software13 (Supplementary Fig. 2). Chromosomes were numbered Paired-end reads with libraries of 500-bp insert size were aligned to the according to the linkage group nomenclature adopted for M. acuminata. Additional A- and B-genomes simultaneously using BWA (v.0.7.12)66 with the parameters details regarding genome assembly are provided in the Supplementary note. ‘bwa aln -t 20 -l 35’ (Supplementary Table 14), and only uniquely mapped reads were kept. Potential PCR duplicates were marked using Picard (v.1.54, https:// broadinstitute.github.io/picard/) and indexed using the SAMtools package83. Evaluation of assembly quality. BUSCO (v.3) was used to assess assembly The Genome Analysis Toolkit84 was then used to infer SNPs and InDels. SNP completeness11. We mapped 29,610 M. acuminata-expressed sequence tags (ESTs) identifications were filtered based on the following parameters: ‘QD < 2.0 || to the assembled genome using BLAT65 (v.35) with default parameters. In total, FS > 60.0 || MQ < 40.0 || MQRankSum < −12.5 || ReadPosRankSum < −8.0’, and 93.59% of the ESTs were aligned to the genome with identity >90%. Additionally, InDels were filtered based on the following parameters: ‘QD < 2.0 || FS > 200.0 || BWA66 v.0.7.12 (aln -l 35) was used to map 59× Illumina reads to the assembly, ReadPosRankSum < −20.0’ (Supplementary Tables 16 and 17). Breakdancer (http:// and 96.11% of the reads were mapped to the assembled genome. Additional details breakdancer.sourceforge.net/)85 was used to detect structural variations by checking are available in the Supplementary note. paired-end reads with an abnormal orientation and/or span. The final structural variations were filtered using the following requirements: (1) minimum size of 100 Genome annotation. Repetitive sequences within the M. balbisiana genome and maximum size of 1,000,000; (2) minimum score of ≥30; and (3) minimum were identified by a combination of homology-based and de novo approaches number of required reads supporting each structural variation ≥5 (Supplementary (Supplementary Table 1). Gene structures were annotated iteratively using three Table 18). Nucleotide diversity (π) was analysed using VCFtools (v.0.1.13; https:// main approaches (ab initio predictions, homologue proteins and transcriptome vcftools.github.io/man_latest.html). data) that were combined using MAKER15 (v.2.31.8) (Supplementary Table 5). Gene functions were annotated according to the best match of the alignments Analysis of homoeologous exchanges. Assessment of read coverage depth was using blastp67 (E-value < 1 × 10–5) against the Swiss-Prot68, TrEMBL68, NR used to detect homoeologous exchanges between the A- and B-subgenomes. (https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/), KOG69 and KEGG70 databases. We inferred these based on cases where reads coverage of a given region on one Additional details are available in the Supplementary note. parental genome was significantly high while the orthologous one was low or had no coverage. High coverage indicates duplication, and low or no coverage indicates Transcription factors. We used the iTAK programme16 to identify transcription loss34. The uniquely mapped paired-end reads were used to calculate the coverage factors based on the consensus rules that are mainly summarized within PlnTFDB depth of each sample on the A- and B-genomes (Supplementary Figs. 25–27). 818 NATuRe PLANTS | VOL 5 | AUGUST 2019 | 810–821 | www.nature.com/natureplants Nature PlaNts Articles According to coverage depth, we detected homoeologous exchanges in the triploids Received: 7 May 2018; Accepted: 20 May 2019; ‘FenJiao (ABB)’, ‘Pelipita (ABB)’ and ‘Kamaramasenge (AAB)’ (Supplementary Published online: 15 July 2019 Table 15). Additional details are available in the Supplementary note. Transcriptome analysis. Banana fruits at different stages of development (0, References 20 and 80 DAF) and ripening (8 and 14 DPH for BX, 3 and 6 DPH for FJ) were 1. D’Hont, A. et al. The banana (Musa acuminata) genome and the evolution of obtained from the banana plantation at the Institute of Tropical Bioscience and monocotyledonous plants. Nature 488, 213–217 (2012). Biotechnology (Chengmai, Hainan, 20° N, 110° E). Two-month-old BX and FJ 2. Davey, M. W. et al. A draft Musa balbisiana genome sequence for molecular banana seedlings were obtained from the Tissue Culture Centre of CATAS and genetics in polyploid, inter-and intra-specific Musa hybrids. BMC Genom. 14, cultured in Hoagland’s solution. Seedlings at the five-leaf stage were treated 683 (2013). with 200 mM mannitol for 7 days, 300 mM NaCl for 7 days and low-temperature 3. De Langhe, E. et al. Why bananas matter: an introduction to the history of conditions (4 °C) for 22 h. Fruit, root and leaf samples were frozen in liquid banana domestication. Ethnobot. Res. Appl. 7, 165–177 (2009). nitrogen and stored at −80 °C until RNA extraction for transcriptome analysis. 4. Paul, J. Y. et al. Golden bananas in the field: elevated fruit pro‐vitamin A Total RNAs were isolated using a plant RNA extraction kit (TIANGEN). Total from the expression of a single banana transgene. Plant Biotechnol. J. 15, RNA (3 μg) from each sample was converted to complementary DNA using a 520–532 (2017). RevertAid First-Strand cDNA Synthesis Kit (Fermentas). cDNA libraries were 5. Cheesman, E. E. Classification of the bananas. Kew Bull. 2, 97–117 (1947). constructed using TruSeq RNA Library Preparation Kit v.2, and were subsequently 6. Simmonds, N. W. & Shepherd, K. The taxonomy and origins of the cultivated sequenced on the Illumina HiSeq 2000 platform using the Illumina RNA-Seq bananas. Bot. J. Linn. Soc. 55, 302–312 (1955). protocol. Two biological replicates were used for each sample. A total of 159.14 Gb 7. Daniells, J. et al. Diversity in the Genus Musa (INIBAP, 2001). (Supplementary Table 21) of high-quality clean data were produced. Gene 8. Häkkinen, M. Reappraisal of sectional taxonomy in Musa (Musaceae). expression levels were calculated as RPKM86. Differentially expressed genes were Taxon 62, 809–813 (2013). identified by methods previously established with the read count of two replicates 9. Martin, G. et al. Improvement of the banana ‘Musa acuminata’ reference for each gene (fold change ≥2; false discovery rate ≤0.001)87. Additional details are sequence using NGS data and semi-automated bioinformatics methods. available in the Supplementary note. BMC Genom. 17, 243 (2016). 10. Assani, A. et al. Production of haploids from anther culture of banana Weighted gene co-expression network analysis. Gene expression patterns for all [Musa balbisiana (BB)]. Plant Cell Rep. 21, 511–516 (2003). identified genes were used to construct a co-expression network using WGCNA 11. Simao, F. A. et al. BUSCO: assessing genome assembly and annotation (v.1.47)39. Genes without expression detected in all tissues were removed before completeness with single-copy orthologs. Bioinformatics 31, analyses. Soft thresholds were set based on the scale-free topology criterion 3210–3212 (2015). employed in ref. 88. An adjacency matrix was developed using squared Euclidean 12. Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data distance values, and the topological overlap matrix was calculated for unsigned processing. Genome Biol. 16, 259 (2015). network detection using the Pearson method. Co-expression coefficients >0.55 13. Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome between the target genes were then selected. Finally, the network connections were assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119 (2013). visualized using cytoscape89. 14. Li, C. et al. Genome sequencing and assembly by long reads in plants. Genes 9, 6 (2017). Identification of gene families involved in ethylene biosynthesis and starch 15. Campbell, M. S. et al. Genome annotation and curation using MAKER and metabolism pathways. We compared genes related to ethylene biosynthesis and MAKER‐P. Curr. Protoc. Bioinformatics 48, 4–11 (2014). starch metabolism in the M. balbisiana genome to those annotated in both M. 16. Zheng, Y. et al. iTAK: a program for genome-wide prediction and acuminata and other plant genomes, including B. distachyon, O. sativa, Arabidopsis classification of plant transcription factors, transcriptional regulators, and thaliana, Solanum lycopersicum, Prunus persica, Populus trichocarpa and V. protein kinases. Mol. Plant 9, 1667–1670 (2016). vinifera. We retrieved protein sequences of these gene families from 16 species 17. Givnish, T. J. et al. Monocot plastid phylogenomics, timeline, net rates of (Supplementary Table 51) for homologue-based searches with the criteria similarity species diversification, the power of multi-gene analyses, and a functional >80% and coverage >80%. We then confirmed the presence of the conserved domain model for the origin of monocots. Am. J. Bot. 105, 1888–1910 (2018). within all protein sequences and removed members without a complete domain. 18. Lescot, M. et al. Insights into the Musa genome: syntenic relationships to rice and between Musa species. BMC Genomics 9, 58 (2008). Determination of total starch content. Banana pulp was immersed in 0.5% 19. Christelova, P. et al. A multi gene sequence-based phylogeny of the Musaceae sodium bisulfite for 10 min to prevent browning, and then dried at 40 °C for 24 h. (banana) family. BMC Evol. Biol. 11, 103 (2011). Pulp was then ground and centrifuged. The residue was suspended in 5 ml of 20. Janssens, S. B. et al. Evolutionary dynamics and biogeography of Musaceae 80% Ca(NO3)2 in a boiling water bath for 10 min then centrifuged for 4 min at reveal a correlation between the diversification of the banana family and 4,000 r.p.m. The supernatant was transferred to a 20-ml volumetric flask and the the geological and climatic history of Southeast Asia. New Phytol. 210, 1453–1465 (2016). residue was extracted twice with 80% Ca(NO3)2, which yielded a combined extract 21. Mandáková, T. et al. How diploidization turned a tetraploid into a volume of 20 ml. All experiments were repeated three times. The total starch content was assessed following methods described by Yang et al.90 pseudotriploid. Am. J. Bot. 103, 1187–1196 (2016). . 22. Li, L., Stoeckert, C. J. & Roos, D. S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003). Scanning electron microscopy. Samples were fixed in stubs using double-faced 23. De Bie, T. et al. CAFE: a computational tool for the study of gene family tape and coated with a 10-nm platinum layer in a Bal-tec MEDo020 coating system evolution. Bioinformatics 22, 1269–1271 (2006). (Kettleshulme). The prepared samples were analysed using an FEI Quanta 600 24. Bennett, R. N. & Wallsgrove, R. M. Secondary metabolites in plant defence FEG scanning electron microscope (FEI Co.). Observations were performed in mechanisms. New Phytol. 127, 617–633 (1994). secondary electron mode while operating at 15 kV. 25. Baurens, F. C. et al. Recombination and large structural variations shape interspecific edible bananas genomes. Mol. Biol. Evol. 36, 97–111 (2018). Measurement of ethylene production during fruit post-harvest stage. Ethylene 26. Saxena, R. K., Edwards, D. & Varshney, R. K. Structural variations in plant production was measured by enclosing fruit samples in an airtight container for 2 h genomes. Brief. Funct. Genom. 13, 296–307 (2014). at 25 °C. After incubation, 1 ml of the headspace gas was withdrawn and injected 27. Jourda, C. et al. Expansion of banana (Musa acuminata) gene families into a gas chromatograph (GC-2010, Shimadzu) fitted with a flame ionization involved in ethylene biosynthesis and signalling after lineage specific whole detector and an activated alumina column. Ethylene production measurements genome duplications. New Phytol. 202, 986–1000 (2014). were obtained as recommended by the manufacturer. 28. Garsmeur, O. et al. Two evolutionarily distinct classes of paleopolyploidy. Mol. Biol. Evol. 31, 448–454 (2013). Reporting Summary. Further information on research design is available in the 29. Dodsworth, S., Chase, M. W. & Leitch, A. R. Is post‐polyploidization Nature Research Reporting Summary linked to this article. diploidization the key to the evolutionary success of angiosperms? Bot. J. Linn. Soc. 180, 1–5 (2016). Data availability 30. Langham, R. J. et al. Genomic duplication, fractionation and the origin of Raw sequence reads for B-genome assembly and transcriptome for all samples regulatory novelty. Genetics 166, 935–945 (2004). were deposited in the CNSA (https://db.cngb.org/search/project/CNP0000292/) of 31. Ni, Z. et al. Altered circadian rhythms regulate growth vigor in hybrids and CNGBdb with accession number CNP0000292 and Sequence Read Archive of the allopolyploids. Nature 457, 327 (2009). National Centre for Biotechnology Information (NCBI) under the BioProject (No. 32. de Jesus, O. N. et al. Genetic diversity and population structure of Musa PRJNA432894). Genome assembly and annotation of DH-PKW were submitted accessions in ex situ conservation. BMC Plant Biol. 13, 41 (2013). to NCBI (No. PYDT00000000). Assembly and gene annotation of the A-genome 33. D’Hont, A. et al. The interspecific genome structure of cultivated banana, (DH-Pahang) are available on the Banana Genome Hub (http://banana-genome- Musa spp. revealed by genomic DNA in situ hybridization. Theo. Appl. Genet. hub.southgreen.fr/). 100, 177–183 (2000). NATuRe PLANTS | VOL 5 | AUGUST 2019 | 810–821 | www.nature.com/natureplants 819 Articles Nature PlaNts 34. Chalhoub, Boulos et al. Early allopolyploid evolution in the post-Neolithic 66. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows- Brassica napus oilseed genome. Science 345, 950–953 (2014). Wheeler transform. Bioinformatics 25, 1754–1760 (2009). 35. Salse, J. et al. Improved criteria and comparative genomics tool provide new 67. Mount, D. W. Using the basic local alignment search tool (BLAST). Cold insights into grass paleogenomics. Brief. Bioinformatics 10, 619–630 (2009). Spring Harb. Protoc. https://doi.org/10.1101/pdb.top17 (2007). 36. Salse, J. et al. Reconstruction of monocotelydoneous proto-chromosomes 68. Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and reveals faster evolution in plants than in animals. Proc. Natl Acad. Sci. USA its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000). 106, 14908–14913 (2009). 69. Tatusov, R. L. et al. The COG database: an updated version includes 37. Ren, L. et al. Determination of dosage compensation and comparison of gene eukaryotes. BMC Bioinformatics 4, 41 (2003). expression in a triploid hybrid fish. BMC Genom. 18, 38 (2017). 70. Ogata, H. et al. KEGG: Kyoto Encyclopedia of Genes and Genomes. 38. Guo, M., Davis, D. & Birchler, J. A. Dosage effects on gene expression in a Nucleic Acids Res. 27, 29–34 (1999). maize ploidy series. Genetics 142, 1349–1355 (1996). 71. Pérez-Rodríguez, P. et al. PlnTFDB: updated content and new features 39. Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation of the plant transcription factor database. Nucleic Acids Res. 38, network analysis. BMC Bioinformatics 9, 559 (2008). D822–D827 (2009). 40. Kumar, R., Khurana, A. & Sharma, A. K. Role of plant hormones and their 72. Jin, J. et al. PlantTFDB 3.0: a portal for the functional and evolutionary study interplay in development and ripening of fleshy fruits. J. Exp. Bot. 65, of plant transcription factors. Nucleic Acids Res. 42, D1182–D1187 (2013). 4561–4575 (2014). 73. Dai, X. et al. PlantTFcat: an online plant transcription factor and transcriptional 41. Adams, D. O. & Yang, S. F. Ethylene biosynthesis: identification of regulator categorization and analysis tool. BMC Bioinformatics 14, 321 (2013). 1-aminocyclopropane-1-carboxylic acid as an intermediate in the conversion 74. Davuluri, R. V. et al. AGRIS: Arabidopsis gene regulatory information server, of methionine to ethylene. Proc. Natl Acad. Sci. USA 76, 170–174 (1979). an information resource of Arabidopsis cis-regulatory elements and 42. Yang, S. F. & Hoffman, N. E. Ethylene biosynthesis and its regulation in transcription factors. BMC Bioinformatics 4, 25 (2003). higher plants. Annu. Rev. Plant Physiol. 35, 155–189 (1984). 75. Huelsenbeck, J. P. & Ronquist, F. MRBAYES: Bayesian inference of 43. Liu, X. et al. Characterization of ethylene biosynthesis associated with phylogenetic trees. Bioinformatics 17, 754–755 (2001). ripening in banana fruit. Plant Physiol. 121, 1257–1265 (1999). 76. Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. 44. Iqbal, N. et al. Ethylene role in plant growth, development and senescence: Mol. Biol. Evol. 24, 1586–1591 (2007). interaction with other phytohormones. Front. Plant. Sci. 8, 475 (2017). 77. Huang, D. W., Sherman, B. T. & Lempicki, R. A. Bioinformatics enrichment 45. Tomato Genome Consortium. The tomato genome sequence provides insights tools: paths toward the comprehensive functional analysis of large gene lists. into fleshy fruit evolution. Nature 485, 635 (2012). Nucleic Acids Res. 37, 1–13 (2008). 46. Teh, B. T. et al. The draft genome of tropical fruit durian (Durio zibethinus). 78. Tang, H. et al. Synteny and collinearity in plant genomes. Science 320, Nat. Genet. 49, 1633 (2017). 486–488 (2008). 47. Lynch, M. & Conery, J. S. The evolutionary fate and consequences of 79. Huang, S. et al. The genome of the cucumber, Cucumis sativus L. Nat. Genet. 41, duplicate genes. Science 290, 1151–1155 (2000). 1275–1281 (2009). 48. Hubbard, N. L., Pharr, D. M. & Huber, S. C. Role of sucrose phosphate 80. Hasegawa, M., Kishino, H. & Yano, T. A. Dating of the human-ape splitting synthase in sucrose biosynthesis in ripening bananas and its relationship to by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22, 160–174 (1985). the respiratory climacteric. Plant Physiol. 94, 201–208 (1990). 81. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and 49. do Nascimento, J. R. O. et al. Beta-amylase expression and starch degradation high throughput. Nucleic Acids Res. 32, 1792–1797 (2004). during banana ripening. Postharvest Biol. Tec. 40, 41–47 (2006). 82. Wang, D. et al. KaKs_Calculator 2.0: a toolkit incorporating gamma-series 50. Tribess, T. et al. Thermal properties and resistant starch content of green methods and sliding window strategies. Genom. Proteom. Bioinf. 8, banana flour (Musa cavendishii) produced at different drying conditions. 77–80 (2010). LWT-Food Sci. Technol. 42, 1022–1025 (2009). 83. Li, H. et al. The sequence alignment/map format and SAMtools. 51. Jourda, C. et al. Lineage-specific evolutionary histories and regulation of Bioinformatics 25, 2078–2079 (2009). major starch metabolism genes during banana ripening. Front. Plant. Sci. 7, 84. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework 1778 (2016). for analyzing next-generation DNA sequencing data. Genome Res. 20, 52. Martin, C. & Smith, A. M. Starch biosynthesis. Plant Cell 73, 1297–1303 (2010). 2141–2145 (1995). 85. Chen, K. et al. BreakDancer: an algorithm for high-resolution mapping of 53. Tiessen, A. et al. Starch synthesis in potato tubers is regulated by post- genomic structural variation. Nat. Methods 6, 677–681 (2009). translation redox modification of ADP-glucose pyrophosphorylase: a novel 86. Mortazavi, A. et al. Mapping and quantifying mammalian transcriptomes regulatory mechamism linking starch synthesis to the sucrose supply. by RNA-Seq. Nat. Methods 5, 621–628 (2008). Plant Cell 14, 2191–2213 (2002). 87. Audic, S. & Claverie, J. M. The significance of digital gene expression profiles. 54. Tetlow, I. J. et al. Analysis of protein complexes in wheat amyloplasts reveals Genome Res. 7, 986–995 (1997). functional interactions among starch biosynthetic enzymes. Plant Physiol. 88. Zhang, B. & Horvath, S. A general framework for weighted gene co- 146, 1878–1891 (2008). expression network analysis. Stat. Appl. Genet. Mol. Biol. 4, 17 (2005). 55. Xiao, Y. Y. et al. A comprehensive investigation of starch degradation process 89. Shannon, P. et al. Cytoscape: a software environment for integrated models and identification of a transcriptional activator MabHLH6 during banana of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003). fruit ripening. Plant Biotechnol. J. 16, 151–164 (2017). 90. Yang, J. & Bi, C. A half-grain method for analyzing the amylose content in 56. Miao, H. et al. Soluble starch synthase III-1 in amylopectin metabolism of rice grains and its application. Acta Agron. Sin. 18, 366–372 (1992). banana fruit: characterization, expression, enzyme activity, and functional analyses. Front. Plant. Sci. 8, 454 (2017). Acknowledgements 57. Zhang, T. et al. Sequencing of allotetraploid cotton (Gossypium hirsutum L. This work was supported by the earmarked fund for the National Key Research and acc. TM-1) provides a resource for fiber improvement. Nat. Biotechnol. 33, Development Programme (No. 2018YFD1000200), Modern Agro-industry Technology 531–537 (2015). Research System (No. CARS-32), National Nonprofit Institute Research Grant of 58. International Wheat Genome Sequencing Consortium. A chromosome-based CATAS (No. 1630052012012), the National Natural Science Foundation of China draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. (No. 31872161),the Natural Science Foundation of Hainan Province (No. 2019CXTD412), Science 345, 1251788 (2014). the Central Public-interest Scientific Institution Basal Research Fund for CATAS (Nos. 59. Grover, C. E. et al. Homoeolog expression bias and expression level 1630052017018, 1630052019024, 1630052016005, 1630052016006 and 1630052017017), dominance in allopolyploids. New Phytol. 196, 966–971 (2012). and the CGIAR Research Programme on Roots, Tubers and Bananas. 60. Yang, J. et al. The genome sequence of allopolyploid Brassica juncea and analysis of differential homoeolog gene expression influencing selection. Author contributions Nat. Genet. 48, 1225–1232 (2016). Z.J., B.X., X.F. and A.D.-H. conceived the experiments. Z.W., H.M., J.H.L., C.J., J.Y.W., 61. Bird, K. A. et al. The causes and consequences of subgenome dominance in J.Z., W.H., W.T., Y.Y., Z.D., J.Y. L., W.M., Y.X., J.S.W., M.P., A.G., J.X., Y.L., C.H., N.L., hybrids and recent polyploids. New Phytol. 220, 87–93 (2018). R.H., F.S. and S.R. participated in various aspects of biological sample collection, 62. Murray, M. G. & Thompson, W. F. Rapid isolation of high molecular weight preparation and quality control. X.Y., C.X., Z.W., L.Y., Y.L.Y., C.L., Y.G. and S.S. plant DNA. Nucleic Acids Res. 8, 4321–4326 (1980). sequenced the genomes. X.Y., C.X., Z.W., M.R., F.-C.B. and G.M. assembled the genomes. 63. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range X.Y., C.X. and Z.W. annotated the genomes. Z.W., H.M., W.H., X.Y., C.X., S.Z., Z.Y.W., interactions reveals folding principles of the human genome. Science 326, L.Y., Y.L.Y., C.L., Y.G., O.G. and N.Y. analysed the genomes. W.H., Z.W., H.M., C.X., B.X., 289–293 (2009). Z.J., A.D.-H. and M.R. wrote the manuscript. 64. Boetzer, M. et al. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27, 578–579 (2010). 65. Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, Competing interests 656–664 (2002). The authors declare no competing interests. 820 NATuRe PLANTS | VOL 5 | AUGUST 2019 | 810–821 | www.nature.com/natureplants Nature PlaNts Articles Additional information Open Access This article is licensed under a Creative Commons Supplementary information is available for this paper at https://doi.org/10.1038/ Attribution 4.0 International License, which permits use, sharing, s41477-019-0452-6. adaptation, distribution and reproduction in any medium or format, Reprints and permissions information is available at www.nature.com/reprints. as long as you give appropriate credit to the original author(s) and the source, provide Correspondence and requests for materials should be addressed to A.D., W.H. or Z.J. a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Peer review information: Nature Plants thanks M. Schranz, Robert VanBuren and the Commons license, unless indicated otherwise in a credit line to the material. If mate- other, anonymous, reviewer(s) for their contribution to the peer review of this work. rial is not included in the article’s Creative Commons license and your intended use is Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in not permitted by statutory regulation or exceeds the permitted use, you will need to published maps and institutional affiliations. obtain permission directly from the copyright holder. To view a copy of this license, © The Author(s), under exclusive licence to Springer Nature Limited 2019 visit http://creativecommons.org/licenses/by/4.0/. NATuRe PLANTS | VOL 5 | AUGUST 2019 | 810–821 | www.nature.com/natureplants 821 Corresponding author(s): Wei Hu, Zhiqiang Jin and Angélique D’Hont Last updated by author(s): Apr 29, 2019 Reporting Summary Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist. Statistics For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. n/a Confirmed The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one- or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section. A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable. For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above. Software and code Policy information about availability of computer code Data collection Homolog data (homolog species genome and gene set ) was downloaded from public databases: NCBI, JGI, Uniprot, KEGG, GO and NR. Genome data of Musa acuminata was downloaded from http://banana-genome-hub.southgreen.fr/. Genomes of Musa balbisiana (DH- PKW) and nine different accessions, and transcriptome data of Baxijiao and Fenjiao were sequenced by ourselves in this research. Data analysis We used lots of software for data analysis in this paper, and all data and software used was described in Methods section of manuscript. Genome assembly: wtdbg v1.2.8, SSPACE v3, Arrow, LACHESIS, HiC-Pro v2.8.1 Evaluation of assembly quality: BUSCO v3, BLAT v35, BWA v0.712 Repeat annotation: RepeatMasker v4.0.6, RepeatProteinMask, Repbase v21.01, Piler v1.0, RepeatScout v1.0.5, LTR-FINDER v1.0.5, Tandem Repeats Finder v4.09 Gene structure annotation: Blast v2.2.26, Augustus v3.2.1, SNAP, HISAT2 v2.0.1-beta, StringTie v1.2.1, PASA_lite, MAKER v3.31.8 Genome annotation completeness: BUSCO v3 Gene function annotation: BLAST v2.2.26, InterProScan v5.16 ncRNA annotation: tRNAscan-SE v1.23, INFERNAL, BLAST v2.2.26 Gene family analysis: OrthoMCL v1.4, CAFE v2.1, blast v2.2.26 Phylogentic analysis: PAML package v4.4, MrBayes v3.1.2 Transcription factor prediction: iTAK v1.5 Resequencing analysis: bwa 0.7.12, GATK v3.3-0, Breakdancer Nucleotide diversity: VCFtools v0.1.13 Genome syntenic analysis: blast v2.2.26, MCscan v1.5.1, KaKs_Calculator v2.0 Gene co-expression network analysis:R platform v3.2.2, WGCNA package v1.47 RNA-Seq analysis: SOAPaligner/SOAP2 v2.21, DEGseq For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 1 nature research | reporting summary October 2018 Data Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: - Accession codes, unique identifiers, or web links for publicly available datasets - A list of figures that have associated raw data - A description of any restrictions on data availability Raw sequence reads for gene assembly and gene expression for all samples were deposited in the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI) under the BioProject (PRJNA432894). Genome assembly and annotation of DH-PKW was submitted to NCBI (PYDT00000000). A full data availability statement is included in the manuscript. Field-specific reporting Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf Life sciences study design All studies must disclose on these points even when the disclosure is negative. Sample size One double haploid (DH-PKW) sample from M. balbisiana was used for genome assembly. Nine different genotypes (AAA, ABB, AA, BB, AAB) of banana accessions were used for resequencing. Two cultivated varieties of BaXiJiao and FenJiao were used for transcriptomic analysis and forty samples were collected from different tissues and treatments including development and postharvest ripening process of fruits, and osmotic, salt and low temperature treatments. Two biological replicates were used for each sample. Data exclusions N/A Replication For gene expression profiling of BaXijiao and FenJiao, we produced RNA-seq data of fruits, roots, and leaves with two biological replicates. Randomization N/A Blinding N/A Reporting for specific materials, systems and methods We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. Materials & experimental systems Methods n/a Involved in the study n/a Involved in the study Antibodies ChIP-seq Eukaryotic cell lines Flow cytometry Palaeontology MRI-based neuroimaging Animals and other organisms Human research participants Clinical data 2 nature research | reporting summary October 2018