Articles
https://doi.org/10.1038/s41477-019-0452-6
Musa balbisiana genome reveals subgenome 
evolution and functional divergence
Zhuo Wang1,12, Hongxia Miao1,12, Juhua Liu1,2,12, Biyu Xu1,12, Xiaoming Yao3,12, Chunyan Xu3,12, 
Shancen Zhao4, Xiaodong Fang3, Caihong Jia1, Jingyi Wang1, Jianbin Zhang1, Jingyang Li2, Yi Xu2, 
Jiashui Wang2, Weihong Ma2, Zhangyan Wu3, Lili Yu3, Yulan Yang3, Chun Liu3, Yu Guo3, Silong Sun3, 
Franc-Christophe Baurens5,6, Guillaume Martin   5,6, Frederic Salmon6,7, Olivier Garsmeur5,6, 
Nabila Yahiaoui5,6, Catherine Hervouet5,6, Mathieu Rouard   8, Nathalie Laboureau9,10, Remy Habas9,10, 
Sebastien Ricci6,7, Ming Peng1, Anping Guo1, Jianghui Xie1, Yin Li   11, Zehong Ding1, Yan Yan1, 
Weiwei Tie1, Angélique D’Hont   5,6*, Wei Hu   1* and Zhiqiang Jin   1,2*
Banana cultivars (Musa ssp.) are diploid, triploid and tetraploid hybrids derived from Musa acuminata and Musa balbisiana. 
We presented a high-quality draft genome assembly of M. balbisiana with 430 Mb (87%) assembled into 11 chromosomes. 
We identified that the recent divergence of M. acuminata (A-genome) and M. balbisiana (B-genome) occurred after lineage-
specific whole-genome duplication, and that the B-genome may be more sensitive to the fractionation process compared to 
the A-genome. Homoeologous exchanges occurred frequently between A- and B-subgenomes in allopolyploids. Genomic varia-
tion within progenitors resulted in functional divergence of subgenomes. Global homoeologue expression dominance occurred 
between subgenomes of the allotriploid. Gene families related to ethylene biosynthesis and starch metabolism exhibited signif-
icant expansion at the pathway level and wide homoeologue expression dominance in the B-subgenome of the allotriploid. The 
independent origin of 1-aminocyclopropane-1-carboxylic acid oxidase (ACO) homoeologue gene pairs and tandem duplication-
driven expansion of ACO genes in the B-subgenome contributed to rapid and major ethylene production post-harvest in allotrip-
loid banana fruits. The findings of this study provide greater context for understanding fruit biology, and aid the development 
of tools for breeding optimal banana cultivars.
Bananas (Musa ssp.) are large herbaceous plants that are peren- Musa acuminata (A-genome, 2n = 2x = 22; n represents the haploid 
nial but monocarpic. They originated in Southeast Asia and the chromosome number), Musa balbisiana (B-genome, 2n = 2x = 22), 
Western Pacific and were one of the first crops to be domesti- Musa schizocarpa (S-genome, 2n = 2x = 22) and Australimusa spe-
cated, about 7,000 years ago, in Southeast Asia1,2. Bananas are widely cies (T-genome, 2n = 2x = 20)2. The majority of edible cultivated 
distributed throughout the tropics and subtropics, where they are a bananas originated from intraspecific or interspecific hybridization 
staple food and fruit for millions of people2,3. Moreover, bananas are between wild diploid M. acuminata (A-genome) and M. balbisiana 
one of the major export commodities of several developing coun- (B-genome) species. Combinations of these A- and B-genomes have 
tries and represent the largest international trade in fruit3,4. Thus, resulted in various genotypes of cultivated edible bananas, includ-
bananas are an essential food resource and have important socio- ing diploid (AA, BB and AB), triploid (AAA, AAB and ABB) and 
economic and ecological roles. tetraploid (AAAB, AABB, ABBB) variants6. The triploid genotype 
The genus Musa belongs to the monocotyledon Musaceae family variants constitute the predominant cultivated varieties that are 
along with the genus Ensete. Its wild species have traditionally been planted worldwide.
subdivided into four sections: Eumusa (x = 11; x represents the num- Genome sequencing of the A-genome banana has provided 
ber of chromosomes), Rhodochlamys (x = 11), Australimusa (x = 10) insights into the evolution of monocotyledonous plants1,9. Although 
and Callimusa (x = 9 or 10)5–7, and refined recently to two sections the A-genome represents a crucial step in the genetic improvement 
in which the Rhodochlamys and Australimusa were merged into the of banana, the lack of a high-quality B-genome sequence greatly 
Eumusa and Callimusa, respectively8. Most edible bananas belong to hinders germplasm characterization and the molecular breeding 
the Eumusa (or Musa) section, and are categorized into the dessert of banana. A draft B-genome has previously been reported, but 
or cooking group based on their usage. Furthermore, bananas of exhibited low quality, based on assembly and annotation via map-
this section are distinguished based on their genetic background as ping reads to the A-genome2. Here, we sequenced the genome of the 
1Key Laboratory of Biology and Genetic Resources of Tropical Crops, Institute of Tropical Bioscience and Biotechnology, Chinese Academy of Tropical 
Agricultural Sciences, Haikou, China. 2Key Laboratory of Genetic Improvement of Bananas, Hainan province, Haikou Experimental Station, China Academy 
of Tropical Agricultural Sciences, Haikou, China. 3BGI Genomics, BGI-Shenzhen, Shenzhen, China. 4BGI Institute of Applied Agriculture, BGI-Shenzhen, 
Shenzhen, China. 5CIRAD, UMR AGAP, Montpellier, France. 6AGAP, Univ Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France. 7CIRAD, 
UMR AGAP, Guadeloupe, France. 8Bioversity International, Montpellier, France. 9CIRAD, UMR BGPI, Montpellier, France. 10BGPI, CIRAD, INRA, Montpellier 
SupAgro, Montpellier, France. 11Waksman Institute of Microbiology, Rutgers, The State University of New Jersey, Piscataway, NJ, USA. 12These authors 
contributed equally: Zhuo Wang, Hongxia Miao, Juhua Liu, Biyu Xu, Xiaoming Yao, Chunyan Xu. *e-mail: dhont@cirad.fr; huwei2010916@126.com; 
18689846976@163.com
810 NATuRe PLANTS | VOL 5 | AUGUST 2019 | 810–821 | www.nature.com/natureplants
Nature PlaNts Articles
double haploid of the wild diploid genotype Pisang Klutuk Wulung single-copy orthologous genes indicated a very recent diver-
(DH-PKW, 2n = 2x = 22), belonging to the species M. balbisiana that gence time of about 5.4 MYA for the A- and B-genomes, with 
contributed the B-subgenome to cultivated allotriploid bananas. We their common ancestor having diverged from Poaceae ~134 MYA 
further compared the B- and A-genomes to investigate subgenome (Supplementary Fig. 5). This estimate is consistent with a diver-
evolution, genetic diversity and the functional divergence of sub- gence time of 4.6 MYA between the A- and B-genomes, which 
genomes in polyploid bananas. Our analyses provide insights into was estimated based on 17 bacterial artificial chromosome (BAC) 
the evolution and regulation of fruit-ripening processes in bananas. clones that contained 23.5 Kb of coding sequence18. Our estimate 
In particular, the results highlight a significant contribution of the is more recent than the 20.9 MYA (estimated by three genes and 
B-genome towards ethylene biosynthesis and starch metabolism one internal transcribed spacer region) or the 27.9 MYA (estimated 
during fruit ripening. by 19F genes)19,20. However, the increased sampling of informative 
characters in our genome-wide study for phylogenetic analyses 
Results should contribute to a more accurate divergence time estimation.
Genome assembly and annotation. To reduce heterozygosity, Three rounds of whole-genome duplication (WGD) events (α-, 
we used the DH-PKW genotype for our genome sequencing and β- and γ-WGD) have occurred in the Musa lineage1, which was 
assembly10. A total of 58.99 gigabases (Gb) (113×) of PacBio single- validated by our four-fold synonymous third-codon transversion 
molecule long reads and 86.34 Gb (166×) Illumina paired-end and (4dTv) analysis (Supplementary Fig. 6). WGD is frequently, and 
mate-pair reads were used for assembly (Supplementary Table 1), almost always, followed by diploidization and fractionation, which 
producing 492.77 megabases (Mb) of scaffolds. The contig N50 and includes chromosome rearrangement, gene loss and biased reten-
scaffold N50 of the final assembly were 1.83 and 5.05 Mb, respec- tion21. After diploidization and divergence, A- and B-genomes start 
tively (Supplementary Table 2). K-mer analysis suggested that the to evolve independently. To investigate the evolutionary differences 
draft assembly covers approximately 95% of the genome size of between the A- and B-genomes after divergence, we performed 
DH-PKW (Supplementary Fig. 1). We further evaluated the com- three types of comparative analysis: (1) assessment of gene family 
pleteness of the scaffold assembly using the BUSCO (v.3) plants expansion/contraction between A- and B-genomes by comparison 
datasets11. Precise exon placement of 91.3% of the total 1,440 single- to 14 other plant genomes; (2) comparison of structural variation in 
copy orthologue groups in the embryophyta dataset was identified the A- and B-genomes by synteny analysis; and (3) comparison of 
in the B-genome assembly. To anchor the scaffolds to chromo- synteny between the ancestral syntenic block and the A/B-genomes.
somes, we constructed high-throughput chromosome conforma- We analysed the gene family clustering and expansion/contraction 
tion capture (Hi-C) libraries of DH-PKW, generating 72 Gb (138×) of banana genomes of M. acuminata (A-genome) and M. balbisiana 
Hi-C pair-end reads (Supplementary Table 1). Duplicate removal, (B-genome), compared to 14 other plant genomes using OrthoMCL 
sorting and quality assessment were performed with HiC-Pro12, and CAFE22,23 (Supplementary Table 7 and Supplementary Fig. 7). 
and uniquely mapped valid reads were used for Hi-C scaffolding by We found 9,038 gene families that were conserved in M. balbisiana, 
LACHESIS software13 (Supplementary Fig. 2). As a result, 430 Mb M. acuminata, Oryza sativa, Brachypodium distachyon and Vitis 
(87.27%) of the assembly and 94.0% of the genes were placed vinifera. In contrast, 348 and 639 gene families were specific to the 
on 11 chromosome groups (Fig. 1 and Supplementary Table 3). A- and B-genome, respectively (Supplementary Fig. 8). After their 
The 11 pseudo-molecules were named in accordance with the divergence, 1,761 gene families were expanded and 203 were con-
M. acuminata (A-genome) reference sequence1,9. tracted in the A-genome, while 392 gene families were expanded 
About 55.75% of the B-genome assembly was composed of repet- and 1,008 contracted in the B-genome (Supplementary Fig. 9 and 
itive sequences, which is higher than the 41.85% of the A-genome Supplementary Tables 8 and 9). Kyoto Encyclopedia of Genes and 
assembly (Supplementary Table 4). This may be due to the span- Genomes (KEGG) pathway enrichment analysis of the significantly 
ning of repetitive regions by long reads14. Long terminal-repeat expanded gene families (P < 0.05) in the B-genome suggested that it 
(LTR) retrotransposons represented the most abundant fraction of was enriched in the photosynthesis and biosynthesis of secondary 
transposable elements in the A- and B-genomes, among which the metabolite pathways, including those associated with the metabo-
families Gypsy and Copia accounted for 12.88 and 28.04% of the lism of inositol, starch and sucrose, linoleic acid and arachidonic 
B-genome, respectively. DNA transposons comprised 2.12% of the acid (Supplementary Fig. 10 and Supplementary Table 10). Plants 
B-genome and 2.03% of the A-genome (Supplementary Table 4). produce a high diversity of secondary metabolites with prominent 
LTR retrotransposons tended to accumulate near the centromeric functions in defence against a variety of herbivores and pathogens, 
and pericentromeric regions (Fig. 1). Active insertions of LTR ret- in addition to the mitigation of various types of abiotic stresses24. 
rotransposons occurred more recently in the B-genome (peak at Thus, these observations are consistent with the association of the 
0–0.5 million years ago (MYA)) relative to the A-genome (peak at B-genome with improved vigour and tolerance to both biotic and 
1.5–2.0 MYA) after their divergence (Supplementary Fig. 3). Both abiotic stresses2.
genomes experienced a wave of LTR retrotransposon amplification Synteny analysis indicated a high level of genomic colinearity and 
but differed in insertion histories. sequence similarity between the A- and B-genomes (Supplementary 
We annotated the genome using the MAKER pipeline15 incorpo- Fig. 11). We identified 72 large syntenic blocks between the A- and 
rating ab initio predictions, homologous proteins and transcriptome B-genomes, including 15 large blocks each containing over 900 gene 
data from six samples, resulting in 35,148 protein-coding genes in the pairs. These 72 syntenic blocks comprised 75.02% of A-genome space 
B-genome (Supplementary Table 5). Of these, 33,137 (94.27%) were (containing 23% transposable elements) and 68.01% of B-genome 
located on the 11 pseudo-chromosomes (Supplementary Table 3). space (containing 22% transposable elements) (Supplementary 
Overall, 86% of the genes transcribed in our RNA sequence (RNA- Table 11). We also identified two large translocations and two inver-
Seq) analysis. Additionally, we identified 3,329 transcription factors sions between the A- and B-genome after their divergence. One large 
among 88 families in the B-genome using the iTAK programme16 reciprocal translocation comprises 7.09 Mb on chromosome (chr)1 
(Supplementary Table 6). of the B-genome and 7.03 Mb on chr:3 of the A-genome, and one 
large inversion of 9.39 Mb on chr 5 of the B-genome and 8.83 Mb 
The B-genome is more sensitive to fractionation than the on chr 5 of the A-genome (Fig. 1 and Supplementary Fig. 11). These 
A-genome. Compared to other monocots, the Musa lineage exhib- translocations and inversions were also supported by the rearrange-
its a relatively slower evolutionary rate as demonstrated previ- ments based on genetic mapping25, and can serve to introduce novel 
ously17 (Supplementary Fig. 4). Phylogenetic analyses based on 519 genetic diversity into the A- and B-genomes26.
NATuRe PLANTS | VOL 5 | AUGUST 2019 | 810–821 | www.nature.com/natureplants 811
Articles Nature PlaNts
Gypsy
Copia
Orthologous
genes
Genes
ACHR01 ACHR02
BCHR01
ACH
3 R
0 03
20
BCHR02
20
3010
40 10
30 20
20 30
10 40
40 10
30 20
2 30
0
10
10
40 20
30 30
20
10 10
20
30 30
20 40
10
10
40 20
30
30
B R10
4
C 0
HR ACH
09
H CHR11
BC R1 A
0 BCHR11
Fig. 1 | Characterization of M. balbisiana (B-genome) and M. acuminata (A-genome) chromosomes. Elements are arranged in the following scheme  
(from outer to inner). (1) Distribution of Gypsy elements (non-overlapping, window size, 50 kb); (2) distribution of Copia elements (non-overlapping, 
window size, 50 kb); (3) distribution of orthologous gene pairs between two genomes (non-overlapping, window size, 50 kb); (4) gene density  
(non-overlapping, window size, 50 kb); (5) syntenic relationships between A- and B-genomes. The connecting blue lines represent alignment blocks,  
red lines represent inversions, green lines represent translocations and grey lines show small blocks with<30 gene pairs.
Previously, 12 Musa ancestral blocks were assembled that repre- (gene loss) and diploidization processes had occurred extensively 
sented the ancestral genome before α/β-WGD1,27,28. We identified after WGD events in the Musa lineage (Supplementary Table 12).
97 syntenic blocks resulting from α/β-WGD events in the B-genome Taken together, our results indicated that the B-genome exhibited 
by comparison to the 12 Musa ancestral blocks. These blocks con- less expansion and more contraction of gene families, less syntenic 
tained 24,639 genes and represented 70.10% of all gene models coverage ratio and a higher singleton ratio in the ancestral blocks 
involved in WGD-driven regions. We also identified 100 syntenic compared to the A-genome after divergence. Cycles of WGD fol-
blocks that contained 26,780 genes (75.61%) in the A-genome lowed by diploidization and fractionation have occurred across land 
(Supplementary Table 12). Of these ancestral Musa α/β blocks, plants, and are important in determining chromosome structure and 
56.89% (15,236) and 60.59% (14,930) were singletons in the A- gene content. Consequently, these processes have significantly con-
and B-genome, respectively, suggesting that genome fractionation tributed to the evolutionary success of plants21,29. The diploidization 
812 NATuRe PLANTS | VOL 5 | AUGUST 2019 | 810–821 | www.nature.com/natureplants
BCHR05
BCHR0
2 40 B
1 C0 HR03
30
20
10
40
30
20
10
20
10
20
10
20
10
20
30
ACH 1R 009
ACHR08
ACHR07
ACHR0
6
AC
HR
05
ACHR04 10
20
30
10
20
10
40
30
20
10
30
20 08
10
BCHR
CH
R07
B
BCHR06
Nature PlaNts Articles
a FenJiao b Kamaramasenge c Pelipita
30 30 30
10 28 Mb
chr01 10 28 Mb
10 chr01 10 28 Mb
chr01 10 48 Mb 10
30 48 Mb 30 30 48 Mb
30 30 30
10 29 Mb 29 Mb
chr02 10 29 Mb
chr02 10 10 chr02 10
32 Mb 32 Mb 10 32 Mb
30 30 30
30 30 30
10 34 Mb 34 Mb
chr03 10 34 Mb
chr03 10 10 chr03 10
30 Mb 30 Mb 10 30 Mb
30 30 30
30 30 30
10 37 Mb 37 Mb 37 Mb
chr04 chr04 10
10 10 chr04 10
42 Mb 42 Mb 10 42 Mb
30 30 30
30 30 30
10 41 Mb 41 Mb
chr05 10 41 Mb
chr05 10 10 chr05 10
42 Mb 42 Mb 10 42 Mb
30 30 30
30 30 30
10 37 Mb 10 37 Mb 37 Mb
chr06 10 chr06 10 chr06 10
41 Mb 41 Mb 10 41 Mb
30 30 30
30 30 30
10 34 Mb 34 Mb 10 34 Mb
chr07 chr07 10
10 10 chr07
36 Mb 36 Mb 10 36 Mb
30 30 30
30 44 Mb 30 44 Mb 30
10 44 Mb
chr08 chr08 10
10 10 chr08 10
10
30 45 Mb 30 45 Mb 30 45 Mb
30
10 41 Mb 30 30
41 Mb 41 Mb
chr09 chr09 10
10 10 chr09 10
37 Mb 37 Mb 10 37 Mb
30 30 30
30 30 30
10 37 Mb 10 37 Mb 10 37 Mb
chr10 10 chr10 10 chr10
42 Mb 42 Mb 10
30 42 Mb
30 30
30
10 27 Mb 30 30
27 Mb 27 Mb
chr11 chr11 10
10 29 Mb 10 chr11 10
30 29 Mb 10 29 Mb
30 30
Fig. 2 | Coverage depth and genome structure summary for three allotriploid banana accessions. a–c, Chromosome coverage and structure  
for accessions FenJiao (genome group, ABB) (a), Kamaramasenge (genome group, AAB) (b) and Pelipita (genome group, ABB) (c) with 100 kb  
non-overlapping sliding windows. The upper red bar and lower blue bar represent coverage depth of the A- and B-subgenome, respectively.
and fractionation processes involve a series of evolutionary events, A-subgenome were replaced by the B-subgenome and there were 
including repetitive DNA loss, chromosome rearrangements and 18 segmental homoeologous exchanges on chromosomes 6, 9 and 10 
complex patterns of gene loss29,30. The above evidence supports the (Fig. 2c and Supplementary Table 15). This indicates the eight A and 
hypothesis that the B-genome was more sensitive to fractionation 25 B chromosome constitution of Pelipita, consistent with previous 
than the A-genome after their divergence. genomic in  situ hybridization studies33. This classification is fur-
ther supported by phylogenetic analyses based on genotyping data 
Genetic diversity in polyploid bananas and functional divergence (Supplementary Fig. 12). A total of 18,475,661 single-nucleotide 
of subgenomes. Polyploid species usually exhibit vigorous growth, polymorphisms (SNPs), 1,425,391 small insertions and deletions 
including high-quality production and high fitness31. Most banana and 220,452 structural variations were identified in the samples 
cultivars are polyploid and exhibit various levels of ploidy and (Supplementary Tables 16–18). Analysis of gene and SNP density on 
genomic background32. The genetic classification of some bananas the chromosomes indicated that SNPs were preferentially located on 
is discordant, as is the case for Pelipita. Previous studies have shown non-gene-rich regions (Supplementary Fig. 13). There were ~2.5-
that its karyotype comprises eight A and 25 B chromosomes as fold SNPs on the A-genome of Pisang_Mas and Pisang_Kra com-
opposed to the predicted 11 A and 22 B chromosome distribution33. pared to the B-genome of Balbisiana (Supplementary Table 16). The 
Understanding the genetic diversity and genomic constitution of nucleotide diversity (π,0.0059) of A-subgenomes was higher than 
Musa accessions would inform genomic group classifications, in that of the B-subgenomes (0.0031) in accessions Fenjiao, Pelipita 
addition to conservation and breeding strategies. Therefore, we re- and Kamaramasenge.
sequenced five triploid bananas and four diploid bananas to investi- Gene family expansion and contraction analysis of M. acumi-
gate their genetic diversity (Supplementary Table 13). nata and M. balbisiana in comparison to other sequenced genomes 
Simultaneous alignment of re-sequencing data to the A- and indicated that there are 83 gene families significantly expanded in 
B-subgenomes identified the uniquely mapped reads that were the A-genome (and conversely contracted in the B-genome). These 
used to analyse coverage depth, variations calling and homoeolo- families included plant–pathogen interactions, glycosphingolipid 
gous exchanges on each chromosome (Supplementary Table 14). biosynthesis-ganglio series and glycosaminoglycan degradation 
Homoeologous exchanges were characterized by read coverage pathways among others (Supplementary Table 19). Conversely, 
that showed a chromosomal region with a duplicated copy from 33 gene families were significantly expanded in the B-genome 
the corresponding homoeologous subgenome34. These analyses (and contracted in the A-genome). These families included those 
confirmed that genome constitutions for the banana accessions involved in photosynthesis, metabolic pathways and ribosome, 
were, in most cases, consistent with previous genome group classi- among others (Supplementary Table 20). This indicates that the A- 
fications based on morphological traits. We identified 48 segmental and B-genomes may have functionally diverged at the genome level 
homoeologous exchanges in the accession FenJiao, including nine during their respective genome evolution.
from the B- to the A-subgenome and 39 in the reverse direction To explore the transcription of allopolyploid subgenomes, we 
(Fig. 2a and Supplementary Table 15). We also found four segmen- assessed the expression of homoeologue genes from the A- and 
tal homoeologous exchanges from the B- to the A-subgenome in the B-subgenomes of the triploid FenJiao. Expression levels were mea-
accession Kamaramasenge, and replacement of chromosome 10 of sured within different tissues, at different stages of fruit development 
the B-subgenome by the A-subgenome. (Fig. 2b and Supplementary and ripening and in banana seedlings after abiotic stress treat-
Table 15). For the accession Pelipita, chromosomes 2, 7 and 11 of the ments (Supplementary Table 21). A total of 25,717 homoeologous 
NATuRe PLANTS | VOL 5 | AUGUST 2019 | 810–821 | www.nature.com/natureplants 813
Articles Nature PlaNts
gene pairs were identified between the A- and B-subgenomes based five gene pairs were dominantly expressed in the A-subgenome of 
on the gene alignment method with cumulative identity percent- FJ (Fig. 3b and Supplementary Tables 33 and 34). The dominant 
age (CIP) ≥ 60% and cumulative alignment length percentage expression of these genes in various tissues may be related to the 
(CALP) ≥ 60% (refs. 35,36). Among the homoeologue gene pairs, fundamental role of ethylene biosynthesis.
81.83% were further supported by syntenic analysis (Supplementary Both ACS and ACO have previously been demonstrated as 
Table 22). Expression of all homoeologue gene pairs was assessed to limiting enzymes within ethylene biosynthesis41. The expression 
determine the distribution of expression fold change of B/A in the abundance of MA-ACS1 (AB021906) and MA-ACO1 (X91076) is 
triploid ‘FJ’. The log2(RPKM B/RPKM A) was 1.2/1, where RPKM consistent with ethylene production during the post-harvest banana-
is reads per kilobase million, which differed from the genomic ripening period43. Of the ten ACS gene pairs, MaACS7/MbACS7, 
constitution value of 2/1 (Supplementary Fig. 14). This result which is a homologue of MA-ACS1, exhibited high expression 
could be explained by dosage compensation, wherein the triploid levels during fruit ripening and was dominantly expressed in the 
expression levels are reduced to a diploid state37,38. We further char- B-genome (Fig. 3b). MbACS6 and MbACS7 are paralogous in a 
acterized 1,075 and 4,032 homoeologue gene pairs that showed large syntenic block (the block contains 19 gene pairs), and main-
expression dominance in the A- and B-subgenome, respectively tain synteny and close evolutionary relationships to MaACS6 and 
(Supplementary Table 23). KEGG enrichment analysis indicated MaACS7, respectively, suggesting that these genes duplicated 
that genes with expression dominance in the B-subgenome were from WGD (Fig. 3c and Supplementary Fig. 18). Of the nine ACO 
associated with 2-oxocarboxylic acid metabolism and the arginine gene pairs, three (MaACO2/MbACO6, MaACO3/MbACO7 and 
biosynthesis pathway (q-value < 0.05) (Supplementary Table 24), MaACO8/MbACO13) exhibited high expression levels during 
whereas those showing expression dominance in the A-subgenome fruit ripening and were dominantly expressed in the B-subgenome 
were not significantly enriched in KEGG pathways. (Fig. 3b). MaACO2/MbACO6 and MaACO3/MbACO7 are 
Non-synonymous/synonymous substitution (Ka/Ks) ratios in the syntenic block of chr:5 and chr:6 between the A- and 
were calculated for all homoeologue gene pairs between the A- and B-genome, respectively, and belong to the same phylogenetic clade 
B-subgenomes. The Ka/Ks ratios of genes with expression domi- (Fig. 3d,e and Supplementary Table 35). These results indicated that 
nance in the A-subgenome (median, 0.157) were slightly lower MaACO2/MbACO6 and MaACO3/MbACO7 originated indepen-
than those in the B-subgenome (median, 0.196) and non-dominant dently and developed crucial functions during fruit ripening.
genes (median, 0.186) (Supplementary Fig. 15). We also observed high expression levels (log2RPKM > 4 in at 
We then constructed a gene co-expression network for those least one stage of fruit ripening in the B-genome) of homoeologue 
genes with expression dominance using weighted gene co-expres- gene pairs, probably related to fruit softening (pectin methylester-
sion network analysis (WGCNA)39. The results indicated that 87 ases, galactosidases, expansions and pectatelyase)44, cell wall modi-
and 295 genes with dominance expression interacted with 4,302 and fication (xyloglucan endotransglucosylase/hydrolases, fasciclin-like 
4,612 genes in the A- and B-subgenome, respectively. KEGG path- arabinogalactan proteins and β-d-xylosidase)45,46 and aroma produc-
way enrichment analysis suggested that genes in the co-expression tion (alcohol dehydrogenases)40. These genes are closely involved in 
network of the A- and B-subgenomes were commonly associated fruit ripening and are regulated by ethylene44. Almost all of these 
with starch and sucrose metabolism (ko00500) and other metabolic gene pairs showed expression dominance in the B-subgenome of FJ 
pathways. In particular, ubiquinone and other terpenoid-quinone during fruit ripening, which is the same as the dominance expres-
biosynthesis, photosynthesis–antenna proteins, carotenoid biosyn- sion of gene pairs related to ethylene biosynthesis (Supplementary 
thesis and other glycan degradation pathways were specially enriched Fig. 19 and Supplementary Tables 36 and 37). This co-dominance of 
in the A-subgenome (q-value < 0.05) (Supplementary Fig. 16 and homoeologue gene pairs in the B-subgenome further supports the 
Supplementary Table 25). In contrast, selenocompound metabo- significant contribution of the B-genome to ethylene biosynthesis 
lism and cyanoamino acid metabolism pathways were particularly and fruit ripening.
enriched in the B-subgenome (q-value < 0.05) (Supplementary Gene duplication is a major mechanism that generates new 
Fig. 17 and Supplementary Table 26). Overall, these results further genetic diversity as a basis for evolutionary innovation in eukary-
support the hypothesis of functional divergence between the A- and otes47. Compared to the 11 ACO genes within the A-genome, the 
B-genomes at the transcriptional level. ACO genes in the B-genome expanded significantly to 18 members. 
The expansion of ACO genes, including MbACO2, -3, -4, -5 in chr 3, 
Expression dominance of homoeologue gene pairs in the ethyl- MbACO8, -9, -11 in chr 6 and MbACO16, -18 in scaffolds, was 
ene biosynthesis pathway and expansion of the ACO family affect driven by tandem duplications in the B-genome (Fig. 3b,d). Of the 
fruit ripening. Ethylene plays a key role in the regulation of climac- ACO genes that were expanded in the B-genome, MbACO2 and 
teric fruit ripening post-harvest40. The core enzymatic steps in the MbACO3 showed strong expression levels during the fruit-ripen-
ethylene biosynthesis pathways are well characterized, and include ing stages with log2RPKM > 11 at 6 days post-harvest (DPH) in FJ, 
S-adenosyl-l-methionine synthase (SAMS), 1-aminocyclopropane- which is coincident with the ethylene climacteric period (Fig. 3b,f 
1-carboxylic acid synthase (ACS) and ACO41,42 (Fig. 3a). However, and Supplementary Table 38). In addition, MbACO8, -9, -11, -16, 
the expansion and expression dominance of these homoeologue -18 exhibited high expression levels in roots and fruits at 0 days after 
gene pairs during polyploid fruit ripening remains largely unknown. flowering (DAF) (Fig. 3b and Supplementary Table 38). These genes 
We identified 12 SAMS, 11 ACS and 11 ACO genes from belonged to the same cluster and their expression patterns were 
the A-genome and 10 SAMS, 11 ACS and 18 ACO genes in the highly concordant with their duplication (Fig. 3d,e), suggesting that 
B-genome, which represents a significant expansion compared the expansion and evolution of ACO genes in the B-genome con-
to the seven other sequenced plant species among the mono- tributed to tissue development and fruit ripening.
cots and eudicots22 (Supplementary Tables 27 and 28). We fur- Ripening of FJ was more rapid than BX during post-harvest rip-
ther characterized 28 homoeologue gene pairs from the A- and ening. BX required 8 and 14 DPH to reach the more-green-than-
B-genomes (Supplementary Table 29). These gene pairs displayed yellow and full-yellow stages, respectively, whereas FJ required 
similar expression profiles in the BaXiJiao (Musa AAA group, cv. 3 and 6 DPH to reach these stages, respectively (Supplementary 
Cavendish, BX), the A-subgenome of FenJiao (Musa ABB group, Fig. 20). The dominant expression of ethylene biosynthesis and fruit 
cv Pisang Awak, FJ) and the B-subgenome of FJ (Fig. 3b and ripening-related gene pairs and the expansion of the ACO family in 
Supplementary Tables 30–32). Interestingly, eight gene pairs exhib- the B-genome could have contributed to increased ethylene produc-
ited homoeologue expression dominance in the B-subgenome and tion and faster fruit ripening in FJ compared to BX.
814 NATuRe PLANTS | VOL 5 | AUGUST 2019 | 810–821 | www.nature.com/natureplants
Nature PlaNts Articles
a b
Methionine
SAMS MaSAMS1
MaSAMS2 * MaSAMS1 MbSAMS1
MaSAMS2 MbSAMS2
MaSAMS3 MaSAMS3
MaSAMS4
S-adenosyl- * MaSAMS4 MbSAMS3
* * * * *
L-methionine MaSAMS5 MaSAMS5 MbSAMS4
MaSAMS6
MaSAMS7 * * * * * * * MaSAMS6 MbSAMS5
MaSAMS7 MbSAMS7
MaSAMS8 MaSAMS8 * * * * 5
MaSAMS9 MaSAMS9 MbSAMS6
ACS MaSAMS10
MaSAMS11 * * * * * MaSAMS10 MbSAMS8
MaSAMS11
MaSAMS12 MaSAMS12 MbSAMS10 0
MbSAMS9
MaACS1 MaACS1 MbACS2
MaACS2 MaACS2 MbACS3
1-aminocyclopropane-1-carboxylate MaACS3 MaACS3 MbACS1 −5
MaACS4 MaACS4 MbACS4
MaACS5 MaACS5 MbACS5
MaACS6 MaACS6 MbACS6
ACO MaACS7 MaACS7
MaACS8 * * * * MbACS7 −10
MaACS8 MbACS9
MaACS9 MaACS9 MbACS10
MaACS10 MaACS10
MaACS11 MaACS11 MbACS11
H H MbACS8
MaACO1 MaACO1 * MbACO1
C=C MaACO2 MaACO2 * * MbACO6
MaACO3 MaACO3 * * * ** ** MbACO7
H H MaACO4 MaACO4 MbACO14
MaACO5 MaACO5 MbACO10
MaACO6
MaACO7 * MaACO6 * * MbACO17
MaACO7 MbACO12
MaACO8 MaACO8 MbACO13
MaACO9 MaACO9 * * * * *
MaACO10 MaACO10
MaACO11 MaACO11 * * MbACO15
MbACO9
MbACO11
MbACO2
MbACO4
MbACO18
MbACO8
MbACO16
MbACO3
MbACO5
c MaACS1 d MaACO1
MbACS2 MaACS2
MbACO1 MaACO2
MaACS3
MbACS1 ACH MbACO5
R01A AC
CH MbACO4 HR01 AC
R01 R0
2 2 HR MaACO3
3 A .54
C 0
. BCH
H 3 9 BCHR01
05
1 .
. 9
9 5 2.01
MbACS3 01 R 1.1
1
MaACS4
2 MbACO3 7
4 3 03 2
0 .0 1.12
.7 0 MbACO2 1
6 .9
8 1.13
26.28 .3 6 2.0
9.84
7 MaACO4
40.7
MbACS9 7 MaACS5
40. 26.29 MbACO6 6.3 9.85
6
3 7
9 6
.7 MaACO5
9 26.30
31.78 6.3 9.86
5
39.78 MaACS6 MbACO1210.38 MaACO6
MbACS8 31.79 MbACO11 9.87
MbACS7 39.77
37.23 MbACO10 10.37
31.80 MaACO7
34.3 9
1 .88
37.22   MbACO9
34 10.36
.32    MbACO8 9.
MbACS6 21 2 8
37. 7 9
.68
31.39 3
3 4 MaACS7
5 .3 10.35
2.16
.3 3
3 27
1.38 .6
3 3 MbACO7 9
5.3 MaACS8
MbACS5 3 4
31.37 2.15
27 MaACO8
7.33
.70
ACHR08 16
1.25 R05
H 6 5
B C . .3 .5
C A 32 5
1.2
4 7
HR 7.3
2 BCHR MaACO9
05 R09
MaACS9 08
MbACS4 BC ACH MbACO13
HR B
1 CH
0 R10 ACHR10
ACHR10 MaACO10
MbACS10 MaACS10
MbACS11 MbACO14 MaACO11
MaACS11 MbACO15
e f
18
O1
M
bA
C 1
a O
AC
MaAC
Mb M O13
O
8 A 3
0.2 CO7 MbAC 8
CO
MaA
MaAC 1
O 00
2 95.6
99.6 1
M 0
bA 0
C 7
O6 6
99 .
.8 9 10
0
52
1 .
01.3
9 98.5
9.2 7.4
100
99.6
9 77.4 100
1 71.124.6 87.5
. 9
2 9.7
86.3
99.6 12
S. bicolor 48
89
.7
100 BX
S. lycopersicum M. acuminata
FJ
M. balbisiana
P. dactylifera
A. thaliana
A. officinalis 6
Mb
B. distachyon M A O. sativa
b C
A O
C 10 99.3
M O
bAC 8 9
O 2.95 9.3
11 100 9
78
.2
M 9
a .6
A 92
CO6 96
96.1
M 8
aA 5
CO5 959739...
5786.
185.4
10
0
MbACO9
MbACO16 0
CO10
MaA
MaACO9
BX_0
DAF
FJ_
0D
AF
BX_2
0D
AF
FJ_
20
DAF
BX_0
DPH
FJ_
0D
PH
BX_8
DPH
FJ_
3D
PH
BX_1
4D
PH
FJ_
6D
PH
MaACO4
Fig. 3 | Phylogeny and expression patterns of ethylene biosynthesis genes between M. acuminata (A-genome) and M. balbisiana (B-genome).  
a, Overview of the ethylene biosynthesis pathway. b, Expression patterns of SAMS, ACS and ACO family genes in the root and leaf, and at different 
stages of fruit development and ripening in BX, the A-subgenome of FJ and the B-subgenome of FJ. Genes aligned horizontally in the heat map indicate 
homoeologue gene pairs between the A- and B-genomes. White boxes with diagonals indicate the lack of homoeologue gene pairs between the A- and 
B-subgenomes. Asterisks indicate expression dominance of homoeologue gene pairs between the A- and B-subgenomes of FJ. c,d, Synteny analysis of ACS 
(c) and ACO (d) families between the A- and B-genomes. Red lines indicate paralogous gene pairs resulting from WGD, blue lines indicate homoeologous 
gene pairs, purple lines indicate tandem duplication, light blue strips indicate aligned syntenic blocks, light green strip indicates translocation block and 
light red strips indicate inversion blocks.The blocks in outer ring represent location and length of genes; blue blocks represent genes from A-genome and 
orange blocks represent genes from B-genome. e, Phylogenetic analysis of ACO family genes among nine species: M. acuminata, M. balbisiana, A. thaliana, 
O. sativa, Sorghum bicolor, Solanum lycopersicum, Phoenix dactylifera, Asparagus officinalis and B. distachyon. f, Ethylene production at different stages of fruit 
development and ripening in BX and FJ. Error bars show standard error of the mean from three independent experiments (n = 3).
NATuRe PLANTS | VOL 5 | AUGUST 2019 | 810–821 | www.nature.com/natureplants 815
1 BC
6 . H. 2
8 3 R4 02
6.83
6
3 .8
.4 2
2
M 3.b 41
A
M Cb O
M A 1
b C 8
A O 3.40
M C 1
bA O 7
C 1
O 4
MbA 2
C 5.64
O3
100 5
78 .. 62 599.9 89.6
80.541.6
92.2
36.2 5.
100 2 6
0 6
.04
98.6 20.05
85.3 97.5 2
3 00 .. 0
5 6100
88.7 7
9 5 80
3 .1
.9 3 2
93.8 0.53
M
M bb A
A C
C O
O 5
4 BX_Root
BX_Leaf
BX_0DAF
BX_20DAF
BX_0DPH
BX_8DPH
BX_14DPH
Ethylene production (ng g–1 h–1)
FJ_Root
BC FJ_leaf
HR05 FJ_0DAF
B
1 7 C FJ_20DAF4 .. 3
7 1
H
8 R0 FJ_0DPH3
1 FJ_3DPH
4.77 FJ_6DPH
14.
1 7
0 6
.53
10.52
8.34
FJ_Root
8.3 FJ_leaf
5
FJ_0DAF
8. FJ_20DAF
6 3
. 6
78 FJ_0DPH
6 FJ_3DPH
.79 FJ_6DPH
6
2 .. 8
0 0AC 0
HR07
ACH
R06
16
.58
16
.59
27
.94
27.9
5
27.96
10.51
16.30
16.2
9
16.
28
9.9
7
9.9
6 HR07
BC
BCH
R06
ACHR04
6.3
3
6.3
4
20
.08
20.
09
89.2 99.2
99.3
20.1
0 99
.6
34.7
6
MbAC
O15
98
.7
93.
8 MaACO
11
86.
6
100
78.2 MbACO1299.934.77 97.1 MaACO7
34.78
9
72.
13.45
4
13.4 10
0
43 .81
7
13.4.4
9
8
4.4
4.4
7
23
.0
2
BCHR0
4
Articles Nature PlaNts
a b c
Light reaction
SWEET
Calvin cycle
12 SUT MaAMY1 MaAMY1
MaAMY2 MaAMY2 * * * MbAMY3
MbAMY6
MaAMY3 MaAMY3
Sucrose * * * MbAMY7
10 SuSy MaAMY4 MaAMY4 * * MbAMY8
SuSy MaAMY5 MaAMY5 10
SWEET SUT 8 MaAMY6 MaAMY6 * * MbAMY2
UGPase MaAMY7 MaAMY7 MbAMY11
UDPG MaAMY8 MaAMY8 MbAMY12 5
6 AGPase MaAMY9 MaAMY9 MbAMY13
UGPase MbAMY1
UGPase MbAMY4 0
UDPG 4
G-1-P GBSS MbAMY5
AGPase MbAMY9
2 −5
SSS MbAMY10
ADPG MaBMY1 MaBMY1 * * * MbBMY2
SBE MaBMY2 MaBMY2 * MbBMY3 −10
DBE GBSS SBE MaBMY3 MaBMY3 MbBMY4
SSS SSS MaBMY4 MaBMY4 * * * MbBMY5
MaBMY5 MaBMY5 MbBMY1
DBE MaBMY6 MaBMY6
* * * MbBMY6
Amylopectin Starch Amylose MaBMY7 MaBMY7 MbBMY8
AMY MaBMY8 MaBMY8 MbBMY7
MaBMY9
AMY MaBMY9 MbBMY9
MaBMY10 MaBMY10
BMY MaBMY11 MaBMY11 * * * MbBMY11
MaBMY12
Linear glucans MaBMY12 MbBMY12
MaBMY13 MaBMY13
BMY DPE MbBMY10
MbBMY13
Maltotriose Maltose a a  MaDPE1 MaDPE1
ina
t MbDPE1
ian at
iva MaDPE2 MaDPE2 * * MbDPE2
. a
cu
m
alb
is
. b A. t
ha
lia
na
S. ly
co
pe
rs
icu
m
P. p
er
sic
a
P. t
ric
ho
ca
rp
a
V. v
ini
fe
ra
M B. d
ist
ac
hy
um
DPE DPE  s
O.
Glucose Heteroglucan M
d MaAMY1 e MaBMY1
MbBMY2
MbAMY5 MaBMY2
MaAMY2
MbAMY4 MbBMY1 MaBMY3
ACHR MaAMY3
03 ACH
MbAMY3 R01 A
MbBMY4 01 CH MaBMY4
A R
2 BCHR
0
C 2
5. HR MaAMY4
00 0 24
.44 2 6
0. .
5 3
4 61
2 6.45
MaBMY5
4 0 31.50
.9 24.45
5.97
9 MbBMY3 .55
MbAMY2 MaAMY5 31.51
20.
6 5
.6 4
5 31.52
4.98 5.98
20.27
6.64
MbBMY5 20.28
4.97 5.99
6 MaBMY6
MbAMY1 7.43 .6
2 3
5.06 20.29
5.70
4.96
29.5 7.44
5 25.0 MaBMY7
5
MaAMY6 5.71
MbBMY6
7.45 25.04
29.54 7.48 5.72
28.21 7.08
MbAMY8 MaBMY8
7.47
28. 7
2 .
2 09
29.53 MbBMY8
7.46 7
MbAMY7 MaAMY7 .1
5 0
2 .30
29.52 6.09
28
. .
823
7 6.08 5.31 MaBMY9
MbBMY7
29.51 2.88 6.07 5
2 .3
MbAMY6 6 2
2 MaAMY8 30.27
.19
29.50
3 .
6 8 7 26.
. 9 2
7.30 BC 4 30.26
4 MbBMY10 5 2 0 MaBMY10
HR0
3 6
6. 30.2
5.63 AC 8
7.29 HR05 B .
C 0 .21
45 H
MbAMY10 R 8 7
0 0
7 B HR
BC MaAMY9
HR C
07 H
CHR08 BCHR10
MbBMY9 R AC MaBMY11
B 08 BCHR10 ACHR10
MbAMY9 MaBMY12
MbBMY11
MbAMY11 MbAMY13
MbAMY12 MbBMY12 MaBMY13
MbBMY13
f g
90
60 BX
FJ
BX_0DAF BX_20DAF BX_80DAF BX_8DPH BX_14DPH
30
0
10µm FJ_0DAF FJ_20DAF FJ_80DAF FJ_3DPH FJ_6DPH
0D
AF
BX_
FJ_
0D
AF
BX_2
0D
AF
FJ_
20
DAF
BX_0
DPH
FJ_
0D
PH
BX_8
DPH
FJ_
3D
PH
BX_1
4D
PH
FJ_
6D
PH
Fig. 4 | Comparison of genomic expansion, evolutionary history and differential expression patterns of the starch metabolic pathway between M. 
acuminata (A-genome) and M. balbisiana (B-genome). a, Overview of the starch biosynthesis and degradation pathway. b, Gene families in the starch 
metabolic pathway that are expanded in M. acuminata and M. balbisiana. c, Expression patterns of families AMY, BMY and DPE in the starch degradation 
pathway in BX, the A-subgenome of FJ and the B-subgenome of FJ during fruit-ripening stages. Horizontally oriented genes in the heat map indicate 
homoeologue gene pairs between the A- and B-genomes. White boxes with diagonals indicate that no homoeologue gene pairs were identified between 
the A- and B-genomes. Asterisks indicate expression dominance of homoeologue gene pairs between the A-subgenome of FJ and the B-subgenome of FJ. 
d,e, Synteny analyses of AMYs (d) and BMYs (e) between the A- and B-genomes. Red lines indicate paralogous gene pairs resulting from segmental/WGD-
driven duplication, blue lines indicate homoeologous gene pairs, purple lines indicate tandem duplication, light blue strips indicate aligned syntenic blocks, 
light green strip indicates translocation block and light red strips indicate inversion blocks. The blocks in the outer ring represent location and length of genes; 
blue blocks represent genes from A-genome and orange blocks represent genes from B-genome. f, Starch contents at different stages of fruit development 
and ripening in BX and FJ. Error bars show standard error of the mean from three independent experiments (n = 3). g, Scanning electron microscopy of starch 
granules at different stages of fruit development and ripening in BX and FJ. The experiment was repeated three times independently with similar results.
816 NATuRe PLANTS | VOL 5 | AUGUST 2019 | 810–821 | www.nature.com/natureplants
Starch content (%)
7
7 .. 2
1 8
BC
9 HR
7 0
.1 3
8
7.
6 1
. 7
89
6.88
6.87
3.04
4.88
4.89
4.9
2 0
4.40
24.41
A 2C 4.4
H 2R1 20 4.4
A 3
CHR08
ACHR07
BX_0DPH
BCHR04 BCHR BX_8DPH
5 0. 3
62 B BX_14DPH
2 5 C8 . H. 6
1 1 R0
2 1 2
8.1
2 0
8
1 .
2 0
. 9
88
12.87 FJ_0DPH
12.
1 8
4 6
.2 FJ_3DPH
2
FJ_6DPH
14.21
14.20 7.68
7.69
7.70
17.52 FJ_0DPH
17.53 FJ_3DPH
17 FJ_6DPH
2 .3 5
. 4
42
23.4
2 3
6 3
. .
4 44
6 3
A .
C 4
H 4
R06
ACH
R04
ACHR05
log (RPKM)
2
HR
03 8.
08
AC 8.0
9
15
.53
15.
54
15.
55
24.3
0
24.3
1
24.32
15.61
5.601
15.5
9
3.4
5
4
3.4
3.4
3
26
.33 HR06
BC
26
.3
2
BCHR05
ACHR
05
36
.46
1.6
1
1.62
1.63
Gene number 3.02
3.03
5.05
5.04
5.0
3
5.0
2
1
5.0
BCHR
04
Nature PlaNts Articles
The active starch metabolic pathway in the B-genome during genes related to starch degradation in the B-subgenome also prob-
fruit development and the ripening process. Starch is the most ably led to elevated starch degradation in FJ (Fig. 4c,f,g). Therefore, 
widespread and abundant storage carbohydrate in plants. It is also the active starch metabolic pathway in the B-genome leads to 
a major component of cultivated banana, accumulating to high lev- marked starch accumulation and degradation during the fruit 
els (~60–75% of dry weight) and leading to the presence of large development and ripening processes, respectively.
starch granules (~8–30 m) during banana fruit development, along 
with near-complete conversion to soluble sugars during post-har- Discussion
vest ripening48–51. Thus, banana could serve as an excellent model Most cultivated bananas are triploid, having evolved from two 
for the investigatation of starch metabolism in fresh starchy fruits. wild diploid species, M. acuminata and M. balbisiana6. Our previ-
The major enzymes that are responsible for starch biosynthesis ous M. acuminata genome sequencing efforts provided genomic 
(sugars will eventually be exported transporter: SWEET; sucrose resources to inform banana breeding1. However, there is also an 
transporter: SUT; sucrose synthase: SuSy; UDP-glucose pyrophos- urgent need to develop the genomic resources of M. balbisiana, 
phorylase: UGPase; ADP-glucose pyrophosphorylase: AGPase; which is a crucial contributor to cultivated varieties. Improved 
granule-bound starch synthase: GBSS; soluble starch synthase: SSS; understanding of the B-genome structure, subgenome evolution, 
starch branching enzyme: SBE; and starch debranching enzyme: homoeologous exchange, genetic diversity in polyploidy bananas 
DBE) and degradation (α-amylase: AMY; β-amylase: BMY; and and gene expansion and expression patterns will help in the design 
starch phosphorylase: DPE) are encoded by multigenic families51–56 and application of breeding strategies for novel banana cultivars 
(Fig. 4a). We identified 101 starch metabolism-related genes in the with improved traits.
A-genome, including 77 in the starch synthesis pathway and 24 in Cycles of WGD followed by diploidization events have occurred 
the starch degradation pathway. Ninety-six such genes were identi- across land plants, and have significantly contributed to their evo-
fied in the B-genome, including 68 in the starch synthesis pathway lutionary success24,25. Our results indicate that the B-genome may 
and 28 in the starch degradation pathway (Supplementary Table 39). be more sensitive to fractionation than the A-genome after WGD, 
In the starch synthesis pathway, five (SuSy, GBSS, SSS, SBE although the A- and B-subgenomes have diverged very recently. 
and DBE) out of nine families showed obvious expansion in the Variation in genomic structure between the A- and B-genomes 
A- and B-genomes of banana, compared to seven other plant consists of chromosome rearrangements and gene loss during 
species that included two fruit plants, tomato and grape. These diploidization, which has resulted in the functional divergence of 
expansions suggest a potential significance in banana (Fig. 4b subgenomes in polyploidy bananas. This divergence is supported 
and Supplementary Table 40). We characterized 54 homoeologue by differential enrichment of expanded/contracted gene families 
gene pairs from these families in the A- and B-genomes. Of these, between the A- and B-genomes and the expression dominance 
27 homoeologue gene pairs had expression dominance in root, leaf of homoeologue genes from A- and B-subgenomes in triploids. 
and fruit tissues with seven dominant in the A-subgenome and 20 Although homoeologue expression dominance has been identi-
in the B-subgenome (Supplementary Fig. 21 and Supplementary fied in certain polyploid species57–60, the relationship between 
Tables 41–45). Consequently, the starch synthesis pathway is more homoeologue expression dominance and functional divergence 
active in different tissues within the B-subgenome than in the (especially in regard to ethylene biosynthesis/starch metabo-
A-subgenome. Of those gene pairs with dominant expression in the lism) of subgenomes in triploids remains to be elucidated. Thus, 
B-subgenome, MbSWEET17, MbSuSy1, MbSuSy2 and MbSuSy9 these results provide an important basis for the improvement of 
had high expression levels (log2-based RPKM > 6) at 0 DAF and agriculturally important traits by focusing selection on transcrip-
20 DAF, suggesting an important role in starch synthesis during tionally dominant genes. It is worth noting that homoeologous 
fruit development (Supplementary Fig. 21 and Supplementary exchanges may obscure the signal of expression dominance in 
Table 43). In contrast, most of the starch synthesis-related genes subgenomes of allopolyploids61. The extensive homoeologue 
that were unique to the A- or B-genome exhibited low expression exchanges in allopolyploid bananas may remove many progenitor 
levels (Supplementary Fig. 22). genome conflicts that result in subgenome biases in gene content 
Within the starch degradation pathway, genome annotation and expression. Thus, homoeologous exchanges may contribute 
indicated that families AMY and BMY had an obvious expansion to the diversity of homoeologue expression dominance and 
in the banana A- and B-genomes compared to seven other plant induce a series of rapid genetic and epigenetic modifications for 
species (Fig. 4b and Supplementary Table 40). We characterized agronomic traits61.
21 homoeologue gene pairs within this pathway from the A- and Previous studies have suggested an important role of ethylene 
B-genomes. Among these gene pairs, 11 had dominant expression production in fruit ripening and starch metabolism with regard to 
and were associated with the B-subgenome during fruit ripening fruit quality post-harvest27,43,55. However, the genetic mechanisms 
(Fig. 4c and Supplementary Tables 46–50). Among the dominant underlying polyploid fruit ripening are less well known. Here, we 
genes, MbAMY-2, MbAMY-3, MbAMY-8, MbBMY-6, MbBMY-8 analysed biological processes related to these two pathways in trip-
and MbDPE-2 exhibited high expression (log2-based RPKM > 6 in loid bananas. We identified significant genomic expansions and 
at least one stage) during fruit ripening (Fig. 4c and Supplementary dominant expression of homoeologue genes in the B-genome at 
Table 48). MbAMY-1, -2, -3, -4, -5 and MbAMY-6, -7, -8 showed close the pathway level of gene families and, most notably, in the ACO 
proximity and an evolutionary relationship, suggesting that these family, known to play a critical role in ethylene production. Our 
genes duplicated from a tandem copy (Fig. 4d and Supplementary analysis revealed the origin and evolution of crucial gene families 
Fig. 23). MbBMY-5, -8 and MbBMY-6, -12 are two paralogous pairs in these pathways, particularly for the independent origin of the 
in syntenic blocks of the B-genome and showed close evolutionary MaACO2/MbACO6 and MaACO3/MbACO7 gene pairs and their 
relationships, suggesting that these genes duplicated from WGD specific function in fruit ripening. Moreover, we identified that this 
(Fig. 4e and Supplementary Fig. 24). Taken together, these results tandem event has led to expansion of ACO genes in the B-genome 
suggest that tandem-driven AMY duplication and WGD-driven and to strong expression of these genes during fruit ripening. Our 
BMY duplication in the B-genome contribute to starch degradation. analysis also indicated a potential link between the dominant expres-
The dominant expression of genes involved in starch biosynthe- sion of homoeologue genes and the expansion of gene families with 
sis in the B-subgenome could have led to increased starch accumu- fruit ripening and starch metabolism. Previous studies have dem-
lation during fruit development in FJ relative to BX (Fig. 4f,g and onstrated that the B-genome is associated with improved vigour 
Supplementary Fig. 21). In addition, the dominant expression of and tolerance to both biotic and abiotic stresses. Consequently, the 
NATuRe PLANTS | VOL 5 | AUGUST 2019 | 810–821 | www.nature.com/natureplants 817
Articles Nature PlaNts
B-genome is a target for banana breeding programmes2. Here, we and PlantTFDB71,72, with families from PlantTFcat73 and AtTFDB74 used as 
highlight that the B-genome is of great importance in ethylene bio- supporting evidence. In total, we identified 3,329 transcription factor genes in  
synthesis and post-harvest banana ripening, which will further our M. balbisiana and 3,899 in M. acuminata (Supplementary Table 6).
understanding of ripening mechanisms in polyploid fruit. Gene family analysis. A total of 500,142 genes from 16 plant species with 
The M. balbisiana genome assembly, along with our previously available whole-genome sequences were used for gene family clustering analysis. 
acquired M. acuminata genome data, may aid in the discovery of BLAST67 (v.2.2.26; -p blastp) was used to generate pairwise protein sequence 
cultivar-specific sequences that are related to important cultivar- alignments with E-values of < 1 × 10–5. Then, OrthoMCL22 was used to cluster 
specific traits, including shelf life, quality and stress tolerance. similar genes by setting the main inflation value at 1.5 and using default settings 
for the other parameters. These analyses resulted in 39,358 gene families 
Thus, these resources can be used to build molecular characteriza- comprising 393,700 genes from the 16 species (Supplementary Table 7  
tion strategies for various cultivars to assist in rapid quality control and Supplementary Fig. 7).
and the conservation of biodiversity. The data from this study also We identified 519 single-copy gene families shared among the 16 species, 
pave the way for whole-genome association studies, germplasm and constructed a phylogenetic tree using MrBayes (v.3.1.2)75 software with the 
improvement and genetic modification of bananas to meet increas- general time-reversible model (Supplementary Fig. 4). Divergence times for the 
16 species were also estimated based on fourfold degenerate sites of all single-
ing commercial demands. These genomic resources and results also copy orthologous genes using the MCMCTree programme in the PAML package 
reinforce the use of the banana as an appropriate model to study (v.4.4)76 (Supplementary Fig. 5).
subgenome evolution and fruit biology in triploid variants. Due to CAFÉ (v.2.1)23 was used to analyse the expansion and contraction of gene 
the sterility and seedlessness of banana cultivars, further efforts will families. A random birth-and-death model was used to assess gene gain or loss 
in gene families across the specified phylogenetic tree (Supplementary Fig. 9). 
be needed to leverage the key gene resources from precisely charac- Families with P < 0.05 were considered as significant expansion or contraction, and 
terized germplasms to achieve effective breeding schemes. pathway enrichment analysis of these families was conducted using the enrichment 
pipeline77. Additional details are available in the Supplementary note.
Methods
Sample collection. A double haploid of the wild diploid genotype PKW; 2n = 2x = 22 Whole-genome alignment analysis. MCSCAN78 (parameters: -a -e 1e-5 -s 5) was 
was provided by the Centre de Coopération Internationale en Recherche used to detect collinearity within M. acuminata (A-genome) and M. balbisiana 
Agronomique pour le Développement (CIRAD) for genome sequencing. Fresh, (B-genome) and among various species. Syntenic blocks containing at least ten 
unexpanded leaves were harvested and then frozen immediately with liquid nitrogen gene pairs were obtained. All of the orthologous and paralogous gene pairs were 
to preserve genomic DNA for isolation. High-molecular weight genomic DNA was extracted from the syntenic blocks for calculation of 4dTv79 distances using the 
extracted using a standard cetyltrimethyl ammonium bromide (CTAB) method62. HKY substitution model80 (Supplementary Fig. 8). Additional details are available 
DNA integrity was assessed by agarose gel electrophoresis (concentration of agarose in the Supplementary note.
gel, 1%; voltage, 150 V; electrophoresis time, 40 min). Finally, DNA was purified from 
the gel using a QIAquick Gel Extraction kit (QIAGEN). Orthologous gene pair analysis. BLAST67 (v.2.2.26, -p blastp) was used to align 
M. acuminata proteins to M. balbisiana proteins for identification of orthologous 
Library construction and sequencing. One paired-end and eight mate-pair genes. The value 1 × 10–5 was used as a cut-off to define the raw orthologues. We 
libraries were constructed for short-read sequencing on the Illumina HiSeq 2000 then filtered the BLAST results using two parameters (CIP ≥ 60% and CALP ≥ 60% 
platform, which generated around 86.34 Gb (166× coverage) of high-quality data. (ref. 35). We identified 25,717 orthologous gene pairs (81.83% consistency with 
For long-read DNA sequencing, 5.79 million SMRT long reads (58.99 Gb data, syntenic blocks) between M. acuminata and M. balbisiana using these two 
113× coverage) were sequenced using the PacBio Sequel system with libraries of parameters (Supplementary Table 22). The orthologous gene pairs were first 
20-kb insert size; sub-reads had a mean length of 10.2 kb and N50 length of 16.6 kb. aligned using MUSCLE (v.3.8.31)81, then the Ka/Ks ratio of each gene pair was 
One Hi-C library was prepared and sequenced on Illumina NovaSeq 6000 to calculated using KaKs_Calculator (v.2.0)82 with model yn00. The significant 
generate 71.96 Gb (138× coverage) of high-quality data63 (Supplementary Table 1). difference between Ka/Ks values was detected by Student’s t-test.
Additional details are available in the Supplementary note. Re-sequencing analysis. Nine different genotypes of banana were used for re-
Genome assembly. De novo assembly of DH-PKW was performed using wtdbg sequencing, including the triploid plants BaXiJiao (subgroup Cavendish, AAA), 
(v.1.2.8; https://github.com/ruanjue/wtdbg) based on PacBio data (only reads Gros_Michel (subgroup Gros Michel, AAA), FenJiao (subgroup Pisang Awak, 
longer than 1 kb were used in assembly). The assembled genome was corrected for ABB), Kamaramasenge (AAB) and Pelipita (ABB), in addition to the diploid plants 
two rounds using the ‘wtdbg-cns’ programme in the wtdbg package. We then used Pisang_Mas (subgroup Sucrier, AA), Pisang_Kra (subsp. malaccensis, AA), DH-
the algorithm Arrow (https://github.com/PacificBiosciences/GenomicConsensus), PKW (BB) and balbisiana (BB) (Supplementary Table 13). BaXiJiao, Gros_Michel 
which takes into account all of the underlying data and the raw quality values and FenJiao were obtained from the Tissue Culture Centre of the Chinese Academy 
inherent in SMRT sequencing, to polish the assembly again for the final consensus of Tropical Agricultural Sciences (CATAS). Pelipita, Pisang_Mas, Pisang_Kra, 
accuracies. The final consensus contigs were scaffolded using the SSPACE-standard M. balbisiana and Kamaramasenge were provided by the Bioversity International 
programme64 (v.3.0) with meta-pair reads from libraries of insert size 2–20 kb. Musa Transit Centre. Genomic DNA was extracted from fresh leaves of seedlings at 
Based on Hi-C data, 430.02-Mb scaffolds were anchored to 11 pseudo-molecules the five-leaf stage using the CTAB method62.
using LACHESIS software13 (Supplementary Fig. 2). Chromosomes were numbered Paired-end reads with libraries of 500-bp insert size were aligned to the 
according to the linkage group nomenclature adopted for M. acuminata. Additional A- and B-genomes simultaneously using BWA (v.0.7.12)66 with the parameters 
details regarding genome assembly are provided in the Supplementary note. ‘bwa aln -t 20 -l 35’ (Supplementary Table 14), and only uniquely mapped reads 
were kept. Potential PCR duplicates were marked using Picard (v.1.54, https://
broadinstitute.github.io/picard/) and indexed using the SAMtools package83. 
Evaluation of assembly quality. BUSCO (v.3) was used to assess assembly The Genome Analysis Toolkit84 was then used to infer SNPs and InDels. SNP 
completeness11. We mapped 29,610 M. acuminata-expressed sequence tags (ESTs) identifications were filtered based on the following parameters: ‘QD < 2.0 || 
to the assembled genome using BLAT65 (v.35) with default parameters. In total, FS > 60.0 || MQ < 40.0 || MQRankSum < −12.5 || ReadPosRankSum < −8.0’, and 
93.59% of the ESTs were aligned to the genome with identity >90%. Additionally, InDels were filtered based on the following parameters: ‘QD < 2.0 || FS > 200.0 || 
BWA66 v.0.7.12 (aln -l 35) was used to map 59× Illumina reads to the assembly,  ReadPosRankSum < −20.0’ (Supplementary Tables 16 and 17). Breakdancer (http://
and 96.11% of the reads were mapped to the assembled genome. Additional details breakdancer.sourceforge.net/)85 was used to detect structural variations by checking 
are available in the Supplementary note. paired-end reads with an abnormal orientation and/or span. The final structural 
variations were filtered using the following requirements: (1) minimum size of 100 
Genome annotation. Repetitive sequences within the M. balbisiana genome and maximum size of 1,000,000; (2) minimum score of ≥30; and (3) minimum 
were identified by a combination of homology-based and de novo approaches number of required reads supporting each structural variation ≥5 (Supplementary 
(Supplementary Table 1). Gene structures were annotated iteratively using three Table 18). Nucleotide diversity (π) was analysed using VCFtools (v.0.1.13; https://
main approaches (ab initio predictions, homologue proteins and transcriptome vcftools.github.io/man_latest.html).
data) that were combined using MAKER15 (v.2.31.8) (Supplementary Table 5). 
Gene functions were annotated according to the best match of the alignments Analysis of homoeologous exchanges. Assessment of read coverage depth was 
using blastp67 (E-value < 1 × 10–5) against the Swiss-Prot68, TrEMBL68, NR  used to detect homoeologous exchanges between the A- and B-subgenomes. 
(https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/), KOG69 and KEGG70 databases. We inferred these based on cases where reads coverage of a given region on one 
Additional details are available in the Supplementary note. parental genome was significantly high while the orthologous one was low or had 
no coverage. High coverage indicates duplication, and low or no coverage indicates 
Transcription factors. We used the iTAK programme16 to identify transcription loss34. The uniquely mapped paired-end reads were used to calculate the coverage 
factors based on the consensus rules that are mainly summarized within PlnTFDB depth of each sample on the A- and B-genomes (Supplementary Figs. 25–27). 
818 NATuRe PLANTS | VOL 5 | AUGUST 2019 | 810–821 | www.nature.com/natureplants
Nature PlaNts Articles
According to coverage depth, we detected homoeologous exchanges in the triploids Received: 7 May 2018; Accepted: 20 May 2019;  
‘FenJiao (ABB)’, ‘Pelipita (ABB)’ and ‘Kamaramasenge (AAB)’ (Supplementary Published online: 15 July 2019
Table 15). Additional details are available in the Supplementary note.
Transcriptome analysis. Banana fruits at different stages of development (0, References
20 and 80 DAF) and ripening (8 and 14 DPH for BX, 3 and 6 DPH for FJ) were  1. D’Hont, A. et al. The banana (Musa acuminata) genome and the evolution of 
obtained from the banana plantation at the Institute of Tropical Bioscience and monocotyledonous plants. Nature 488, 213–217 (2012).
Biotechnology (Chengmai, Hainan, 20° N, 110° E). Two-month-old BX and FJ  2. Davey, M. W. et al. A draft Musa balbisiana genome sequence for molecular 
banana seedlings were obtained from the Tissue Culture Centre of CATAS and genetics in polyploid, inter-and intra-specific Musa hybrids. BMC Genom. 14, 
cultured in Hoagland’s solution. Seedlings at the five-leaf stage were treated 683 (2013).
with 200 mM mannitol for 7 days, 300 mM NaCl for 7 days and low-temperature  3. De Langhe, E. et al. Why bananas matter: an introduction to the history of 
conditions (4 °C) for 22 h. Fruit, root and leaf samples were frozen in liquid banana domestication. Ethnobot. Res. Appl. 7, 165–177 (2009).
nitrogen and stored at −80 °C until RNA extraction for transcriptome analysis.  4. Paul, J. Y. et al. Golden bananas in the field: elevated fruit pro‐vitamin A 
Total RNAs were isolated using a plant RNA extraction kit (TIANGEN). Total from the expression of a single banana transgene. Plant Biotechnol. J. 15, 
RNA (3 μg) from each sample was converted to complementary DNA using a 520–532 (2017).
RevertAid First-Strand cDNA Synthesis Kit (Fermentas). cDNA libraries were  5. Cheesman, E. E. Classification of the bananas. Kew Bull. 2, 97–117 (1947).
constructed using TruSeq RNA Library Preparation Kit v.2, and were subsequently  6. Simmonds, N. W. & Shepherd, K. The taxonomy and origins of the cultivated 
sequenced on the Illumina HiSeq 2000 platform using the Illumina RNA-Seq bananas. Bot. J. Linn. Soc. 55, 302–312 (1955).
protocol. Two biological replicates were used for each sample. A total of 159.14 Gb  7. Daniells, J. et al. Diversity in the Genus Musa (INIBAP, 2001).
(Supplementary Table 21) of high-quality clean data were produced. Gene  8. Häkkinen, M. Reappraisal of sectional taxonomy in Musa (Musaceae).  
expression levels were calculated as RPKM86. Differentially expressed genes were Taxon 62, 809–813 (2013).
identified by methods previously established with the read count of two replicates  9. Martin, G. et al. Improvement of the banana ‘Musa acuminata’ reference 
for each gene (fold change ≥2; false discovery rate ≤0.001)87. Additional details are sequence using NGS data and semi-automated bioinformatics methods.  
available in the Supplementary note. BMC Genom. 17, 243 (2016).
 10. Assani, A. et al. Production of haploids from anther culture of banana  
Weighted gene co-expression network analysis. Gene expression patterns for all [Musa balbisiana (BB)]. Plant Cell Rep. 21, 511–516 (2003).
identified genes were used to construct a co-expression network using WGCNA  11. Simao, F. A. et al. BUSCO: assessing genome assembly and annotation 
(v.1.47)39. Genes without expression detected in all tissues were removed before completeness with single-copy orthologs. Bioinformatics 31,  
analyses. Soft thresholds were set based on the scale-free topology criterion 3210–3212 (2015).
employed in ref. 88. An adjacency matrix was developed using squared Euclidean  12. Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data 
distance values, and the topological overlap matrix was calculated for unsigned processing. Genome Biol. 16, 259 (2015).
network detection using the Pearson method. Co-expression coefficients >0.55  13. Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome 
between the target genes were then selected. Finally, the network connections were assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119 (2013).
visualized using cytoscape89.  14. Li, C. et al. Genome sequencing and assembly by long reads in plants.  
Genes 9, 6 (2017).
Identification of gene families involved in ethylene biosynthesis and starch  15. Campbell, M. S. et al. Genome annotation and curation using MAKER and 
metabolism pathways. We compared genes related to ethylene biosynthesis and MAKER‐P. Curr. Protoc. Bioinformatics 48, 4–11 (2014).
starch metabolism in the M. balbisiana genome to those annotated in both M.  16. Zheng, Y. et al. iTAK: a program for genome-wide prediction and 
acuminata and other plant genomes, including B. distachyon, O. sativa, Arabidopsis classification of plant transcription factors, transcriptional regulators, and 
thaliana, Solanum lycopersicum, Prunus persica, Populus trichocarpa and V. protein kinases. Mol. Plant 9, 1667–1670 (2016).
vinifera. We retrieved protein sequences of these gene families from 16 species  17. Givnish, T. J. et al. Monocot plastid phylogenomics, timeline, net rates of 
(Supplementary Table 51) for homologue-based searches with the criteria similarity species diversification, the power of multi-gene analyses, and a functional 
>80% and coverage >80%. We then confirmed the presence of the conserved domain model for the origin of monocots. Am. J. Bot. 105, 1888–1910 (2018).
within all protein sequences and removed members without a complete domain.  18. Lescot, M. et al. Insights into the Musa genome: syntenic relationships to rice 
and between Musa species. BMC Genomics 9, 58 (2008).
Determination of total starch content. Banana pulp was immersed in 0.5%  19. Christelova, P. et al. A multi gene sequence-based phylogeny of the Musaceae 
sodium bisulfite for 10 min to prevent browning, and then dried at 40 °C for 24 h. (banana) family. BMC Evol. Biol. 11, 103 (2011).
Pulp was then ground and centrifuged. The residue was suspended in 5 ml of  20. Janssens, S. B. et al. Evolutionary dynamics and biogeography of Musaceae 
80% Ca(NO3)2 in a boiling water bath for 10 min then centrifuged for 4 min at reveal a correlation between the diversification of the banana family and  
4,000 r.p.m. The supernatant was transferred to a 20-ml volumetric flask and the the geological and climatic history of Southeast Asia. New Phytol. 210, 
1453–1465 (2016).
residue was extracted twice with 80% Ca(NO3)2, which yielded a combined extract  21. Mandáková, T. et al. How diploidization turned a tetraploid into a 
volume of 20 ml. All experiments were repeated three times. The total starch 
content was assessed following methods described by Yang et al.90 pseudotriploid. Am. J. Bot. 103, 1187–1196 (2016).
.  22. Li, L., Stoeckert, C. J. & Roos, D. S. OrthoMCL: identification of ortholog 
groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003).
Scanning electron microscopy. Samples were fixed in stubs using double-faced  23. De Bie, T. et al. CAFE: a computational tool for the study of gene family 
tape and coated with a 10-nm platinum layer in a Bal-tec MEDo020 coating system evolution. Bioinformatics 22, 1269–1271 (2006).
(Kettleshulme). The prepared samples were analysed using an FEI Quanta 600  24. Bennett, R. N. & Wallsgrove, R. M. Secondary metabolites in plant defence 
FEG scanning electron microscope (FEI Co.). Observations were performed in mechanisms. New Phytol. 127, 617–633 (1994).
secondary electron mode while operating at 15 kV.  25. Baurens, F. C. et al. Recombination and large structural variations shape 
interspecific edible bananas genomes. Mol. Biol. Evol. 36, 97–111 (2018).
Measurement of ethylene production during fruit post-harvest stage. Ethylene  26. Saxena, R. K., Edwards, D. & Varshney, R. K. Structural variations in plant 
production was measured by enclosing fruit samples in an airtight container for 2 h genomes. Brief. Funct. Genom. 13, 296–307 (2014).
at 25 °C. After incubation, 1 ml of the headspace gas was withdrawn and injected  27. Jourda, C. et al. Expansion of banana (Musa acuminata) gene families 
into a gas chromatograph (GC-2010, Shimadzu) fitted with a flame ionization involved in ethylene biosynthesis and signalling after lineage specific whole 
detector and an activated alumina column. Ethylene production measurements genome duplications. New Phytol. 202, 986–1000 (2014).
were obtained as recommended by the manufacturer.  28. Garsmeur, O. et al. Two evolutionarily distinct classes of paleopolyploidy. 
Mol. Biol. Evol. 31, 448–454 (2013).
Reporting Summary. Further information on research design is available in the  29. Dodsworth, S., Chase, M. W. & Leitch, A. R. Is post‐polyploidization 
Nature Research Reporting Summary linked to this article. diploidization the key to the evolutionary success of angiosperms?  
Bot. J. Linn. Soc. 180, 1–5 (2016).
Data availability  30. Langham, R. J. et al. Genomic duplication, fractionation and the origin of 
Raw sequence reads for B-genome assembly and transcriptome for all samples regulatory novelty. Genetics 166, 935–945 (2004).
were deposited in the CNSA (https://db.cngb.org/search/project/CNP0000292/) of  31. Ni, Z. et al. Altered circadian rhythms regulate growth vigor in hybrids and 
CNGBdb with accession number CNP0000292 and Sequence Read Archive of the allopolyploids. Nature 457, 327 (2009).
National Centre for Biotechnology Information (NCBI) under the BioProject (No.  32. de Jesus, O. N. et al. Genetic diversity and population structure of Musa 
PRJNA432894). Genome assembly and annotation of DH-PKW were submitted accessions in ex situ conservation. BMC Plant Biol. 13, 41 (2013).
to NCBI (No. PYDT00000000). Assembly and gene annotation of the A-genome  33. D’Hont, A. et al. The interspecific genome structure of cultivated banana, 
(DH-Pahang) are available on the Banana Genome Hub (http://banana-genome- Musa spp. revealed by genomic DNA in situ hybridization. Theo. Appl. Genet. 
hub.southgreen.fr/). 100, 177–183 (2000).
NATuRe PLANTS | VOL 5 | AUGUST 2019 | 810–821 | www.nature.com/natureplants 819
Articles Nature PlaNts
 34. Chalhoub, Boulos et al. Early allopolyploid evolution in the post-Neolithic  66. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-
Brassica napus oilseed genome. Science 345, 950–953 (2014). Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
 35. Salse, J. et al. Improved criteria and comparative genomics tool provide new  67. Mount, D. W. Using the basic local alignment search tool (BLAST). Cold 
insights into grass paleogenomics. Brief. Bioinformatics 10, 619–630 (2009). Spring Harb. Protoc. https://doi.org/10.1101/pdb.top17 (2007).
 36. Salse, J. et al. Reconstruction of monocotelydoneous proto-chromosomes  68. Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and 
reveals faster evolution in plants than in animals. Proc. Natl Acad. Sci. USA its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000).
106, 14908–14913 (2009).  69. Tatusov, R. L. et al. The COG database: an updated version includes 
 37. Ren, L. et al. Determination of dosage compensation and comparison of gene eukaryotes. BMC Bioinformatics 4, 41 (2003).
expression in a triploid hybrid fish. BMC Genom. 18, 38 (2017).  70. Ogata, H. et al. KEGG: Kyoto Encyclopedia of Genes and Genomes.  
 38. Guo, M., Davis, D. & Birchler, J. A. Dosage effects on gene expression in a Nucleic Acids Res. 27, 29–34 (1999).
maize ploidy series. Genetics 142, 1349–1355 (1996).  71. Pérez-Rodríguez, P. et al. PlnTFDB: updated content and new features  
 39. Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation of the plant transcription factor database. Nucleic Acids Res. 38,  
network analysis. BMC Bioinformatics 9, 559 (2008). D822–D827 (2009).
 40. Kumar, R., Khurana, A. & Sharma, A. K. Role of plant hormones and their  72. Jin, J. et al. PlantTFDB 3.0: a portal for the functional and evolutionary study 
interplay in development and ripening of fleshy fruits. J. Exp. Bot. 65, of plant transcription factors. Nucleic Acids Res. 42, D1182–D1187 (2013).
4561–4575 (2014).  73. Dai, X. et al. PlantTFcat: an online plant transcription factor and transcriptional 
 41. Adams, D. O. & Yang, S. F. Ethylene biosynthesis: identification of regulator categorization and analysis tool. BMC Bioinformatics 14, 321 (2013).
1-aminocyclopropane-1-carboxylic acid as an intermediate in the conversion  74. Davuluri, R. V. et al. AGRIS: Arabidopsis gene regulatory information server, 
of methionine to ethylene. Proc. Natl Acad. Sci. USA 76, 170–174 (1979). an information resource of Arabidopsis cis-regulatory elements and 
 42. Yang, S. F. & Hoffman, N. E. Ethylene biosynthesis and its regulation in transcription factors. BMC Bioinformatics 4, 25 (2003).
higher plants. Annu. Rev. Plant Physiol. 35, 155–189 (1984).  75. Huelsenbeck, J. P. & Ronquist, F. MRBAYES: Bayesian inference of 
 43. Liu, X. et al. Characterization of ethylene biosynthesis associated with phylogenetic trees. Bioinformatics 17, 754–755 (2001).
ripening in banana fruit. Plant Physiol. 121, 1257–1265 (1999).  76. Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood.  
 44. Iqbal, N. et al. Ethylene role in plant growth, development and senescence: Mol. Biol. Evol. 24, 1586–1591 (2007).
interaction with other phytohormones. Front. Plant. Sci. 8, 475 (2017).  77. Huang, D. W., Sherman, B. T. & Lempicki, R. A. Bioinformatics enrichment 
 45. Tomato Genome Consortium. The tomato genome sequence provides insights tools: paths toward the comprehensive functional analysis of large gene lists. 
into fleshy fruit evolution. Nature 485, 635 (2012). Nucleic Acids Res. 37, 1–13 (2008).
 46. Teh, B. T. et al. The draft genome of tropical fruit durian (Durio zibethinus).  78. Tang, H. et al. Synteny and collinearity in plant genomes. Science 320, 
Nat. Genet. 49, 1633 (2017). 486–488 (2008).
 47. Lynch, M. & Conery, J. S. The evolutionary fate and consequences of  79. Huang, S. et al. The genome of the cucumber, Cucumis sativus L. Nat. Genet. 41, 
duplicate genes. Science 290, 1151–1155 (2000). 1275–1281 (2009).
 48. Hubbard, N. L., Pharr, D. M. & Huber, S. C. Role of sucrose phosphate  80. Hasegawa, M., Kishino, H. & Yano, T. A. Dating of the human-ape splitting 
synthase in sucrose biosynthesis in ripening bananas and its relationship to by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22, 160–174 (1985).
the respiratory climacteric. Plant Physiol. 94, 201–208 (1990).  81. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and 
 49. do Nascimento, J. R. O. et al. Beta-amylase expression and starch degradation high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
during banana ripening. Postharvest Biol. Tec. 40, 41–47 (2006).  82. Wang, D. et al. KaKs_Calculator 2.0: a toolkit incorporating gamma-series 
 50. Tribess, T. et al. Thermal properties and resistant starch content of green methods and sliding window strategies. Genom. Proteom. Bioinf. 8,  
banana flour (Musa cavendishii) produced at different drying conditions. 77–80 (2010).
LWT-Food Sci. Technol. 42, 1022–1025 (2009).  83. Li, H. et al. The sequence alignment/map format and SAMtools. 
 51. Jourda, C. et al. Lineage-specific evolutionary histories and regulation of Bioinformatics 25, 2078–2079 (2009).
major starch metabolism genes during banana ripening. Front. Plant. Sci. 7,  84. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework 
1778 (2016). for analyzing next-generation DNA sequencing data. Genome Res. 20, 
 52. Martin, C. & Smith, A. M. Starch biosynthesis. Plant Cell 73,  1297–1303 (2010).
2141–2145 (1995).  85. Chen, K. et al. BreakDancer: an algorithm for high-resolution mapping of 
 53. Tiessen, A. et al. Starch synthesis in potato tubers is regulated by post- genomic structural variation. Nat. Methods 6, 677–681 (2009).
translation redox modification of ADP-glucose pyrophosphorylase: a novel  86. Mortazavi, A. et al. Mapping and quantifying mammalian transcriptomes  
regulatory mechamism linking starch synthesis to the sucrose supply.  by RNA-Seq. Nat. Methods 5, 621–628 (2008).
Plant Cell 14, 2191–2213 (2002).  87. Audic, S. & Claverie, J. M. The significance of digital gene expression profiles. 
 54. Tetlow, I. J. et al. Analysis of protein complexes in wheat amyloplasts reveals Genome Res. 7, 986–995 (1997).
functional interactions among starch biosynthetic enzymes. Plant Physiol.  88. Zhang, B. & Horvath, S. A general framework for weighted gene co-
146, 1878–1891 (2008). expression network analysis. Stat. Appl. Genet. Mol. Biol. 4, 17 (2005).
 55. Xiao, Y. Y. et al. A comprehensive investigation of starch degradation process  89. Shannon, P. et al. Cytoscape: a software environment for integrated models  
and identification of a transcriptional activator MabHLH6 during banana of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
fruit ripening. Plant Biotechnol. J. 16, 151–164 (2017).  90. Yang, J. & Bi, C. A half-grain method for analyzing the amylose content in 
 56. Miao, H. et al. Soluble starch synthase III-1 in amylopectin metabolism of rice grains and its application. Acta Agron. Sin. 18, 366–372 (1992).
banana fruit: characterization, expression, enzyme activity, and functional 
analyses. Front. Plant. Sci. 8, 454 (2017). Acknowledgements
 57. Zhang, T. et al. Sequencing of allotetraploid cotton (Gossypium hirsutum L. This work was supported by the earmarked fund for the National Key Research and 
acc. TM-1) provides a resource for fiber improvement. Nat. Biotechnol. 33, Development Programme (No. 2018YFD1000200), Modern Agro-industry Technology 
531–537 (2015). Research System (No. CARS-32), National Nonprofit Institute Research Grant of  
 58. International Wheat Genome Sequencing Consortium. A chromosome-based CATAS (No. 1630052012012), the National Natural Science Foundation of China  
draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. (No. 31872161),the Natural Science Foundation of Hainan Province (No. 2019CXTD412),  
Science 345, 1251788 (2014). the Central Public-interest Scientific Institution Basal Research Fund for CATAS (Nos. 
 59. Grover, C. E. et al. Homoeolog expression bias and expression level 1630052017018, 1630052019024, 1630052016005, 1630052016006 and 1630052017017), 
dominance in allopolyploids. New Phytol. 196, 966–971 (2012). and the CGIAR Research Programme on Roots, Tubers and Bananas.
 60. Yang, J. et al. The genome sequence of allopolyploid Brassica juncea and 
analysis of differential homoeolog gene expression influencing selection.  Author contributions
Nat. Genet. 48, 1225–1232 (2016). Z.J., B.X., X.F. and A.D.-H. conceived the experiments. Z.W., H.M., J.H.L., C.J., J.Y.W., 
 61. Bird, K. A. et al. The causes and consequences of subgenome dominance in J.Z., W.H., W.T., Y.Y., Z.D., J.Y. L., W.M., Y.X., J.S.W., M.P., A.G., J.X., Y.L., C.H., N.L., 
hybrids and recent polyploids. New Phytol. 220, 87–93 (2018). R.H., F.S. and S.R. participated in various aspects of biological sample collection, 
 62. Murray, M. G. & Thompson, W. F. Rapid isolation of high molecular weight preparation and quality control. X.Y., C.X., Z.W., L.Y., Y.L.Y., C.L., Y.G. and S.S. 
plant DNA. Nucleic Acids Res. 8, 4321–4326 (1980). sequenced the genomes. X.Y., C.X., Z.W., M.R., F.-C.B. and G.M. assembled the genomes. 
 63. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range X.Y., C.X. and Z.W. annotated the genomes. Z.W., H.M., W.H., X.Y., C.X., S.Z., Z.Y.W., 
interactions reveals folding principles of the human genome. Science 326, L.Y., Y.L.Y., C.L., Y.G., O.G. and N.Y. analysed the genomes. W.H., Z.W., H.M., C.X., B.X., 
289–293 (2009). Z.J., A.D.-H. and M.R. wrote the manuscript.
 64. Boetzer, M. et al. Scaffolding pre-assembled contigs using SSPACE. 
Bioinformatics 27, 578–579 (2010).
 65. Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12,  Competing interests
656–664 (2002). The authors declare no competing interests.
820 NATuRe PLANTS | VOL 5 | AUGUST 2019 | 810–821 | www.nature.com/natureplants
Nature PlaNts Articles
Additional information Open Access This article is licensed under a Creative Commons 
Supplementary information is available for this paper at https://doi.org/10.1038/ Attribution 4.0 International License, which permits use, sharing, 
s41477-019-0452-6. adaptation, distribution and reproduction in any medium or format, 
Reprints and permissions information is available at www.nature.com/reprints. as long as you give appropriate credit to the original author(s) and the source, provide 
Correspondence and requests for materials should be addressed to A.D., W.H. or Z.J. a link to the Creative Commons license, and indicate if changes were made. The 
images or other third party material in this article are included in the article’s Creative 
Peer review information: Nature Plants thanks M. Schranz, Robert VanBuren and the Commons license, unless indicated otherwise in a credit line to the material. If mate-
other, anonymous, reviewer(s) for their contribution to the peer review of this work. rial is not included in the article’s Creative Commons license and your intended use is 
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in not permitted by statutory regulation or exceeds the permitted use, you will need to 
published maps and institutional affiliations. obtain permission directly from the copyright holder. To view a copy of this license, 
© The Author(s), under exclusive licence to Springer Nature Limited 2019 visit http://creativecommons.org/licenses/by/4.0/.
NATuRe PLANTS | VOL 5 | AUGUST 2019 | 810–821 | www.nature.com/natureplants 821
Corresponding author(s): Wei Hu, Zhiqiang Jin and Angélique D’Hont
Last updated by author(s): Apr 29, 2019
Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency 
in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist.
Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested
A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons
A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals)
For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.
Software and code
Policy information about availability of computer code
Data collection Homolog data (homolog species genome and gene set ) was downloaded from public databases: NCBI, JGI, Uniprot, KEGG, GO and NR. 
Genome data of Musa acuminata was downloaded from http://banana-genome-hub.southgreen.fr/.  Genomes of Musa balbisiana (DH-
PKW) and nine different accessions, and transcriptome data of Baxijiao and Fenjiao were sequenced by ourselves in this research.
Data analysis We used lots of software for data analysis in this paper, and all data and software used was described in Methods section of manuscript. 
Genome assembly: wtdbg v1.2.8, SSPACE v3, Arrow, LACHESIS, HiC-Pro v2.8.1 
Evaluation of assembly quality: BUSCO v3, BLAT v35, BWA v0.712 
Repeat annotation: RepeatMasker v4.0.6, RepeatProteinMask, Repbase v21.01, Piler v1.0, RepeatScout v1.0.5, LTR-FINDER v1.0.5, 
Tandem Repeats Finder v4.09  
Gene  structure annotation: Blast v2.2.26, Augustus v3.2.1, SNAP, HISAT2 v2.0.1-beta, StringTie v1.2.1, PASA_lite, MAKER v3.31.8 
Genome annotation completeness: BUSCO v3 
Gene function annotation: BLAST v2.2.26, InterProScan v5.16 
ncRNA annotation: tRNAscan-SE v1.23, INFERNAL,  BLAST v2.2.26 
Gene family analysis: OrthoMCL v1.4, CAFE v2.1, blast v2.2.26 
Phylogentic analysis: PAML package v4.4, MrBayes v3.1.2 
Transcription factor prediction: iTAK v1.5 
Resequencing analysis: bwa 0.7.12, GATK v3.3-0, Breakdancer 
Nucleotide diversity: VCFtools v0.1.13 
Genome syntenic analysis: blast v2.2.26, MCscan v1.5.1, KaKs_Calculator v2.0 
Gene co-expression network analysis:R platform v3.2.2, WGCNA package v1.47 
RNA-Seq analysis: SOAPaligner/SOAP2 v2.21, DEGseq
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. 
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.
1
nature research  |  reporting summary October 2018
Data
Policy information about availability of data
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 
- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- A description of any restrictions on data availability
Raw sequence reads for gene assembly and gene expression for all samples were deposited in the Sequence Read Archive (SRA) of the National Center for 
Biotechnology Information (NCBI) under the BioProject (PRJNA432894). Genome assembly and annotation of DH-PKW was submitted to NCBI (PYDT00000000). A 
full data availability statement is included in the manuscript.
Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences Behavioural & social sciences  Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf
Life sciences study design
All studies must disclose on these points even when the disclosure is negative.
Sample size One double haploid (DH-PKW) sample from M. balbisiana was used for genome assembly. Nine different genotypes (AAA, ABB, AA, BB, AAB) 
of banana accessions were used for resequencing. Two cultivated varieties of BaXiJiao and FenJiao were used for transcriptomic analysis and 
forty samples were collected from different tissues and treatments including development and postharvest ripening process of fruits, and 
osmotic, salt and low temperature treatments. Two biological replicates were used for each sample.
Data exclusions N/A
Replication For gene expression profiling of BaXijiao and FenJiao, we produced RNA-seq data of fruits, roots, and leaves with two biological replicates.
Randomization N/A
Blinding N/A
Reporting for specific materials, systems and methods
We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 
Materials & experimental systems Methods
n/a Involved in the study n/a Involved in the study
Antibodies ChIP-seq
Eukaryotic cell lines Flow cytometry
Palaeontology MRI-based neuroimaging
Animals and other organisms
Human research participants
Clinical data
2
nature research  |  reporting summary October 2018