Contents lists available at ScienceDirect
Data in Brief
Data in Brief 18 (2018) 285–293https://d
2352-34
(http://c
n Corr
E-mjournal homepage: www.elsevier.com/locate/dibData ArticleGenome sequence data from 17 accessions of
Ensete ventricosum, a staple food crop for millions
in Ethiopia
Zerihun Yemataw a,b, Sadik Muzemil a, Daniel Ambachew a,
Leena Tripathi c, Kassahun Tesfaye d,e, Alemayheu Chala f,
Audrey Farbos g,h, Paul O’Neill g,h, Karen Moore g,h,
Murray Grant i, David J. Studholme g,n
a Southern Agricultural Research Institute, Areka Agricultural Research Center, P.O. Box 79, Areka, Ethiopia
b Department of Microbial, Cellular and Molecular Biology, Addis Ababa University, AddisAbaba, Ethiopia
c International Institute of Tropical Agriculture, P.O. Box 30709, Nairobi, Kenya
d Addis Ababa University, Institute of Biotechnology, P.O. Box 1176, Addis Ababa, Ethiopia
e Ethiopian Biotechnology Institute, Ministry of Science and Technology, P.O. Box 32853, Addis Ababa,
Ethiopia
f Hawassa University, Awassa College of Agriculture, P.O. Box 05, Hawassa, Ethiopia
g Biosciences, University of Exeter, Exeter EX4 4QD, United Kingdom
h Exeter Sequencing Service, University of Exeter, Exeter EX4 4QD, United Kingdom
i Life Sciences, University of Warwick, Coventry CV4 7AL, United Kingdoma r t i c l e i n f o
Article history:
Received 29 January 2018
Received in revised form
2 March 2018
Accepted 5 March 2018
Available online 11 March 2018oi.org/10.1016/j.dib.2018.03.026
09/& 2018 The Authors. Published by Else
reativecommons.org/licenses/by/4.0/).
esponding author.
ail address: d.j.studholme@exeter.ac.uk (D.a b s t r a c t
We present raw sequence reads and genome assemblies derived
from 17 accessions of the Ethiopian orphan crop plant enset
(Ensete ventricosum (Welw.) Cheesman) using the Illumina HiSeq
and MiSeq platforms. Also presented is a catalogue of single-
nucleotide polymorphisms inferred from the sequence data at an
average density of approximately one per kilobase of genomic
DNA.
& 2018 The Authors. Published by Elsevier Inc. This is an open
access article under the CC BY license
(http://creativecommons.org/licenses/by/4.0/).vier Inc. This is an open access article under the CC BY license
J. Studholme).
Z. Yemataw et al. / Data in Brief 18 (2018) 285–293286Specifications TableS
M
T
H
D
E
E
D
Dubject area Biology
ore specific sub-
ject areaGenomics of crop plantsype of data Deoxyribonucleic acid (DNA) sequence
ow data was
acquiredIllumina HiSeq. 2500; Illumina MiSeqata format Raw sequence reads; genome sequence assemblies
xperimental
factorsGenomic DNA was extracted from a selection of 15 enset cultivars and two wild
accessionsxperimental
featuresGenome sequencingata source
locationEthiopiaata accessibility Sequence data are available from the Sequence Read Archive via BioProjects
PRJNA344540 https://www.ncbi.nlm.nih.gov/bioproject/?
term=PRJNA344540, PRJNA342253 https://www.ncbi.nlm.nih.gov/bioproject/
?term=PRJNA342253, PRJNA341828 https://www.ncbi.nlm.nih.gov/biopro
ject/?term=PRJNA341828, PRJNA252658 https://www.ncbi.nlm.nih.gov/bio
project/?term=PRJNA252658Value of the data
 Here we present the first genome-wide sequence data available for enset accessions cultivated or
growing wild in Ethiopia.
 There is potential to exploit genetic diversity (e.g. large numbers of single-nucleotide poly-
morphisms) to generate markers to assist enset selection for key agronomic traits.
 Given the long lifespan of enset, patterns of genetic variation can be used to classify germplasm
and to prioritise and select germplasm for use in breeding.1. Data
The data presented here include enset genomic resequencing data, in the form of sequence reads
generated using the Illumina massively parallel deoxyribonucleic acid (DNA) sequencing platform.
Also included are draft genome assemblies, a catalogue of single-nucleotide polymorphisms (SNPs)
inferred from the sequence data, and images of agarose gels containing results of genotyping assays
for several SNPs. Enset (Ensete ventricosum (Welw.) Cheesman) is a perennial, herbaceous plant
belonging to the same botanical family as bananas and plantains, namely the Musaceae [1]. Although
it does not yield edible fruits, it is the most important cultivated staple food crop in the highlands of
central, south and southwestern Ethiopia with cultural significance [2] as well as a key role in food
security [3,4]. The main food value is in the large starch-rich corm, which can be boiled and con-
sumed in a similar manner to tubers such as potato or can be used to generate a fermented product
known as kocho [3,5–9].
Enset varieties display a great range of genetic and phenotypic variation [7,10–16] (Fig. 1) and 15
phenotypic traits have been assayed for a collection of 387 enset accessions [17]. Integration of
phenotypic measurements with genetic markers could be of great value in breeding improved vari-
eties with enhanced resistance to abiotic and biotic stresses. Despite its importance for food security
of millions in Ethiopia, enset has been relatively neglected in molecular research and few genomic
resources are available. We previously published a first draft genome sequence of E. ventricosum [18],
but the sequenced individual was obtained from the nursery trade (from the UK-based company
Jungle Seeds) and its provenance is unknown and therefore its relevance to Ethiopian agriculture is
Fig. 2. Phylogenetic positions of the enset accessions sequenced here compared to that of the previously sequenced enset
genome based on sequences of the trnF – trnT barcode voucher region of the chloroplast DNA. This locus has previously been
used as a barcode and phylogenetic indicator and sequence data for this locus are available from previously published studies
(Bekele and Shigeta, [36]; Li et al. [19]; Harrison et al. [18]). There was no sequence variation at this locus among the 17
genomes presented here, as judged by BWA alignments of raw sequence reads against trnF-trnT sequence. Thus, the branch
indicated by the black circle represents the phylogenetic position of all 17 sequenced accessions. A black triangle highlights the
position of the “Jungle Seeds” individual whose genome was previously sequenced. The Maximum Likelihood tree presented
here is based on a multiple sequence alignment of trnF-trnT sequences generated using MUSCLE (Edgar, 2004). Evolutionary
history was inferred by using the Maximum Likelihood method based on the Tamura-Nei model (Tamura and Nei [37]). The
tree with the highest log likelihood (-1249.11) is shown. The percentage of trees in which the associated taxa clustered together
is shown next to the branches. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join
and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach,
and then selecting the topology with superior log likelihood value. The tree is drawn to scale, with branch lengths measured in
the number of substitutions per site. The analysis involved 32 nucleotide sequences. All positions containing gaps and missing
data were eliminated. There were a total of 666 positions in the final dataset. Evolutionary analyses were conducted in MEGA7
(Kumar et al. [38]).
Fig. 1. Phenotypic variation among sequenced accessions of E. ventricosum. Panels A, B and C shows cultivars Mazia, Lochingie
and Nobo respectively.
Z. Yemataw et al. / Data in Brief 18 (2018) 285–293 287
Table 1
Illumina sequencing of E. ventricosum accessions. Pairs of 100-bp reads were generated using the Illumina HiSeq. 2500 in
normal mode except where indicated. A single asterisk (*) indicates use of the Illumina HiSeq. 2500 in rapid-run mode to
generate pairs of 300-bp reads and two asterisks (**) indicate use of the Illumina MiSeq to generate pairs of 300-bp reads.
SARI ID Name Collected from Depth of coverage of genome SRA accession numbers
362 Arkiya Dawro 7.36× SRR4304969, SRR4304970
455 Arkiya Wolaita 8.04× SRR4304981*, SRR4304987
112 Astara Sidama 15.64× SRR4304989
n/a Bedadeti Unknown 45.81× SRR1515268, SRR1515269**
406 Buffero West Arsi 18.25× SRR4304990
435 Derea Gurage 18.43× SRR4308285, SRR4308286
451 Erpha 13 Dawro 9.21× SRR4304991*, SRR4304992
449 Erpha 20 Dawro 9.43× SRR4304971, SRR4304993*
221 Lochingie Dawro 8.86× SRR4304972*, SRR4304973
253 Lochingie Wolaita 8.66× SRR4304974*, SRR4304975
208 Mazia Wolaita 7.00× SRR4304976*, SRR4304977
429 Mazia Dawro 8.24× SRR4304978*, SRR4304979
39 Nechuwe Gurage 20.69× SRR4304982
49 Nobo Sheka 17.16× SRR4304983
170 Onjamo Kembata-Tembaro 21.75× SRR4308284
183 Siyuti Wolaita 16.54× SRR4304984
54 Yako Kaffa 17.96× SRR4304985
Table 2
Assembly statistics for E. ventricosum genomes.
GenBank accession
number
Enset
accession
Total length
(bp)
Contig
N50 (bp)
Scaffold
N50 (bp)
GCA_000818735.2 Bedadeti 451,284,018 20,943 21,097
GCA_001884805.1 Derea (435) 429,479,738 10,278 n.d.
GCA_001884845.1 Onjamo (170) 444,841,970 15,546 16,208
Z. Yemataw et al. / Data in Brief 18 (2018) 285–293288uncertain. Its phylogenetic relationship with Ethiopian varieties is rather distant (Fig. 2), clustering
much more closely with E. ventricosum e4 (GenBank: FJ428156.1) [19], whose provenance is also
unknown. In contrast, the data presented here originate from enset accessions collected in Ethiopia.
Most of these enset accessions are sourced from the germplasm collection of the Southern Agri-
cultural Research Institute (SARI), with the exception of Bedadeti, which originated from the collec-
tion of the International Institute for Tropical Agriculture (IITA). The data presented here complement
previously published genomic resequencing data from Ensete species: targetted sequencing of repeats
in Ensete gilletii [20] and E. ventricosum variety Gena [21] and exon sequencing of Ensete superbum and
E. ventricosum [22].2. Experimental design, materials and methods
Genomic DNA was extracted from the young emerging (cigar) leaves using a previously published
mini-prep protocol [23]. Between 0.2 and 0.5 g of young and clean leaf was collected per plant and
dried in silica gel. From these dried leaves 0.2 g was taken from each sample and ground with sterile
pestle and mortar. Genomic DNA was isolated from about 0.2 g of pulverized leaf sample using a
modified triple cetyltrimethyl ammonium bromide (CTAB) extraction technique [24]. The yield and
quality of DNA were assessed by agarose gel electrophoresis and by a NanoDrop spectrophotometer
(NanoDrop Technologies, Wilmington, Delaware) and quantified by Qubit broad range assay (Thermo
Fisher Scientific). Illumina sequencing libraries were prepared, after fragmenting 500 ng of DNA to an
average size of 500 bp, using Nextflex Rapid DNAseq kit for Illumina sequencing (Bioo Scientific) with
Fig. 3. Overview of genetic variation in the sequenced E. ventricosum genomes. Each column in the heat-map represents one of
20,000 single-nucleotide variant sites. Each row represents one of the sequenced genomes. Colour indicates the relative fre-
quency of aligned sequence reads with the variant nucleotide at that site in that genome, on a yellow-orange-red palette. Thus,
heterozygous sites would be expected to be orange, while homozygous sites would be yellow (same as Bedadeti reference
genome sequence) or red (variant from the Bedadeti reference genome sequence). These frequency values were inferred from
mpileup-formatted files, generated by aligning genomic sequence reads against the Bedadeti reference genome sequence. The
Perl script used to extract these from the mpileup files is included in the Supplementary Material.
Z. Yemataw et al. / Data in Brief 18 (2018) 285–293 289adapters containing indexes and 5–8 cycles polymerase chain reaction (PCR) [25]. Library quality was
determined using D1000 screen-tapes (Agilent) and libraries were either sequenced individually or
combined in equimolar pools.
We sequenced the enset genomic DNA using a combination of Illumina [26,27] MiSeq and/or
Illumina HiSeq. 2500 in either normal or rapid-run modes, as detailed in Table 1. The 17 sequenced
accessions included 15 distinct named varieties. We sequenced two different accessions for cultivar
Mazia and two different accessions for cultivar Lochingie (a result of complex vernacular naming
systems for enset landraces arising from multiple ethno-linguistic communities); one accession was
sequenced for each of the other varieties. Raw sequence reads were submitted to the Sequence Read
Archive (SRA) [28] under the accession numbers listed in Table 1.
Prior to further analysis, sequence reads were trimmed and filtered using TrimGalore with options
“-q 30 –paired”. We performed de novo sequence assembly for sequence reads from Bedadeti, Derea
and Onjamo (Table 2). For Bedadeti, we used St. Petersburg genome assembler (SPAdes) v. 3.6.1 [29] to
assemble contigs and then scaffolded these using Short Sequence Assembly by progressive K-mer
search and 3′ read Extension (SSAKE)-based Scaffolding of Pre-Assembled Contigs after Extension
(SSPACE) v. 3.0 [30]. For Onjamo, we generated contigs and scaffolds using SPAdes v. 3.9.0 and for
Derea generated contigs only using SPAdes v. 3.9.0. SPAdes assemblies were performed using the
“–careful” option.
We identified single-nucleotide polymorphisms by alignment against the reference genome
sequence, according to the following procedure. After trimming and filtering with TrimGalore,
sequence reads were aligned against the Bedadeti reference genome sequence (GenBank:
GCA_000818735.2) using Burrows-Wheeler Aligner (BWA) mem [31,32] version 0.7.15-r1140 with
default options and parameter values.
Candidate SNVs were identified using Sequence Alignment/Map tools (SAMtools)/binary call for-
mat tools (BCFtools) package [33], version 1.6, using the following command-lines:
samtools mpileup -u -f genome.fasta alignment.bam 4 alignment.bcf and.
bcftools call -m -v –Ov alignment.bcf 4 alignment.vcf
The candidate variants were then filtered using the following command line:
bcftools filter –SnpGap 100 –include ’(REF¼"A" | REF¼"C" | REF¼"G" | REF¼"T") &
%QUAL4¼35 & MIN(IDV)4¼2 & MIN(DP)4¼5 & INDEL¼0’ alignment.vcf 4 align-
ment.filtered.vcf
Table 3
Oligonucleotide primers for PCR-RFLP genotyping assays.
No. Forward and reverse primer sequences PCR product size (bp) Restric-
tion
enzyme
Genomic coordinates of PCR
target (GenBank accession
number: start-end)
Corresponding location in banana
genome
1 TAGACTGCCAAGAGACTGCC, GAGTTTGTTCTCCACTTGCTG 395 EcoRV JTFG02000023: 86778–87172 Chromosome 9
2 CAATGAAATGAGCTCTCGAATGA, CCTCCCTCCCTCTACACAAG 453 ClaI JTFG02000451: 2383–2835 Chromosome 3
3 AGCTGCCTACTTATGTGCCA, AGGATGGGAGGATTTCACTCA 296 ClaI JTFG02001079: 44094–44389 No match
4 GAAAGATTCAACCACGCAACA, CAAAGTTGCCCAAATAATAGGGG 100 HindIII JTFG02001701: 16598–16697 Chromosome 9
5 ACGTAGGAAACAGAAGGCGT, AGAATGAAAACCGGACAGATGA 400 BglII JTFG02004430: 21696–22095 Chromosome 10
6 GACCAAGGTTGCAACGATGT, AACTCCCTAAAGTGGACCCG 296 HindIII JTFG02004708: 2865–3160 No match
7 TGCCAATTGTAGCACGCTTT, TCCCAATGATCAGGATGTCATC 321 BglII JTFG02007725: 4758–5078 Chromosome 4
8 AGCTGATCGGTAGGCTGTTT, TGTTCACTTGCTCAACTTCAATG 329 EcoRV JTFG02008123: 5568–5896 Chromosome 4
9 CGAAGGAACAAGAGGACGT, CGGCATGAACTAACCGCTTA 380 BglII JTFG02010045: 2436–2815 No match
10 AGAGTAGAGGTCAGCGCATC, AGGCGAGTGACTAAAGTGCT 385 HindIII JTFG02015245: 4512–4896 No match
11 GTCATGTAGAATTCAAAAGCCCA, ACCCATGACCAAGACTTTTCT 458 ClaI JTFG02000797: 35394–35851 Chromosome 10
12 GCAGAATCCCGTGAACCATC, TGTAAGTTTCTTCTCCTCCGCT 377 BglII JTFG02001387: 44650–45026 Chromosome 10
13 TGCTTTAACCTAGTGAGCTACAA, ACGTCGCCCTTTTACTTTTCT 400 BamHI JTFG02001793: 29736–30135 Chromosome 7
14 GCCCATGCCATTCTTAAGGA, TCCAATTCCATCCTTCTTCATCT 398 BglII JTFG02003127: 17456–17853 Matches multiple chromosomes
15 ACTACACAATCCTGGTCCAAAA, CGTAGTTTCCGCCCTTTGAG 113 EcoRV JTFG02004277: 15220–15332 Chromosome 5
16 CCTGGTTGAGAATGCGGATG, CGACCAATTACACTAAGCCCA 419 BglII JTFG02006088: 4069–4489 Matches several chromosomes
17 TCCAGCCCAACAATTGATTCTT, CTGAACCTCGGCCAACCT 400 ClaI JTFG02006206: 13985–14384 Matches several chromosomes
18 TGCCAACCGAACCTCTCAG, TCAGCCATCTACGACATTTACA 400 PstI JTFG02010369: 10275–10674 No match
19 TGCTTACTGACTATGGAGAGCT, TGCCTGTTTGAGTCCATATAAGT 487 BamHI JTFG02011833: 6273–6759 Matches several chromosomes
20 CTCGTTAAGGTTCCCCATGC, CCAGCGTGGGAGATCTTTTG 452 EcoRV JTFG02024842: 425–876 No match
21 CGAGGGCTTCATCGAAAAGG, GCTGCCGACGAGTTGTTC 391 BamHI JTFG02043259: 629–1019 No match
22 CGATCGTTACGTTGCTTCAG, GGAGCCACAACCAACCAATT 446 PstI JTFG02009519: 11979–12424 No match
Z.Yem
ataw
et
al./
D
ata
in
Brief
18
(2018)
285
–293
290
Z. Yemataw et al. / Data in Brief 18 (2018) 285–293 291This filtering step eliminates indels with low-confidence single-nucleotide variant calls. It also
eliminates candidate SNVs within 10 base pairs of an indel, since alignment artefacts are relatively
common in the close vicinity of indels.
Allele frequencies at each SNP site were estimated from frequencies of each base (adenine (A),
cytosine (C), guanine (G) or thymine (T)) among the aligned reads. Thus, we would expect an allele
frequency of close to zero or one for homozygous sites and approximately 0.5 for heterozygous sites
in a diploid genome. The binary alignment/map (BAM)-formatted BWA-mem alignments were con-
verted to pileup format using the samtools mpileup command in SAMtools [33] version 1.6 with
default options and parameter values. From the resulting pileup files, we used a custom Perl script
(included in Supplementary material) to detect SNPs. For SNP detection, we considered only sites
where depth of coverage by aligned reads was at least 5× for all 17 datasets. The distribution of a
random sample of variants across the 17 accessions is summarized in Fig. 3.
The identification of relatively high-confidence SNPs, distributed throughout the genome at a
density of approximately one SNP per kilobase, provides the possibility to develop markers that could
be used for genotyping large numbers of plant accessions without the need for large-scale sequen-
cing. One straightforward approach is polymerase chain reaction restriction fragment digest poly-
morphism (PCR-RFLP) [34]. Another is co-dominant amplified polymorphism (CAPS) [35]. In the PCR-
RFLP assay, oligonucleotide primers are designed to amplify a PCR product that flanks a SNP that falls
within the recognition site for a restriction enzyme such that one variant is cleavable by the
restriction enzyme whilst the other variant is not. Thus, by examining the pattern of bands in agarose
electrophoresis of the product after restriction digestion, it is possible to assess the genotype at that
SNP location. As a proof of principle, we designed 22 pairs of oligonucleotide primers targeting SNPs
identified from the genome sequencing data; these are listed in Table 3. We applied 5 of these assays
to several hundred E. ventricosum accessions; agarose gels showing the products of digesting the PCR
products can be found in the Supplementary material.Acknowledgements
The authors are grateful to Satish Kulasakaran, John Sidda and Joana Furtardo at the University of
Warwick for assistance with PCR-RFLP assays and to James Harrison at the University of Exeter for
assistance with handling Bedadeti genomic DNA. Zerihun Yemataw was supported by the McKnight
Foundation. Murray Grant was supported by the Biotechnology and Biological Sciences Research
Council (BBSRC) BBSRC IAA award BB/GCRF-IAA/22. DNA sequencing was performed using the Exeter
Sequencing Service and Computational core facilities at the University of Exeter, which are supported
by a Medical Research Council Clinical Infrastructure award (MR/M008924/1), a Wellcome Trust
Institutional Strategic Support Fund (WT097835MF), a Wellcome Trust Multi User Equipment Award
(WT101650MA) and a BBSRC LOLA award (BB/K003240/1). David Studholme is supported by The
European Community Horizon 2020 grant Project ID 727624, “Microbial uptakes for sustainable
management of major banana pests and diseases (MUSA)”.Transparency document. Supporting information
Supplementary data associated with this article can be found in the online version at doi:10.1016/j.
dib.2018.03.026.Appendix A. Supplementary material
Supplementary data associated with this article can be found in the online version at doi:10.1016/j.
dib.2018.03.026.
Z. Yemataw et al. / Data in Brief 18 (2018) 285–293292References
[1] E. Cheesman, Classification of the bananas: the Genus Ensete Horan, Kew Bull. 2 (1947) 97–106.
[2] Y. Tsehaye, F. Kebebew, Diversity and cultural use of enset (Enset ventricosum (Welw.) Cheesman) in Bonga in-situ Con-
servation Site, Ethiopia, Ethnobot. Res. Appl. 4 (2006) 147. http://dx.doi.org/10.17348/era.4.0.147-158.
[3] S.A. Brandt, A. Spring, C. Hiebsch, J.T. McCabe, E. Tabogie, M. Diro, G. Wolde-Michael, G. Yntiso, M. Shigeta, S. Tesfaye, The
“Tree Against Hunger” Enset-based agricultural systems in Ethiopia, Am. Assoc. Adv. Sci. (1997) 〈http://www.aaas.org/
international/africa/enset/〉.
[4] A. Negash, A. Niehof, The significance of enset culture and biodiversity for rural household food and livelihood security in
southwestern Ethiopia, Agric. Human. Values 21 (2004) 61–71. http://dx.doi.org/10.1023/B:AHUM.0000014023.30611.ad.
[5] M.T. Yirmaga, Improving the indigenous processing of kocho, an Ethiopian traditional fermented food, J. Nutr. Food Sci. 3
(2013) 1–6. http://dx.doi.org/10.4172/2155-9600.1000182.
[6] A. Bosha, A.L. Dalbato, T. Tana, W. Mohammed, B. Tesfaye, L.M. Karlsson, Nutritional and chemical properties of fermented
food of wild and cultivated genotypes of enset (Ensete ventricosum), Food Res. Int. 89 (2016) 806–811. http://dx.doi.org/
10.1016/j.foodres.2016.10.016.
[7] Tobiaw, Analysis of genetic diversity among cultivated enset (Ensete ventricosum) populations from Essera and Kefficho,
southwestern part of Ethiopia using inter simple sequence repeats (ISSRs) marker, Afr. J. Biotechnol. 10 (2011)
15697–15709. http://dx.doi.org/10.5897/AJB11.885.
[8] L.T.J. Pijls, A.A.M. Timmer, Z. Wolde-Gebriel, C.E. West, C.E. Pijls, T., J. Ainoid, A.M. timmer, Zewdie Wolde-Gwbriel, Culti-
vation, preparation and consumption of ensete (Ensete ventricosum) in Ethiopia, J. Sci. Food Agric. 67 (1995) 1–11. http:
//dx.doi.org/10.1002/jsfa.2740670102.
[9] T. Bezuxeh, A. Feleke, The production and utilization of the Genus Ensete in Ethiopia, Econ. Bot. (1966) 〈http://www.
springerlink.com/index/k96280651m27x672.pdf〉 (Accessed 30 July 2013).
[10] B. Tesfaye, P. Lüdders, Diversity and distribution patterns of enset landraces in Sidama, Southern Ethiopia, Genet. Resour.
Crop Evol. (2003) 359–371. http://dx.doi.org/10.1023/A:1023918919227 〈http://link.springer.com/article/〉 (Accessed 12
July 2013).
[11] G. Birmeta, H. Nybom, E. Bekele, RAPD analysis of genetic diversity among clones of the Ethiopian crop plant Ensete
ventricosum, Euphytica. 124 (2002) 315–325. http://dx.doi.org/10.1023/A:1015733723349 〈http://link.springer.com/arti
cle/〉 (Accessed 19 October 2013).
[12] G. Birmeta, H. Nybom, E. Bekele, Distinction between wild and cultivated enset (Ensete ventricosum) gene pools in
Ethiopia using RAPD markers, Hereditas. 140 (2004) 139–148. http://dx.doi.org/10.1111/j.1601-5223.2004.01792.x.
[13] B. Tesfaye, On Sidama folk identification, naming, and classification of cultivated enset (Ensete ventricosum) varieties,
Genet. Resour. Crop Evol. 55 (2008) 1359–1370. http://dx.doi.org/10.1007/s10722-008-9334-x.
[14] Z. Yemataw, H. Mohamed, M. Diro, T. Addis, G. Blomme, Genetic Variability, Inter-Relationships and Path Analysis in Enset
( Ensete ventricosum) Clones, 2012.
[15] Z. Yemataw, H. Mohamed, M. Diro, T. Addis, G. Blomme, Ethnic-based diversity and distribution of enset (Ensete ven-
tricosum) clones in southern Ethiopia, J. Ecol. Nat. Environ. 6 (2014) 244–251. http://dx.doi.org/10.5897/JENE2014.0450.
[16] K. Zippel, Diversity Over Time and Space in Enset Landraces (Ensete Ventricosum) in Ethiopia, in: African Biodivers,
Springer, US, Boston, MA (2005) 423–438.
[17] Z. Yemataw, A. Chala, D. Ambachew, D.J. Studholme, M. Grant, K. Tesfaye, Morphological variation and inter-relationships
of quantitative traits in enset (Ensete ventricosum (Welw.) Cheesman) germplasm from south and south-western
Ethiopia, Plants. 6 (2017) 56. http://dx.doi.org/10.3390/plants6040056.
[18] J. Harrison, K. Moore, K. Paszkiewicz, T. Jones, M. Grant, D. Ambacheew, S. Muzemil, D. Studholme, A draft genome
sequence for Ensete ventricosum, the drought-tolerant “Tree Against Hunger”, Agronomy 4 (2014) 13–33. http://dx.doi.
org/10.3390/agronomy4010013.
[19] L.-F. Li, M. Häkkinen, Y.-M. Yuan, G. Hao, X.-J. Ge, Molecular phylogeny and systematics of the banana family (Musaceae)
inferred from multiple nuclear and chloroplast DNA fragments, with a special reference to the genus Musa, Mol. Phylo-
genet. Evol. 57 (2010) 1–10. http://dx.doi.org/10.1016/j.ympev.2010.06.021.
[20] P. Novák, E. Hřibová, P. Neumann, A. Koblížková, J. Doležel, J. Macas, Genome-wide analysis of repeat diversity across the
family Musaceae, PLoS One. 9 (2014) e98918. http://dx.doi.org/10.1371/journal.pone.0098918.
[21] T.M. Olango, B. Tesfaye, M.A. Pagnotta, M.E. Pè, M. Catellani, Development of SSR markers and genetic diversity analysis in
enset (Ensete ventricosum (Welw.) Cheesman), an orphan food security crop from Southern Ethiopia, BMC Genet. 16
(2015) 98. http://dx.doi.org/10.1186/s12863-015-0250-8.
[22] C. Sass, W.J.D. Iles, C.F. Barrett, S.Y. Smith, C.D. Specht, Revisiting the Zingiberales: using multiplexed exon capture to
resolve ancient and recent phylogenetic splits in a charismatic plant lineage, Peer J. 4 (2016) e1584. http://dx.doi.org/
10.7717/peerj.1584.
[23] J. Doyle, J. Doyle, Isolation of plant DNA from fresh tissue, Focus (Madison) 12 (1990) 13–15.
[24] T. Borsch, K.W. Hilu, D. Quandt, V. Wilde, C. Neinhuis, W. Barthlott, Noncoding plastid trnT-trnF sequences reveal a well
resolved phylogeny of basal angiosperms, J. Evol. Biol. 16 (2003) 558–576. http://dx.doi.org/10.1046/
j.1420-9101.2003.00577.x.
[25] S.R. Head, H.K. Komori, S.A. LaMere, T. Whisenant, F. Van Nieuwerburgh, D.R. Salomon, P. Ordoukhanian, Library con-
struction for next-generation sequencing: overviews and challenges, Biotechniques. (2014), http://dx.doi.org/10.2144/
000114133.
[26] R. a Holt, S.J.M. Jones, The new paradigm of flow cell sequencing, Genome Res. 18 (2008) 839–846. http://dx.doi.org/
10.1101/gr.073262.107.
[27] E.R. Mardis, Next-generation sequencing platforms, Annu. Rev. Anal. Chem. 6 (2013) 287–303. http://dx.doi.org/10.1146/
annurev-anchem-062012-092628.
[28] R. Leinonen, H. Sugawara, M. Shumway, The sequence read archive, Nucleic Acids Res. 39 (2011) D19–D21. http://dx.doi.
org/10.1093/nar/gkq1019.
Z. Yemataw et al. / Data in Brief 18 (2018) 285–293 293[29] A. Bankevich, S. Nurk, D. Antipov, A. a Gurevich, M. Dvorkin, A.S. Kulikov, V.M. Lesin, S.I. Nikolenko, S. Pham, A.
D. Prjibelski, A.V. Pyshkin, A.V. Sirotkin, N. Vyahhi, G. Tesler, M.A. Alekseyev, P.A. Pevzner, SPAdes: a new genome assembly
algorithm and its applications to single-cell sequencing, J. Comput. Biol. 19 (2012) 455–477. http://dx.doi.org/10.1089/
cmb.2012.0021.
[30] M. Boetzer, C.V. Henkel, H.J. Jansen, D. Butler, W. Pirovano, Scaffolding pre-assembled contigs using SSPACE, Bioinfor-
matics. 27 (2011) 578–579. http://dx.doi.org/10.1093/bioinformatics/btq683.
[31] H. Li, Aligning Sequence Reads, Clone Sequences and Assembly Contigs with BWA-MEM3. 〈http://arxiv.org/abs/1303.3997〉
(Accessed 20 July 2014), 2013.
[32] H. Li, R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler transform, Bioinformatics 25 (2009)
1754–1760. http://dx.doi.org/10.1093/bioinformatics/btp324.
[33] H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, 1000 genome project data
processing subgroup, the sequence alignment/map format and SAMtools, Bioinformatics 25 (2009) 2078–2079. http://dx.
doi.org/10.1093/bioinformatics/btp352.
[34] C. Pourzand, P. Cerutti, Genotypic mutation analysis by RFLP/PCR, Mutat. Res. 288 (1993) 113–121 〈http://www.ncbi.nlm.
nih.gov/pubmed/7686255〉.
[35] A. Konieczny, F.M. Ausubel, A procedure for mapping Arabidopsis mutations using co-dominant ecotype-specific PCR-
based markers, Plant J. 4 (1993) 403–410. http://dx.doi.org/10.1046/j.1365-313X.1993.04020403.x.
[36] E. Bekele, M. Shigeta, Genet. Resour., Crop Evol. 58 (2011) 259.
[37] K. Tamura, M. Nei, Estimation of the number of nucleotide substitutions in thecontrol region of mitochondrial DNA in
humans and chimpanzees, Mol. Biol.Evol., 10 (1993) 512–526.
[38] S. Kumar, G. Stecher, K. Tamura, MEGA7: Molecular Evolutionary Genetics AnalysisVersion 7.0 for Bigger Datasets, Mol.
Biol. Evol., 33 (2016) 1870–1874 http://dx.doi.org/10.1093/molbev/msw054.