Publisher: AGRONOMY; Journal: AGROJNL:Agronomy Journal; Copyright: Will notify... 
Volume: Will notify...; Issue: Will notify...; Manuscript: aj-2017-08-0123-a; DOI: ; PII: <txtPII> 
TOC Head: ; Section Head: ; Article Type: ARTICLE 
This article has been accepted for publication and undergone full peer review but has not been 
through the copyediting, typesetting, pagination and proofreading process, which may lead to 
differences between this version and the Version of Record. Please cite this article as doi: 
10.1002/csc2.20003. 
This article is protected by copyright. All rights reserved. 
Page 1 of 42 
 Running Title 
Candidates for training populations and implication on accuracy 
 
Title 
Genomic prediction and QTL discovery in a cassava training population constructed from multiple 
breeding stages 
 
Authors 
Mohamed Somo
1*
 
Heneriko Kulembeka
2
 
Kiddo Mtunda
2
 
Emmanuel Mrema
2
  
Kasele Salum
2 
Marnin D. Wolfe
1
 
Ismail Y. Rabbi
3
 
Chiedozie Egesi
3
 
Robert Kawuki
4
 
 This article is protected by copyright. All rights reserved. 
Alfred Ozimati
4
 
Roberto Lozano
1
 
Jean-Luc Jannink
1, 5 
 
 
Affiliations 
(1) Dept. of Plant Breeding and Genetics, Cornell University., Ithaca, NY, USA 
(2) Tanzania Agricultural Research Institute, Kibaha and Mwanza, Tanzania 
(3) International Institute for Tropical Agriculture, Ibadan, Oyo, Nigeria  
(4) National Crops Resources Research Institute, Namulonge, Uganda 
(5) USDA--ARS, Ithaca, NY, USA 
 
Abbreviations 
Please list abbreviations in alphabetical order with the abbreviation first, separated from its 
definition by a comma. Please use semicolons to distinguish separate abbreviations. 
BLUPs, Best Linear Unbiased Predictors  
GBS, Genotyping-by-Sequencing  
GEBVs, Genomic Estimated Breeding Values  
GS, Genomic Selection 
 This article is protected by copyright. All rights reserved. 
MAF, Minor Allele Frequency  
MAP, Months After Planting  
PCA, Principal Component Analysis 
SNP, Single Nucleotide Polymorphism  
TP, Training Population 
 This article is protected by copyright. All rights reserved. 
Abstract 
Assembly of training population (TP) is an important component of effective genomic 
selection (GS) based breeding program. In this study, we examined the power of diverse 
germplasm assembled from two breeding programs in Tanzania, at different breeding stages 
to predict traits and discover QTL. This is the first GS and Genome-wide association study 
(GWAS) analysis on Tanzanian cassava data. We detected QTL associated with resistance to 
Cassava mosaic disease (CMD) on chromosomes 12 and 16, QTL conferring resistance to 
Cassava brown streak disease (CBSD) on chromosomes 9 and 11, and QTL on chromosomes 
2, 3, 8, and 10 associated with resistance to CBSD for root necrosis (CBSDR). We also 
detected a QTL on chromosome 4 and two QTL on chromosome 12 conferring dual 
resistance to CMD and CBSD. Using clones in the same breeding stage to construct TP 
provided higher trait prediction accuracies than using TP with mixture of clones from 
multiple breeding stages. Moreover, clones in the early breeding stage provided more reliable 
trait prediction accuracies and therefore are better candidates for constructing TP. Although 
larger training population sizes have been associated with improved accuracies, in this study, 
adding clones from Kibaha location to Ukiriguru and vice versa did not improve prediction 
accuracy of either population. Similarly including Uganda TP in either population did not 
improve trait prediction accuracies. This study applied genomic prediction to understand the 
implication of constituting TP using clones at different breeding stages and pooled from 
different locations on trait accuracies in cassava. 
  
 This article is protected by copyright. All rights reserved. 
Introduction 
Cassava is an important source of dietary calories for millions of people in tropical regions 
(Howeler et al., 2013; Salvador et al., 2014). Because of cassava‟s adaptation and resiliency 
to dry and marginal environments, it is likely to continue to be a lead contributor to food 
security for many. Traditionally, cassava breeders have used phenotypes for selection, but 
this strategy has not always generated adequate genetic gain for yield especially in Africa 
(Ceballos et al., 2012). Similarly, the use of marker assisted selection (MAS) on complex 
traits in cassava has remained largely ineffective because of the multiple minor loci which are 
difficult to detect and deploy (Ceballos et al., 2012).  
 
Genomic selection (GS) allows superior plants to be selected earlier on at seedling stage 
before phenotyping is started, and then used for crossing (Heffner et al., 2009; Lorenzana and 
Bernardo, 2009; Jannink et al., 2010).  In principle, adoption of GS is expected to increase 
the rate of genetic gain and also reduce selection cycle time. For the case of clonal plants, the 
added advantage is that once elite clones are identified, they are genetically fixed, and thus 
can immediately be advanced to downstream evaluation for faster variety replacement. 
Recent implementation of GS in three breeding programs in Africa have improved trait 
prediction accuracies. Wolfe et al. (2017) reported a prediction accuracy increase of 57% for 
CMD in Nigerian cassava population. Similarly, Kayondo et al. (2017) reported improved 
prediction accuracy of 0.42 for both CBSD foliar severity and root severity in two Ugandan 
cassava populations.   
 
 This article is protected by copyright. All rights reserved. 
In principle, successful implementation of GS program hinges on several critical factors, one 
of which is how TP is constituted. Mindful of breeding objectives, breeders can select 
individuals from available diverse germplasm, improved breeding lines, within and across 
breeding stages to construct TPs. They could also compose them using clones from different 
breeding programs or even clones from different countries. Most early crop GS studies used 
genotypes from the same breeding stages to form TPs. These individuals were either tested in 
replicated trials in multiple environments or were part of larger historical phenotypic data 
(Dawson et al., 2013; Storlie and Charmet. 2013; Ly et al., 2013; Rutkoski et al., 2015).  
 
Often breeding programs transitioning to GS-based approach may not have access to 
sufficient and diverse genetic pool and adequate historical data to choose TP candidates from. 
For the case of clonal crops, the inadequacy of planting materials of potential candidates 
could further reduce the number of genotypes available for selection. In GS breeding, large 
TP size is generally preferred because it is associated with better trait prediction accuracies 
(Zhong et al., 2009; Cericola et al., 2017). 
Recently, The Tanzania Agriculture Research Institute (TARI) transitioned to GS-based 
cassava breeding pipeline. Traditionally, TARI has maintained two distinct cassava breeding 
programs at Chambezi, Kibaha (KIB) and Ukiriguru (UKG), Mwanza, serving Coastal and 
Lake Zone parts of the country, respectively. The two breeding programs evolved 
independently due to restrictions on germplasm movement between regions to curb the 
spread of cassava brown streak disease (CBSD). Although the breeding objectives are quite 
similar between the two programs, cassava breeders at KIB largely selected for fresh root 
yield while those at UKG focused mainly on dry matter content. This dichotomy of trait 
 This article is protected by copyright. All rights reserved. 
selection was largely driven by the consumption pattern of the local communities in the 
respective regions.   
 
The TARI TPs were constructed using clones from different trial types. The KIB trials 
included clones at preliminary yield trial (PYT) and advanced yield trial (AYT) selection 
stages while those at UKG consisted of clones at clonal evaluation trial (CET) and PYT. In a 
breeding pipeline, CET genotypes are minimally selected in seedling nurseries before 
cloning, whereas PYT and AYT would have undergone one and two specific selection 
rounds, respectively. In the present study, each breeding cycle represented several trials 
(Table 1). Each trial had different plot sizes, replications, and different number of plants per 
plot. Additionally, there were no common checks for the trials across the two programs. 
Because there were few clones in each trial to form sizeable TP, clones from different trial 
types, breeding stages, and locations were pooled together to form TPs. The Uganda (UG) 
training population was added to the clones from the TARI two programs to evaluate whether 
their inclusion could improve trait prediction accuracies. There is limited knowledge on the 
implication of using cassava clones from multiple trials, multiple locations, and clones from 
multiple breeding stages to construct TP on trait predictions. This study was therefore 
designed to: (1) examine which combination of breeding materials could provide the best 
training set, (2) evaluate trait prediction accuracies for within and across breeding stages, 
locations, and combined TARI data sets, (3) assess the impact of adding Ugandan clones to 
clones of either program on trait predictions, and (4) identify loci associated with disease 
resistance particularly for CMD, CBSD, and CBSDRS in Tanzanian cassava. The findings of 
this study will guide breeders in selecting the most suitable candidates as well as the best 
source of TP candidates when initiating GS-assisted breeding. We also expect the markers 
identified in this study will be valuable addition for MAS breeding in cassava. 
 This article is protected by copyright. All rights reserved. 
 
Materials And Methods 
Germplasm and field design  
A diverse panel of breeding lines from two independent breeding programs were used to construct the 
training population. The panel comprised of 432 and 408 clones from Kibaha and Ukiriguru breeding 
programs, respectively. Kibaha clones were at PYT (PYTKIB) and AYT (AYTKIB) breeding stages 
while those from Ukiriguru were at CET (CETUKG) and PYT (PYTUKG). The KIB clones were 
drawn from nine PYTs and six AYTs while UKG clones consisted of two CETs and a PYT. Clones in 
each trial type arose from different progenitors and as such each trial in both programs had a unique 
set of individuals, varied number of replicates, and different plot sizes (Table 1). An additional 402 
clones from Uganda (UG) training population previously described by Wolfe et al. (2016) were 
included in the study.  
For each trial set, clones were evaluated in a randomized complete block design during the 2016/2017 
cropping season at Kibaha (lat/lng: 6.8138° S, 38.6949° E, Alt: 156 m above sea level) and Ukiriguru 
(lat/lng: 2° 43' 0" S, 33° 1' 0" E, Alt: 1194 m above sea level). Kibaha PYT trials were laid as a 2-row 
plot with 14 plants while AYT trials were 42 plants planted in 6-row plots. The CET plots at UKG 
consisted of a single-row of 5 plants while the PYT were 2 row plots with 10 plants. At both 
locations, the clones were spaced at 1 m x 1m. With exception of two AYT trials which were 
replicated three times, the rest of the trials in both locations were replicated twice (Table 1). For all 
KIB trials, Kiroba, Mfaransa, and Mkuranga1 were used as common checks while Mkombozi, 
Lwakitangaza, and Liongokwimba were the checks in UKG trials. For both locations, standard 
agronomic practices under rainfed cropping were applied. All the field management and phenotyping 
took place between March 2016 and May 2017. 
 This article is protected by copyright. All rights reserved. 
 
Phenotyping  
Nine traits were evaluated for this study. Cassava mosaic disease severity (CMDS), foliar cassava 
brown streak disease severity (CBSDS), and cassava green mite severity (CGMS) were scored at 3, 6, 
and 9 months after planting (MAP) on a scale of 1 (No symptoms) to 5 (severe symptoms). For 
analysis we used the season wide mean severity (MCMDS, MCBSDS, and MCGMS) for 3, 6, and 9 
MAP. For root necrosis severity (CBSDRS), necrotic symptoms on a scale of 1-5 were scored 12 
MAP using cross sections of root (Hillocks and Thresh, 2000), where 1 = no necrosis and 5 is > 25% 
necrotic and severe root constriction. We then used average disease scores for each experimental plot 
for analysis. 
The root number (RTNO) are the number of fresh roots harvested per plot and used to calculate 
RTNO per hectare. Root weight (RTWT) and shoot weight (SHTWT) were both measured in kg per 
plot and then used to calculate below and above ground yield in tons per hectare, respectively. 
Harvest index (HI) was the ratio between RTWT and total biomass (RTWT+SHTWT) per unit area. 
Dry matter (DM) was expressed as a percentage of fresh root (FYLD) weight using the specific 
gravity method. The specific gravity of each sample (of the whole root, including the peel) was 
determined from the weights in air and in water (Kawano et al., 1987). The experimentation and 
phenotypic data capture of UG population is as described previously (Wolfe et al., 2016; Ozimati et 
al. 2018; Kayondo et al., 2018). Eight variables were common to both UG and TARI populations 
except MCGMS which is specific to TARI. 
 
Genotyping  
SNP marker genotypes were obtained using genotyping-by-sequencing (GBS; Elshire et al., 2011). 
The markers were called using the TASSEL 5.0 GBS pipeline version 2 (Glaubitz et al., 2014) after 
aligning the resulting reads to the Manihot esculenta Version 6 assembly available at 
 This article is protected by copyright. All rights reserved. 
http://phytozome.jgi.doe.gov/. Genotype calls were accepted only when there was a minimum of two 
reads, otherwise the genotype was set to missing and imputed downstream. The markers were initially 
converted to a matrix of allele dosages, with REF/REF, REF/ALT, ALT/ALT genotypes coded as -1, 
0 and +1, respectively. The GBS data were filtered so that clones with >80% missing markers and 
markers with >60% missing genotype calls were removed.  Beagle version 4.1 was used for 
imputation of the data (Browning and Browning, 2009). After imputation we retained 121,246 bi-
allelic SNP markers with AR2 (Estimated Allelic r-squared) threshold higher than 0.3. Of the imputed 
markers, 116,837 markers with minor allele frequency (MAF) higher than 0.01 were retained. These 
markers were then used for genomic prediction. Similarly, a total of 88,434 markers were retained 
after filtering with MAF threshold higher than at 0.05 and used for GWAS analysis. 37,776 markers 
common to both UG and TARI populations  
were filtered for minor allele frequency (0.01) and the resulting 36,847 markers were then used for 
downstream joint analysis. 
 
 
 
 
 
Statistical analysis 
Estimation of observed values  
To estimate these observed values, we fitted a mixed linear model across trial types and locations 
using lme4, R package (R Development Core Team 2010; Vazquez et al., 2010; Bates et al., 2015). 
For trials within locations, we fitted the model: y              ZTrial (Trial: Rep)r  , where y is 
raw phenotypic observations, β included a fixed effect for the population mean and for plot-basis 
traits (FYLD, RTNO, and SHTWT), the proportion of plants harvested per plot was included as a 
covariate; vector   and corresponding incidence matrix        represented a random effect for clone 
 This article is protected by copyright. All rights reserved. 
where    (     
 ) and I represented the identity matrix. The incidence of replication was nested in 
trial and was represented by matrix ZTrial (Trial.Rep) and random effects vector such that r  (     
 ) 
and residuals   were considered to be distributed as    (     
 ).  
The model for combined analysis across locations was: y               Zrep (Loc.Trial. Rep)b  . The 
fixed and clone effects for combined across location were the same as those for each location 
described above. Trial and rep effects were nested in locations and incorporated as random with 
incidence matrix Zrep and the effects vector b  (     
 ). All the random effects in each of the models 
are independent and identically distributed (iid). For both models described above, the best linear 
unbiased predictors (BLUPs) for clones were extracted and used as the “observed values” that were 
correlated with genomic estimated breeding values (GEBVs) to determine trait prediction accuracies.  
Under combined location model, our emphasis was on prediction of new genetic materials from 
another breeding program, and therefore we considered the environment effects to be nuisance 
parameters. Since we were not interested in location, trial or rep effects per se, but instead interested 
in genomic estimated breeding values (GEBVs). We assumed the three-way interaction of location, 
trial and rep is sufficient to correct for these nuisance terms, and no main effects are required to be fit 
as they are simply linear combinations of the interaction predictors.  
We also considered the issue of heterogeneity of error variances for different trials across the two 
locations. Although error variance estimates did differ across trials, those differences had little effect 
on the estimated BLUP values (the correlation between BLUPs with and without heterogeneous error 
variance were between 0.92 and 0.96 except for fresh yield (FYLD) which had reduced BLUP 
correlations (~0.48 data not shown). To overcome the challenges and complexity of modelling for 
heterogeneity of error variances particularly for yield, we did fit model for each trial type in each 
location separately and then extracted BLUPs from each experiment and then used combined BLUPs 
to correlate with genomic estimated breeding values to determine prediction accuracies. For the rest of 
the trait, because of the large number of trials (18), we felt that fitting an error variance for each trial 
 This article is protected by copyright. All rights reserved. 
would result in too many parameters to estimate, and therefore we used a simpler homogenous error 
variance model.  
 
Variance Components, Heritability and Trait Correlation 
The variance components were extracted from the model used for the trials within location. The 
model statement is as described above under “Estimation of observed values.” Accordingly, broad 
sense heritability (H
2
) was then computed for each trial using  
     
 (  
    
 )⁄  where   
  = clone variance, and    
  = residual variance. 
The phenotypic and genotypic correlations between the nine traits were obtained using combined data 
from the two locations. We also assessed trait phenotypic correlations using data from each breeding 
stage in each program; UKG (CETUKG & PYTUKG), and KIB (PYTKIB & AYTKIB) to determine 
the consistency of trait correlations across different breeding stages. For the combined locations, we 
used raw plot data to estimate phenotypic correlations, without accounting for the experimental 
designs. For estimating the genetic correlations, we accounted for the trial, location, and replication 
effects as described above. We then extracted and used BLUPs to estimate genetic correlations. The 
estimates were plotted (Fig. 2) in R 3.4.1 (R Core Team 2017) using the „corrplot‟ package (Wei, 
2013). For all the cases, correlation values were considered significantly different from zero at P value 
≤ 0.05. 
 
 
Genomic Prediction  
To perform genomic prediction, we fitted separate genomic best linear unbiased prediction (GBLUP) 
model as           for within breeding stages, across breeding stages, and across breeding 
programs. where y is a vector of the raw phenotype   is a vector of fixed effect (different for different 
 This article is protected by copyright. All rights reserved. 
scenarios) with design matrix X, the vector g is the random effect representing GEBV for each 
individual, Z is a design matrix linking observations to genomic values, and   is a vector of residuals. 
For within breeding stages, the fixed is the grand mean, while the fixed effects for across breeding 
stages were grand mean and trial. For across breeding programs the fixed effect included trial and 
location. The GEBV was obtained assuming    (     
 ), where   
  the additive genetic variance 
and K is the square, symmetric genomic relationship matrix based on SNP markers. The genomic 
relationship matrix was constructed with the function A.mat in the R package rrBLUP (Endelman, 
2011) that used VanRaden‟s (2008) method 1. The GBLUP predictions were made with the function 
emmreml in the R 3.1.0 package EMMREML (Akdemir and Okeke, 2015).  
Assessment of Prediction Accuracy 
In order to assess prediction accuracy, we used 30 replicates of 5-fold cross-validation. For each 
replicate, we divided the population randomly into five equal and mutually-exclusive subsets or folds. 
We then trained the prediction model on four (training sets) of the five folds to predict the fifth 
(validation set). For scenarios where we used one location to predict another location, our training and 
test sets were fixed and therefore were not replicated. The following prediction scenarios were 
explored in our analysis: (1) within breeding stages in each location: CETUKG, PYTUKG, PYTKIB, 
and AYTKIB, (2) combined locations (UKG+KIB), (3) within locations: prediction of UKG alone 
(CETUKG + PYTUKG) and KIB alone (PYTKIB + AYTKIB), (4) inter-location prediction: UKG to 
predict KIB and KIB to predict UKG, and UKG+KIB to predict UKG and vice versa.  We further 
used UKG+UG to predict UKG, KIB+UG to predict KIB and UKG+KIB+UG to predict UKG and 
KIB. This was done to determine if adding UG clones to either TARI program would improve trait 
prediction accuracies. For scenarios 3 and 4, we maintained a fixed test set (20%) picked randomly 
from the population under prediction in each iteration. For each prediction, accuracies were computed 
as the Pearson correlation coefficient between the GEBV predicted for the test set and the 
corresponding estimated observed breeding values; BLUPs. Correlation values were considered 
significantly different from zero at P value ≤ 0.05. 
 This article is protected by copyright. All rights reserved. 
Population structure 
To visualize population structure, principal component analysis (PCA) using common markers in the 
three breeding programs; KIB, UG, and UKG was performed. The prcomp function in R was used to 
generate PCs. The first two PCs were then used for plotting to visualize structure between the three 
populations (KIB, UKG and UG). Thereafter the loadings (eigenvector coefficients) for all the 
markers on 18 chromosomes on PC1 and PC2 were assessed to determine the markers contributing to 
the greatest variations in TARI alone and combined TARI and UG in the respective PCs. 
 
Genome-Wide Association Study  
GWAS was performed on different subsets of TARI accessions. The subsets included trial types 
(CET, PYT, AYT), locations (KIB and UKG), and four clusters generated based on marker kinship 
matrices of the individuals. We used these sets of data combination because we suspected that certain 
chunks of data could provide better results than others. Clustering was considered because we thought 
it is an objective way to find population structure. By doing GWAS within each cluster we were 
hoping to avoid population structure and find better allele frequencies. 
In implementing GWAS, we used BLUPs extracted from the linear mixed model described under 
estimated observed breeding values section as phenotypes. For each trial, the GWAS was carried out 
using genome-wide complex trait analysis tool (GTCA, Yang et al., 2011). This followed a leave one 
chromosome out approach in which the chromosome with the tested candidate SNP markers is 
excluded from the genomic relationship calculation. The linear mixed model:            with 
var (y) = V = Kσ2g + Iσ
2
ɛ, were fitted for each case. Here   is an n × 1 vector of phenotype (BLUPs) 
with n being the sample size, β is a vector of fixed effect (genetic marker information), g is an n × 1 
vector of the total genetic effects of the individuals with g∼ N (0, Kσ2g), K is the genetic relationship 
matrix between individuals, same as symmetric genomic realized relationship matrix based on SNP 
markers, I is an n × n identity matrix, and ɛ is a vector of residual effects with ɛ ∼ N (0, Iσ2ɛ).   
 This article is protected by copyright. All rights reserved. 
Bonferroni threshold was applied to correct for multiple testing for each data set used for GWAS. The 
cut off was computed as −log 10(α/t*n) where α is 0.05, which is the standard significance threshold, t 
is the number of different subset of data, and n is the number of SNPs. In this study, the Bonferroni 
correction significance cut off was − log10 (0.05/15*88,434) = 7.42. We used the marker-wised P 
value of 0.001 as the threshold to declare significant market trait association (MTA) and the false 
discovery rate (FDR) corrected P value of 0.2 to select the highly significant MTA. QTL boundaries 
were defined with linkage disequilibrium (LD) pairwise correlation coefficient r2 ≥ 0.7 coupled with 
the MTA information. We chose the tagging marker for each QTL with the criteria of the smallest 
marker-wised P value. Manhattan and QQplots plots were generated using R package 
ggplots2 (Wickham et al., 2016) with customization for joint Manhattan-QQplot display using R 
package ggpubr (Kassambara, 2018). We used the M. esculenta_305_v6.1 available in Phytozome 
to report presence of annotated genes that are related to plant defense system within the QTL 
region.  
 
Results and Discussion 
Population Structure 
We performed principal component analysis to assess structure within and among the UKG, 
KIB, and UG populations (Fig.1). This is necessarily to determine how individuals from these 
programs are related to each other in order to understand the effect of germplasm sharing 
between Uganda and Tanzania. It will also be useful to understand whether the existing 
restrictions on clone movement between KIB and UKG might have facilitated population 
differentiation within Tanzanian germplasm. In the combined data (UKG+KIB+UG), the first 
and second PC accounted for 11% and 5% of the genetic variation, respectively. In TARI 
(KIB+UKG), PC1 and PC2 explained 7% and 5% of the variation, respectively. Some 
population stratification within each breeding program was observed (Fig. 1). 
 This article is protected by copyright. All rights reserved. 
 
Three clusters of clones were evident arrayed along PC1 (Fig. 1b). The loadings of markers 
on PC1 (Fig. 1d) showed a strong effect of chromosome 1 on this component, suggesting that 
the clusters are due to the known introgression from M. glaziovii on this chromosome 
(Bredeson et al., 2016). This clustering along PC1 has been observed previously in Ugandan 
cassava (Ozimati et al., 2019) and we suggest that it depends on the dosage of the 
chromosome 1 introgression, either 0, 1, or 2 copies. Interestingly, PC2 (Supplementary Fig. 
1) strongly separated UKG from KIB while the UG accessions were intermediate between 
these two. When analyzing TARI alone, differences between UKG and KIB aligned along 
PC1 (Fig. 1a) whereas the chromosome 1 introgression aligned along PC2 (Fig. 1c, 
Supplementary Fig. 1). This analysis may also provide valuable information on the role of 
introgression in driving population stratification. Despite, Tanzania being the presumed 
origin of chromosome introgressions in the 1930s, (Hahn et al., 1979, 1980; Fauquet et al., 
1990), based on this analysis, this introgression may have been subjected to more 
recombination events in Tanzanian population (Fig 1:c &d).  
The Uganda population was constituted in 2011 by combining progenitors sourced from 
CIAT, IITA, Tanzanian, and a few clones from within Uganda (Kawuki, 2019 personal 
communication). Because of this recent constitution of UG TP, it is possible that the UG 
introgression is newer than Tanzania resulting in differentiation between sub populations 
with introgression versus ones without it.  However, further investigation is needed to 
validate these hypotheses. 
 
Trait Correlations 
 This article is protected by copyright. All rights reserved. 
Trait phenotypic and genetic correlations were quite similar (Fig. 2; Supplementary Fig. 2). 
Phenotypic and genotypic correlations of the yield traits were moderately positive except for 
DM. The highest correlation was between FYLD and SHTWT, followed by RTNO and 
SHTWT, and RTNO and FYLD, (r=0.6, r=0.5, r=0.4, respectively). These values are only 
somewhat lower than previous findings; Kundy et al. (2014) reported a strong positive 
correlation between RTNO and FYLD (r=0.7), but fewer clones were tested which could 
have skewed their findings. In addition, effects associated with locations and trial types in the 
present study might have contributed to the observed deviations. Generally, the positive 
correlations of FYLD, RTNO, and SHTWT could be exploited for simultaneous trait 
improvements. We also observed a slight positive genetic correlation between MCMDS and 
MCBSDS (r=0.3) but a very low association between MCBSDS and CBSD root necrosis 
which agrees with findings in other studies (Rwegasira and Rey, 2012; Ozimati et al., 2019). 
The positive relationship between these two foliar diseases provides an opportunity to 
increase resistance against both concurrently. MCGMS showed weak negative correlations 
with MCMDS, SHTWT, and FYLD (r=-0.3, -0.3, -0.4, respectively). The negative 
correlation between MCGMS and FYLD corroborates Chipeta et al., (2008) result (r= -0.53). 
Breeders should keep these negative associations in mind and incorporate sources of 
resistance when breeding for yield. Assessments of trait correlations in each breeding stage 
showed similar patterns observed in combined data sets from across breeding stages. 
However, we also noticed slight improvement in some trait although we think it could just be 
noise in the data (Supplementary Fig. 2).  
 
Variance Components and Broad-Sense Heritability Estimates 
 This article is protected by copyright. All rights reserved. 
Variance components and broad-sense heritability estimates for each of the nine traits are in Table 2. 
Generally, KIB trials yielded the largest genetic variance estimates for MCMDS, CBSDRS, FYLD, 
and SHTWT while UKG trials had the largest estimates for MCBSDS and RTNO. Conversely, both 
FYLD and SHTWT had the largest residual estimates in the AYTKIB and UKG trials, respectively. 
Errors associated with CBSDRS, MCGMS, RTNO, HI, and DM in AYTKIB were lower than other 
trials. Meanwhile both genetic and error estimates for RTNO in the UKG trials were higher than in 
the KIB trials. Additionally, HI and MCBSDS had similar error estimates across all the trials. 
Heritability estimates ranged between 0.06 and 0.91. Based on cassava heritability classification, only 
HI in AYTKIB, MCBSDS in PYTUKG, and MCMDS in both AYTKIB and PYTUKG could be 
classified as high (>0.50) (Bhateria et al., 2006). Other researchers reported higher heritability for 
MCMDS, MCBSDS, and HI in different cassava populations (Wolfe et al., 2017; Kanyondo et al., 
2018; Adeniji et al., 2011). High heritability estimates for MCMDS and MCBSDS could be 
associated with varying levels of resistance in Tanzanian accessions owing to specific responses to 
various strains of CBSVs and CGMs.  In the Lake Zone, there is a high prevalence of Africa cassava 
mosaic virus, Eastern African cassava mosaic virus (EACMV), the Ugandan variant of cassava 
mosaic virus, cassava brown streak virus (CBSV), and the Ugandan variant cassava brown streak 
virus. However, only EACMV and CBSV are known to occur in the Eastern Zone; includes Kibaha 
(Raya et al., 1993; Jeremiah et al., 2015). Extensive pathogen variation within and between the two 
regions of Tanzania could have caused the variability detected in the germplasm. Significant 
differences in the HI were reported among cassava cultivars (Kawano et al., 1978). There is intensive 
selection against clones with heavy branching in early generations. Clones with fewer branches are 
known to have low HI. It appears from this data that the clones from AYTKIB trials were heavily 
selected for low branching and consequently resulting in higher heritability (h2=0.80). 
 
The CBSDRS, SHTWT, FYLD, and DM in the CETUKG trials and MCBSDS and DM in the KIB 
trials, all had medium heritability estimates. Similar estimates were reported in other cassava 
 This article is protected by copyright. All rights reserved. 
populations (Wolfe et al., 2017; Kayondo et al., 2018; Ozimati et al., 2019). We observed low FYLD 
heritability in AYTKIB than in any other trials. We can only speculate that intensive selection for 
disease resistance and DM in early breeding stages could have limited and reduced the amount of 
FYLD genetic variation in advanced lines (AYTKIB). Therefore, we need to balance between trait 
selection preference and genetic variation in breeding stages.  
For most traits, CET trials with a smaller plot size (5m
2
) had lower error variance and a better 
heritability than trials with medium and large plots (e.g. 14m
2
 and 20m
2
). High trait genetic variances 
were reported in sugarcane clonal evaluation trials at Louisiana State University Agricultural Center, 
but the heritability estimates in different plot sizes were similar (Milligan et al., 2007). The substantial 
increase in genetic variance and trait heritability in smaller plots than larger plot sizes needs further 
investigations. This will help establish optimal plot sizes for cassava breeding for better prediction 
accuracies. Generally, trait heritability estimates varied across trials similar to the findings of Wolfe et 
al. (2017), Kayondo et al. (2018), and Ozimati et al. (2019). Confounding effects (locations and 
trials), environmental condition differences, trait selection priority in each of the TARI breeding 
programs, and differences in data collection between the two programs could have affected 
heritability estimates. 
 
Genomic prediction 
Within Breeding Stage Prediction 
Within breeding stage prediction accuracies for all the traits in CETUKG, PYTUKG, PYTKIB and 
AYTKIB were 0.23, 0.20, 0.15, and 0.20, respectively (Table 3). The accuracies within the breeding 
stages were lower than accuracies from other cassava populations for most traits except DM and 
FYLD (Wolfe et al., 2017). In this study, the use of data from CETUKG stage significantly improved 
prediction accuracies for all the traits except MCMDS and CBSDRS compared to PYTUKG, 
PYTKIB, and AYTKIB data sets. Overall prediction accuracies obtained from cross-validations were 
 This article is protected by copyright. All rights reserved. 
higher for HI and FYLD when Individuals in CETUKG and PYTUKG were used for model training. 
On the contrary, the two breeding stages evaluated at Kibaha produced markedly lower prediction 
accuracies for most traits except for MCGMS and DM.  
Interestingly, there was consistent prediction of FYLD (~35%) in both CETUKG and PYTUKG. The 
stability in yield prediction in these breeding cycles at Ukiriguru suggests that breeders can opt to 
select candidates for training population from either cycle particularly when the interest is yield. An 
added advantage of equal prediction in CETUKG and PYTUKG is that clones can directly be 
transferred CET to AYT stage for evaluation. Skipping of PYT stage could help accelerate the varietal 
replacement process.  
Dry matter prediction accuracy improved significantly by 53% when clones in advanced 
Kibaha trials (AYTKIB) were used to model training than those in PYTKIB. Comparison 
between breeding stages in the two locations showed high FYLD prediction accuracy in 
CETUKG and a high DM in AYTKIB. One would have expected opposite results because of 
trait selection preferences between the two locations as well as the larger plot size associated 
with AYTs. We are not certain why clones in early generation evaluated in smaller plot s had 
higher yield than clones in advanced generation evaluated in larger plots. The difference 
could be attributed to high genetic variation and high heritability estimates for CET clones 
than AYT clones. 
 
We also observed that using clones from the same breeding stage for TP gave higher prediction 
accuracy estimates than clones aggregated from multiple breeding stages. Our estimates agree with 
those found by Hofheinz et al. (2012) using data from two consecutive breeding cycles of sugar beets 
and Michel et al. (2016) study on genomic selection using multiple breeding cycles in bread wheat. 
Ceballos et al. (2016) recommends the used of phenotypic information from clones in advanced 
breeding stage during genomic selection because of their “stable” genotypic performance, in this 
 This article is protected by copyright. All rights reserved. 
study we observed that phenotypic records of clones in early breeding stage predicted test set better 
for most traits than advanced ones. From this study we suggest that consideration should be given to 
the within the breeding stage clones particularly from early generation when forming training 
populations. Additionally, CET clones are generally tested in smaller plots because of limited number 
of planting stakes. The use of smaller plots for trait evaluation can further help reduce phenotyping 
cost. Additional investigation using different cassava population is needed to validate our results. 
 
Within and Cross Location Prediction  
Using one location data to predict the performance in an independent location can help accelerate 
breeding process. We evaluated across-location prediction and compared the trait prediction accuracy 
estimates to the within location prediction accuracy estimates. Except for MCGMS, the cross location 
predictions were generally low, averaging between 0.10 and 0.14 when KIB was used to predict UKG 
and vice versa, respectively (Table 4). MCGMS was the highest and most consistently predicted trait 
for both within and between locations. Adding the two populations 
together to increase the training population size did not improve trait prediction accuracies. We also 
noticed that the use of KIB to predict UKG reduced accuracies for CBSDRS, RTNO, SHTWT, HI, 
and FYLD compared to the within-UKG scenario. The unrelatedness of materials across cycles, 
confounding effects (trials and locations), and variable heritability across locations could have caused 
low accuracies. Perhaps existing quarantine between the two TARI programs associated with subtle 
population differentiation (Fig.1) as well as the absence of common checks between trials may have 
caused low trait prediction estimates between locations.  
 
Reduced common ancestors overtime results in reduced kinship, which reduces accuracy (de los 
Campos et al., 2013). Lorenz and Smith. (2015) reported a decrease in prediction accuracy when 
unrelated lines are added to the calibration set. Similarly, Song et al. (2017) reported a decrease in 
 This article is protected by copyright. All rights reserved. 
prediction accuracy when predicting yield across cycles compared to within cycles in wheat. Our 
results are also in agreement with Ly et al. (2013) findings. They observed a decrease in prediction 
accuracies across locations in cassava. Therefore, based on this population, cross-location prediction 
may not be useful for programs implementing GS. Although generally, it would be useful to 
implement GS across locations, using populations with structure and weak SNP-QTL LD associations 
across populations could limit GS-assisted breeding in cassava and other species. This limitation has 
also been observed in livestock, Hayes et al. (2009), reported that pooling animals from different 
populations did not improve trait predictions due to non-persistent association between SNPs and 
QTL across breeds (populations).  
 
In our study we did not find consistent relationship between heritability and prediction accuracy 
estimates across breeding cycles. In fact, we observed same accuracies for FYLD in both CET and 
PYT at Ukiriguru. Other studies have reported positive relationship between heritability and 
prediction accuracy across breeding cycles. For example, Sallam et al. (2015) reported high prediction 
accuracy for fusarium head blight resistance than yield in cross breeding cycles.  
This is because highly heritable traits have a less complex genetic architecture and therefore 
considered to be stable across multiple cycles. Combs and Bernardo (2013) and Daetwyler et al. 
(2010) have both attributed the positive relationship between heritability and accuracy to the 
preserved haplotype structures and relatedness across breeding cycles.  
 
 
Across-Program Prediction  
The UKG population had slightly better prediction accuracies for most traits than KIB (Table 5). We 
added the UKG clones to KIB and vice versa and predicted a fixed number (20%) of KIB or UKG 
 This article is protected by copyright. All rights reserved. 
accessions to determine whether including these clones could improve trait prediction. Adding UKG 
clones to the KIB training set did not improve prediction accuracies for most traits. On the other hand, 
adding the KIB clones to the UKG training set reduced the prediction accuracies for most traits. This 
effect was more severe on FYLD than any other trait (from r=0.30 to r=0.08). Similarly, adding the 
UG clones to either KIB or UKG did not improve prediction accuracies for either program. However, 
the UKG+UG to predict UKG test set slightly improved the results compared to the KIB+UG to 
predict KIB test set.  
Further, we used UKG+KIB+UG to predict UKG and KIB to determine if adding UG clones to TARI 
clones would improve trait prediction accuracies. There was no improvement in trait prediction. In 
fact, there were decreases for CBSDS, CBSDRS, and SHTWT (r=0.16 to 0.07, r=0.14 to 0.07, r=0.16 
to 0.08, respectively) when UKG+KIB+UG was used to predict KIB. We observed similar decreases 
for CBSDRS, SHTWT, and FYLD (r=0.23 to 0.15, r=0.20 to 0.14, 0.30 to 0.26, respectively) when 
UKG+KIB+UG was used to predict UKG. Decrease in accuracies as result of combining unrelated 
populations have been reported in other crops. For example, Lorenz and Smith (2015) reported low 
prediction accuracy estimates when they combined population from North Dakota state University 
barley program and a second population from University of Minnesota barley breeding program to 
form training population. Other researchers have also reported low accuracies in cross-population 
predictions (Wolfe et al., 2017; Crossa et al., 2010; Endelman, 2011). Our results and evidences from 
other studies suggests that breeders can achieve better and a more reliable prediction accuracy 
estimates using smaller population with closely related genotypes than using large population with 
unrelated individuals. Population structure, environmental condition differences, differences in 
experimental design, direction of trait selection in each of the TARI breeding program, and variation 
in heritability could have impacted the accuracy estimates. Based on published results, the Ugandan 
training population had slightly higher accuracies for most traits compared to TARI (Wolfe et al., 
2017; Ozimati et al., 2018). More locations and replication of clones across environments could have 
improved their accuracies. The poor prediction results across-programs reported in our study will 
 This article is protected by copyright. All rights reserved. 
make it harder for breeders to use training data from different locations, breeding programs, and 
countries.  
 
Genetic Architecture of Disease Resistance 
Genetic associations with MCMDS were distributed across all chromosomes except 4, 5 6, 8, 11, 13, 
and 18 (Fig. 3, Table 6, and Supplementary File1). Previous GWAS study for cassava mosaic disease 
severity resistance revealed CMD2 locus on chromosome 12 in other cassava populations (Wolfe et 
al., 2016). SNPs S12_5529819 and S12_11351823 delimit this QTL region (6.3-8.7 Mbps). 
S12_7270219 was the most significant marker tagging the QTL in all the TARI sub populations and 
accounted for 14 to 37% of the phenotypic variance explained (PVE). This result validates the 
presence of previously reported CMD2 locus. In addition to CMD2 locus, we also detected another 
major CMD QTL (QTL-cmds16-1) on chromosome 16 tagged by marker S16_421670 when we used 
PYTUKG population. This region accounted for 87% of PVE (Fig.3, Table 6, and Supplementary 
File1).  There are no known CMD resistance genes in this region. However, Manes.16G036200.1, 
Manes.16G036300.1, and Manes.16G036400.1 genes have been annotated in this region and known 
to encode pentatricopeptide repeats.  These genes are positively expressed when plants are under 
attack from pathogens (Park et al., 2014). 
A QTL on chromosome 4 (QTL-cbsd4|cmd-1) and two on chromosome 12 (QTL-cbsd12|cmd-1 and 
QTL-cbsd12|cmd-2) showed significant association with both MCBSDS and MCMDS resistance (Fig. 
3, Table 6, and Supplementary File 1). Marker S12_7929439 tagging cbsd12|cmd-2 QTL accounted 
for 8% PVE for MCMDS and 33% MCBSDS in several subsets of TARI population. In the same 
region, a single marker (S12_929320) was found associated with MCMDS and MCBSDS resistance 
QTL (QTL-cbsd12|cmd-1) in PYTKIB dataset and accounted for 37% and 8% PVE, respectively. 
QTL-cbsd4|cmd-1 tagged by marker S4_24670203 was detected in K4_cluster1 dataset and explained 
14% of PVE. Manes.04G113900.1 gene occur in the marker region detected on chromosome 4. This 
gene is known to encode for phospholipased α and activates the plant responses to pathogen attacks 
 This article is protected by copyright. All rights reserved. 
(De Torres et al., 2002). There are no known annotated genes in the region of QTL-cbsd12|cmd-1 and 
QTL-cbsd12|cmd-2 QTL. Further studies need to be conducted to confirm the presence of QTL 
conferring resistance to both CMD and CBSD on chromosome 4 and 12. 
 
On chromosome 9 and 11, QTL-cbsd9-1 and QTL-cbsd11-2 tagged by SNP S9_10707044 and 
S11_22942418 were associated with responses to MCBSDS in UKG and PYTKIB sub populations, 
respectively. QTL-cbsd9-1 accounted for 9% PVE while QTL-cbsd11-2 accounted for 5% PVE. The 
annotated gene within the QTL-cbsd11-2 is Manes.11G120800.1. This gene is known to encode for a 
protein kinase. An overexpression of similar gene in tobacco stimulated plant defense response (Li et 
al., 2018). Similarly, the region on chromosome 9, with a significant marker S9_10707044, contains a 
single gene (Manes.09G074200.1) that encodes a kinase family protein. Wen-Yuan Song et al. (1995) 
reported that a receptor kinase-like protein is encoded by the rice disease resistance gene Xa21. The 
two genes on chromosomes 9 and 11 could play a role in cassava defense mechanism to confer CBSD 
resistance. However, further investigation to validate the presence of CBSD resistant is needed. 
 
Nine markers representing four loci on chromosomes 2, 3, 8, and 10 were significantly associated 
with CBSDRS responses (Table 6, Supplementary File 1). Two loci on chromosomes 3 (QTL-
cbsdrs3-1) and 8 (QTL-cbsdrs8-1), occurring in AYTKIB accessions, accounted for 30% and 32% of 
the phenotypic variation (chromosomes 3 and 8, respectively). While the other two minor loci on 
chromosomes 2 (QTL-cbsdrs2-1) and 10 (QTL-cbsdrs10-1), detected in UKG accessions, explained 
8% and 11% of the phenotypic variation (chromosomes 2 and 10, respectively). The 
Manes.08G079900.1 gene within the QTL region on chromosome 8 is known to express a wall-
associated kinase like receptor. Shi et al. (2016) cloned Snn1 which is a member of the wall-
associated kinase class receptors and found that they drive pathways for biotrophic pathogen 
resistance in wheat. No annotated genes within the region significant for resistance to CBSDRS on 
chromosome 3 are in cassava reference genome. 
 This article is protected by copyright. All rights reserved. 
In conclusion, discovery of new loci and associated markers will facilitate early selection during the 
season so breeders can have adequate information early enough to make decisions. Although CBSDS 
resistance genes in some Ugandan accessions are thought to have originated from Tanzania during 
germplasm exchange, the MCBSDS genes discovered in this study are not localized on chromosomes 
5, 11, and 18 as reported for Ugandan germplasm (Kayondo et al., 2018). From this study we 
observed that using fewer but closely related individuals particularly from same clusters improves the 
discovery of QTL compared to the use of large number of individuals. These results are similar to the 
GWAS findings of Bradbury et al. (2011) who reported that accounting for individual relatedness in 
barley improved detection of true QTL.   
 
Conclusions 
Genomic prediction and selection have been touted as tools that could greatly modernize plant 
breeding and accelerate genetic gain. In this study, we examined the power of diverse breeding lines 
assembled from two breeding programs, at different breeding stages to predict traits and discover 
QTL. TARI population differentiation could have resulted from existing restrictions on clonal 
movement between programs. This restriction was imposed to contain the spread of cassava foliar 
diseases between Lake and Coastal Zone (Legg and Thresh, 2000). Although, there is long tradition of 
germplasm sharing between Tanzania and Uganda, the introgression occurring in Uganda TP is large 
and restricted to chromosome 1 while those in Tanzania are spread across the entire 18 chromosomes. 
This could suggest that the Uganda introgression could be more recent than in Tanzania. This may 
explain the population differentiation between sub populations with the introgression versus ones 
without.  
There was no relationship between increase in plot size and decrease in error variance across all traits. 
For some traits, larger plots were preferable while for other traits smaller plots were. However, this 
conclusion needs more proven evidence to verify the results. If this finding is confirmed, then 
breeders need to reconsider whether utilizing larger plots is cost effective.  
 This article is protected by copyright. All rights reserved. 
An inverse relationship between heritability estimates and trait prediction accuracies were observed 
for some traits, contrary to other plant studies (Combs and Bernardo, 2013; Lian et al., 2014). We are 
not certain whether this inverse relationship is due to noise in the data or is true and therefore further 
assessment is needed to determine this relationship.  
Prediction accuracies within and between locations (KIB and UKG) were generally low compared to 
other cassava populations. Although larger training population sizes have been associated with 
improved accuracies, in this study, adding clones from KIB to UKG and vice versa did not improve 
prediction accuracy of either population. Similarly, adding the UG clones to either KIB or UKG did 
not improve accuracies of either. The lack of relatedness between germplasm, population structure, 
and the impact of genotype-by-environment interaction negatively impacted accuracy estimates.  
Generally, across breeding cycle GS is more difficult in plants than in animals. This is because every 
year, plant breeders largely use new parents, some with unknow background from other breeding 
programs or competitors while animal breeders work in closed populations. This could make it 
challenging for the breeders to keep materials adequately related across breeding cycle GS. The 
successful use of GS is dependent on the close relationship between individuals in training and test 
set. Clones in similar breeding stages were more valuable than mixing clones from different breeding 
stages when constructing training population. This because clones in the same generation are likely to 
share ancestor few generations back and therefore marker-QTL linkage are preserved due to limited 
number of recombination events (Habier et al., 2013). It is also possible that closely related 
population share more polymorphic loci and share large fraction of genetic background causing 
sufficient genetic variation (Lorenz and Cohen, 2012; Mohammadi et al., 2015).  
Consistent with the findings of other researchers, we conclude that clones from within-breeding 
cycles are presently better option as candidates for GS training population (Song et al., 2017; Michel 
et al., 2016; Cericola et al., 2017; Riedelsheimer et al., 2013; Albrecht et al., 2014; Lehermeier et al., 
2014). Also, it may not be useful to constitute training populations from programs with divergent 
populations or populations separated by barriers like the existing restriction on clone movement 
 This article is protected by copyright. All rights reserved. 
between the two Tanzania programs. Across-breeding cycle GS especially those in more advanced 
breeding stages needs to be investigated further because prediction accuracies are very low. The 
impact of GxE modeling on unreplicated clones across environments need to be investigated further. 
Moreover, clones in the early breeding stage provided more reliable trait prediction accuracies 
because of their inherent genetic variation.  Therefore, these clones are better candidates for training 
population construction.  
 
We identified accessions carrying MCMDS, MCBSDS, and CBSDRS resistance. Some of the loci 
identified in these accessions have been previously reported. However, other loci are new. These 
results will be valuable for cassava breeding against CMD, CBSDS, and CBSDRS. Although we have 
learned valuable lesson from this study, we still need to continue to improve our experimental 
designs, data capture, and constitution of training populations so that genomic prediction and 
accuracies will be more reliable. We echo the lessons from Wolfe et al. (2017) to continue to improve 
on data quality and selection of individuals to make training populations so that we can maximize 
genetic gain. 
 
Conflict of Interest  
The authors declare that there is no conflict of interest 
Acknowledgments 
This work was supported by the Next Generation Cassava Breeding Project through funds 
from the Bill and Melinda. Gates Foundation and the Department for International 
Development of the United Kingdom. We thank the technical field staff of the TARI and 
NaCRRI cassava breeding programs, who helped in the trial management and/or data 
collection (Nsajigwa Mwakyusa, Karoline Leonard Sichalwe, Joseph Orone, Charles Majara, 
 This article is protected by copyright. All rights reserved. 
Gerald Adiga, and Vincent Kyaligonza). Special thanks to Guillaume Bauchet for initial 
curation of sequence data and the entire Next Generation Cassava Breeding team 
(www.nextgencassava.org) as a whole 
 
Data Availability 
The phenotypic and genotypic data generated and analyzed during this study are available in the 
CASSAVABASE repository, (https://www.cassavabase.org/)                                                                                                                                           
References 
Akdemir, D., and O.U. Godfrey. 2015. EMMREML: fitting mixed models with known 
covariance structures. R package version 3.1. 
Adeniji, O.T., Odo, P.E. and B. Ibrahim. 2011. Genetic relationships and selection indices for 
cassava root yield in Adamawa State, Nigeria. African Journal of Agricultural Research, 
6(13):2931-2934. 
Bates, D., M. Maechler, B. Bolker, S. Walker, R.H.B. Christensen, H. Singmann, B. Dai, and 
G. Grothendieck. 2015. Package „lme4‟. Convergence, 12(1). 
Bhateria, S., Sood, S.P. and A. Pathania, 2006. Genetic analysis of quantitative traits across 
environments in linseed (Linum usitatissimum L.). Euphytica, 150(1-2):185-194. 
Bradbury, P., T. Parker, M.T. Hamblin, and J. L. Jannink. 2011. Assessment of power and false 
discovery rate in genome-wide association studies using the BarleyCAP germplasm. Crop 
Sci. 51(1):52–59. 
Bredeson, J., V. Lyons, J. B. Prochnik, S. E. Wu, G. A., Ha, C. M. Edsinger-Gonzales, E. and 
D. S. Rokhsar. 2016. Sequencing wild and cultivated cassava and related species reveals 
extensive interspecific hybridization and genetic diversity. Nature Biotechnology 
34(5):562–570.  
Browning, B.L., and S.R. Browning. 2009. A unified approach to genotype imputation and 
haplotype-phase inference for large data sets of trios and unrelated individuals. The 
American Journal of Human Genetics, 84(2):210-223. 
Ceballos, H., P. Kulakow, and C. Hershey. 2012. Cassava breeding: current status, 
bottlenecks and the potential of biotechnology tools. Tropical plant biology, 5(1):73-87. 
Cericola, F., A. Jahoor, J. Orabi, J. R. Andersen, L. L. Janss, and J. Jensen. 2017. Optimizing 
training population size and genotyping strategy for genomic prediction using association 
study results and pedigree information. A case of study in advanced wheat breeding lines. 
PloS one, 12(1)0169606. 
Chipeta, M. M., J. M. Bokosi, V.W. Saka, and I. R. Benesi. 2013. Combining ability and 
mode of gene action in cassava for resistance to cassava green mite and cassava mealy 
bug in Malawi. J Plant Breed Crop Sci, 5(9):195-202. 
 This article is protected by copyright. All rights reserved. 
Combs, E. and R. Bernardo. 2013. Accuracy of genomewide selection for different traits with 
constant population size, heritability, and number of markers. The Plant Genome, 6(1). 
Crossa. J., G. de los Campos, P. Pérez, D. Gianola, J. Burgueño, J. L. Araus, D. Makumbi, R. 
P. Singh, S. Dreisigacker, J. Yan, V. Arief, M. Banzinger, and H. J. Braun. 2010. 
Prediction of genetic values of quantitative traits in plant breeding using pedigree and 
molecular markers. Genetics 186:713–724. 
Dawson, J. C., J. B. Endelman, N. Heslot, J. Crossa, J. Poland, S. Dreisigacker, Y. Manès, M. 
E. Sorrells, and J. L. Jannink. 2013. The use of unbalanced historical data for genomic 
selection in an international wheat breeding program." Field Crops Research, 154:12-22. 
de los Campos, G., J.M. Hickey, R. Pong-Wong, H. D. Daetwyler, and M. P. Calus. 2013. 
Whole-genome regression and prediction methods applied to plant and animal breeding. 
Genetics, 193(2):327-345. 
Elshire, R.J., J.C. Glaubitz, Q. Sun, J.A Poland, K. Kawamoto, E.S. Buckler, and S.E. 
Mitchell. 2011. A robust, simple genotyping-by-sequencing (GBS) approach for high 
diversity species. PloS one, 6(5):19379. 
Endelman, J.B. 2011. Ridge regression and other kernels for genomic selection with R 
package rrBLUP. The Plant Genome 4(3):250-255. 
Fauquet, C., D. Fargette, and C. Munihor. 1990. African cassava mosaic virus: Etiology, 
epidemiology, and control. Plant Disease 74:404–411.  
Glaubitz, J.C., T.M. Casstevens, F. Lu, J. Harriman, R.J. Elshire, Q. Sun, and E.S. Buckler. 
2014. TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline. PloS 
one, 9(2):e90346. 
Habier, D., R. L. Fernando, and D. J. Garrick. 2013. Genomic BLUP decoded: a look into the black 
box of genomic prediction. Genetics, 194(3):597-607. 
Hahn, S., E. Terry, and K. Leuschner. 1980. Breeding cassava for resistance to cassava 
mosaic disease. Euphytica 29:673–683.  
Hahn, S., E. Terry, K. Leuschner, I. Akobundu, C. Okali, and R. Lal. 1979. Cassava 
improvement in Africa. Field Crops Res. 2:193–226.  
Hayes, B.J., P. J. Bowman, A. J. Chamberlain, and M. E. Goddard. 2009. Invited review: Genomic 
selection in dairy cattle: Progress and challenges. Journal of dairy science, 92(2):433-443. 
Heffner, E.L., M.E. Sorrells, and J.L. Jannink. 2009. Genomic selection for crop 
improvement. Crop Science, 49(1):1-12. 
Hillocks, R.J. and J.M. Thresh. 2000. Cassava mosaic and cassava brown streak virus 
diseases in Africa: a comparative guide to symptoms and aetiologies. Roots, 7(1):1-8. 
Hofheinz, N., D. Borchardt, K. Weissleder, and M. Frisch. 2012. Genome-based prediction of test 
cross performance in two subsequent breeding cycles. Theoretical and Applied 
Genetics, 125(8):1639-1645. 
Howeler, R., N. Lutaladio, and G. Thomas. 2013. Save and grow: cassava. A guide to 
sustainable production intensification. FAO. 
Jannink, J.L., A.J. Lorenz, and H. Iwata. 2010. Genomic selection in plant breeding: from 
theory to practice. Briefings in functional genomics, 9(2):166-177. 
Jeremiah, S.C., I.L. Ndyetabula, G.S. Mkamilo, S. Haji, M.M. Muhanna, C. Chuwa, S. 
Kasele, H. Bouwmeester, J. N. Ijumba, J. P. Legg. 2015. The dynamics and 
 This article is protected by copyright. All rights reserved. 
environmental influence on interactions between cassava brown streak disease and the 
whitefly, Bemisia tabaci. Phytopathology, 105(5):646-655. 
Kassambara, A. 2018. ggpubr:„ggplot2‟Based Publication Ready Plots. R package version 0.2. 
Kawano, K., W.M.G. Fukuda, and U. Cenpukdee. 1987. Genetic and Environmental Effects 
on Dry Matter Content of Cassava Root 1. Crop Science, 27(1):69-74. 
Kawano, K., P. Daza, A. Amaya, M. Rios and W. M. Goncalves. 1978. Evaluation of 
Cassava Germplasm for Productivity 1. Crop Science, 18(3):377-380. 
Kayondo, S.I., D.P. Del Carpio, R. Lozano, A. Ozimati, M. Wolfe, Y. Baguma, V. Gracen, S. 
Offei, M. Ferguson, R. Kawuki, and J.L. Jannink. 2018. Genome-wide association 
mapping and genomic prediction for CBSD resistance in Manihot esculenta. Sci. Rep. 
8(1): 1–11. 
Kundy, A.C., G.S. Mkamilo, and R.N. Misangu. 2014. Correlation and Path Analysis 
between Yield and Yield Components in Cassava (Manihot esculenta Crantz) in Southern 
Tanzania. Journal of Natural Sciences Research, 4:6-10. 
Legg, J.P. and T, J.M Thresh. 2000. Cassava mosaic virus disease in East Africa: a dynamic 
disease in a changing environment. Virus research, 71(1-2):135-149. 
Li, W., X. Li, J. Chao, Z. Zhang, W. Wang, and Y. Guo. 2018. NAC family transcription 
factors in tobacco and their potential role in regulating leaf senescence. Frontiers in plant 
science, 9, p.1900. 
Lian, L., A. Jacobson, S. Zhong, and R. Bernardo. 2014. Genomewide prediction accuracy 
within 969 maize biparental populations. Crop Science, 54(4):1514-1522. 
Lian, L., A. Jacobson, S. Zhong and R. Bernardo. 2015. Prediction of genetic variance in biparental 
maize populations: Genomewide marker effects versus mean genetic variance in prior 
populations. Crop Science, 55(3):1181-1188. 
Lorenzana, R. E., and R. Bernardo. 2009. Accuracy of genotypic value predictions for 
marker-based selection in biparental plant populations. Theoretical and applied genetics 
120(1):151-161. 
Lorenz, A.J. and K.P. Smith, 2015. Adding genetically distant individuals to training 
populations reduces genomic prediction accuracy in barley. Crop science, 55(6):2657-
2667.  
Lorenz, K. and B. A. Cohen. 2012. Small-and large-effect quantitative trait locus interactions underlie 
variation in yeast sporulation efficiency. Genetics, 192(3):1123-1132. 
Ly, D., M. Hamblin, I. Rabbi, G. Melaku, M. Bakare, H. G. Gauch, R. Okechukwu, A. G. 
Dixon, P. Kulakow, and J. L. Jannink. 2013. Relatedness and genotype× environment 
interaction affect prediction accuracies in genomic selection: a study in cassava. Crop 
Science, 53(4):1312-1325. 
Michel, S., C. Kummer, M. Gallee, J. Hellinger, C. Ametz, B. Akgol, D. Epure, H. Gungor, 
F. Loschenberger, H. Buerstmayr. 2018. Improving the baking quality of bread wheat by genomic 
selection in early generations. Theor. Appl. Genet. (131): 477–493. 
Milligan, S.B., M. Balzarini, K. A. Gravois, and K. P. Bischoff. 2007. Early stage sugarcane 
selection using different plot sizes. Crop science, 47(5):1859-1864. 
 This article is protected by copyright. All rights reserved. 
Mohammadi M., T. Tiede, and K. P. Smith.2015. "PopVar: A genome-wide procedure for 
predicting genetic variance and correlated response in biparental breeding 
populations." Crop Science 55(5):2068-2077.Ozimati, A., R. Kawuki, W. Esuma, S. I. 
Kayondo, A. Pariyo, M. Wolfe, and J. L. Jannink. 2019. Genetic Variation and Trait 
Correlations in an East African Cassava Breeding Population for Genomic Selection. 
Crop Science. 
Park, Y.J., H.J., Lee, K.J., Kwak, K., Lee, S.W. Hong, and H. Kang2014. MicroRNA400-
guided cleavage of pentatricopeptide repeat protein mRNAs renders Arabidopsis thaliana 
more susceptible to pathogenic bacteria and fungi. Plant and Cell Physiology, 55(9):1660-
1668. 
Raya, M.D., S. C. Jeremiah, and J. Legg. 1993. African Cassava Mosaic Virus (ACMV) and 
Brown Streak survey in Tanzania.  
Robertsen, C.D., R.L. Hjortshøj, and L. L. Janss, 2019. Genomic Selection in Cereal 
Breeding. Agronomy, 9(2):95.  
Rutkoski, J., R. P. Singh, J. Huerta-Espino, S. Bhavani, J. Poland, J. L. Jannink, J.L. and M. 
E. Sorrells. 2015. Efficient use of historical data for genomic selection: a case study of 
stem rust resistance in wheat. The Plant Genome,8(1). 
Rwegasira, G. M. and C. M. Rey, 2012. Response of selected cassava varieties to the 
incidence and severity of cassava brown streak disease in Tanzania. J Agric Sci, 4(7):237-
245. 
Salvador, E.M., V. Steenkamp, and C. M. E. McCrindle, C.M., E. 2014. Production, 
Consumption and Nutritional Value of Cassava (Manihot esculenta, Crantz) in 
Mozambique: An Overview, 6(3):29-38. 
Shi, G., Z., Zhang, T.L. Friesen, D. Raats, T. Fahima, R. S. Brueggeman, S. Lu, H. N. Trick, 
Z. Liu, W. Chao, and Z. Frenkel. 2016. The hijacking of a receptor kinase–driven 
pathway by a wheat fungal pathogen leads to disease. Science advances, 2(10):1600822. 
Song, J., B.F. Carver, C. Powers, L. Yan, J. Klápště, Y. A. El-Kassaby, and C. Chen. 2017. 
Practical application of genomic selection in a doubled-haploid winter wheat breeding 
program. Molecular breeding, 37(10):117. 
Song, Wen-Yuan, Guo-Liang Wang, Li-Li Chen, Han-Suk Kim, Li-Ya Pi, Tom Holsten, J. 
Gardner et al. 1995. A receptor kinase-like protein encoded by the rice disease resistance 
gene, Xa21." Science, 270(5243):1804-1806. 
Storlie E. and G. Charmet. 2013. Genomic selection accuracy using historical data generated 
in a wheat breeding program. The Plant Genome.  
VanRaden, P.M. 2008. Efficient methods to compute genomic predictions. Journal of dairy 
science, 91(11):4414-4423. 
Vazquez, A.I., D.M. Bates, G.J.M. Rosa, D. Gianola and K.A Weigel. 2010. An R package 
for fitting generalized linear mixed models in animal breeding 1. Journal of animal 
science, 88(2):497-504. 
Wei. T. 2013. Corrplot: Visualization of correlation matrix. R package version 0.2. 
Wickham, H., W. Chang, and M.H. Wickham, 2016. Package „ggplot2‟. Create Elegant Data 
Visualisations Using the Grammar of Graphics. Version, 2(1):1-189. 
Wolfe, M.D., D.P. Del Carpio, O. Alabi, L.C. Ezenwaka, U.N. Ikeogu, I.S. Kayondo, R. 
Lozano, U.G. Okeke, A.A. Ozimati, E. Williams, C. Egesi, R.S. Kawuki, P. Kulakow, 
 This article is protected by copyright. All rights reserved. 
I.Y. Rabbi, and J.-L. Jannink. 2017. Prospects for Genomic Selection in Cassava 
Breeding. Plant Genome. 
Wolfe, M.D., I.Y. Rabbi, C. Egesi, M. Hamblin, R. Kawuki, P. Kulakow, R. Lozano, D.P. 
Del Carpio, P. Ramu, and J. L. Jannink.2016. Genome-Wide Association and Prediction 
Reveals Genetic Architecture of Cassava Mosaic Disease Resistance and Prospects for 
Rapid Genetic Improvement. Plant Genome 9. 
Yang, J. S. Hong Lee, M. E. Goddard, and P. M. Visscher. 2011. GCTA: a tool for genome-
wide complex trait analysis. The American Journal of Human Genetics 88, (1):76-82. 
Zhong, S., J.C. Dekkers, R. L. Fernando, and J.L. Jannink. 2009. Factors affecting accuracy 
from genomic selection in populations derived from multiple inbred lines: a barley case 
study. Genetics, 182(1):355-364. 
 
 
 
 
 
 
 
 
 
 
 
  
 This article is protected by copyright. All rights reserved. 
Figure Legends 
Figure 1. PCA plot of the TARI clones alone (A) and the combined TARI and UG clones (B) based 
on SNP marker matrix and the loadings values (eigenvector coefficients) for each marker on PC1 
against marker positions on all cassava chromosomes for TARI alone (C) and combined TARI and 
UG (D). For TARI PCA, 121K SNPs were used and for combined PCA, we used 37K SNPs common 
to both TARI and UG. For both cases, marker loadings were based on common markers (37K) 
 
  
 This article is protected by copyright. All rights reserved. 
Figure 2. Phenotypic (left) and genetic (right) correlations between nine traits in Cassava using 
combined TARI data. Blue and red colors represent positive and negative correlations respectively. 
The strength of trait relationships is depicted by the intensity of the color. Cells with correlation 
values that are not significant at P-value < 0.05 were left blank. 
 
  
 This article is protected by copyright. All rights reserved. 
Figure 3. The Manhattan (A) and quantile–quantile (B) plots from mixed linear models summarizing 
genome-wide association results for three significant traits in all subsets of combined TARI 
population. The QQ plot demonstrate the differences between various population structure controls. 
CET=clonal evaluation trial, PYT=preliminary yield trial, AYT=advanced yield trial, K3 & K4 
=cluster 3 and 4 generated using kinship relationship matrix. The horizontal solid line indicates the 
genome-wide Bonferroni significance threshold (−log10 (P-value) = 7.27), the dashed line indicates 
FDRpt1 and Dotted line indicate=FDRpt2. Additional detail for significant markers, Pvalue, and 
pFDR values are detailed in supplementary File 1 (Sheet 1). Additional plot for all the other traits 
below threshold are in Supplementary Figure 3. 
 
  
 This article is protected by copyright. All rights reserved. 
Table 1. The trial structure for the materials used for training population. Reps are the number of 
replicates in each trial. The number of individual clones in each of the three replicates after 
sprouting are represented by Rep1, Rep2, and Rep3.  
 
 
Trial 
 
 
Stage 
 
 
Location 
 
 
No. Reps 
Clones/Rep  Plot Area (M2)  
 
Rep1 
 
Rep2 
 
Rep3 
  
Planted 
 
Harvested 
CET1 CET Ukiriguru 2 108 91    5 5 
CET2 CET Ukiriguru 2 213 214    5 5 
PYT1 PYT Ukiriguru 2 96 81    10 10 
PYT1 PYT Kibaha 2 64 65    14 14 
PYT2 PYT Kibaha 2 74 77    14 14 
PYT3 PYT Kibaha 2 45 48    14 14 
PYT4 PYT Kibaha 2 21 23    14 14 
PYT5 PYT Kibaha 2 24 23    14 14 
PYT6 PYT Kibaha 2 15 15    14 14 
PYT7 PYT Kibaha 2 20 19    14 14 
PYT8 PYT Kibaha 2 55 51    14 14 
PYT9 PYT Kibaha 2 58 63    14 14 
AYT1 PYT Kibaha 2 26 25    42 20 
AYT2 PYT Kibaha 3 12 12 12  42 20 
AYT3 PYT Kibaha 3 16 13 15  42 20 
AYT4 PYT Kibaha 2 23 23    42 20 
AYT5 PYT Kibaha 2 25 20    42 20 
AYT6 PYT Kibaha 2 21 21    42 20 
 
 
Table 2. Heritability, genetic and residual variance components for four plot sizes 
 This article is protected by copyright. All rights reserved. 
Plot Size 
Variance 
Componen
t 
MCMD
S 
MCBSD
S 
CBSDR
S 
MCGM
S 
RTNO 
SHTW
T 
HI FYLD DM 
5m2 
(CETUKG
) 
σ2G 0.042 0.103 0.097 0.009 
2195
0 
12.71 
0.00
5 
11.01 7.11 
σ2TR 0.000 0.002 0.000 0.184 9516 9.10 
0.00
5 
0.46 1.69 
σ2e 0.049 0.121 0.215 0.142 
7192
4 
28.66 
0.02
0 
25.12 15.20 
H2 0.46 0.46 0.31 0.06 0.23 0.31 0.20 0.30 0.32 
10m2 
(PYTUKG
) 
 
σ2G 0.050 0.114 0.008 0.037 
2205
9 
0.215 
0.00
3 
1.80 3.72 
σ2TR 0.000 0.000 0.005 0.000 
1571
8 
4.155 
0.00
3 
0.06 2.67 
σ2e 0.005 0.097 0.129 0.131 
7714
2 
10.605 
0.01
7 
5.80 16.84 
H2 0.91 0.54 0.06 0.22 0.22 0.02 0.15 0.24 0.18 
14m2 
(PYTKIB) 
σ2G 0.349 0.077 0.042 0.029 
1376
3 
14.89 
0.00
4 
50.09 5.23 
σ2TR 0.026 0.018 0.03 0.013 9203 28.08 
0.00
2 
24.69 1.74 
σ2e 0.432 0.100 0.21 0.118 
4391
1 
105.48 
0.01
8 
129.2
1 
12.28 
H2 0.43 0.39 0.14 0.18 0.24 0.10 0.17 0.25 0.27 
20m2 
(AYTKIB) 
σ2G 0.28 0.083 0.031 0.051 4760 36.00 
0.00
4 
59.84 2.0885 
σ2TR 0.04 0.020 0.001 0.036 6688 70.08 
0.00
3 
66.79 0.9732 
σ2e 0.30 0.122 0.121 0.117 
5530
3 
197.90 
0.00
1 
211.4
5 
10.252
6 
H2 0.44 0.38 0.20 0.25 0.08 0.12 0.80 0.18 0.16 
σ2G =Genetic variance among clones, σ
2
TR=Variance among reps in trials, σ
2
e=Residual error variance, 
H2 = plot-based broad-sense heritability estimates. CETUKG=Clonal Evaluation Trial at Ukiriguru, 
 This article is protected by copyright. All rights reserved. 
PYTUKG=Preliminary Yield Trial at Ukiriguru, PYTKIB=Preliminary Yield Trial at Kibaha; and 
AYTKIB=Advanced Yield Trial at Kibaha. 
 
Table 3. Summary of cross-validated prediction accuracies by trait and breeding stages 
Trait CETUKG PYTUKG PYTKIB AYTKIB Mean 
MCMDS 0.11ns 0.06ns 0.17ns 0.32ns 0.14 
MCBSDS 0.26* 0.33ns 0.08ns 0.24ns 0.23 
CBSDRS 0.26ns 0.10ns 0.09ns 0.10ns 0.14 
MCGMS 0.22* 0.09ns 0.21ns 0.43* 0.24 
RTNO 0.28* 0.27ns 0.16ns 0.03ns 0.19 
SHTWT 0.22ns 0.03ns 0.12ns 0.08ns 0.11 
HI 0.22* 0.38* 0.13ns 0.19n 0.23 
FYLD 0.36* 0.35* 0.10ns 0.09ns 0.23 
DM 0.14ns 0.18ns 0.28* 0.43* 0.26 
Mean 0.23 0.20 0.15 0.20 0.19 
ns = non-significant prediction accuracies (r), * accuracy significantly different from zero (P ≤ 0.05). 
MCMDS=mean cassava mosaic disease severity, MCBSDS=mean cassava brown streak disease 
severity, CBSDRS=mean Cassava brown streak disease root severity, MCGMS=Mean Cassava green 
mite severity, RTNO=root number, SHTWT=shoot weight, HI=harvest index, FYLD=Fresh root yield, 
DM=Dry matter. 
 Publisher: AGRONOMY; Journal: AGROJNL:Agronomy Journal; Copyright: Will notify... 
Volume: Will notify...; Issue: Will notify...; Manuscript: aj-2017-08-0123-a; DOI: ; PII: <txtPII> 
TOC Head: ; Section Head: ; Article Type: ARTICLE 
This article has been accepted for publication and undergone full peer review but has not been 
through the copyediting, typesetting, pagination and proofreading process, which may lead to 
differences between this version and the Version of Record. Please cite this article as doi: 
10.1002/csc2.20003. 
This article is protected by copyright. All rights reserved. 
Page 40 of 42 
Table 4. Average prediction trait accuracies for cross-location and combined TARI  
Trait TARI UKG to UKG KIB to UKG KIB to KIB UKG to KIB 
MCMDS  0.15* 0.09 0.02ns 0.18* 0.06ns 
MCBSDS  0.23* 0.25 0.11* 0.08ns 0.09ns 
CBSDRS 0.09* 0.21 0.16* 0.05ns 0.10* 
MCGMS  0.25* 0.22 0.23* 0.23* 0.25* 
RTNO  0.16* 0.24 0.08ns 0.09ns 0.11* 
SHTWT  0.11ns 0.22 0.19ns 0.09ns 0.01ns 
HI  0.16ns 0.18 0.15* 0.11* 0.12* 
FYLD 0.18* 0.29 0.13* 0.08ns 0.05ns 
DM  0.23* 0.13 0.16* 0.28* 0.24* 
Mean 0.17 0.20 0.14 0.13 0.10 
ns = non-significant prediction accuracies, * accuracy significantly different from zero (P ≤ 0.05). The 
values in the bracket are the accuracies for within the location prediction. 
Table 5. Summary of cross-validated prediction accuracies by trait and cross programs 
Scenario 
MCMD
S 
MCBSD
S 
CBSDR
S 
RTN
O 
SHTW
T 
HI FYLD DM 
Mea
n 
KIB_to_KIB 0.19ns 0.16ns 0.14ns 
0.16n
s 
0.15ns 
0.12n
s 
0.09n
s 
0.29* 0.16 
UKG_to_UKG 0.01ns 0.28* 0.23* 0.26* 0.20* 
0.17n
s 
0.30* 
0.13n
s 
0.20 
KIB+UKG_to_KIB 0.17ns 0.14ns 0.14ns 
0.16n
s 
0.14ns 
0.14n
s 
0.09n
s 
0.30* 0.16 
 This article is protected by copyright. All rights reserved. 
UKG+KIB_to_UKG 0.01ns 0.28* 0.18ns 0.27* 0.14ns 0.19* 
0.08n
s 
0.15n
s 
0.16 
KIB+UG_to_KIB 0.20* 0.07ns 0.06ns 
0.14n
s 
0.08ns 
0.15n
s 
0.08n
s 
0.27* 0.13 
UKG+UG_to_UKG 0.02ns 0.30* 0.15ns 0.24* 0.14ns 0.19* 0.26* 
0.15n
s 
0.18 
KIB+UKG+UG_to_KI
B 
0.17ns 0.11ns 0.06ns 
0.18n
s 
0.10ns 
0.15n
s 
0.08n
s 
0.28* 0.14 
UKG+ 
KIB+UG_to_UKG 
-0.01ns 0.31* 0.17* 0.26* 0.12ns 0.20* 
0.11n
s 
0.17n
s 
0.17 
Mean 0.09 0.21 0.14 0.21 0.13 0.16 0.14 0.22 0.16 
Table 6. Significant markers associated with MCMDS, MCBSDS, and CBSDRS resistance detected in 
TARI TP 
QTL Trait 
C
hr
. 
Reg
ion 
(Mb
s) 
Tag-
SNP 
Positi
on 
All
ele 
Fr
eq 
b 
Log10P
value 
PVE 
(%)  
Popula
tion 
QTL-
cbsdrs2-1 
CBSDRS 2 
9.15
-
9.16 
S2_9258
334 
92583
34 
A/
C 
0.1
5 
0.
11 
6.843 8 UKG 
QTL-
cbsdrs3-1 
CBSDRS 3 
10.1
6 
S3_1016
5675 
10165
675 
A/
G 
0.0
5 
0.
20 
6.137 3 
AYTKI
B 
QTL-
cbsdrs8-1 
CBSDRS 8 1.74 
S8_1736
3721 
17363
721 
A/
G 
0.0
5 
0.
21 
6.35 32 
AYTKI
B 
QTL-
cbsdrs10-1 
CBSDRS 10 0.16 
S10_160
775 
16077
5 
G/
A 
0.0
8 
0.
16 
5.24 11 
CETU
KG 
QTL-cbsd9-
1 
MCBSDS 9 
10.7
1 
S9_1070
7044 
10707
044 
G/
A 
0.5 
0.
57 
6.457 9 
PYTKI
B 
QTL-
cbsd11-1 
MCBSDS 11 
22.8
8-
22.9
4 
S11_229
42418 
22942
418 
A/
G 
0.4
5 
0.
08 
6.01 5 UKG 
QTL-
cmds16-1 
MCMDS 16 0.42 
S16_421
670 
42167
0 
T/
C 
0.4
9 
1.
86 
9.844 87 
PYTU
KG 
QTL-
cmds12-1 
MCMDS 12 
6.3-
8.7 
S12_727
0219 
72702
19 
T/
C 
0.4
9 
0.
24 
11.246 
14 - 
37 
TARI 
QTL-
cbsd4/cmd-
MCBSDS|
4 
24.6 S4_2467 24670 T/ 0.2 0.
5.441 14 
K4_Clu
 This article is protected by copyright. All rights reserved. 
1 MCMDS 7 0203 203 G 4 14 ster1 
QTL-
cbsd12/cmd
-1 
MBSDS|M
CMDS 
12 0.93 
S12_929
320 
92932
0 
T/
G 
0.0
6 
0.
13 
5.941 37|8 
PYTKI
B 
QTL-
cbsd12/cmd
-2 
MBSDS|M
CMDS 
12 
7.93
-
7.95 
S12_792
9439 
79294
39 
G/
C 
0.3
7 
0.
23 
8.749 8|33 Several