The Crop Journal 12 (2024) 558–568
Contents lists available at ScienceDirect

The Crop Journal

journal homepage: www.keaipubl ishing.com/en/ journals / the-crop- journal /
Genome-wide association mapping and genomic prediction of stalk rot
in two mid-altitude tropical maize populations
https://doi.org/10.1016/j.cj.2024.02.004
2214-5141/� 2024 Crop Science Society of China and Institute of Crop Science, CAAS. Production and hosting by Elsevier B.V. on behalf of KeAi Communications
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

⇑ Corresponding authors.
E-mail addresses: chunpingw@haust.edu.cn (C. Wang), d.thanda@cgiar.org

(T. Dhliwayo).
Junqiao Song a,b,c, Angela Pacheco b, Amos Alakonya b, Andrea S. Cruz-Morales b, Carlos Muñoz-Zavala b,
Jingtao Qu d, Chunping Wang a,⇑, Xuecai Zhang b, Felix San Vicente b, Thanda Dhliwayo b,⇑
aCollege of Agronomy, Henan University of Science and Technology, Luoyang 471000, Henan, China
b International Maize and Wheat Improvement Center (CIMMYT), El Batan, Mexico
cAnyang Academy of Agricultural Sciences, Anyang 455000, Henan, China
dCIMMYT-China Specialty Maize Research Center, Crop Breeding, and Cultivation Research Institute, Shanghai Academy of Agricultural Sciences, Shanghai, China

a r t i c l e i n f o
Article history:
Received 3 September 2023
Revised 13 February 2024
Accepted 18 February 2024
Available online 11 March 2024

Keywords:
Maize stalk rot
Genome-wide association mapping
Haplotype analysis
Genomic prediction
G � E interaction
a b s t r a c t

Maize stalk rot reduces grain yield and quality. Information about the genetics of resistance to maize
stalk rot could help breeders design effective breeding strategies for the trait. Genomic prediction may
be a more effective breeding strategy for stalk-rot resistance than marker-assisted selection. We per-
formed a genome-wide association study (GWAS) and genomic prediction of resistance in testcross
hybrids of 677 inbred lines from the Tuxpeño and non-Tuxpeño heterotic pools grown in three environ-
ments and genotyped with 200,681 single-nucleotide polymorphisms (SNPs). Eighteen SNPs associated
with stalk rot shared genomic regions with gene families previously associated with plant biotic and abi-
otic responses. More favorable SNP haplotypes traced to tropical than to temperate progenitors of the
inbred lines. Incorporating genotype-by-environment (G � E) interaction increased genomic prediction
accuracy.

� 2024 Crop Science Society of China and Institute of Crop Science, CAAS. Production and hosting by
Elsevier B.V. on behalf of KeAi Communications Co., Ltd. This is an open access article under the CC BY-NC-

ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
1. Introduction

Stalk rot is one of the most destructive diseases of maize, caus-
ing yield losses ranging from 5% to 100% [1]. It is an endemic dis-
ease in Mexico, particularly in the central states, including
Guanajuato, which ranks among the five top maize-producing
states in Mexico [2]. The pathogens causing stalk rot are complex
and varied, making their identification difficult. Many mycotoxin-
producing fungi, including Fusarium verticillioides (Fv), Fusarium
graminearum, syn. Gibberella (Fg), Colletotrichum graminicola
(Anthracnose), Stenocarpella maydis (Diplodia), Acremonium strictum
(Cephalosporium), Macrophomina phaseolina, and Pythium aphani-
dermatum, alone or in combination may cause severe stalk rot
[1]. These pathogens block vascular bundles, resulting in prema-
ture plant death [3]. In Mexico, Fv and Fg are the most common
and have been reported in Guanajuato [4]

Although some management practices can reduce the incidence
of stalk rot, there is no effective chemical control for the disease
[1]. Breeding resistant cultivars is an objective of maize breeding
programs where the disease is endemic. However, stalk rot pheno-
typing is laborious, and the trait tends to have high spatial variabil-
ity and genotype-by-environment (G � E) interaction. Identifying
genomic regions associated with stalk rot is essential to develop
DNA markers for marker-assisted selection or to inform genomic-
assisted breeding strategies for the trait.

Classical quantitative genetics and quantitative trait locus (QTL)
mapping studies of stalk rot [5–9] have shown that the trait is
quantitative and controlled by multiple loci with minor effects.
QTL with major effects on stalk rot have also been reported, includ-
ing qRfg1, qRfg2, and qRfg3 associated with resistance to Gibberella
stalk rot [10–13]. These QTL have been fine-mapped: qRfg1 was
mapped to a 500 kb contig on chromosome 10, qRfg2 to a 300 kb
contig on chromosome 1, and qRfg3 to a 350 kb contig on chromo-
some 3 [10–13]. A major QTL, Rcg1 for Anthracnose stalk rot on
chromosome 4 has been cloned [14]. Two major QTL, RpiX178-1
and RpiX178-2, associated with resistance to Pythium stalk rot
have been mapped to chromosomes 1 and 10 [15,16].

Genetic mapping studies of stalk rot resistance are needed to
elucidate the genetic architecture of the trait in tropical germplasm
and environments. Most maize stalk rot mapping studies have
Co., Ltd.

http://crossmark.crossref.org/dialog/?doi=10.1016/j.cj.2024.02.004&domain=pdf
http://creativecommons.org/licenses/by-nc-nd/4.0/
https://doi.org/10.1016/j.cj.2024.02.004
http://creativecommons.org/licenses/by-nc-nd/4.0/
mailto:chunpingw@haust.edu.cn
mailto:d.thanda@cgiar.org       
https://doi.org/10.1016/j.cj.2024.02.004
http://www.sciencedirect.com/science/journal/22145141
http://www.keaipublishing.com/en/journals/the-crop-journal/


J. Song, A. Pacheco, A. Alakonya et al. The Crop Journal 12 (2024) 558–568
been conducted in recombinant inbred line (RIL), backcross, and
F2-derived F3 populations using simple sequence repeat and
chip-based SNP markers [11,13,15]. Testcross populations have
been used to map other maize traits, including grain yield but
not stalk rot, and we are not aware of any studies conducted using
SNP markers derived from genotyping-by-sequencing to map stalk
rot. Most stalk rot mapping studies have been conducted in tem-
perate germplasm, with few studies conducted in the tropics.
Genomic regions detected in temperate germplasm may not coin-
cide with those in tropical germplasm owing to G � E interaction,
genetic background (epistasis), pathogen diversity, and possible
lack of intraspecific genetic collinearity—when the order of genes
on a chromosome is not maintained or some genes are missing
in some individuals within a species [17].

An alternative to QTL mapping and marker-assisted selection is
genomic selection. Genomic selection (GS) assumes that each mar-
ker is associated with minor effects and uses all markers to calcu-
late a breeding value for each selection candidate [18]. In GS, a
training set, which is a set of genotyped and phenotyped individu-
als, is used to estimate the effects of the markers. The estimated
marker effects are then used to predict the genetic merit or geno-
mic estimated breeding values (GEBVs) of individuals in the pre-
diction set consisting of individuals that have been genotyped
but not phenotyped. The prediction accuracy, which is the correla-
tion between the GEBVs and true breeding values, is affected by
many factors, including training population size, G � E interaction,
heritability of the trait, and the genetic relationship between the
training and the prediction sets [19]. Although the genetic archi-
tecture of stalk rot [5–9] suggests that genomic selection could
be used to improve resistance in maize, its effectiveness and fac-
tors affecting prediction accuracy have not been investigated.

In this study, we used testcross data of 667 inbred lines evalu-
ated for stalk rot under natural disease inoculation in three tropical
environments to conduct a genome-wide association study
(GWAS) to identify genomic regions and putative candidate genes
in the regions associated with stalk rot. We used the same dataset
to evaluate the effectiveness of genomic selection using models
with and without G � E and varying the training population size
relative to the total population. The specific objectives of this study
were to i) identify genomic regions and putative candidate genes
associated with resistance to stalk rot in tropical germplasm and
environments, ii) for each region, identify favorable haplotypes
and their sources, iii) and assess the effectiveness of genomic pre-
diction for stalk rot in two tropical maize populations under selec-
tion in a hybrid breeding program.
2. Materials and methods

2.1. Plant materials

Of 677 inbred lines in the first year of testing in the CIMMYT
mid-altitude tropical breeding program, 381 were from the Tux-
peño and 296 from the non-Tuxpeño heterotic groups (Tables S1,
S2). CIMMYT’s heterotic groups and their alignment with known
maize germplasm groups have been described by Guo et al. [20].
The Tuxpeño subset was derived from 14 tropical and 15 temper-
ate inbreds with expired U.S. plant variety protection (exPVP). The
non-Tuxpeño subset comprised lines derived from 25 tropical and
18 temperate exPVP lines. More details about the background of
the lines are presented in Tables S1, S2, and S3.

Each population was crossed to one heterotic tester for trait
evaluation. The Tuxpeño lines were crossed to CSL1663, a line
derived from CML444, and the non-Tuxpeño lines were crossed
to CML312, an important line for the mid-altitude tropics in sub-
Saharan Africa and Mexico. The testcross hybrids were made at
559
CIMMYT’s Tlaltizapán research station in Morelos state, Mexico,
during the 2020 summer (May–November) season.

2.2. Field trials and experimental design

The 677 testcross hybrids and checks were subdivided into
seven trial sets of 84 to 108 entries. The testcross hybrids were
subdivided into smaller trials to facilitate field layout and control
spatial field variation. The experimental design for each trial was
an alpha lattice (0, 1) with two replications. The plot field row
and column coordinates were recorded. The seven trials were each
evaluated at three locations during summer 2021 in Guanajuato
state: Cortazar (20�2600000N, 100�5603000W; 1736 masl), Valle de
Santiago (20�2303400N, 101�1102900W; 1717 masl), and Juventino
Rosas (20�4100000N, 100�5900000W; 1847 masl). Each plot consisted
of two rows, each 4 m long, with 0.75 m between rows and
0.16 m between plants within a row for a density of about
93,000 plants ha�1. During the 2021 summer season, stalk rot
symptoms were observed, and laboratory assays showed that the
predominant pathogens in these fields were Fv and Fg.

For each plot, the number of plants with stalk rot infection was
recorded at about 8 weeks after anthesis, corresponding to physi-
ological maturity. A plant was classified as having stalk rot infec-
tion when it was dead and failed either of two tests: the pinch
test, when the stalk is crushed when pinched between the lowest
two internodes; and the push test, when the plant does not snap
back to vertical position when it is pushed to an angle of 30 from
vertical [21,22]. The number of infected plants was expressed as
a percentage of the total number of plants per plot. Grain yield (t
ha�1), percent grain moisture, plant height (cm), number of days
to anthesis, and test weight (kg 100 L�1) were also recorded.

2.3. Phenotypic data analysis

The Tuxpeño and non-Tuxpeño populations were analyzed sep-
arately. For each population, two models were compared for anal-
ysis of variance (ANOVA) of the phenotypic data: one based on the
alpha-lattice design and the other based on the field row-column
coordinates. ANOVA was conducted according to the alpha-lattice
design and using the field rows and columns to adjust for field spa-
tial variation. The field row-column model had a lower Bayesian
information content (BIC) value [23] and therefore had better fit
to the data than the alpha-lattice model. The row-column model
was fitted to the phenotypic data as follows:

Yijklmn ¼ lþ Ei þ Gj þ Tk þ RlðjkÞ þ pmðljkÞ þunðljkÞ þ GEij þ eijklm ð1Þ

where Yijklmn is the phenotype of the jth (j = 1,. . .,J) genotype (inbred
line) tested in the ith (i = 1,. . .,I) environment, kth (k = 1,. . .,K) trial, lth

(l = 1,. . .,L) replication nested in the ith environment and kth trial,
mth (m = 1,. . .,M) row and nth (n = 1,. . .,N) column both nested in
the lth replication of the kth trial and ith environment; l is the overall
mean, Ei is the effect of the ith environment, Gj is the effect of the jth

genotype, Tk is the effect of the kth trial, RlðjkÞ is the effect of the lth

replication nested in the ith environment and kth trial, and GEij is
the interaction between the ith environment and the jth genotype.
The effects pmðjklÞ and unðljkÞ are for the mth row and nth column,
respectively, both nested in the ith environment, kth trial, and lth

replication. The model fitted to the data for each environment can
be obtained by dropping the environment effect in the models
described above. The best linear unbiased estimate (BLUE) for each
trait was calculated assuming genotypes as fixed and the other fac-
tors as random. Variance components from the mixed models were
calculated by restricted maximum likelihood method (REML) with

the lmer function of the lme4 package in R [24]. Heritability (h2Þ
was then calculated as:


J. Song, A. Pacheco, A. Alakonya et al. The Crop Journal 12 (2024) 558–568
h2 ¼ r2
g

r2
g þ

r2
ge
e þ r2

e
re

ð2Þ

where r2
g is genetic variance, r2

ge is G � E interaction variance, r2
e is

the residual variance, e is the number of environments, and r is the
number of replications within each environment. Correlation coef-
ficients among the five traits were calculated based on the least-
squares means across environments.

2.4. Genotyping

The 677 inbred lines and their parents were grown in a green-
house at CIMMYT in Texcoco, state of Mexico, Mexico. Leaf tissue
was collected from 10 plants of each line, and genomic DNA was
extracted from bulked young leaves using the CTAB method [25].
All 737 lines were genotyped at the Biotechnology Center–DNA
Sequencing Facility, University of Wisconsin-Madison, Wisconsin,
USA, using genotyping by sequencing (GBS). The DNA was digested
with ApeKI restriction enzyme and sequenced with an Illumina
NovaSeq600 instrument [26].

The GBS reads were anchored to the B73 reference genome
using the GBS 2.7 TOPM (tags on physical map) file retrieved from
Panzea (www.panzea.org), and the SNPs were called using the TAS-
SEL 5.0 [27] SNP calling pipeline. In total, 955,690 SNPs were
called, including 955,120 SNPs mapped to the 10 maize chromo-
somes and 570 SNPs that could not be mapped to a chromosome.

The SNP genotypes of the lines and progenitors were filtered
and imputed before separating 677 lines into Tuxpeño and non-
Tuxpeño populations. Markers with > 50% missing data, minor
allele frequency < 0.05, and heterozygosity > 5% were filtered out
before imputing using the LD KNNi method [28] with the default
parameters in TASSEL. After filtering and imputing, 200,681 high-
quality SNPs remained and were used for GWAS.

2.5. GWAS and candidate genes

The genetic structure of Tuxpeño and non-Tuxpeño populations
and all 737 lines was estimated by principal component analysis
(PCA) in TASSEL. For each population, the first two principal com-
ponents were plotted using the ggplot2 package in R [29]. The first
three principal components were retained to account for popula-
tion structure. The n � n pairwise matrix of kinship coefficient
(K) and linkage disequilibrium (LD) decay were also computed
for each population using TASSEL.

GWAS analysis was performed using TASSEL based on a linear
mixed linear model:

y ¼ Xbþ Zuþ e ð3Þ
where y is the vector of phenotypic observations (BLUEs); b repre-
sents unknown vectors of fixed effects, including them� 1 vector of
SNP markers tested and p � 1 vector of the first p principal compo-
nents explaining population structure; X and Z are design matrices
containing variables intended to explain the observed phenotypic
data; u is an unknown vector of additive genetic effects; and e is
a vector of random residuals. The variance of u was estimated as
Var uð Þ ¼ Kr2

a , where K is the n � n pairwise matrix of kinship
among the inbreds and r2

a is the additive genetic variance, with
e � N 0; Ir2

� �
. GWAS analysis was performed in the Tuxpeño and

non-Tuxpeño populations separately using the 200,681 imputed
SNPs and the BLUEs for each environment and across environments.
The negative logarithm of the probability, �log10(P), that an SNP
and stalk rot are associated by random chance was plotted against
the chromosome position of each SNP to produce a Manhattan plot;
the observed � log10(P) was plotted against the expected �log10(P)
to produce a quantile–quantile (Q–Q) plot. Both the Manhattan and
560
Q–Q plots were produced using the Cmplot package in R [30]. The
genome-wide statistical significance threshold was determined for
each population using the algorithm proposed by Li et al. [31] and
implemented in the Genetic type 1 error calculator (version 1.0)
tool. Thresholds of P < 1.23 � 10�5 in the Tuxpeño and P < 1.03 �
10�5 in the non-Tuxpeño population were adopted to maintain a
genome-wide a = 0.05 and used to declare significant SNP–trait
associations. The distance in kb flanking a significant SNP over
which LD decays to r2 < 0.2 was defined as a genomic region asso-
ciated with stalk rot resistance, and genes within this region were
considered candidate genes. Candidate genes were identified and
annotated on the MaizeGDB website (https://www.maizegdb.org)
using the B73 version 2 reference genome (https://ensembl.gra-
mene.org/Zea_mays).

2.6. Haplotype analysis in genomic regions associated with stalk rot

LD blocks in genomic regions containing SNPs associated with
stalk rot were identified using LDBlockShow software [32] via stan-
dardized disequilibrium coefficients (D0) [33]. The favorable haplo-
type of each LD block was the sequence associated with the lowest
value of stalk rot in the individual and combined environments.
The relative frequency of each favorable haplotype was calculated
separately for the Tuxpeño and non-Tuxpeño populations and their
tropical and temperate progenitors. The effect of each favorable
haplotype in each population was calculated as the average pheno-
typic deviation of the individuals carrying the favorable haplotype
relative to the population mean.

2.7. Genomic prediction

Genomic prediction for stalk rot was conducted using models
described by Jarquin et al. [34] and Mageto et al. [35]. Phenotypic
data analysis to calculate the BLUEs was based on the baseline
model:

Yij ¼ lþ Ei þ Gj þ GEij þ eij ð4Þ

where Yij is the response of the jth(j = 1,. . .,J) genotype tested in the

ith (i = 1,. . .,I) environment, l is the overall mean, Ei is the random

environmental main effect Ei �iid Nð0;r2
EÞ

h i
, Gj is the random geno-

type effect Gj �iid N 0;r2
G

� �h i
, GEij is the random interaction between

the jth genotype and the ith environment GEij �iid N 0;r2
GE

� �h i
, and eij

is the random residual eij �iid Nð0;r2
e Þ

h i
. In this model, Ei, Gj, GEij,

and eij are assumed normally distributed N(.,.), and have indepen-
dent and identically distributed responses (iid); r2

E, r2
G, r2

GE, and
r2

e are the variances for environment, genotype, G � E interaction,
and residual error, respectively. This baseline model does not
exploit the covariance among genotypes because the genotypes
were treated as independent outcomes. Models used for genomic
prediction were derived from the baseline model above by exclud-
ing terms, modifying assumptions, or incorporating marker infor-
mation. Below is a brief description of the genomic models:

Model 1 (Environment + Line) is obtained by retaining the first
three components from the baseline model (l, Ei, and Gj) while
their underlying assumptions remain unchanged. Model 1 uses
only the observed phenotypic values to predict the phenotypic val-
ues of genetically related lines.

Model 2 (Environment + Line + Marker) is derived from an alter-
native representation of the random main effect of line (Gj) in the
baseline model as a linear combination of markers and their corre-
sponding effects:

Yij ¼ lþ Ei þ Gj þ qj þ eij ð5Þ

http://www.panzea.org/
https://www.maizegdb.org/
https://ensembl.gramene.org/Zea_mays
https://ensembl.gramene.org/Zea_mays


J. Song, A. Pacheco, A. Alakonya et al. The Crop Journal 12 (2024) 558–568
where qj ¼
Pp

m¼1xjmbm, bm �iid N 0;r2
b

� �
represents the random effect

of the mth(m = 1,. . ., p) marker, xjm is the genotype of the jth line
at the mth marker, and r2

b is its corresponding variance. Thus,
q ¼ q1; � � � ; qJ

� �
represents the vector of marker genetic effects and

is normally distributed with mean zero and covariance matrix
Cov qð Þ ¼ Gr2

q , where G ¼ XX0
p is the genomic relationship matrix,

with X representing the centered and standardized marker matrix
such that r2

b ¼ r2
q [35]. The line effect was retained in the model

to account for imperfect information and model misspecification
because of potential imperfect linkage disequilibrium between
markers [36].

Model 3 (Environment + Line + Marker + Marker � Environmen
t) extends the Genomic Best Linear Unbiased Predictor (GBLUP)
random effect model by modeling the main effects of lines (geno-
types), markers, environments, and their interactions using covari-
ance structures that are functions of marker genotypes and
environments [36]. The model can be expressed as

Yij ¼ lþ Ei þ Gj þ qj þ qEij þ eij ð6Þ

where qEij is the random interaction between the genetic value of

the jth marker genotype and the ith environment.
The three models were fitted using the Bayesian generalized

linear regression (BGLR) R package [37,38]. Because the BGLR soft-
ware cannot handle heterogenous error variances, all models were
fitted assuming homogenous error variances across environments.
First, each model was fitted to the entire dataset for each popula-
tion using the R package BGLR to estimate variance components
and assess model fit based on the deviance information criterion
(DIC) [39]. Next, genomic prediction was performed using the
Gaussian model (Bayesian ridge regression), assuming Gaussian
priors for the marker effects with default parameters in BGLR.

Two random cross-validation schemes were used. The first
scheme (CV1) used an independent but related training dataset
to evaluate the prediction accuracies of models when the testing
set has not been evaluated in any environment. Thus, CV1 tests
the ability of the model to predict the breeding values of new lines
that were not used to train the model. The second scheme (CV2)
mimics unbalanced field trials (e.g., sparse testing designs) and
aims to predict the breeding values of genotypes that were not
tested in one or more environments. Thus, the goal of CV2 is to test
the ability of the model to predict the breeding values of individu-
als in environments they have not yet been tested. In CV2, predic-
tion accuracies can be improved by exploiting the covariance
among lines within an environment, lines across environments,
and correlated environments.

For both CV1 and CV2, a fivefold cross-validation was per-
formed to assess the prediction accuracy for stalk rot within each
population. The inbred lines and their corresponding SNP and phe-
notypic data were randomly divided into five subsets, using 80%
(four) training and 20% for validation. The permutations from the
random subdivisions led to five training and validation sets. The
procedure was repeated 20 times for each population, and the
mean correlation between the observed stalk rot breeding values
and the genomic estimated breeding values (GEBVs) was defined
as the prediction accuracy (rMP).

The effect of the size of the training set relative to the validation
set on the prediction accuracy was assessed using a series of ran-
dom selections of 70%, 60%, 50%, 40%, 30%, and 20% of the lines
as the training set to predict the breeding values of the rest of
the lines using the Model 3 and CV2. The effect of the genetic rela-
tionship between the training and the prediction sets on prediction
accuracy was assessed using either population as the training set
to predict the values of the other with Model 3 and CV2. In all
561
cases, predictions were based on 30,000 samples from the poste-
rior distribution and a burn-in of 15,000 samples.
3. Results

3.1. Phenotypic characteristics on stalk rot

The non-Tuxpeño population generally had a higher mean for
stalk rot in the individual and combined environments than the
Tuxpeño population. The mean stalk rot across environments for
non-Tuxpeño was about 10% higher than the Tuxpeño mean. The
means of both populations were highest at Cortazar and lowest
at Juventino Rosas (Table 1).

Heritability estimates for stalk rot were high (> 0.50) for both
populations at Cortazar and Valle de Santiago, and low (< 0.25)
at Juventino Rosas (Table 1). For non-Tuxpeño, the heritability
was � 0.01 at Juventino Rosas, indicating a lack of genetic variance
for stalk rot at this location. Consequently, Juventino Rosas was
excluded from further analyses for the non-Tuxpeño population,
leaving a heritability estimate of 0.76 (Table 1).

3.2. Correlation between stalk rot and other agronomic traits

The Tuxpeño and non-Tuxpeño populations differed for grain
yield, grain moisture, number of days to anthesis, and test weight
but not for plant height (Table S4). Compared with the non-
Tuxpeño population, the Tuxpeño population had higher grain
yield, lower grain moisture, and more days to anthesis (Table S4).
The pair-wise correlation coefficients among the six traits in Tux-
peño and non-Tuxpeño showed a moderate to strong negative cor-
relation between stalk rot and grain yield, grain moisture, and
plant height in both populations (Table S5). The correlation with
stalk rot in the Tuxpeño population ranged from r = �0.24
(P < 0.01) for plant height to r = �0.64 (P < 0.01) for grain moisture.
Grain moisture and grain yield were strongly and negatively corre-
lated (r < �0.51; P < 0.01) with stalk rot.

3.3. SNP summary statistics and population structure

The full dataset of 200,681 SNPs for the 737 lines had a missing
rate of 0.23%, an average heterozygosity of 1.44%, and an average
minor allele frequency (MAF) of 0.23 (Table S6). These summary
statistics were similar for the Tuxpeño and non-Tuxpeño popula-
tions, excluding the progenitors.

The first two principal components (PCs) explained respectively
23.8%, 17.2%, and 14.7% of the total SNP marker variance for Tux-
peño, non-Tuxpeño, and the complete set of 737 lines. The first
two PCs subdivided the populations into respectively five, three,
and four clusters associated with germplasm subgroups within
Tuxpeño, non-Tuxpeño, and the entire dataset (Fig. 1A–C). The first
two PCs also clearly separated the Tuxpeño and non-Tuxpeño het-
erotic groups among the 737 genotyped lines (Fig. 1C).

The average LD decay distance at r2 = 0.2 across the ten chromo-
somes was 1.27 kb for Tuxpeño, 0.94 kb for non-Tuxpeño, and
0.92 kb for the combined population (Fig. 1D–F).

3.4. Genomic regions associated with stalk rot resistance

Eighteen SNPs were associated with stalk rot in the two popu-
lations at P values ranging from 1.15 � 10�5 to 1.09 � 10�6

(Fig. 2A, B; Table 2). Eight of the 18 SNPs were detected in the Tux-
peño population on chromosomes 1, 3, 4, and 10, and 10 SNPs in
non-Tuxpeño on chromosomes 5, 6, and 7. The phenotypic vari-
ance explained by each SNP ranged from 6.20% to 9.09%, suggesting
that the trait was controlled by many loci with minor effects. The


Table 1
Descriptive statistics (range and mean), genetic variances (VG), genotype-environment interaction variances (VGE), error variances (Ve), and heritability estimates (h2) for percent
stalk rot in the Tuxpeño and non-Tuxpeño populations across three environments.

Environment Range Mean VG Ve VGE h2

Tuxpeño
Cortazar 7.66–91.99 31 ± 1.2 252.35 447.56 0.53
Valle de Santiago 5.21–87.36 14 ± 0.9 171.51 201.46 0.63
Juventino Rosas 1.27–19.02 3 ± 0.2 1.88 13.33 0.22
Combined 5.27–55.51 16 ± 0.6 63.14 216.12 73.93 0.51
non-Tuxpeño
Cortazar 3.04–105.1 44 ± 1.7 587.92 378.94 0.76
Valle de Santiago 2.42–97.23 28 ± 1.5 231.62 173.42 0.73
Juventino Rosas 3.08–16.37 3 ± 0.2 0.04 15.92 0.005
Combined 4.36–64.11 25 ± 0.9 938.48 558.61 313.42 0.76

Fig. 1. Principal component (PC) plots for Tuxpeño (A), non-Tuxpeño (B), and the 737 inbred lines, including progenitors (C), the linkage disequilibrium (r2) decay in Tuxpeño
(D), non-Tuxpeño (E), and the 677 inbred lines (excluding the progenitors) (F).

J. Song, A. Pacheco, A. Alakonya et al. The Crop Journal 12 (2024) 558–568
18 SNPs were consistently detected across environments
(Table S7).

Based on the LD decay distance, the eight SNPs could be
grouped into six genomic regions in Tuxpeño on chromosome 1
at positions 187 Mb and 191 Mb, on chromosome 3 at 215 Mb,
chromosome 4 at 168 Mb and 190 Mb, and chromosome 10 at
147 Mb. In contrast, four genomic regions were identified in non-
Tuxpeño on chromosome 5 at 22 Mb and 33 Mb, chromosome 6
at 136 Mb, and chromosome 7 at 8 Mb (Table 2). These genomic
regions also contained annotated genes with known predicted
functions (Table S8). Genomic regions detected in Tuxpeño did
not overlap with those in non-Tuxpeño.

Eight favorable haplotypes were detected in the 10 genomic
regions: five from the Tuxpeño and three from the non-Tuxpeño
population (Fig. S1), where 13 SNPs associated with FSR resistance
were contained (Table S9). The frequencies of the favorable haplo-
types ranged from 3.37% to 39.27% in Tuxpeño, and 0.36% to
73.30% in non-Tuxpeño (Table 3). A search of the haplotypes
among the tropical and temperate progenitors found all eight
562
haplotypes in tropical progenitors (Table 3), and only five in
temperate progenitors.

The top 13 lines with the lowest percent stalk rot in both Tux-
peño and non-Tuxpeño populations contained one to three favor-
able haplotypes (Table S10). The presence of more favorable
haplotypes in the most resistant lines suggests that the results of
the haplotype analysis were consistent with the performance of
the hybrids for stalk rot.

3.5. Prediction accuracy in different models and cross-validation
schemes

Model 3 showed the lowest DIC values for both Tuxpeño and
non-Tuxpeño populations, indicating that it provided the best fit
(Table S11). Among the three models, Model 1 showed the lowest
prediction accuracy for CV1 (rMP � 0) and a moderate to high pre-
diction accuracy for CV2 in both the Tuxpeño (rMP = 0.31) and non-
Tuxpeño (rMP = 0.47) populations (Fig. 3A–D). These results show
that the stalk rot phenotypic values of a set of lines are a poor pre-


Fig. 2. Manhattan and quantile–quantile (Q–Q) plots for GWAS of Tuxpeño and non-Tuxpeño populations: the Manhattan plots showing significant SNPs and associated
genes for the Tuxpeño population (A) and the non-Tuxpeño population (B); Q–Q plots for the Tuxpeño population (C) and the non-Tuxpeño population (D).

Table 2
SNPs associated with stalk rot in the Tuxpeño and non-Tuxpeño populations based on GWAS.

SNP Allele P-value PVE (%)a Binb Genomic region

Tuxpeño
S1_187642387 G/A 1.09E-06 7.53 1.06 chr1:187640937–187643837
S1_191581422 G/A 9.56E-06 6.31 1.06 chr1:191579972–191582872
S3_215475106 A/T 7.34E-06 6.45 3.08 chr3:215474116–215476096
S4_168991386 C/A 5.32E-06 6.64 4.06 chr4:168989856–168992916
S4_190444220 G/C 1.15E-05 6.2 4.08 chr4:190442690–190445750
S10_147125575 A/C 8.63E-06 6.36 10.07 chr10:147124415–147126748
S10_147125579 G/T 8.63E-06 6.36 10.07
S10_147125588 G/C 8.63E-06 6.36 10.07
non-Tuxpeño
S5_22556254 G/T 3.81E-06 7.51 5.03 chr5:22555334–22557174
S5_33821881 A/T 8.10E-06 8.21 5.03 chr5:33820961–33822804
S5_33821884 C/T 8.10E-06 8.21 5.03
S6_136411596 C/G 6.75E-06 8.34 6.05 chr6:136410666–136412541
S6_136411602 C/G 6.75E-06 8.34 6.05
S6_136411611 C/T 6.05E-06 8.42 6.05
S7_8047244 G/C 4.72E-06 8.63 7.01 chr7:8046354–8048176
S7_8047279 G/A 4.72E-06 8.63 7.01
S7_8047282 G/A 4.72E-06 8.63 7.01
S7_8047286 A/G 2.54E-06 9.09 7.01

a Percentage of the phenotypic variance explained by each QTL.
b Chromosome bin

J. Song, A. Pacheco, A. Alakonya et al. The Crop Journal 12 (2024) 558–568
dictor of the stalk rot values of a genetically related set of lines. In
contrast, Model 1 CV2 shows that prediction accuracies can be
improved by taking advantage of the covariance structure among
environments.

With Model 2, the prediction accuracy for CV1 relative to Model
1 increased from 0.0 in both Tuxpeño and non-Tuxpeño to 0.31 in
Tuxpeño and 0.60 in non-Tuxpeño (Fig. 3A, C). The prediction
563
accuracy also increased from 0.31 to 0.38 in Tuxpeño and from
0.47 to 0.62 in non-Tuxpeño for CV2 (Fig. 3B, D), indicating the
advantage of including marker effects for both cross-validation
schemes.

The G � E model (Model 3), which includes the interaction
between markers and the environment, gave higher prediction
accuracy than the genotype (line) main effects models (Model 1


Table 3
The genetic location, frequency, effect, and source populations of seven favorable haplotypes associated with stalk rot resistance in the Tuxpeño and non-Tuxpeño populations.

LD blocks Favorable haplotypes Effect (%) Frequency (%)

Physical position ID Sequence Tuxpeño non-Tuxpeño Tuxpeño non-Tuxpeño Temperate parents Tropical parents

Tuxpeño
Chr.1, �187.6 Mb H1-1 GTT �2.63 �13.15 13.55 18.28 27.60 25.80
Chr.1, �191.5 Mb H1-2 GT �5.78 �4.54 10.3 3.58 0 9.70
Chr.3, �215.4 Mb H3 TTGCGTCAT �2.03 �3.76 34.9 51.05 69.00 19.40
Chr.4, �168.8 Mb H4-1 CGGGATGGC �1.09 �6.11 3.37 12.21 0 3.20
Chr.4, �190.4 Mb H4-2 GT �5.8 �2.58 39.27 73.3 50.00 50.00
non-Tuxpeño
Chr.5, �22.5 Mb H5 GTCCCCT �1.04 �20.13 3.86 8.33 10.30 3.20
Chr.6, �136.4 Mb H6 TTGGCCC �10.9 �15.86 9.26 0.36 0 3.20
Chr.7, �8.0 Mb H7 GGGGTATC �5.52 �11.93 14.88 31.56 17.24 6.45

Fig. 3. Genomic prediction accuracies for stalk rot in the Tuxpeño and non-Tuxpeño populations when training the model on 80% of the population to predict the remaining
20%. (A–D) Results for the baseline model based on phenotypic data alone (M1: Environment + Line), the model incorporating genomic effects without G � E (M2:
Environment + Line + Genomic), and the model incorporating genomic effects and G � E (M3: Environment + Line + Genomic + Genomic � Environment). The genomic
prediction was conducted using two cross-validation schemes: CV1, equivalent to using an existing independent but related data set to predict breeding values of newly
developed lines, and CV2, equivalent to predicting breeding values in unbalanced multi-environment trials. (E, F) Genomic prediction accuracies for Model 3 and CV2 in non-
Tuxpeño using the stalk rot data from positively correlated environments only (E) and the phenotypic data from all three environments, including a negatively correlated
environment (F). (G) The prediction accuracies for Model 3 and CV2 of each individual environment, including the uncorrelated environment (Juventino Rosas) in the non-
Tuxpeño population (G).

J. Song, A. Pacheco, A. Alakonya et al. The Crop Journal 12 (2024) 558–568
and Model 2). The mean prediction accuracy for Model 3 increased
for CV1 relative to Model 2 from 0.31 to 0.35 in Tuxpeño and from
0.60 to 0.70 in the non-Tuxpeño (Fig. 3A, C). A similar trend was
observed for CV2, especially in non-Tuxpeño, where prediction
564
accuracy increased from 0.62 in Model 2 to 0.77 in Model 3
(Fig. 3B, D).

The prediction accuracy of all models was affected by the
correlation among environments. In the non-Tuxpeño population,


J. Song, A. Pacheco, A. Alakonya et al. The Crop Journal 12 (2024) 558–568
the correlation between Cortazar and Valle de Santiago stalk rot
means was significant and positive, whereas Juventino Rosas was
not correlated with either Cortazar or Valle de Santiago (Fig. S2).
Excluding Juventino Rosas fromModel 3 increased prediction accu-
racy from 0.50 to 0.77. A similar trend was observed for Model 1
and Model 2 (Fig. 3E, F). Moreover, prediction accuracies within
each environment were � 0.0 or negative for all three models for
Juventino Rosas, indicating a lack of genetic variance for stalk rot
at this location (Fig. 3G).
3.6. Prediction with varying training population sizes and between
populations

Prediction accuracies tended to increase with the size of the
training set. The prediction accuracy in Tuxpeño increased from
0.29 when 20% of the population was used to predict the remaining
80% to 0.38 when 80% of the population was used to predict the
remaining 20% (Fig. 4A). A similar trend was observed in the
non-Tuxpeño population, where the prediction accuracy increased
from 0.62 when 20% was used to predict 80% to 0.77 when 80% of
the population was used to predict the remaining 20% (Fig. 4B).

Prediction accuracy was low when either population was used
to train the model to predict stalk rot in the other. When Tuxpeño
was used as the training population, prediction accuracies in non-
Tuxpeño were 0.00, 0.09, and 0.17 for Model 1, Model 2, and Model
3, respectively (Fig. 4C). The prediction accuracies were equally
low when non-Tuxpeño was used to train the model to predict
stalk rot in Tuxpeño (Fig. 4C).
Fig. 4. Genomic prediction accuracies for stalk rot resistance in the Tuxpeño and non-Tu
80% to predict the rest of the population using Model 3 and CV2 in the Tuxpeño (A) a
population to predict the Tuxpeño population and vice versa (C).

565
4. Discussion

4.1. Linkage disequilibrium decay and genome-wide association
mapping

The LD estimates from this study were higher than those
reported in other studies that used inbred lines selected to capture
most of the genetic diversity in a germplasm pool. The Tuxpeño
and non-Tuxpeño populations under selection for combining abil-
ity for grain yield and were relatively closed, resulting in high LD.
The high LD decay distance implies that the resolution of genomic
regions detected in this study was lower than of those detected in
other studies using populations with lower LD decay distance.

The lack of overlap in the genomic regions detected in the two
populations suggests that the two heterotic pools were genetically
heterogeneous for stalk. Although the tester effect could explain
some of the heterogeneity, it is also possible that the populations
indeed had unique genomic regions associated with stalk rot. This
genetic heterogeneity could result from the genetic divergence
expected from hybrid breeding [40] or other population genetics
forces such as drift. Whatever the cause, this genetic divergence
could facilitate hybrid breeding for stalk rot resistance by enabling
independent selection for a few regions in each heterotic group
and maximizing resistance in hybrids through complementary
dominance and additive effects of the favorable alleles from the
parents.

The regions detected in this study did not overlap with regions
detected for other stalk rot pathogens in previous studies. None of
xpeño populations when the relative size of the training set was varied from 20% to
nd non-Tuxpeño (B) populations. Prediction accuracy when training non-Tuxpeño


J. Song, A. Pacheco, A. Alakonya et al. The Crop Journal 12 (2024) 558–568
the ten regions overlapped with known QTL for Gibberella stalk rot,
including qRfg1, qRfg2, and qRfg3 [11–13]. Likewise, Ma et al. [14]
reported a major QTL for Anthracnose stalk rot in bin 4.07, but
the regions detected on chromosome 4 in this study were at posi-
tion 168.99 Mb in bin 4.06 and 190.44 Mb in bin 4.08 in Tuxpeño.
In another study, two genes associated with Pythium stalk rot, one
on chromosome 1 (bin 1.03) and the other on chromosome 10 (bin
10.02), were reported by Song et al. [15]; however, the regions
detected in this study were in bins 1.06 and 10.07. In yet another
study for Pythium stalk rot Duan et al. [16] reported two genes
on chromosomes 1 (bin 1.09) and chromosome 4 (bin 4.08). The
genomic position of the gene on chromosome 4, RpiX178-2, was
about 5 Mb from the region detected in bin 4.08 in the Tuxpeño
population in this study (Table 2). In a study conducted using a
tropical maize population in India, Rashid et al. [7] reported five
SNP markers significantly associated with stalk rot caused by Fv
on chromosomes 1, 2, and 6. None of the five SNPs were in regions
detected in this study, even though the soil and plant residue anal-
yses indicated that Fv was one of the two main pathogens present
in the fields used for this study. Differences in inoculation methods
could explain some of the lack of congruency in the regions
detected in this study and those of Rashid et al. [7]. Natural disease
inoculation was used in this study, whereas Rashid et al. [7] relied
on artificial inoculation.

The lines that made up the Tuxpeño and non-Tuxpeño popula-
tions were derived from tropical and temperate germplasm
(Fig. 1C). This genetic background allowed us to track the favorable
haplotypes for stalk rot in the tropical and temperate progenitors.
The haplotype tracking results suggested more favorable alleles for
stalk rot in tropical than in temperate germplasm (Table 3). While
more genetic diversity for disease resistance is expected in tropical
maize than in temperate maize [41], the difference in the number
of favorable haplotypes could be due to sampling, drift, or other
genetic forces. The finding that no one line carried all seven favor-
able haplotypes among the tropical progenitors indicates the
potential to improve stalk resistance by pyramiding several favor-
able haplotypes through breeding.

Although the effects of the QTL detected in this study were
small, marker-assisted selection may be an effective breeding
strategy in populations segregating for a few major QTL. Such a
strategy works well with well-validated QTL or genes with large
effects in the target population [15,16]. Most genes identified in
the 10 regions are involved in plant growth and development
(Table S8). However, some genes, including GRMZM2G122025—a
stress response NST1-like protein [42]; GRMZM2G046021—a his-
tone acetyltransferase GNAT/MYST [43]; and GRMZM5G860810—a
leucine-rich repeat (LRR) protein kinase family protein have been
associated with biotic and abiotic stress responses in plants
[44,45].

4.2. Factors affecting genomic prediction accuracy

The moderate to high prediction accuracies observed in this
study indicate that genomic selection is an effective strategy for
identifying superior genotypes for stalk rot. The prediction accura-
cies were slightly higher than those of previous studies showing
moderate to high prediction accuracies for ear rot, including Fusar-
ium ear rot and Fumonisin ear rot [46–48]. Higher prediction accu-
racies in this study could be attributed to the higher heritability for
stalk rot (Table 1) than for ear rot [47,48], consistent with the gen-
eral expectation for prediction accuracies to increase with an
increase in heritability [19,35].

In addition to heritability, the prediction accuracy could also be
affected by the prediction model used, the size of the training set,
G � E interaction—which affects the heritability, and the relation-
ship between the training and the testing populations. The G � E
566
model had the highest prediction accuracy of the three models
used in both the Tuxpeño and non-Tuxpeño (Fig. 3). Further, a rel-
atively small training set is unlikely to sufficiently sample the
genetic and phenotypic diversity in the entire population, resulting
in low prediction accuracies [35,49]. The prediction accuracy for
stalk rot in Tuxpeño increased with an increase in the size of the
training set (Fig. 4). However, depending on the genetic architec-
ture of the trait and population structure, a small training set could
still result in high prediction accuracies. Using 20% of the popula-
tion to predict stalk rot in the rest of the population resulted in a
prediction accuracy > 0.6 in the non-Tuxpeño population.

Using the Tuxpeño population as a training set to predict non-
Tuxpeño and vice versa resulted in poor prediction accuracies. Poor
prediction accuracy when training and predicting across popula-
tions was expected because, in addition to the tester effect, hybrid
breeding with two heterotic pools drives allele frequencies of the
germplasm pools in opposite directions [40]. Similar observations
have been reported for the effects of G � E, the relationship
between the training and testing sets, and training population size
for various traits [35,49].

In general, prediction accuracies were lower when predicting
stalk rot for new untested lines (CV1) than when predicting stalk
rot in unbalanced multi-environment trials (CV2). Higher predic-
tion accuracy for CV2 is achieved by using phenotypic values of
lines already tested, the genetic covariance among lines, and
exploiting correlation among environments. For this reason,
excluding the uncorrelated environment (Juventino Rosas)
increased the prediction accuracy for the G � E model in non-
Tuxpeño (Fig. 3). Therefore, when designing genomic prediction
breeding schemes for complex traits such as stalk rot and grain
yield, selecting sufficiently correlated environments is essential
[50]. In CV1, the prediction set is not tested in any of the environ-
ments, resulting in a less effective exploitation of the covariance
structure among environments compared to CV2 where all the
lines are tested in at least one environment.

Both CV1 and CV2 are useful for breeding programs, but they
increase genetic gain in two distinct ways. With an appropriate
training dataset, the CV1 scheme permits the prediction of trait
values for new lines without phenotypic data, allowing breeders
to skip testing stages and reduce breeding cycle time. The CV2
scheme, in contrast, increases genetic gain primarily by increasing
selection accuracy in unbalanced experiments. However, in most
cases, breeders prefer to sacrifice some accuracy for speed and
have a product on the market as quickly as possible. Whatever
the case, reducing cycle time has the largest effect on genetic gain
[51] and the associated benefits may be great enough to compen-
sate for the lower prediction accuracy of CV1. But with the CV1
prediction accuracies of 0.35 in Tuxpeño, 0.70 in non-Tuxpeño
and the marginal increases in prediction accuracy observed with
CV2, the advantages of predicting untested lines for stalk rot can
still be substantial.
Data availability

The datasets generated and analyzed for this study are available
from the CIMMYT data and software repository network: https://
hdl.handle.net/11529/10548947.
CRediT authorship contribution statement

Junqiao Song: Data curation, Formal analysis, Visualization,
Writing – original draft, Writing – review & editing. Angela
Pacheco: Investigation, Formal analysis. Amos Alakonya: Investi-
gation, Methodology. Andrea S. Cruz-Morales: Investigation,
Methodology. Carlos Muñoz-Zavala: Investigation, Methodology.

https://hdl.handle.net/11529/10548947
https://hdl.handle.net/11529/10548947


J. Song, A. Pacheco, A. Alakonya et al. The Crop Journal 12 (2024) 558–568
Jingtao Qu: Investigation, Formal analysis. Chunping Wang: Con-
ceptualization, Methodology, Writing – review & editing. Xuecai
Zhang: Conceptualization, Methodology, Data curation, Formal
analysis, Writing – review & editing. Felix San Vicente: Project
administration, Conceptualization. Thanda Dhliwayo: Conceptual-
ization, Methodology, Data curation, Project administration,
Supervision.

Declaration of competing interest

The authors declare that they have no known competing finan-
cial interests or personal relationships that could have appeared to
influence the work reported in this paper.

Acknowledgments

This work was funded by the CGIAR Research Program (CRP) on
MAIZE, the USAID through the Accelerating Genetic Gains Supple-
mental Project (Amend. No. 9 MTO 069033), and the One CGIAR
Initiative on Accelerated Breeding. The MAIZE CRP received fund-
ing from the governments of Australia, Belgium, Canada, China,
France, India, Japan, the Republic of Korea, Mexico, the Nether-
lands, New Zealand, Norway, Sweden, Switzerland, the United
Kingdom, the United States, and the World Bank. JS was supported
by the China Scholarship Council. The authors thank Oscar Garcia-
Romero and Jorge Martinez-Ruiz for their technical support of the
work.

Appendix A. Supplementary data

Supplementary data for this article can be found online at
https://doi.org/10.1016/j.cj.2024.02.004.

References

[1] D.G. White, Compendium of corn diseases, third edition., American
Phytopathological Society Press, St Paul, MN, USA, 1999.

[2] S. Zahniser, N.F. López-López, M. Motamed, Z.Y. Silva-Vargas, T. Capehart, The
growing corn economies of Mexico and the United States (a report from the
Economic Research Service, OCS-19F-02), Economic Research Service, USDA,
USA, 2019.

[3] W.J. Li, P. He, J.Y. Jin, Effect of potassium on ultrastructure of maize stalk pith
and young root and their relation to stalk rot resistance, Agric. Sci. China 9
(2010) 1467–1474.

[4] M.G. Figueroa-Rivera, R. Rodríguez-Guerra, B.Z. Guerrero-Aguilar, M.M.
González-Chavira, J.L. Pons-Hernández, Characterization of fusarium species
associated with rotting of corn root in gunajuato, Mexico, Revista Mexicana de
Fitopatología 28 (2010) 124–134.

[5] Z.R. Mir, P.K. Singh, P.H. Zaidi, M.T. Vinayan, S.S. Sharma, M.K. Krishna, A.K.
Vemula, A. Rathore, S.K. Nair, Genetic analysis of resistance to post flowering
stalk rot in tropical germplasm of maize (Zea mays L.), Crop Protect. 106 (2018)
42–49.

[6] P.J. Donahue, E.L. Stromberg, C.W. Roane, A diallel study of stalk rot resistance
in elite maize and its interaction with yield, Virginia J. Sci. 40 (1989) 157–170.

[7] Z. Rashid, V. Babu, S.S. Sharma, P.K. Singh, S.K. Nair, Identification and
validation of a key genomic region on chromosome 6 for resistance to fusarium
stalk rot in tropical maize, Theor. Appl. Genet. 135 (2022) 4549–4563.

[8] S. Liu, J. Fu, Z. Shang, X. Song, M. Zhao, Combination of genome-wide
association study and QTL mapping reveals the genetic architecture of
fusarium stalk rot in maize, Front. Agron. 2 (2021) 590374.

[9] Y. Kou, S. Wang, Broad-spectrum and durability: understanding of quantitative
disease resistance, Curr. Opin. Plant Biol. 13 (2010) 181–185.

[10] C. Wang, Q. Yang, W. Wang, Y. Li, Y. Guo, D. Zhang, X. Ma, W. Song, J. Zhao, M.
Xu, A transposon-directed epigenetic change in ZmCCT underlies quantitative
resistance to gibberella stalk rot in maize, New Phytol. 215 (2017) 1503–1515.

[11] Q. Yang, G. Yin, Y. Guo, D. Zhang, S. Chen, M. Xu, A major QTL for resistance to
gibberella stalk rot in maize, Theor. Appl. Genet. 121 (2010) 673–687.

[12] D. Zhang, Y. Liu, Y. Guo, Q. Yang, J. Ye, S. Chen, M. Xu, Fine-mapping of qRfg2, a
QTL for resistance to gibberella stalk rot in maize, Theor. Appl. Genet. 124
(2012) 585–596.

[13] C. Ma, X. Ma, L. Yao, Y. Liu, F. Du, X. Yang, M. Xu, qRfg3, a novel quantitative
resistance locus against gibberella stalk rot in maize, Theor. Appl. Genet. 130
(2017) 1723–1734.
567
[14] W. Ma, X. Gao, T. Han, M.T. Mohammed, J. Yang, J. Ding, W. Zhao, Y.L. Peng, V.
Bhadauria, Molecular genetics of anthracnose resistance in maize, J. Fungi
(basel) 8 (2022) 540.

[15] F.J. Song, M.G. Xiao, C.X. Duan, H.J. Li, Z.D. Zhu, B.T. Liu, S.L. Sun, X.F. Wu, X.M.
Wang, Two genes conferring resistance to pythium stalk rot in maize inbred
line Qi319, Mol. Genet. Genomics 290 (2015) 1543–1549.

[16] C. Duan, F. Song, S. Sun, C. Guo, Z. Zhu, X. Wang, Characterization and
molecular mapping of two novel genes resistant to pythium stalk rot in maize,
Phytopathology 109 (2019) 804–809.

[17] H. Fu, H.K. Dooner, Intraspecific violation of genetic colinearity and its
implications in maize, Proc. Natl. Acad. Sci. U. S. A. 99 (2002) 9573–9578.

[18] T.H. Meuwissen, B.J. Hayes, M.E. Goddard, Prediction of total genetic value
using genome-wide dense marker maps, Genetics 157 (2001) 1819–1829.

[19] H. Zhang, L. Yin, M. Wang, X. Yuan, X. Liu, Factors affecting the accuracy of
genomic selection for agricultural economic traits in maize, cattle, and pig
populations, Front Genet. 10 (2019) 189.

[20] R. Guo, J. Chen, C.D. Petroli, A. Pacheco, X. Zhang, F. San Vicente, S.J. Hearne, T.
Dhliwayo, The genetic structure of CIMMYT and U.S. inbreds and its
implications for tropical maize breeding, Crop Sci. 61 (2021) 1666–1681.

[21] A.M. Stucker, E. Morris, C.J. Stubbs, D.J. Robertson, The crop clamp - a non-
destructive electromechanical pinch test to evaluate stalk lodging resistance,
HardwareX 10 (2021) e00226.

[22] T.A. Jackson-Ziems, J.M. Rees, R.M. Harveson, Common stalk rot diseases of
corn, Papers in Plant Pathology 532 (2014), http://digitalcommons.unl.
edu/plantpathpapers/532.

[23] A. Chakrabarti, J.K. Ghosh, AIC, BIC and recent advances in model selection, in:
P.S. Bandyopadhyay, M.R. Forster (Eds.), Philosophy of Statistics, North-
Holland, Amsterdam, Amsterdam, the Netherlands, 2011, pp. 583–605.

[24] D. Bates, M. Mächler, B. Bolker, S. Walker, Fitting linear mixed-effects models
using lme4, J. Statistical Soft. 67 (2015) 1–48.

[25] J. Doyle, J. Doyle, A rapid procedure for DNA purification from small quantities
of fresh leaf tissue, Phytochem. Bull. 19 (1987) 11–15.

[26] R.J. Elshire, J.C. Glaubitz, Q. Sun, J.A. Poland, K. Kawamoto, E.S. Buckler, S.E.
Mitchell, A robust, simple genotyping-by-sequencing (GBS) approach for high
diversity species, PLoS ONE 6 (2011) e19379.

[27] P. Bradbury, Z. Zhang, D. Kroon, T. Casstevens, Y. Ramdoss, E. Buckler, TASSEL:
software for association mapping of complex traits in diverse samples,
Bioinformatics 23 (2007) 2633–2635.

[28] D. Money, K. Gardner, Z. Migicovsky, H. Schwaninger, G.Y. Zhong, S. Myles,
LinkImpute, Fast and accurate genotype imputation for nonmodel organisms,
G3-Genes Genomes Genet. 5 (2015) 2383–2390.

[29] H. Wickham, ggplot2: elegant graphics for data analysis, 2nd, Springer, New
York, NY, USA, 2009.

[30] T. Van den Ende, F.A. Abe Nijenhuis, H.G. van den Boorn, E. Ter Veer, M.C.C.M.
Hulshof, S.S. Gisbertz, M.G.H. van Oijen, H.W.M. van Laarhoven, COMplot, a
graphical presentation of complication profiles and adverse effects for the
curative treatment of gastric cancer: a systematic review and meta-analysis,
Front Oncol. 9 (2019) 684.

[31] M.X. Li, J.M. Yeung, S.S. Cherny, P.C. Sham, Evaluating the effective numbers of
independent tests and significant p-value thresholds in commercial
genotyping arrays and public imputation reference datasets, Hum. Genet.
131 (2012) 747–756.

[32] S.S. Dong, W.M. He, J.J. Ji, C. Zhang, Y. Guo, T.L. Yang, LDBlockShow: a fast and
convenient tool for visualizing linkage disequilibrium and haplotype blocks
based on variant call format files, Brief. Bioinformatics 22 (2020) bbaa227.

[33] S.A. Flint-Garcia, J.M. Thornsberry, E.S. Buckler, Structure of linkage
disequilibrium in plants, Annu. Rev. Plant Biol. 54 (2003) 357–374.

[34] D. Jarquin, J. Crossa, X. Lacaze, P. Du Cheyron, J. Daucourt, J. Lorgeou, F. Piraux,
L. Guerreiro, P. Perez, M. Calus, J. Burgueno, G. de los Campos, A reaction norm
model for genomic selection using high-dimensional genomic and
environmental data, Theor. Appl. Genet.127 (2014) 595–607.

[35] E.K. Mageto, J. Crossa, P. Pérez-Rodríguez, T. Dhliwayo, N. Palacios-Rojas, M.
Lee, R. Guo, F. San Vicente, X. Zhang, V. Hindu, Genomic prediction with
genotype by environment interaction analysis for kernel zinc concentration in
tropical maize germplasm, G3-Genes Genomes Genet. 10 (2020) 2629–2639.

[36] M. Lopez-Cruz, J. Crossa, D. Bonnett, S. Dreisigacker, J. Poland, J.L. Jannink, R.P.
Singh, E. Autrique, G. de los Campos, Increased prediction accuracy in wheat
breeding trials using a marker � environment interaction genomic selection
model, G3-Genes Genomes Genet. 5 (2015) 569–582.

[37] P. Pérez, G. de los Campos, Genome-wide regression and prediction with the
BGLR statistical package, Genetics 198 (2014) 483–495.

[38] P. Pérez, G. de Los Campos, J. Crossa, D. Gianola, Genomic-enabled prediction
based on molecular markers and pedigree using the Bayesian linear regression
package in R, Plant, Genome 3 (2010) 106–116.

[39] D.J. Spiegelhalter, N.G. Best, B.P. Carlin, A. van der Linde, Bayesian measures of
model complexity and fit, J. R. Stat. Soc. Ser. B-Stat. Methodol. 64 (2002) 583–
639.

[40] E.A. Lee, M. Tollenaar, Physiological basis of successful breeding strategies for
maize grain yield, Crop Sci. 47 (2007) S202–S215.

[41] M.M. Goodman, Genetic and germplasm stocks worth conserving, J. Hered. 81
(1990) 11–16.

[42] Q. Zhang, F. Luo, Y. Zhong, J. He, L. Li, Modulation of NAC transcription factor
NST1 activity by XYLEM NAC DOMAIN1 regulates secondary cell wall
formation in arabidopsis, J. Exp. Bot. 71 (2020) 1449–1458.

https://doi.org/10.1016/j.cj.2024.02.004
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0005
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0005
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0005
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0010
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0010
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0010
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0010
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0010
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0015
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0015
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0015
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0020
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0020
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0020
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0020
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0025
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0025
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0025
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0025
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0030
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0030
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0035
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0035
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0035
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0040
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0040
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0040
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0045
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0045
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0050
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0050
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0050
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0055
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0055
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0060
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0060
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0060
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0065
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0065
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0065
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0070
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0070
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0070
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0075
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0075
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0075
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0080
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0080
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0080
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0085
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0085
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0090
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0090
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0095
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0095
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0095
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0100
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0100
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0100
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0105
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0105
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0105
http://digitalcommons.unl.edu/plantpathpapers/532
http://digitalcommons.unl.edu/plantpathpapers/532
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0115
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0115
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0115
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0115
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0115
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0120
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0120
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0125
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0125
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0130
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0130
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0130
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0135
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0135
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0135
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0140
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0140
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0140
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0145
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0145
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0145
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0150
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0150
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0150
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0150
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0150
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0155
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0155
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0155
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0155
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0160
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0160
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0160
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0165
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0165
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0170
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0170
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0170
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0170
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0175
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0175
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0175
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0175
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0180
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0180
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0180
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0180
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0180
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0185
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0185
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0190
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0190
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0190
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0195
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0195
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0195
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0200
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0200
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0205
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0205
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0210
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0210
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0210


J. Song, A. Pacheco, A. Alakonya et al. The Crop Journal 12 (2024) 558–568
[43] X. Liu, M. Luo, W. Zhang, J. Zhao, J. Zhang, K. Wu, L. Tian, J. Duan, Histone
acetyltransferases in rice (Oryza sativa L.): phylogenetic analysis, subcellular
localization and expression, BMC Plant Biol. 12 (2012) 145.

[44] Y. Zan, Y. Ji, Y. Zhang, S. Yang, Y. Song, J. Wang, Genome-wide identification,
characterization and expression analysis of populusleucine-rich repeat
receptor-like protein kinase genes, BMC Genomics 14 (2013) 318.

[45] K.U. Torii, Leucine-rich repeat receptor kinases in plants: structure, function,
and signal transduction pathways, in: International Review of Cytology,
Academic Press, New York, NY, USA, 2004, pp. 1–46.

[46] Y.B. Liu, G.H. Hu, A. Zhang, A. Loladze, Y.X. Hu, H. Wang, J.T. Qu, X.C. Zhang, M.
Olsen, F. San Vicente, J. Crossa, F. Lin, B.M. Prasanna, Genome-wide association
study and genomic prediction of fusarium ear rot resistance in tropical maize
germplasm, Crop J. 9 (2021) 325–341.

[47] J.B. Holland, T.P. Marino, H.C. Manching, R.J. Wisser, Genomic prediction for
resistance to fusarium ear rot and fumonisin contamination in maize, Crop Sci.
60 (2020) 1863–1875.
568
[48] M.C. Kuki, R.J.B. Pinto, F.A.B. Bertagna, D.J. Tessmann, A. Teixeira do Amaral
Júnior, C.A. Scapim, J.B. Holland, Association mapping and genomic prediction
for ear rot disease caused by fusarium verticillioides in a tropical maize
germplasm, Crop Sci. 60 (2020) 2867–2881.

[49] J. Crossa, P. Pérez-Rodríguez, J. Cuevas, O. Montesinos-López, D. Jarquín, G. de
Los Campos, J. Burgueño, J.M. González-Camacho, S. Pérez-Elizalde, Y. Beyene,
S. Dreisigacker, R. Singh, X. Zhang, M. Gowda, M. Roorkiwal, J. Rutkoski, R.K.
Varshney, Genomic selection in plant breeding: methods, models, and
perspectives, Trends Plant Sci. 22 (2017) 961–975.

[50] J.E. Spindel, S.R. McCouch, When more is better: how data sharing would
accelerate genomic selection of crop plants, New Phytol. 212 (2016) 814–826.

[51] D.V. Butruille, F.H. Birru, M.L. Boerboom, E.J. Cargill, D.A. Davis, P. Dhungana, G.M.
Dill, F. Dong, A.E. Fonseca, B.W. Gardunia, G.J. Holland, N. Hong, P. Linnen, T.E.
Nickson, N. Polavarapu, J.K. Pataky, J. Popi, S.B. Stark, Maize breeding in the United
States: views from within monsanto, in: J. Janick (Ed.), Plant Breeding Reviews,
Volume 39, John Wiley & Sons Inc, Hoboken, NJ, USA, 2015, pp. 199–282.

http://refhub.elsevier.com/S2214-5141(24)00044-8/h0215
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0215
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0215
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0220
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0220
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0220
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0225
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0225
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0225
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0225
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0230
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0230
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0230
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0230
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0235
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0235
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0235
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0240
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0240
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0240
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0240
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0245
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0245
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0245
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0245
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0245
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0250
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0250
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0255
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0255
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0255
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0255
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0255
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0255
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0255
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0255
http://refhub.elsevier.com/S2214-5141(24)00044-8/h0255

	Genome-wide association mapping and genomic prediction of stalk rotin two mid-altitude tropical maize populations
	1. Introduction
	2. Materials and methods
	3. Results
	References