G3, 2023,  
https://doi.org/10.1093/g3journal/jkad148 
Advance Access Publication Date: 5 July 2023 
Investigation 
Complex traits and candidate genes: estimation of genetic 
variance components across multiple genetic architectures 
Mitchell J. Feldmann,1,* Giovanny Covarrubias-Pazaran,2,3 Hans-Peter Piepho4 
1Department of Plant Sciences, University of California Davis, One Shields Ave, Davis, CA 95616, USA 
2International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, El Batán, 56130 Texcoco, Edo. de México, México 
3Present address: International Rice Research Institute, Los Baños, 4031 Laguna, Philippines 
4Biostatistics Unit, Institute of Crop Science, University of Hohenheim, Stuttgart 70599, Germany 
*Corresponding author: Department of Plant Science, University of California Davis, One Shields Ave, Davis, CA, 95616, USA. Email: mjfeldmann@ucdavis.edu 
Abstract  
Large-effect loci—those statistically significant loci discovered by genome-wide association studies or linkage mapping—associated 
with key traits segregate amidst a background of minor, often undetectable, genetic effects in wild and domesticated plants and animals. 
Accurately attributing mean differences and variance explained to the correct components in the linear mixed model analysis is vital for 
selecting superior progeny and parents in plant and animal breeding, gene therapy, and medical genetics in humans. Marker-assisted 
prediction and its successor, genomic prediction, have many advantages for selecting superior individuals and understanding disease 
risk. However, these two approaches are less often integrated to study complex traits with different genetic architectures. This simulation 
study demonstrates that the average semivariance can be applied to models incorporating Mendelian, oligogenic, and polygenic terms 
simultaneously and yields accurate estimates of the variance explained for all relevant variables. Our previous research focused on large- 
effect loci and polygenic variance separately. This work aims to synthesize and expand the average semivariance framework to various 
genetic architectures and the corresponding mixed models. This framework independently accounts for the effects of large-effect loci 
and the polygenic genetic background and is universally applicable to genetics studies in humans, plants, animals, and microbes. 
Keywords: average semivariance, linear mixed model, variance component estimation, polygenic inheritance, oligogenic inheritance, 
Mendelian inheritance 
Introduction value or genetic merit of a candidate individual (VanRaden 
2008). For loci to provide actionable gains or diagnoses, they 
Today, linear mixed models (LMMs) are routinely applied in plant 
must explain a significant proportion of phenotypic and genetic 
breeding and quantitative genetics research. They are used for the 
variation in a population with alleles in segregation at target loci. 
prediction of genetic values in plants and animals (VanRaden 
Candidate gene discovery through genome-wide association 
2008; Heffner et al. 2010; Meuwissen et al. 2016), or polygenic risk 
studies (GWAS) and quantitative trait locus (QTL) mapping is pro-
scores (PRSs) in humans (de Los Campos et al. 2013; Wray et al. lific in plant and animal populations (Lander and Schork 1994;  
2019), to estimate the heritability of traits in target populations Visscher et al. 2012, 2017). Despite decades of directional selection 
(Visscher et al. 2008; Legarra 2016), and to estimate ecological in many plant populations, loci impacting traits of interest still seg-
and evolutionary genetic parameters of behavioral traits regate, even in advanced breeding materials. These genome-wide 
(Ariyomo et al. 2013; Walsh and Lynch 2018). Genetic values are analyses have implicated numerous genes and genomic regions 
constructed from a combination of genetic effects; including in controlling a wide variety of simple and complex traits 
Mendelian factors; which may have both additive and dominant (Anderson et al. 2007; Septiningsih et al. 2009; Han et al. 2018;  
sources of variance (Pincot et al. 2022), oligogenic factors consist- Demmings et al. 2019; Xin et al. 2020). However, the utility of such 
ing of few genetic factors and their epistatic interactions appropri- marker–trait associations may not be fully realized (Bernardo 
ate for marker-assisted prediction (MAP) (Tang et al. 2006), a 2004, 2016). Large-effect and statistically significant loci typically 
polygenic term consisting of a dense genome-wide framework of only explain a fraction of the genetic and phenotypic variance in 
markers assumed to have minor effects suitable for genomic pre- a population (Feldmann, Piepho, Bridges, et al. 2021), along with 
diction (GP); which may also account of additive and dominance the polygenic fraction (Feldmann, Piepho, et al. 2022), except in ex-
sources of variance (Brandariz and Bernardo 2019), and a residual treme scenarios when Mendelian factors wholly control a trait. 
genetic term consisting of all genetic effects not accounted for by Discovered loci rarely, if ever, explain 100% of the genetic vari-
the previous genetic factors (Rutkoski et al. 2014; Rice and Lipka ance, and understanding the multiple sources of variation and 
2019; DeWitt et al. 2021). The ultimate objective in breeding appli- how they relate can help breeders and research prioritize targets 
cations is, typically, predicting the genotypic value, e.g. breeding and mitigate risk (Bernardo 2004, 2014). Genes with significant 
Received: April 27, 2023. Accepted: June 12, 2023 
© The Author(s) 2023. Published by Oxford University Press on behalf of The Genetics Society of America. 
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which 
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.  
Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023
2 | G3, 2023, Vol. 00, No. 0 
effects often dominate the “nonmissing heritability,” but they can mmer(fixed = Y ∼ 1,   
mask or obscure the effects of other quantitatively acting genes random = ∼ vsr(M, Gu = KM) +        
and pleiotropically affect multiple quantitative phenotypes vsr(G, Gu = Kasv) +         
(Mackay 2001; Mackay et al. 2009; Eichler et al. 2010; De GR,   
Villemereuil et al. 2018). For example, mutations in the BRCA2 rcov  = ∼ units,   
gene can have large effects but be incompletely penetrant, inter- data  = data) 
act with other genes, and may be necessary but insufficient for where KM is the matrix KM = k−1
M InM . All other variables are the 
predicting breast, ovarian, and other cancer risks in women same as previously defined. 
(Gaudet et al. 2010). Accurately partitioning the Mendelian, oligo- We generated 18 experiment designs with different population 
genic, and polygenic sources of variance allows researchers to as- sizes of n = 500, 1,000, and 1,814, and number of clonal replicates 
sess the value conferred by specific loci. per entry r = 1, 2, and 4 for outbred H = 0.38 and inbred H = 0.0 popu-
Here, we use simulations to show that the average semivariance lations. Clonal replicates are a particular case in plant genetics of 
(ASV) provides accurate variance component estimates (VCEs) and hybrid (e.g. maize, rice, and sorghum) cropping systems and in clon-
variance component ratios for all relevant genetic terms regardless ally propagated species (e.g. strawberry, potato, and apple). In all ex-
of study design or population type, e.g. outbred or inbred. We sought amples, 100 populations are genotyped at m = 5, 000 loci. These 
to provide a synthesis and extension of the previously published 5,000 single nucleotide polymorphisms (SNPs) generated the purely 
works on the ASV (Piepho 2019; Feldmann, Piepho, Bridges, et al. additive polygenic background and one locus for the simple genetic 
2021; Feldmann, Piepho, et al. 2022) and to present a fully realized effect. Marker genotypes, e.g. alleles, were drawn from a multivari-
and efficient ASV approach for typical LMM analyses in human, ate normal distribution to replicate the population structure of the 
plant, animal, and microbial genetics. We demonstrate how these 1,814 mice from Valdar et al. (2006) using R/MASS::mvrnorm() and 
models can be extended to handle more complex genetic structures, transformed such that the population was heterozygosity H = 0.38. 
including adding multiple explanatory loci and marker–marker in- We then estimated KASV and excluded the targeted locus from the 
teractions, incorporating nonadditive dominance and epistasis vari- calculation of KASV. We also simulated residual genetic and residual 
√���
ance, partitioning marker variance into additive and dominance effects each from a normal distribution with μ = 0 and θASV = 50
√��� gR
components, and performing fully efficient stagewise analysis. To and θASV
R = 40 using R/stats::rnorm(). A single explanatory lo-
accommodate the models proposed in this research, we enabled cus was simulated with a segregation ratio of approximately 1 : 2 : 1 
􏽰��������
the flexibility to provide the weights into the mixed model machin- for AA:Aa:aa marker genotypes with μ = 0 and θASV
m = kM · 66 using 
ery in the form of a matrix (diagonal or nondiagonal) instead of a vec- R/stats::rnorm(). We simulated marker effects for all m = 5, 000 
√���
tor, which is now available in R/sommer >= v4.2.0. We provide loci following a normal distribution μ = 0 and θASV
g = 66, and each 
examples of expressing the different models and extensions in the locus contributes equally. When multiplied by the centered marker 
freely available R/sommer package (Covarrubias-Pazaran 2016). genotypes and summed, the score is taken as each individual’s true 
The ASV is a powerful tool for answering these questions regardless additive genetic value g. For each simulated population we ex-
of the organism, population, or trait. pressed LMM (1) using R/sommer::mmer() (Covarrubias-Pazaran 
2016). In the second set of simulations, we used the same approach 
and the same mean and variance parameters. However, in this ex-
Methods & materials ample, we simulated inbred lines in the background polygenic mar-
Computer simulations model statements in kers (H = 0.0) and the foreground markers, e.g. 1 : 0 : 1 for AA:Aa:aa. 
R/sommer v4.2.0 Incorporating multiple target loci into GBLUP 
We use computer simulations that follow the same style as in  LMM (8) is expressed as 
Feldmann, Piepho, Bridges, et al. (2021) and Feldmann, Piepho, et al. 
mmer(fixed = Y ∼ 1,   
(2022) to demonstrate under fairly general conditions that ASV 
random = ∼ M1 + M2 + M3 +         
yields accurate estimates of variance components when (1) includ-
M12 + M13 + M23 +         
ing main-effect loci alongside polygenic background, (2) partitioning 
M123 +         
additive and dominance sources of variance for single markers and 
vsr(G, Gu = Kasv) +         
polygene, and (3) performing fully efficient stagewise analyses. 
GR,   
rcov  = ∼ units,   
Incorporating one target locus into GBLUP data  = data) 
LMM (1) is expressed as where data is an n × 10 matrix containing the phenotypic observa-
mmer(fixed = Y ∼ 1,   tions Y, seven columns corresponding to the marker effects and in-
random = ∼ M +         teractions, a factor-coding entries G, and a factor-coding levels of gR. 
vsr(G, Gu = Kasv) +         Due to the similarities between our first set of experiments and 
GR,   this extension, we do not provide any additional simulations dem-
rcov  = ∼ units,   onstrating the successes of this model extension. Feldmann, 
data  = data) Piepho, Bridges, et al. (2021) demonstrated that multiple loci could 
where data is an n × 4 matrix containing the phenotypic observa- be fit simultaneously with their interactions, and variance compo-
tions Y, levels of the marker genotypes, entries, and levels of the nents can be estimated accurately. The same is true for models 
residual genetic term, i.e. entries. The variable units is inferred incorporating a polygenic genomic relationship matrix (GRM) as 
by R/sommer::mmer() and can be considered as a column with well. However, the user is encouraged to check that the higher or-
as many levels as rows in the data (Covarrubias-Pazaran 2016). der locus–locus interactions do not saturate the model and are not 
The version of this model with kM embedded is expressed as correlated with KASV.  
Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023
M. J. Feldmann et al. | 3 
Partitioning marker variance into additive and dominance rcov  = ∼ units,   
components data  = data) 
LMM (9) is expressed as where data is an n × 3 matrix containing the phenotypic observa-
mmer(fixed = Y ∼ 1,   tions Y, one factor coding for the entry ID and one-factor coding 
random = ∼ Ma + Md +        for Blocks within the ne environment. Blocks and other within- 
vs(G, Gu = Kasv) +         location design elements can be incorporated as random effects 
GR,   using the random = syntax. In R/sommer, Σe is obtained from 
rcov  = ∼ units,   each location as the ‘VarBeta‘ matrix in the R/sommer::mmer() 
data  = data) output. “VarBeta” is the name of the model estimated variance– 
where data is an n × 5 matrix containing the phenotypic observa- covariance matrix among entry means in R/sommer. The Σes are 
tions Y, a factor-coding levels of mA, a factor-coding levels of mD, a then bound corner-to-corner, which is accomplished using R/ 
factor-coding entries G, and a factor-coding levels of gR. The factor sommer::adiag1() to obtain Ω. We then take the inverse of Ω 
coding of mA has three levels corresponding to AA : Aa : aa, and a using R/base::solve(). 
factor coding of mD has two groups corresponding to the genetic The LMM for stage 2 (17) is expressed as 
state—either homozygous or heterozygous. mmer(fixed = Y2 ∼ Env - 1,   
We performed one set of simulations for this model extension random = ∼ vsr(M, Gu = KM) +         
that follows the exact parameters as the first simulation set vsr(G, Gu = Kasv) +         
(m = 5, 000, n = 500, H = 0.38). In this simulation, we estimate G:Env + GR,    
which portion of the variance explained by a marker is from addi- rcov = ∼ vsr(units,         
tive variance and which is from dominance variance. In this simu- Gti = matrix(invSigma2,1,1),         
lation, we estimate which portion of the additive genetic variance Gtc = matrix(3,1,1)),  
√���
(θASV
g = 66), the marker explained variance by additive nIters  = 25,  
√��� √���
(θASV emWeight = rep(1,25),  
m = 20) or dominance variance (θASV
m = 20), the residual gen-
A √��� D
etic variance (θASV
g = 50), and the residual variance (on an entry- W     = invOmega,  
R √���
mean basis) (θASV
R = 40). In our simulations, 50% of the variance data   = data) 
explained by the focal marker is from additive variation and where data is an n × 5 matrix containing the adjusted entry means, 
50% is dominance variation. The other parameters of the simula- or BLUEs, from stage 1 (Y2) a factor-coding levels of M, two equiva-
tion are equal to the first set. We examined the accuracy of esti- lent factor-coding entries, e.g. G and gR, and factor-coding environ-
mating each term as well as the accuracy of estimating the total ments Env. In this approach, we must fix the residual variance 
variance explained by the focal marker. component equal to 1 so that all the scaling of the invOmega = Ω−1 
is unaffected by the model estimation process. Within the vsr() ar-
Incorporating a genomic dominance relationship matrix into gument, the Gti() and Gtc() arguments are used to set the initial 
GBLUP value of the variance component equal to the inverse of the variance 
among adjusted entry means (invSigma2 = σ̂−2) and to constrain the 
LMM (13) is expressed as 
variance component estimation to a fixed value by setting the first 
mmer(fixed = Y ∼ 1,   
argument equal to 3 (Covarrubias-Pazaran 2023). In this example, 
random = ∼ M +         
we use 25 iterations of the 100% expectation-maximization (EM) al-
vsr(Ga, Gu = Kasv) +         
gorithm; however, the EM and Newton-Raphson (NR) methods can 
vsr(Gd, Gu = Kasv˙D) +        
be exchanged or averaged, i.e. average information, by changing 
GR,   
the emWeight argument. This is not a general rule or recommenda-
rcov  = ∼ units,   
tion. The large number of iterations we used caused this analysis to 
data  = data) 
be computationally expensive and inefficient. 
where data is an n × 5 matrix containing the phenotypic observa-
We performed one set of simulations for this model extension 
tions Y, a factor-coding levels of the marker genotypes, and three 
following the exact parameters of the first simulation set 
equivalent factor-coding entries, to be used for the additive, dom-
(m = 5, 000, n = 500, H = 0.38). In this simulation, we estimate 
inance, and residual genetic terms. √���
which portion of the additive genetic variance (θASV
g = 66), the 
We performed one set of simulations for this model extension √���
marker explained variance (θASV
m = 40), the residual genetic vari-
that follows the exact parameters as the first simulation set √���
ance (θASV
g = 50), the genotype-by-environment interaction vari-
(m = 5, 000, n = 500, H = 0.38). In this simulation, we estimate R √���
ance (θASV
GE = 90), and the residual variance (on an entry-mean 
which portion of the polygenic variance is from additive √���
√��� √��� basis) (θASV
ϵ = 40). In this simulation, the dominance polygenic 
(θASV
g = 33) or dominance (θASV
g = 33). In this simulation, the 
A D variance is the same magnitude as the additive polygenic vari-
dominance polygenic variance is the same magnitude as the addi-
ance, and the other simulation parameters are equal to the first 
tive polygenic variance, and the other simulation parameters are 
set. We examined the accuracy of estimating each term. 
equal to the first set. We also controlled the residual genetic vari-
√���
ance (θASV
g = 50) and the residual variance (on an entry-mean ba-
R √���
sis) (θASV
R = 40), as in all simulations. We examined the accuracy 
of estimating each term. Results and discussion 
Candidate genes and complex traits 
Incorporating stagewise meta-analysis into GBLUP Bernardo (2014) was the first to propose an integration of MAP and 
LMM (15) for stage 1 is expressed as GP. Since then, empirical studies have validated the methodology 
mmer(fixed = Y ∼ G,   (Rutkoski et al. 2014; Spindel et al. 2016; Rice and Lipka 2019). In 
random = ∼ Block,   contrast, others have shown little-to-no improvement over GP  
Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023
4 | G3, 2023, Vol. 00, No. 0 
(Li et al. 2015; Galli et al. 2020), suggesting that modeling significant effective degrees of freedom. Also, the targeted markers should 
markers can improve prediction accuracy only when markers ex- not fit the marker effect size distribution assumptions used for 
plain a substantial portion of both genetic and phenotypic vari- the marker background, e.g. that all marker effects contribute 
ance (Galli et al. 2020). With the high densities of genome-wide equally to the genomic variance and are drawn from the same 
markers commonly assayed in gene finding studies, investigators distribution (Habier et al. 2007; Endelman 2011; Morota and 
often identify DNA markers tightly linked to a candidate or known Gianola 2014) and should not be in high LD with a large number 
causal genes as exemplified by diverse real-world examples of other markers. 
(Hayes and Goddard 2001; Hayes et al. 2010; Jensen et al. 2012;  
Visscher et al. 2012, 2017; Li et al. 2021). The candidate marker The entry mean, not the observation, is the 
loci are nearly always initially identified by genome-wide searches “phenotype” 
using sequential (marker-by-marker) approaches such as GWAS We believe the “phenotype” is the entry mean for a given subdiv-
and QTL analysis. Following the discovery of statistically signifi- ision of environments, not the individual observations that consti-
cant marker–trait associations from a marker-by-marker tute that entry mean. Our discussion here is primarily predicated 
genome-wide scan, the natural progression would be to analyze on plants, but does not necessarily exclude other organisms, 
single locus or multilocus genetic models where the effects of where replicate observations may be available per entry. In the 
the discovered loci are simultaneously corrected for the effects words of Dr. Rex Bernardo, “…the main focus of quantitative gen-
of other discovered loci, e.g. polygenic variation (Stroup et al. etics is on identifying candidates with the best genotypic value for 
2018; Gbur et al. 2020). a target population of environments” (Bernardo 2020). However, 
A marker will not explain a large portion of variance if that fine- or broad-scaled any subdivision is of a target population of 
marker does not have a large, detectable effect. Thus, markers environments or market segment, we argue that several environ-
that explain a large part of the genetic variance will be the most ments must be sampled from each subdivision. Ultimately, an 
useful for MAP and other diagnostic techniques. For example, average across those environments will be used to communicate 
consider Fusarium race one wilt resistance in strawberry, which the value of an entry to a specific subdivision of target environ-
is conferred by a single dominant acting locus Fw1 (Pincot et al. ments or to all target environments, if appropriate. These subdivi-
2022). This locus explains nearly 100% of the phenotypic and gen- sions may be defined by market segments, maturity zones, 
etic variance, and the mean differences delineate resistant vs. patterns of G × E, management strategies, geopolitics, and other 
susceptible genotypes. Thus there is almost no added benefit of elements of interest to a breeding or research program. The 
a genome-wide sample of markers over the single-marker assay granularity of the entry mean is important since not all environ-
(m) for product delivery and germplasm improvement. While ments, micro or macro, or market segments can be considered 
the variance explained is directly linked to the effect size, it is equal, and severe genotype-by-environment interactions (G × E) 
not a direct substitute. However, the random effect machinery may limit the information contained in the entry mean (Heslot 
allows researchers to obtain variance component estimates, and et al. 2013; González-Barrios et al. 2019). Conceptualizing the 
effect sizes (e.g. best linear unbiased predictors, BLUPs) simultan- phenotype as the entry mean should pose little practical conse-
eously (Searle et al. 1992), eliminating the need for multiple statis- quence as stagewise analyses, common in GWAS and GP, explicit-
tical models to assess the variance explained and the effect size of ly express this idea (Dias et al. 2020; Pincot et al. 2020; Endelman 
a target locus. The BLUP procedure is directly applied in this mod- 2022) and variance component ratios, such as the broad sense her-
el, so it is natural to use the same statistical machinery to esti- itability (H2), are often reported on an entry-mean basis (Bernardo 
mate genome-estimated breeding values (GEBVs) by genomic 2020). This concept is also concordant with single-stage analyses 
best linear unbiased prediction (GBLUP) and the genetic effect of incorporating all entries and subdivisions as main effects and the 
a locus. interaction, such as in product placement and other late-stage 
As a point of contrast, yield in maize (Zea mays) is heritable, but trials (Buntaran et al. 2020). Below, we show that ASV can be accur-
no single locus explains any appreciable amount of phenotypic or ately applied in single-stage and stagewise analyses. 
genotypic variance (Heffner et al. 2009, 2010). To improve yield in 
maize, GP is potentially a more valuable approach because the re- LMM analysis and the ASV 
searcher, or breeder, can predict the polygenic value (g) without The ASV estimator of total variance (Piepho 2019) and the vari-
relying on any particular locus but instead capturing variation ance of single markers and marker–marker interactions 
of a genome-wide sample of markers. The more challenging scen- (Feldmann, Piepho, Bridges, et al. 2021) is half the average total 
ario is the intermediate case in which a trait is controlled by both pairwise variance of a difference between entries and can be de-
loci that are discernible from the polygenic background and a composed into independent sources of variance, e.g. genetic and 
quantitative polygenic effect. residual. In this article, we assume that researchers can replicate 
The ratio between the variance explained by the oligogenic entries independently—as in clonally propagated or inbred crop 
and polygenic terms with the total genetic or phenotypic vari- species—or can collect repeated measures on entries (e.g. indivi-
ance is likely a significant factor determining the cost–benefit duals, families, or strains)—as in humans and animals—and 
of incorporating MAP, GP, or both into a breeding or diagnostic then estimate the least square means (LSMs), best linear unbiased 
program. Modeling an individual locus can be advantageous estimators (BLUEs), or other adjusted entry means in the first 
when the proportion of the phenotypic and genetic variance ex- stage of a stagewise analysis (Piepho et al. 2012; Damesa et al. 
plained by the locus is reasonably large and not partially cap- 2017, 2019). For simplicity, we assume that the residual vari-
tured by other markers in linkage disequilibrium (LD) with the ance–covariance matrix, which can take many forms (Piepho 
target (Bernardo 2014; Rutkoski et al. 2014; Pincot et al. 2022). 2019), is R = Inσ2
ϵ , where n is the number of entries (e.g. individuals, 
In this case, one could factor code a pseudomarker from mul- accessions, genotypes, lines, or animals). In stagewise analysis, R 
tiple markers bracketing a QTL to capture the variance ex- is estimated in the first stage and therefore does not need to be re- 
plained by that locus, assuming that SNPs used to define a estimated in the second stage. Instead, it is forwarded to the se-
QTL region are highly correlated and will not saturate a model’s cond stage by proper weighting.  
Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023
M. J. Feldmann et al. | 5 
The form of the LMM for this analysis assuming only one ex- extend this using the approach for multilocus models as in 
planatory marker is Equation (8), with and without marker–marker interactions, de-
scribed in Feldmann, Piepho, Bridges, et al. (2021). Specifically, 
y̅ = 1μ + Zmm + Ig + IgR + ϵ̅ (1) θASV
m is the total variance explained by a marker and is analogous 
to the total genetic variance used to calculate broad sense herit-
where y̅ is the vector of LSMs with y ∼ N (1μ, V), μ is the population ability, not the additive genetic variance. 
mean and the only fixed effect, Zm is the design matrix linking en- It is important to consider the relationship between the main 
try means to marker genotypes, m is the vector of random effects effect of markers and marker–marker interactions and KASV. 
of the main-effects locus with m ∼ N (0, Iσ2
m), g is the vector of ran- When markers are highly correlated—due to linkage disequilib-
dom additive genetic effects associated with the genome-wide rium (LD) or selection bias—the LMM framework will fail to accur-
framework of marker excluding m with g ∼ N (0, KASVσ2
g), gR is ately partition variance between two main effects, even if an 
the vector of random residual genetic term—the portion of the to- estimator is “unbiased.” One possible strategy here is to create 
tal genetic effect not accounted for by m or g—with gR ∼ N (0, Iσ2
g ), multilocus genotypes, e.g. AA.AA, AA.AB,…, BB.AB, BB.BB, from 
R several SNPs defining a target QTL region. If LD is high in the re-
and ̅ϵ is the random residual term with ̅ϵ ∼ N (0, R). We use a pooled 
gion, there should be far fewer levels of the multilocus genotype 
estimate of σ2
ϵ̅ obtained from the first stage, so this term is known. than possible combinations. The same is true if the marker geno-
We then calculated KASV as types are highly correlated with the geometry of the KASV—the 
LMM framework will fail to accurately partition the variance be-
X̅X̅T
KASV = tween the oligogenic foreground and the polygenic background. 
(n − 1)− (2) 
1 tr(X̅X̅T
) One way to assess this is to examine the correlation between the 
first few eigenvectors of KASV and the main-effect marker geno-
where X̅ = PX is the mean-centered marker matrix, X is the mark- types. If the correlation is large in magnitude, regardless of direc-
er matrix coded [−1,0,1] for [aa,Aa,AA] genotypes, K̅ = X̅X̅T 
is the tion, the LMM will likely struggle to partition the variance 
realized genomic relationship or kinship matrix, P = I − n−11n1T components between the two terms accurately. 
n 
is the idempotent mean-centering matrix, and tr(·) is the trace. The ASV definition of the residual genetic variance is 
The ASV definition of total variance from LMM (1) is 
θASV
g = (n − 1)−1σ2 tr(I TP)
R g nI
R n
(6) 
θASV = σ2
y̅ = (n − 1)−1 tr(VP) gR
(3) 
= θASV
m + θASV
g + θASV + θASV
gR ϵ̅ Importantly, all terms are estimated on the same scale as the re-
sidual variance θASV
ϵ on an entry-mean basis. As with the marker 
where θASV
y̅ is the total phenotypic variance, V is the variance–co- variance, the residual genetic variance will not be accurately par-
variance among LSMs, θASV
m is the average semivariance of the sim- titioned from the polygenic background as KASV → I. While KASV 
ple genetic term, θASV
g is the average semivariance of the polygenic needs to have similar global features—n−1 tr(K) = 1 and 
􏽐
term, θASV
g is the average semivariance of the residual genetic n−2 􏽐
R i j Kij = 0—it is important that KASV ≠ I. 
term, and θASV
ϵ̅ is the average semivariance of the residuals. The ASV definition of the residual variance is 
The ASV definition of genomic variance is 
θASV
ϵ̅ = (n − 1)−1σ2
ϵ tr(InIT
nP)
(7) 
θASV
g = (n − 1)−1σ2
g tr(XXTP) = σ2
ϵ̅
􏼔 􏼕
tr(K̅) (4) 
= σ2
n − 1 g The residual variance σ2
ϵ̅ is estimated in the first stage and the es-
timate is carried forward to the second stage. 
In general, we replace the unknown parameter values (σ2
g) with Two crucial results from Piepho (2019) and Feldmann, Piepho, 
Bridges, et al. (2021) are that (1) the ASV variance component esti-
their REML estimates (σ̂2
g) to obtain the ASV estimates (θ̂ASV
g ). 
mates for the total genetic variance from a simple model are 
Following this form, it is possible to extend LMM (1) to include equivalent to the REML variance components and (2) for REML es-
dominance and epistatic sources of variance (see below). timates that are not ASV equivalent there are simple constants 
The ASV definition of marker-associated genetic variance is that can be applied post hoc to obtain ASV variance component 
estimates. Feldmann, Piepho, et al. (2022), and this article shows 
θASV
m = (n − 1)−1σ2
m tr(ZmZT
mPm) that some ASV variance components (e.g. additive genetic vari-
􏼢 􏼣
(n − n−1 􏽐nm ance, a single marker variance) can conveniently be obtained by 
h=1 n2
g : m )
= h σ2
m (5) 
n − 1 scaling the variance–covariance matrices for the specific random 
effects in the model directly. 
= kmσ̂2
m
Simulations confirm that ASV yields accurate 
where Pm = I − n −1
m 1 T
nm 1n is the idempotent mean-centering estimates of all genetic variance components and 
m 
marker genotype design matrix, nm is the number of marker gen- ratios 
otypes, and ng : mh is the number of entries nested in the hth marker As shown in previous studies (Piepho 2019; Feldmann, Piepho, 
genotype. We are factor-coding marker genotypes in these ana- Bridges, et al. 2021; Feldmann, Piepho, et al. 2022), ASV is ideal 
lyses and the marker genotypes are treated as discrete categorical for estimating the variance explained by both single loci and 
values instead of continuous values (dosage). It is possible to GRMs. In our simulations, we included variation in population  
Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023
6 | G3, 2023, Vol. 00, No. 0 
n = 500 n = 1,000 n = 1,814
1.0
0.5
0.0
−0.5
1.0
0.5
0.0
−0.5
1.0
0.5
0.0
−0.5
2
Ĥ2 2 2 2
ĥ2 ŝ ŝ
g Ĥ2 2 2 2 2 2
m g ŝm R ŝGR Ĥ ĥg Ĥ2 ŝ ŝ
m g ŝm R ŝ2 2 2 2 2
GR Ĥ ĥ ŝ ŝ2
g Ĥm g m ŝ2 2
R ŝGR
Variances and Ratios
Fig. 1. Effect of n and r on the relative bias of variance components and ratios in simulated outbred populations. Phenotypic observations were simulated 
for 100 samples with n = 500, 1,000, and 1,814 (left to right) genotyped for m = 5, 000 SNPs and the average heterozygosity H = 0.38. The relative bias of 
marker heritability, genomic heritability estimates (ĥ2
g), broad sense heritability, genomic variance, marker variance, residual genetic variance, and 
residual variance heritability when the number of replicates of each entry (r) =1 (upper panel), 2 (middle panel), and 4 (lower panel). Each box’s upper and 
lower halves correspond to the first and third quartiles (the 25th and 75th percentiles). The notch corresponds to the median (the 50th percentile). The 
upper whisker extends from the box to the highest value within 1.5 × IQR of the third quartile, where IQR is the inter-quartile range or distance between 
the first and third quartiles. The lower whisker extends from the first quartile to the lowest value within 1.5 × IQR of the quartile. The dashed line in each 
plot is the true value from simulations.   
size, e.g. n = 500, 1,000, and 1,814, and replication of entries, e.g. nonadditive polygenic terms, and achieve fully efficient stagewise 
r = 1, 2, and 4 for both outbred (Fig. 1) and inbred populations analysis. Depending on the population, trait, environment, etc., 
(Fig. 2). We can see that the same pattern has emerged as in pre- the unique components of the models demonstrated here can be 
vious studies; the ASV approach yields accurate and consistent combined to accurately and holistically decompose the multitude 
estimates of variance components and variance component ratios of potential sources of genetic variation. The code to execute these 
from LMM analyses regardless of the constitution of the popula- models using the R/sommer > = v4.2.0 (Covarrubias-Pazaran 2016) 
tion or the study design. Even when there is only one replicate is provided in the Methods & Materials section. 
per entry (r = 1), all explanatory genetic terms are accurately par-
titioned from the total variance. As n increased from 500 to 1,814, Extension #1: Incorporating multiple target loci and 
the precision of estimates increased dramatically (the sampling locus–locus interactions 
variance decreased). Increasing r from 1 to 4 did not affect the pre- It is common for multiple QTL to be implicated from genetic stud-
cision or accuracy of genomic and marker-associated variances. ies (Rutkoski et al. 2014; Lopdell et al. 2019; Rice and Lipka 2019), 
However, increased numbers of replicates did improve the preci- the utility of which is not always certain (Bernardo 2001, 2004). 
sion of residual variance components. This is because entries While the simulations in this paper rely exclusively on LMM (1), 
are replicated among plots (n · r), but markers and other genetic this model can be easily expanded to include multiple explana-
components are replicated among entries (n). Our simulations, tory loci and their interactions or statistical epistasis 
in conjunction with our previous results (Piepho 2019;  (Álvarez-Castro and Carlborg 2007), as demonstrated by  
Feldmann, Piepho, Bridges, et al. 2021; Feldmann, Piepho, et al. Feldmann, Piepho, Bridges, et al. (2021). For example, the LMM 
2022), demonstrate that in most populations—human, animal, with three main-effect loci is 
plant, or microbe—the ASV will yield accurate and easily inter-
preted estimates of different variance components. 􏽘3 􏽘2 􏽘3
y̅ = 1nμ + Zm m +
i i Zm m
ij ij
LMM extensions incorporating the ASV i = 1 i=1 j=i+1 (8) 
While an important model, LMM (1), only covers a narrow scope of + Zm123 m123 + Ig + IgR + ϵ̅
the possible genetic models and experiments, we want to provide re-
searchers with a clear strategy for expanding this approach to more where mi is the random effect of the ith main-effect marker, mij is 
complex systems. This section demonstrates how to partition the the random effect of the two-way interaction between the ith and 
additive and dominance variance from a single marker, incorporate jth markers, and m123 is the random effect of the three-way inter-
multiple explanatory loci, their interactions into the model, and action between the three main-effect loci. Zm , Zm , and Z
ij m123 are 
i
Relative Bias
Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023
r = 1 r = 2 r = 4
M. J. Feldmann et al. | 7 
n = 500 n = 1,000 n = 1,814
1.0
0.5
0.0
−0.5
−1.0
1.0
0.5
0.0
−0.5
−1.0
1.0
0.5
0.0
−0.5
−1.0
2 2
Ĥ2 2 2 2
ĥ2 ŝ ŝ
g Ĥ2
m g ŝm R ŝ2 2 2 2 2
GR Ĥ ĥg Ĥ2 ŝ ŝ
m g ŝm R ŝ2 2 2 2 2
GR Ĥ ĥ ŝ ŝ
g Ĥm g m ŝ2 2
R ŝGR
Variances and Ratios
Fig. 2. Effect of n and r on the relative bias of variance components and ratios in simulated inbred populations. Phenotypic observations were simulated 
for 100 samples with n = 500, 1,000, and 1,814 (left to right) genotyped for m = 5, 000 SNPs and the average heterozygosity H = 0. The relative bias of marker 
heritability, genomic heritability estimates (ĥ2
g), broad sense heritability, genomic variance, marker variance, residual genetic variance, and residual 
variance heritability when the number of replicates of each entry (r) =1 (upper panel), 2 (middle panel), and 4 (lower panel). Each box’s upper and lower 
halves correspond to the first and third quartiles (the 25th and 75th percentiles). The notch corresponds to the median (the 50th percentile). The upper 
whisker extends from the box to the highest value within 1.5 × IQR of the third quartile, where IQR is the inter-quartile range or distance between the first 
and third quartiles. The lower whisker extends from the first quartile to the lowest value within 1.5 × IQR of the quartile. The dashed line in each plot is the 
true value from simulations.   
design matrices that link levels of the explanatory marker and in- variance can be significant, and partitioning the additive and 
teractions to LSMs in y. The rest of the terms have the same defi- dominance sources of variance from significant markers can be 
nitions. LMM (8) follows directly from Equation (1) and the results useful in hybrid crop breeding and disease risk prognoses. Our 
from Feldmann, Piepho, Bridges, et al. (2021), specifically the two goal is to partition θASV
m , the variance explained by a focal locus, 
and three loci examples. into its additive (θASV
m ) and dominance (θASV
A m ) components. 
D
Since we are factor-coding marker genotypes in these models, Here, we demonstrate an LMM that can partition the main- 
that is we are thinking of the marker genotypes as discrete cat- effect marker’s additive and dominance sources of variance by 
egorical values instead of continuous values (dosage), it is possible transforming the marker genotypes into two factors. The form 
to fully saturate the multilocus interaction with more levels than of the linear mixed model (LMM) for this analysis assuming only 
are observed in a given data set. Hence, it is important to consider one explanatory marker is 
the number of interaction terms evaluated. In this situation, 
packages such as lme4::lmer() will report an error that the y̅ = 1μ + ZmA mA + ZmD mD + Ig + IgR + ϵ̅ (9) 
“number of levels of each grouping factor must be < number of where mA is the random additive effect of the main-effect locus 
[LSMs]” (Bates et al. 2015). Further, these models assume that ran-
with mA ∼ N (0, Iσ2
m ) and mD is the random dominance effect of 
dom effects are independent, so we do not advise incorporating A
main effects from the SNPs used to define a target QTL region. the main-effect locus with mD ∼ N (0, Iσ2
m ). ZmA is an n × 3 design 
D
Instead, it is possible to factor code a pseudohaplotype from the matrix linking marker genotypes to LSMs and ZmD is an n × 2 de-
best markers bracketing a QTL to capture the variance explained sign matrix linking genotypic state, either homozygous (AA and 
by that locus, which can be more informative than a single SNP. aa) or heterozygous (Aa), to LSMs. For example, the ZmA and ZmD 
This approach assumes that SNPs used to define a QTL region design matrices for five individuals (rows) with marker genotypes 
are not independent and do not fully saturate the model. at a focal locus of [AA, Aa, Aa, aa, aa] = [1, 0, 0, − 1, − 1] are 
⎡ ⎤ ⎡ ⎤
Extension #2: Partitioning θASV
m into additive (θASV
m ) and 1 0 0 1 0
A
dominance (θASV ⎢ ⎥
m ) components ⎢ 0 1 0 ⎢ ⎥
⎥ ⎢0 1⎥
D ZmA = ⎢ 0 1 0⎥⎢ ⎥, ZmD = ⎢ ⎥
⎢0 1⎥ (10) 
The factor-coding of the Mendelian and oligogenic markers is a ⎣ 0 0 1⎦ ⎣1 0⎦
different approach than is standard in GWAS. In GWAS, markers 0 0 1 1 0
are typically treated as fixed and coded as continuous values, e.g. 
the dosage model. Assuming that a researcher is working with an Other terms are defined in LMM (1). This extension is a partition 
outbred species and the heterozygosity (H) ≠ 0, the dominance of Equation (1). So we expect that Equations (1) and (9) are 
Relative Bias
Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023
r = 1 r = 2 r = 4
8 | G3, 2023, Vol. 00, No. 0 
equivalent, except that Equation (9) will yield a variance compo- Single Stage Analysis
nent for each of the additive and dominance terms, while 1.0
Equation (1) only yield the total genetic variance. 
The ASV estimate of the additive variance explained by a locus 0.5
is obtained as in Equation (5) by 
0.0
θ̂ASV
m = (n − 1)−1σ̂2
m tr(Z T
A A mA Zm P
A mA )
⎡ 􏽐n ⎤
n − n−1 mD −0.5
n2
= h=1 g : m (11) 
⎣ Ah⎦σ̂2
n − 1 mA
−1.0
2 2 2 2 2
ŝm ŝg ŝG ŝ
R GxE ŝR
where PmA = I − n−1 T
m 1
A nm 1n , n a e t e n m
A m mA r  h  u ber of levels coding 
A
the marker additive effects, ng : mA is the number of entries nested Stage−wise Analysis
h 
in the hth marker genotype (Feldmann, Piepho, Bridges, et al. 1.0
2021). The average semivariance estimate of the dominance vari-
ance explained by a locus is obtained by 0.5
θ̂ASV
m = 0.0
(n − 1)−1σ̂2
m tr(Z T
D D mD Zm P
D mD )
⎡ 􏽐n ⎤
n − n−1 mD 2
j=1 ng : m (12) −0.5
D
= ⎣ j⎦σ̂2
n − 1 mD
−1.0
2 2 2 2 2
where PmD = I − n−1 T ŝm ŝg ŝG ŝ ŝ
m 1
D nm 1n , n a e t e n m
m mD r  h  u ber of levels coding R GxE R
D D Variance Component
the genetic status, e.g. homozygous or heterozygous, ng : mD is 
i 
the number of entries nested in the jth genetic state. The sum of Fig. 3. Comparison of VCEs estimated from single-stage and stagewise 
[kmA σ̂2
m + k A V
mD σ̂2
m ] = [θ̂ S θ̂ S
D m + A V
m ] = θ̂ASV and analyses of 500 entries replicated in four environments with block effects. 
A A D m 
[θ̂ASV
m + θ̂ASV θ̂ S = 2 2 ×
A m ] − A V . 1 10−5. θ̂ASV The relative bias of genomic variance (σ̂2
g), marker variance (σ̂2
m), residual 
D m m is an accurate and consist- genetic variance (σ̂2
g ), genotype-by-environment interaction variance 
R
ent estimate of the variance explained by a marker (Feldmann, (σ̂2
G×E), and residual variance (σ̂2
R) analyzed in a single stage (upper panel) 
Piepho, Bridges, et al. 2021). The likelihood ratio (LR) between or stagewise stages (lower panel). Each box’s upper and lower halves 
LMM (1) and (9) was LR ≈ 0. It was not significant in any simulated correspond to the first and third quartiles (the 25th and 75th percentiles). 
The notch corresponds to the median (the 50th percentile). The upper 
populations (PLR > 0.2), suggesting that there is no appreciable dif- whisker extends from the box to the highest value within 1.5 × IQR of the 
ference between the model likelihood of Equations (1) and (9). For third quartile, where IQR is the inter-quartile range or distance between 
each term, θ̂ASV
m and θ̂ASV
A m , the average bias’ across the 100 simu- the first and third quartiles. The lower whisker extends from the first 
 D quartile to the lowest value within 1.5 × IQR of the quartile. The dashed 
lated populations was 1.06% and −1.24%, respectively. line in each plot is the true value from simulations.   
Extension #3: Incorporating additional polygenic terms for 
genome-wide dominance (gD) substantively different than both of the matrices proposed by  
LMM (1) can also be extended to include both additive (gA) and Nishio and Satoh (2014) and Su et al. (2012). 
dominance (gD) sources of genomic variance (Vitezica et al. 2013,  Feldmann, Piepho, et al. (2022) showed that, regardless of popu-
2017; Zhang et al. 2021). The form of the LMM for analysis with lation quality, a GRM with an average diagonal value of 1 and an 
both gA and gD assuming only one explanatory marker M is average element value of 0 will produce consistent variance com-
ponent estimates of the genomic variance. A matrix with the same 
y̅ = 1μ + Zmm + IgA + IgD + IgR + ϵ̅ (13) properties calculated from a dominance coding will produce simi-
larly unbiased parameter estimates. The dominance GRM pro-
where gA and gD are random effect vectors for the additive and dom- posed by Su et al. (2012) has an average diagonal value of 1, but 
inance polygenic effects, respectively, with gA ∼ N (0, KASVσ2
g ) and the average element value is >0, leading to a systematic under-
A estimating since the covariances are overestimated. The domin-
gD ∼ N (0, KD
ASVσ2
g ). The ASV dominance kernel is 
D ance GRM proposed by Nishio and Satoh (2014) has an average 
element value of 0, but the average element value is <1, leading 
KD W̅W̅T
ASV =  
(n − 1)− (14) to a systematic overestimating since the variances (diagonals) 
1 tr(W̅W̅T
) are underestimated. This is true for a wide range of population 
heterozygosities. 
where W = 1 − |X|, assuming X is coded [ − 1, 0, 1], and W̅ = PW. 
This is a feasible approach to improve genetic performance in cross- Extension #4: Stagewise LMM analysis for 
bred populations with large dominance genetic variation (Nishio multienvironment trials (METs) and meta-analysis 
and Satoh 2014; Vitezica et al. 2017; Xiang et al. 2018). Both KASV 
Stagewise analyses are common in plant breeding trials in aca-
and KD
ASV have the matrix properties proposed by Speed and 
􏽐 demic studies and the seed industry (Damesa et al. 2017, 2019). 
Balding (2015); i.e. n−1 tr(K) = 1 and n−2 􏽐
i j Kij = 0. The dominance 
One reason for this is that plant breeders are often not interested 
variance estimated with KD
ASV was accurate, and the relative bias in the performance per se of a line or hybrid within a specific loca-
from 100 simulated populations was −3.32%. Interestingly, KD
ASV is tion unless the presence of cross-over (e.g. rank change) G × E is 
Relative Bias Relative Bias
Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023
M. J. Feldmann et al. | 9 
large enough to make data from one target environment unin- This approach is very common in plant sciences because it 
formative in another set of target environments. Instead, plant is simple but problematic outside a specific set of unrealistic 
breeders are often more interested in the performance of entries conditions, i.e. IID entry means. It is simple because it does 
across environments (Bernardo 2020). It is common then to fit a not require any information on precision from stage 1. It is 
first model that accounts for the variation of random design ele- problematic because the residual and genotype-by- 
ments, e.g. locations, years, blocks, and fixed genotype effects, environment variances are confounded. The naive approach 
to obtain the phenotype—estimated marginal means (EMMs) or does not require additional arguments for LMM software 
best linear unbiased estimators (BLUEs)—for use in subsequent and can be executed in any LMM software. 
analyses. In subsequent stages, these entry-means within envir- 2) In a weighted stagewise analysis, the Ω̃ = D(ω)−1
n matrix is diag-
ge 
onments in a subdivision are used as the response variable. onal, but each diagonal element may differ based on data- 
In general, the single-stage analysis, when performed correctly, driven weight (D(ω)n ), where D(ω)n is an n -
ge ge ge dimension di-
should be considered the “gold standard.” However, there are ex-
perimental conditions where the stagewise analysis may be sim- agonal matrix, estimated in the first stage of the analysis. 
pler to execute and functionally equivalent to the single-stage Importantly, these weights are derived as one of many pos-
analysis when performed correctly (Fig. 3), and, given the fre- sible diagonal approximations of Ωnge (Móhring and Piepho 
quency of naive stagewise analyses—those that fail to incorporate 2009) or its inverse, e.g. Ω−1
n , from the first stage of the ana-
ge
the variance–covariance matrix of entry means, or even appropri- lysis (Smith et al. 2001). The weighted approach may take 
ate weights, from stage 1 into the second stage—we felt it prudent multiple forms that may or may not neglect the covariances 
to highlight the simplicity of these approaches to a general audi- among entry mean, leading to discrepancies between the 
ence. The purpose here is not to convince the reader that multi- single-stage and stagewise analyses (Smith et al. 2001,  
stage analyses are superior (they are not), nor to provide a 2005; Móhring and Piepho 2009). This approach requires an 
one-size-fits-all solution for every experiment (that is impossible), additional argument in LMM software, typically “weights,” 
but to provide a path for users to accomplish fully efficient, multi- which is input as a vector corresponding to entry means 
stage analyses using free, open-source software. Generally, the and internally transformed into a diagonal matrix, and can 
stagewise analysis should be considered a possible backup to be executed in several free or paid software (Inc. 2013;  
the single-stage analysis, not the standard (Schulz-Streeck et al. Covarrubias-Pazaran 2016; Butler 2021).  
2013; Gogel et al. 2018; Damesa et al. 2019; Buntaran et al. 2020). 3) In a fully efficient stagewise analysis, entry means are allowed 
The LMM for stage one is to have nondiagonal covariance structures with Ω̃ = Ωnge , 
where Ωnge is the full variance–covariance matrix of entry 
ye = Xeg∗e + Zeue + ϵe (15) means from the different environments [defined in 
Equation (16)]. This approach is the most general solution 
where Xe is the fixed effect design matrix linking observations to for implementing stagewise meta-analyses, maintaining 
entries, and g∗e are the fixed effects (e.g. BLUEs) for the entries in all variances and covariances without approximation, but 
the eth environment, Ze is the random effect design matrix for de- it is the most limited in terms of software implementations. 
sign (e.g. blocks) elements within each environment (e.g. years The full variance–covariance matrix of the entry means will 
and locations), and ϵe are the residuals and ϵe ∼ N (0, Re), where be nondiagonal in most cases, and a diagonal matrix 
Re is the residual variance–covariance matrix estimated in the (weighted or unweighted) is almost invariably an approxi-
eth environment. This model is fit within each environment mation as the random main effects of the environment, or 
independently. block, will induce a positive covariance among all entry 
From these models, we obtain the adjusted entry means y̅ and means. Incorporating the full variance–covariance matrix 
the variance–covariance matrices of the entry means Σe from requires a lot of additional data and significantly reduces 
each of ne environments, where ne are the number of environ- the computational efficiency of the LMM, which may out-
ments. We can then construct the nge × nge block-diagonal stage weigh potential practice benefits. This approach requires 
one Ω matrix as in Equation (16). the inverse of the full variance–covariance matrix, Ω−1
n , as an 
ge
⎡ ⎤ input argument and can now be executed in R/sommer > = 
Σ1 · · · 0
v4.2.0 (Covarrubias-Pazaran 2016).  
Ωnge = ⎢ .. . . ⎥
⎣ . . . .. ⎦ (16) 
0 · · · Σn The LMM for stage two is then: 
e
where nge is the number of entries nested in environments; for ex- y̅ = 1μ + Xe + Zmm + Zgg + ZgR gR + ZGEgGE + ϵ̅ (17) 
ample, if there are 500 entries in four environments, nge = 2, 000. 
This method allows us to carry the full Ωnge over from stage one where y̅ are the adjusted entry means from stage one, μ is the 
to stage two of the analysis. population mean, X is the fixed effect design matrix linking envir-
The second stage can take several forms with varying complex- onments to adjusted entry means, e are the fixed environmental 
ities, more complete approximations of Ωnge or Ω−1
n , and software main effects, g is the random additive genetic effect associated 
ge
accessibility. Briefly:  with the genome-wide framework of marker excluding m with 
g ∼ N (0, KASVσ2
g), gR is the random residual genetic term—the por-
1) In a naive stagewise analysis, the residual matrix is given as tion of the total genetic effect not accounted for by m or g—with 
the identity matrix multiplied by a scalar gR ∼ N (0, Inσ2
g ), gGE is the genotype-by-environment interaction 
R
(Ω̃ = Inge ω−1 = Inge σ2), where ω is a scalar (σ−2) estimated by term with gGE ∼ N (0, Inge σ2
g ), and ϵ̅ is the structured residual 
GE
the second stage LMM, assuming that the variances for entry term from stage one with ̅ϵ ∼ N (0, Ω) With this model, we can es-
means are identical with 0 covariances (independent); IID. timate the breeding values across environments with marker  
Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023
10 | G3, 2023, Vol. 00, No. 0 
information (K) as in GBLUP and can perform GWAS by adding an genetics, particularly as it relates to genomic relatedness, genom-
􏽐
iterative term for single marker regression, such as j
i=1 βixi where ic heritability, and GP (VanRaden 2008; Kang et al. 2010; Yang et al. 
j is the number of markers, βi is the linear regression coefficient of 2010; Habier et al. 2013) but it also deviates from the classic quan-
the ith marker, and xi is the numeric coding of the ith markers gen- titative genetic model conceptually in that it assumes that marker 
otypes, e.g. [ − 1, 0, 1]. effects are random variables (Falconer and Mackay 1996; Lynch 
We created 1,000 simulated populations with 1,000 entries and and Walsh 1998). Despite the conceptual deviation, this approach 
5,000 markers using a similar approach to the other simulations in has been demonstrated to have statistically valid assumptions 
this experiment. However, we included Environmental and Block and applied in several studies (Verbyla et al. 2012; Schreck et al. 
within Environment effects in this experiment. We estimate the 2019; Taylor et al. 2023). 
variance explained by the polygenic background, a large-effect lo- ASV has several beneficial elements, making it a viable option 
cus, the residual genetic variance, the genotype-by-environment for quantitative genetics. More importantly, it is appropriate for 
interaction variance, and the nongenetic residual. The single any quantitative discipline where variance components are of 
stage analysis yielded relative biases of −1.55%, −3.04%, −0.45%, interest, from plant and microbial biology to psychology and in-
−0.12%, and 0.03% for the marker variance (σ̂2
m), genomic variance fant research. Namely:  
(σ̂2
g), residual genetic variance (σ̂2
g ), genotype-by-environment 
R
interaction variance (σ̂2
g ), and residual variance (σ̂2
ϵ̅ ), respectively 1) The definitions of the variance components using ASV are additive 
GE
(Fig. 3). The two stage analysis yielded relative biases of −1.39%, and sum to the phenotypic variance. Consequently, the LMM can 
−3.09%, 0.48%, −0.21%, and 0.03% for the marker variance (σ̂2
m), be extended to incorporate many explanatory components, 
genomic variance (σ̂2
g), residual genetic variance (σ̂2
g ), e.g. dominance, epistasis, and transcriptomic, and will yield 
R
genotype-by-environment interaction variance (σ̂2
g ), and residual accurate VCEs for all terms. They will sum to the total vari-
GE
variance (σ̂2
ϵ̅ ), respectively (Fig. 3). ance. This is not necessarily true for all definitions of vari-
ance components, such as the Average Marginal Variance 
Extension #5: Incorporating kM directly into LMM analyses (Piepho 2019; Feldmann, Piepho, Bridges, et al. 2021).  
2) ASV is well suited for stagewise analyses. At the center of ASV is 
Feldmann, Piepho, Bridges, et al. (2021) introduced kM (5) as a post the idea that the “entry mean” is the phenotype per se, and 
hoc adjustment of the REML estimated variance explained by a not the observations directly. One interpretation is that indi-
marker to obtain ASV equivalent VCEs. This led to Feldmann, viduals, not observations, are the primary source of vari-
Piepho, et al. (2022), who showed that ASV estimates of the genom- ation or at least the primary source of interest. This 
ic variance could be obtained by scaling the genomic relationship concept can be easily extracted from single-stage analyses 
before or after the LMM analysis and introduced KASV, eliminating but seems at the heart of stagewise analyses (Piepho et al. 
the need for any post hoc adjustment. Scaling variance compo- 2012). Specifically, a single-stage analysis based on plot 
nents a priori is not novel and is routine in genomic evaluation data can be shown to be equivalent to a stagewise analysis 
across species (VanRaden 2008; Astle and Balding 2009; Yang in which entry means and their associated variance–covari-
et al. 2010; Endelman and Jannink 2012; Legarra 2016; Vitezica ance matrix is carried forward to the second stage, in which 
et al. 2017). LMMs can directly scale the variance–covariance ma- BLUPs are computed for the genetic effects (Piepho et al. 
trix for large-effect loci M by kM. 2012). ASV yields accurate estimates of the genetic and gen-
Instead, if we define: omic variance components in unreplicated or partially repli-
cated designs common in humans, plants, and animals. ASV 
KM = k−1
M InM (18) also yields accurate VCEs in fully efficient multistage 
approaches.  
where KM is nM × nM and nM is the number of marker genotypes at a 3) ASV does not impact the BLUPs or breeding value predictions in 
given locus, we can essentially think of KM as a genomic relation- Genomic (G)-BLUP. ASV is only used to obtain accurate VCEs. 
ship matrix, e.g. KASV, except that we apply KM to the levels of the It has been demonstrated that marker coding and different 
marker genotype instead of entries. strategies for scaling and centering Z and K do not impact 
The form of the LMM for this analysis assuming only one ex- BLUPs or prediction accuracy (Strandén and Christensen 
planatory marker is the same as Equation (1), but where m is 2011; Legarra 2016; Feldmann, Piepho, et al. 2022) and be-
the random effect of the main-effect locus with m ∼ N (0, KMσ2
m). cause ASV essentially works through a set of scalar coeffi-
With this approach, we maintain the levels of the factor come cients determined by the experiment and population to 
from the same variance and zero covariance, but our scaling fac- obtain the expected features for the genomic relationship 
tor is embedded directly in the model eliminating the need for ad- matrix. Practically, ASV does not change the information 
justment. Embedding kM in the LMM analysis using KM is embedded in the LMM or data, only the scaling of the VCEs.  
equivalent to the post hoc adjustment that was proposed in  4) ASV works under many model assumptions in GLMM analyses. 
Feldmann, Piepho, Bridges, et al. (2021), and so it is up to the Beyond the often-assumed variance–covariance structure 
user to determine which approach they prefer. in this study, e.g. R = Iσ2
ϵ , many structures will lead to non-
zero covariance between entry means. ASV can be applied 
Conclusions to designs accounting for spatial structures with auto- 
regressive correlations or spline-models (Rodríguez-Álvarez 
ASV is a strategy that can be used for estimating and partitioning et al. 2018; Selle et al. 2019). ASV can also be applied to data 
the total variance into components (Piepho 2019), such as the vari- sets where the observational units lead to nonnormality of 
ance explained by loci and locus–locus (Feldmann, Piepho, residuals, i.e. ordinal disease scores and proportion scores 
Bridges, et al. 2021) and the genomic variance (Feldmann, (Piepho 2019).  
Piepho, et al. 2022). The approach we are suggesting shares some As substantiated by our simulations in this study and the con-
common threads with the current thinking in quantitative text of our previous studies, ASV with REML estimation of the  
Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023
M. J. Feldmann et al. | 11 
underlying variance components yields accurate estimates for Bates D, Mächler M, Bolker B, Walker S. Fitting linear mixed-effects 
oligo- and polygenic effect, both individually and collectively, models using lme4. J Stat Softw. 2015;67:1–48. doi:10.18637/jss. 
and BLUPs of the additive and dominance effects of marker loci v067.i01 
(Piepho 2019; Feldmann, Piepho, Bridges, et al. 2021; Feldmann, Bernardo R. What if we knew all the genes for a quantitative trait 
Piepho, et al. 2022). ASV directly yields accurate estimates of gen- in hybrid crops? Crop Sci. 2001;41:1–4. doi:10.2135/cropsci2001.4111 
omic heritability in the observed population and can be used to Bernardo R. What proportion of declared QTL in plants are false? Theor 
adjust deviations that arise from other commonly used methods Appl Genet. 2004;109:419–424. doi:10.1007/s00122-004-1639-3 
for calculating genomic relationships regardless of the population Bernardo R. Genomewide selection when major genes are known. 
constitution, such as inbred lines and F1 hybrids, unstructured Crop Sci. 2014;54:68–75. doi:10.2135/cropsci2013.05.0315 
GWAS populations, or animal herds and flocks. We believe that Bernardo R. Bandwagons I, too, have known. Theor Appl Genet. 2016; 
KASV provides a powerful approach for directly estimating genom- 129:2323–2332. doi:10.1007/s00122-016-2772-5 
ic heritability for the observed population regardless of study or- Bernardo R. Reinventing quantitative genetics for plant breeding: 
ganism or experiment design (Visscher et al. 2006, 2008, 2010). In something old, something new, something borrowed, something 
conclusion, we recommend that genetics researchers studying blue. Heredity. 2020;125:375–385. doi:10.1038/s41437-020-0312-1 
humans, microbes, or (un)domesticated plants and animals con- Brandariz SP, Bernardo R. Small ad hoc versus large general training 
sider the ASV approach. populations for genomewide selection in maize biparental 
crosses. Theor Appl Genet. 2019;132:347–353. doi:10.1007/ 
s00122-018-3222-3 
Data availability Buntaran H, Piepho HP, Schmidt P, Rydén J, Halling M, Forkman J. 
Cross-validation of stagewise mixed-model analysis of Swedish 
Code and output for simulations are provided in the publicly avail-
variety trials with winter wheat and spring barley. Crop Sci. 
able Zenodo repository (https://doi.org/10.5281/zenodo.6981359) 
2020;60:2221–2240. doi:10.1002/csc2.20177 
(Feldmann, Covarrubias-Pazaran, et al. 2022). 
Butler D. asreml: Fits the Linear Mixed Model. R package version 
4.1.0.160; 2021. 
Covarrubias-Pazaran G. Changes and FAQs for the sommer package; 
Acknowledgements 2023. https://r-mirror.zim.unidue.de/web/packages/sommer/ 
MJF thanks Steven J. Knapp for guidance, support, and vignettes/v2.sommer.changes.and.faqs.pdf 
mentorship. Covarrubias-Pazaran G. Genome-assisted prediction of quantitative 
traits using the R package sommer. PLoS ONE. 2016;11:e0156744. 
doi:10.1371/journal.pone.0156744 
Funding Damesa TM, Hartung J, Gowda M, Beyene Y, Das B, Semagn K, Piepho 
HP. Comparison of weighted and unweighted stage-wise analysis 
MJF and this research were supported by grants to Steven J. Knapp for genome-wide association studies and genomic selection. Crop 
from the United States Department of Agriculture (http://dx.doi. Sci. 2019;59:2572–2584. doi:10.2135/cropsci2019.04.0209 
org/10.13039/100000199), National Institute of Food and Damesa TM, Möhring J, Worku M, Piepho HP. One step at a time: 
Agriculture (NIFA) Specialty Crops Research Initiative (# stage-wise analysis of a series of experiments. Agron J. 2017; 
2017-51181-26833 and # 2022-51181-38328), and California 109:845–857. doi:10.2134/agronj2016.07.0395 
Strawberry Commission (http://dx.doi.org/10.13039/100006760), de Los Campos G, Vazquez AI, Fernando R, Klimentidis YC, Sorensen 
in addition to funding from the University of California, Davis D. Prediction of complex human traits using the genomic best lin-
(http://dx.doi.org/10.13039/100007707). The German Research ear unbiased predictor. PLoS Genet. 2013;9:e1003608. doi:10. 
Foundation (DFG) supported H-PP, grant PI 377/24-1. The funders 1371/journal.pgen.1003608 
had no role in study design, data collection, analysis, publication Demmings EM, Williams BR, Lee CR, Barba P, Yang S, Hwang CF, 
decision, or manuscript preparation. Reisch BI, Chitwood DH, Londo JP. Quantitative trait locus ana-
lysis of leaf morphology indicates conserved shape loci in grape-
vine. Front Plant Sci. 2019;10:1373. doi:10.3389/fpls.2019.01373 
Conflicts of interest De Villemereuil P, Morrissey MB, Nakagawa S, Schielzeth H. 
The author(s) declare no conflict of interest. Fixed-effect variance and the estimation of repeatabilities and 
heritabilities: issues and solutions. J Evol Biol. 2018;31:621–632. 
doi:10.1111/jeb.13232 
Literature cited DeWitt N, Guedira M, Lauer E, Murphy JP, Marshall D, Mergoum M, 
Johnson J, Holland JB, Brown-Guedira G. Characterizing the oligo-
Álvarez-Castro JM, Carlborg O. A unified model for functional and genic architecture of plant growth phenotypes informs genomic 
statistical epistasis and its application in quantitative trait loci selection approaches in a common wheat population. BMC 
analysis. Genetics. 2007;176:1151–1167. Genomics. 2021;22:1–18. doi:10.1186/s12864-021-07574-6 
Anderson JA, Chao S, Liu S. Molecular breeding using a major QTL for Dias K, Piepho H, Guimarães L, Guimarães PdO, Parentoni S, Pinto 
fusarium head blight resistance in wheat. Crop Sci. 2007;47: MdO, Noda R, Magalhães J, Guimarães C, Garcia A, et al. Novel 
1–112. doi:10.2135/cropsci2006.05.0359 strategies for genomic prediction of untested single-cross maize 
Ariyomo TO, Carter M, Watt PJ. Heritability of boldness and aggres- hybrids using unbalanced historical data. Theor Appl Genet. 
siveness in the zebrafish. Behav Genet. 2013;43:161–167. doi:10. 2020;133:443–455. doi:10.1007/s00122-019-03475-1 
1007/s10519-013-9585-y Eichler EE, Flint J, Gibson G, Kong A, Leal SM, Moore JH, Nadeau JH. 
Astle W, Balding DJ. Population structure and cryptic relatedness in Missing heritability and strategies for finding the underlying 
genetic association studies. Stat Sci. 2009;24:451–471. doi:10. causes of complex disease. Nat Rev Genet. 2010;11:446–450. doi: 
1214/09-STS307 10.1038/nrg2809  
Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023
12 | G3, 2023, Vol. 00, No. 0 
Endelman JB. Fully efficient, two-stage analysis of multi- Heslot N, Jannink JL, Sorrells ME. Using genomic prediction to char-
environment trials with directional dominance and multi-trait acterize environments and optimize prediction accuracy in ap-
genomic selection. bioRxiv, 2022. plied breeding data. Crop Sci. 2013;53:921–933. doi:10.2135/ 
Endelman JB. Ridge regression and other kernels for genomic selec- cropsci2012.07.0420 
tion with R package rrBLUP. Plant Genome. 2011;4:250–255. doi: Inc. SI. SAS/STAT 13.1 User’s Guide: Chapter 43—The GLIMMIX 
10.3835/plantgenome2011.08.0024 Procedure. Cary (NC): Author; 2013. 
Endelman JB, Jannink JL. Shrinkage estimation of the realized rela- Jensen J, Su G, Madsen P. Partitioning additive genetic variance into 
tionship matrix. G3: Genes, Genomes, Genet. 2012;2:1405–1413. genomic and remaining polygenic components for complex traits 
doi:10.1534/g3.112.004259 in dairy cattle. BMC Genet. 2012;13:44. doi:10.1186/1471-2156-13-44 
Falconer D, Mackay T. Introduction to Quantitative Genetics. Harlow Kang HM, Sul JH, Service SK, Zaitlen NA, Kong Sy, Freimer NB, Sabatti 
(UK): Longmans Green; 1996. C, Eskin E. Variance component model to account for sample 
Feldmann M, Covarrubias-Pazaran G, Piepho HP. Data for “complex structure in genome-wide association studies. Nat Genet. 2010; 
traits and candidate genes”; 2022. [accessed 2022 August 10].  42:348–354. doi:10.1038/ng.548 
https://doi.org/10.5281/zenodo.6981359 Lander ES, Schork NJ. Genetic dissection of complex traits. Science. 
Feldmann MJ, Piepho HP, Bridges WC, Knapp SJ. Average semivariance 1994;265:2037–2048. doi:10.1126/science.8091226 
yields accurate estimates of the fraction of marker-associated gen- Legarra A. Comparing estimates of genetic variance across different 
etic variance and heritability in complex trait analyses. PLoS Genet. relationship models. Theor Popul Biol. 2016;107:26–30. doi:10. 
2021;17:e1009762. doi:10.1371/journal.pgen.1009762 1016/j.tpb.2015.08.005 
Feldmann MJ, Piepho HP, Knapp SJ. Average semivariance directly Li B, VanRaden P, Null D, O’Connell J, Cole J. Major quantitative trait 
yields accurate estimates of the genomic variance in complex loci influencing milk production and conformation traits in 
trait analyses. G3 (Bethesda). 2022;12:jkac080. doi:10.1093/ Guernsey dairy cattle detected on Bos taurus autosome 19. J 
g3journal/jkac080 Dairy Sci. 2021;104:550–560. doi:10.3168/jds.2020-18766 
Galli G, Alves FC, Morosini JS, Fritsche-Neto R. On the usefulness of Li H, Wang J, Bao Z. A novel genomic selection method combining 
parental lines GWAS for predicting low heritability traits in trop- GBLUP and LASSO. Genetica. 2015;143:299–304. doi:10.1007/ 
ical maize hybrids. PLoS ONE. 2020;15:e0228724. doi:10.1371/ s10709-015-9826-5 
journal.pone.0228724 Lopdell TJ, Tiplady K, Couldrey C, Johnson TJ, Keehan M, Davis SR, 
Gaudet MM, Kirchhoff T, Green T, Vijai J, Korn JM, Guiducci C, Segrè Harris BL, Spelman RJ, Snell RG, Littlejohn MD. Multiple QTL 
AV, McGee K, McGuffog L, Kartsonaki C, et al. Common genetic underlie milk phenotypes at the CSF2RB locus. Genet Sel Evol. 
variants and modification of penetrance of BRCA2-associated 2019;51:3. doi:10.1186/s12711-019-0446-x 
breast cancer. PLoS Genet. 2010;6:e1001183. doi:10.1371/ Lynch M, Walsh B. Genetics and Analysis of Quantitative Traits. Vol. 
journal.pgen.1001183 1. Sunderland (MA): Sinauer; 1998. 
Gbur EE, et al. Analysis of Generalized Linear Mixed Models in the Mackay TF. Quantitative trait loci in Drosophila. Nat Rev Genet. 2001; 
Agricultural and Natural Resources Sciences. Vol. 156. Hoboken, 2:11–20. doi:10.1038/35047544 
NJ: John Wiley & Sons; 2020. Mackay TFC, Stone EA, Ayroles JF. The genetics of quantitative traits: 
Gogel B, Smith A, Cullis B. Comparison of a one-and two-stage mixed challenges and prospects. Nat Rev Genet. 2009;10:565–577. doi: 
model analysis of Australia’s national variety trial southern re- 10.1038/nrg2612 
gion wheat data. Euphytica. 2018;214:1–21. Meuwissen T, Hayes B, Goddard M. Genomic selection: a paradigm 
González-Barrios P, Díaz-García L, Gutiérrez L. Mega-environmental shift in animal breeding. Animal Front. 2016;6:6–14. doi:10. 
design: using genotype× environment interaction to optimize re- 2527/af.2016-0002 
sources for cultivar testing. Crop Sci. 2019;59:1899–1915. Móhring J, Piepho H. Comparison of weighting in two-stage analyses 
Habier D, Fernando RL, Dekkers JC. The impact of genetic relation- of series of experiments. Crop Sci. 2009;49:1988. 
ship information on genome-assisted breeding values. Genetics. Morota G, Gianola D. Kernel-based whole-genome prediction of com-
2007;177:2389–2397. doi:10.1534/genetics.107.081190 plex traits: a review. Front Genet. 2014;5:363. 
Habier D, Fernando RL, Garrick DJ. Genomic BLUP decoded: a look Nishio M, Satoh M. Including dominance effects in the genomic BLUP 
into the black box of genomic prediction. Genetics. 2013;194: method for genomic evaluation. PLoS ONE. 2014;9:e85792. doi:10. 
597–607. doi:10.1534/genetics.113.152207 1371/journal.pone.0085792 
Han K, Lee HY, Ro NY, Hur OS, Lee JH, Kwon JK, Kang BC. QTL map- Piepho HP. A coefficient of determination (R2) for generalized linear 
ping and GWAS reveal candidate genes controlling capsaicinoid mixed models. Biom J. 2019;61:860–872. 
content in capsicum. Plant Biotech J. 2018;16:1546–1558. doi:10. Piepho HP, Moehring J, Schulz-Streeck T, Ogutu JO. A stage-wise ap-
1111/pbi.12894 proach for the analysis of multi-environment trials. Biom J. 2012; 
Hayes B, Goddard ME. The distribution of the effects of genes affect- 54:844–860. doi:10.1002/bimj.201100219 
ing quantitative traits in livestock. Genet Sel Evol. 2001;33:1–21. Pincot DD, et al. Novel fusarium wilt resistance genes uncovered in 
doi:10.1186/1297-9686-33-3-209 natural and cultivated strawberry populations are found on three 
Hayes BJ, Pryce J, Chamberlain AJ, Bowman PJ, Goddard ME. Genetic non-homoeologous chromosomes. Theor Appl Genet. 2022; 
architecture of complex traits and accuracy of genomic predic- 135(6):1–25. 
tion: coat colour, milk-fat percentage, and type in holstein cattle Pincot DD, Hardigan MA, Cole GS, Famula RA, Henry PM, Gordon TR, 
as contrasting model traits. PLoS Genet. 2010;6:e1001139. doi:10. Knapp SJ. Accuracy of genomic selection and long-term genetic 
1371/journal.pgen.1001139 gain for resistance to verticillium wilt in strawberry. Plant 
Heffner EL, Lorenz AJ, Jannink JL, Sorrells ME. Plant breeding with Genome. 2020;13:e20054. doi:10.1002/tpg2.20054 
genomic selection: gain per unit time and cost. Crop Sci. 2010; Rice B, Lipka AE. Evaluation of rr-BLUP genomic selection models 
50:1681–1690. doi:10.2135/cropsci2009.11.0662 that incorporate peak genome-wide association study signals in 
Heffner EL, Sorrells ME, Jannink JL. Genomic selection for crop improve- maize and sorghum. Plant Genome. 2019;12:180052. doi:10. 
ment. Crop Sci. 2009;49:1–12. doi:10.2135/cropsci2008.08.0512 3835/plantgenome2018.07.0052  
Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023
M. J. Feldmann et al. | 13 
Rodríguez-Álvarez MX, Boer MP, van Eeuwijk FA, Eilers PH. association of complex traits in heterogeneous stock mice. Nat 
Correcting for spatial heterogeneity in plant breeding experi- Genet. 2006;38:879–887. doi:10.1038/ng1840 
ments with p-splines. Spat Stat. 2018;23:52–71. VanRaden PM. Efficient methods to compute genomic predictions. J 
Rutkoski JE, Poland JA, Singh RP, Huerta-Espino J, Bhavani S, Barbier Dairy Sci. 2008;91:4414–4423. doi:10.3168/jds.2007-0980 
H, Rouse MN, Jannink JL, Sorrells ME. Genomic selection for quan- Verbyla AP, Taylor JD, Verbyla KL. RWGAIM: an efficient high- 
titative adult plant stem rust resistance in wheat. Plant Genome. dimensional random whole genome average (QTL) interval map-
2014;7. doi:10.3835/plantgenome2014.02.0006 ping approach. Genet Res (Camb). 2012;94:291–306. doi:10.1017/ 
Schreck N, Piepho HP, Schlather M. Best prediction of the additive S0016672312000493 
genomic variance in random-effects models. Genetics. 2019; Visscher PM, Brown MA, McCarthy MI, Yang J. Five years of GWAS 
213:379–394. doi:10.1534/genetics.119.302324 discovery. Am J Hum Genet. 2012;90:7–24. doi:10.1016/j.ajhg. 
Schulz-Streeck T, Ogutu JO, Piepho HP. Comparisons of single-stage 2011.11.029 
and two-stage approaches to genomic selection. Theor Appl Visscher PM, Hill WG, Wray NR. Heritability in the genomics era— 
Genet. 2013;126:69–82. doi:10.1007/s00122-012-1960-1 concepts and misconceptions. Nat Rev Genet. 2008;9:255–266. 
Searle SR, Casella G, McCulloch C. Variance Components. Hoboken, doi:10.1038/nrg2322 
NJ: John Wiley & Sons; 1992. Visscher PM, Medland SE, Ferreira MA, Morley KI, Zhu G, Cornes BK, 
Selle ML, Steinsland I, Hickey JM, Gorjanc G. Flexible modelling of Montgomery GW, Martin NG. Assumption-free estimation of her-
spatial variation in agricultural field trials with the R package itability from genome-wide identity-by-descent sharing between 
INLA. Theor Appl Genet. 2019;132:3277–3293. doi:10.1007/ full siblings. PLoS Genet. 2006;2:e41. doi:10.1371/journal.pgen. 
s00122-019-03424-y 0020041 
Septiningsih EM, Pamplona AM, Sanchez DL, Neeraja CN, Vergara Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, 
GV, Heuer S, Ismail AM, Mackill DJ. Development of Yang J. 10 years of GWAS discovery: biology, function, and trans-
submergence-tolerant rice cultivars: the sub1 locus and beyond. lation. Am J Hum Genet. 2017;101:5–22. doi:10.1016/j.ajhg.2017. 
Ann Bot. 2009;103:151–160. doi:10.1093/aob/mcn206 06.005 
Smith A, Cullis B, Gilmour A. Applications: the analysis of crop var- Visscher PM, Yang J, Goddard ME. A commentary on “Common SNPs 
iety evaluation data in Australia. Aust N Z J Stat. 2001;43: explain a large proportion of the heritability for human height” by 
129–145. doi:10.1111/1467-842X.00163 Yang et al. (2010). Twin Res Hum Genet. 2010;13:517–524. doi:10. 
Smith A, Cullis BR, Thompson R. The analysis of crop cultivar 1375/twin.13.6.517 
breeding and evaluation trials: an overview of current mixed Vitezica ZG, Legarra A, Toro MA, Varona L. Orthogonal estimates of 
model approaches. J Agric Sci. 2005;143:449–462. doi:10.1017/ variances for additive, dominance, and epistatic effects in popu-
S0021859605005587 lations. Genetics. 2017;206:1297–1307. doi:10.1534/genetics.116. 
Speed D, Balding DJ. Relatedness in the post-genomic era: is it still 199406 
useful? Nat Rev Genet. 2015;16:33–44. doi:10.1038/nrg3821 Vitezica ZG, Varona L, Legarra A. On the additive and dominant vari-
Spindel J, Begum H, Akdemir D, Collard B, Redoña E, Jannink J, ance and covariance of individuals within the genomic selection 
McCouch S. Genome-wide prediction models that incorporate scope. Genetics. 2013;195:1223–1230. doi:10.1534/genetics.113. 
de novo GWAS are a powerful new tool for tropical rice improve- 155176 
ment. Heredity. 2016;116:395–408. doi:10.1038/hdy.2015.113 Walsh B, Lynch M. Evolution and Selection of Quantitative Traits. 
Strandén I, Christensen OF. Allele coding in genomic evaluation. Hoboken, NJ: Oxford University Press; 2018. 
Genet Sel Evol. 2011;43:1–11. Wray NR, Kemper KE, Hayes BJ, Goddard ME, Visscher PM. Complex 
Stroup WW, Milliken GA, Claassen EA, Wolfinger RD. SAS for Mixed trait prediction from genome data: contrasting EBV in livestock to 
Models: Introduction and Basic Applications. Hoboken, NJ: SAS PRS in humans: genomic prediction. Genetics. 2019;211: 
Institute; 2018. 1131–1141. doi:10.1534/genetics.119.301859 
Su G, Christensen OF, Ostersen T, Henryon M, Lund MS. Estimating Xiang T, Christensen OF, Vitezica ZG, Legarra A. Genomic model with 
additive and non-additive genetic variances and predicting gen- correlation between additive and dominance effects. Genetics. 
etic merits using genome-wide dense single nucleotide poly- 2018;209:711–723. doi:10.1534/genetics.118.301015 
morphism markers. PLoS ONE. 2012;7:e45293. doi:10.1371/ Xin F, Zhu T, Wei S, Han Y, Zhao Y, Zhang D, Ma L, Ding Q. QTL map-
journal.pone.0045293 ping of kernel traits and validation of a major QTL for kernel 
Tang S, Leon A, Bridges WC, Knapp SJ. Quantitative trait loci for gen- length-width ratio using SNP and bulked segregant analysis in 
etically correlated seed traits are tightly linked to branching and wheat. Sci Rep. 2020;10:1–12. 
pericarp pigment loci in sunflower. Crop Sci. 2006;46:721–734. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, 
doi:10.2135/cropsci2005.0006-7 Madden PA, Heath AC, Martin NG, Montgomery GW. Common 
Taylor J, Jorgensen D, Moffat CS, Chalmers KJ, Fox R, Hollaway GJ, SNPs explain a large proportion of the heritability for human 
Cook MJ, Neate SM, See PT, Shankar M. An international wheat di- height. Nat Genet. 2010;42:565–569. doi:10.1038/ng.608 
versity panel reveals novel sources of genetic resistance to tan Zhang J, Liu F, Reif JC, Jiang Y. On the use of GBLUP and its extension 
spot in Australia. Theor Appl Genet. 2023;136:61. doi:10.1007/ for GWAS with additive and epistatic effects. G3: Genes, Genomes, 
s00122-023-04332-y Genetics. 2021;11(7):jkab122. doi:10.1093/g3journal/jkab122  
Valdar W, Solberg LC, Gauguier D, Burnett S, Klenerman P, Cookson 
WO, Taylor MS, Rawlins JNP, Mott R, Flint J. Genome-wide genetic Editor: A. Lipka   
Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023