G3, 2023, https://doi.org/10.1093/g3journal/jkad148 Advance Access Publication Date: 5 July 2023 Investigation Complex traits and candidate genes: estimation of genetic variance components across multiple genetic architectures Mitchell J. Feldmann,1,* Giovanny Covarrubias-Pazaran,2,3 Hans-Peter Piepho4 1Department of Plant Sciences, University of California Davis, One Shields Ave, Davis, CA 95616, USA 2International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, El Batán, 56130 Texcoco, Edo. de México, México 3Present address: International Rice Research Institute, Los Baños, 4031 Laguna, Philippines 4Biostatistics Unit, Institute of Crop Science, University of Hohenheim, Stuttgart 70599, Germany *Corresponding author: Department of Plant Science, University of California Davis, One Shields Ave, Davis, CA, 95616, USA. Email: mjfeldmann@ucdavis.edu Abstract Large-effect loci—those statistically significant loci discovered by genome-wide association studies or linkage mapping—associated with key traits segregate amidst a background of minor, often undetectable, genetic effects in wild and domesticated plants and animals. Accurately attributing mean differences and variance explained to the correct components in the linear mixed model analysis is vital for selecting superior progeny and parents in plant and animal breeding, gene therapy, and medical genetics in humans. Marker-assisted prediction and its successor, genomic prediction, have many advantages for selecting superior individuals and understanding disease risk. However, these two approaches are less often integrated to study complex traits with different genetic architectures. This simulation study demonstrates that the average semivariance can be applied to models incorporating Mendelian, oligogenic, and polygenic terms simultaneously and yields accurate estimates of the variance explained for all relevant variables. Our previous research focused on large- effect loci and polygenic variance separately. This work aims to synthesize and expand the average semivariance framework to various genetic architectures and the corresponding mixed models. This framework independently accounts for the effects of large-effect loci and the polygenic genetic background and is universally applicable to genetics studies in humans, plants, animals, and microbes. Keywords: average semivariance, linear mixed model, variance component estimation, polygenic inheritance, oligogenic inheritance, Mendelian inheritance Introduction value or genetic merit of a candidate individual (VanRaden 2008). For loci to provide actionable gains or diagnoses, they Today, linear mixed models (LMMs) are routinely applied in plant must explain a significant proportion of phenotypic and genetic breeding and quantitative genetics research. They are used for the variation in a population with alleles in segregation at target loci. prediction of genetic values in plants and animals (VanRaden Candidate gene discovery through genome-wide association 2008; Heffner et al. 2010; Meuwissen et al. 2016), or polygenic risk studies (GWAS) and quantitative trait locus (QTL) mapping is pro- scores (PRSs) in humans (de Los Campos et al. 2013; Wray et al. lific in plant and animal populations (Lander and Schork 1994; 2019), to estimate the heritability of traits in target populations Visscher et al. 2012, 2017). Despite decades of directional selection (Visscher et al. 2008; Legarra 2016), and to estimate ecological in many plant populations, loci impacting traits of interest still seg- and evolutionary genetic parameters of behavioral traits regate, even in advanced breeding materials. These genome-wide (Ariyomo et al. 2013; Walsh and Lynch 2018). Genetic values are analyses have implicated numerous genes and genomic regions constructed from a combination of genetic effects; including in controlling a wide variety of simple and complex traits Mendelian factors; which may have both additive and dominant (Anderson et al. 2007; Septiningsih et al. 2009; Han et al. 2018; sources of variance (Pincot et al. 2022), oligogenic factors consist- Demmings et al. 2019; Xin et al. 2020). However, the utility of such ing of few genetic factors and their epistatic interactions appropri- marker–trait associations may not be fully realized (Bernardo ate for marker-assisted prediction (MAP) (Tang et al. 2006), a 2004, 2016). Large-effect and statistically significant loci typically polygenic term consisting of a dense genome-wide framework of only explain a fraction of the genetic and phenotypic variance in markers assumed to have minor effects suitable for genomic pre- a population (Feldmann, Piepho, Bridges, et al. 2021), along with diction (GP); which may also account of additive and dominance the polygenic fraction (Feldmann, Piepho, et al. 2022), except in ex- sources of variance (Brandariz and Bernardo 2019), and a residual treme scenarios when Mendelian factors wholly control a trait. genetic term consisting of all genetic effects not accounted for by Discovered loci rarely, if ever, explain 100% of the genetic vari- the previous genetic factors (Rutkoski et al. 2014; Rice and Lipka ance, and understanding the multiple sources of variation and 2019; DeWitt et al. 2021). The ultimate objective in breeding appli- how they relate can help breeders and research prioritize targets cations is, typically, predicting the genotypic value, e.g. breeding and mitigate risk (Bernardo 2004, 2014). Genes with significant Received: April 27, 2023. Accepted: June 12, 2023 © The Author(s) 2023. Published by Oxford University Press on behalf of The Genetics Society of America. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023 2 | G3, 2023, Vol. 00, No. 0 effects often dominate the “nonmissing heritability,” but they can mmer(fixed = Y ∼ 1, mask or obscure the effects of other quantitatively acting genes random = ∼ vsr(M, Gu = KM) + and pleiotropically affect multiple quantitative phenotypes vsr(G, Gu = Kasv) + (Mackay 2001; Mackay et al. 2009; Eichler et al. 2010; De GR, Villemereuil et al. 2018). For example, mutations in the BRCA2 rcov = ∼ units, gene can have large effects but be incompletely penetrant, inter- data = data) act with other genes, and may be necessary but insufficient for where KM is the matrix KM = k−1 M InM . All other variables are the predicting breast, ovarian, and other cancer risks in women same as previously defined. (Gaudet et al. 2010). Accurately partitioning the Mendelian, oligo- We generated 18 experiment designs with different population genic, and polygenic sources of variance allows researchers to as- sizes of n = 500, 1,000, and 1,814, and number of clonal replicates sess the value conferred by specific loci. per entry r = 1, 2, and 4 for outbred H = 0.38 and inbred H = 0.0 popu- Here, we use simulations to show that the average semivariance lations. Clonal replicates are a particular case in plant genetics of (ASV) provides accurate variance component estimates (VCEs) and hybrid (e.g. maize, rice, and sorghum) cropping systems and in clon- variance component ratios for all relevant genetic terms regardless ally propagated species (e.g. strawberry, potato, and apple). In all ex- of study design or population type, e.g. outbred or inbred. We sought amples, 100 populations are genotyped at m = 5, 000 loci. These to provide a synthesis and extension of the previously published 5,000 single nucleotide polymorphisms (SNPs) generated the purely works on the ASV (Piepho 2019; Feldmann, Piepho, Bridges, et al. additive polygenic background and one locus for the simple genetic 2021; Feldmann, Piepho, et al. 2022) and to present a fully realized effect. Marker genotypes, e.g. alleles, were drawn from a multivari- and efficient ASV approach for typical LMM analyses in human, ate normal distribution to replicate the population structure of the plant, animal, and microbial genetics. We demonstrate how these 1,814 mice from Valdar et al. (2006) using R/MASS::mvrnorm() and models can be extended to handle more complex genetic structures, transformed such that the population was heterozygosity H = 0.38. including adding multiple explanatory loci and marker–marker in- We then estimated KASV and excluded the targeted locus from the teractions, incorporating nonadditive dominance and epistasis vari- calculation of KASV. We also simulated residual genetic and residual √��� ance, partitioning marker variance into additive and dominance effects each from a normal distribution with μ = 0 and θASV = 50 √��� gR components, and performing fully efficient stagewise analysis. To and θASV R = 40 using R/stats::rnorm(). A single explanatory lo- accommodate the models proposed in this research, we enabled cus was simulated with a segregation ratio of approximately 1 : 2 : 1 􏽰�������� the flexibility to provide the weights into the mixed model machin- for AA:Aa:aa marker genotypes with μ = 0 and θASV m = kM · 66 using ery in the form of a matrix (diagonal or nondiagonal) instead of a vec- R/stats::rnorm(). We simulated marker effects for all m = 5, 000 √��� tor, which is now available in R/sommer >= v4.2.0. We provide loci following a normal distribution μ = 0 and θASV g = 66, and each examples of expressing the different models and extensions in the locus contributes equally. When multiplied by the centered marker freely available R/sommer package (Covarrubias-Pazaran 2016). genotypes and summed, the score is taken as each individual’s true The ASV is a powerful tool for answering these questions regardless additive genetic value g. For each simulated population we ex- of the organism, population, or trait. pressed LMM (1) using R/sommer::mmer() (Covarrubias-Pazaran 2016). In the second set of simulations, we used the same approach and the same mean and variance parameters. However, in this ex- Methods & materials ample, we simulated inbred lines in the background polygenic mar- Computer simulations model statements in kers (H = 0.0) and the foreground markers, e.g. 1 : 0 : 1 for AA:Aa:aa. R/sommer v4.2.0 Incorporating multiple target loci into GBLUP We use computer simulations that follow the same style as in LMM (8) is expressed as Feldmann, Piepho, Bridges, et al. (2021) and Feldmann, Piepho, et al. mmer(fixed = Y ∼ 1, (2022) to demonstrate under fairly general conditions that ASV random = ∼ M1 + M2 + M3 + yields accurate estimates of variance components when (1) includ- M12 + M13 + M23 + ing main-effect loci alongside polygenic background, (2) partitioning M123 + additive and dominance sources of variance for single markers and vsr(G, Gu = Kasv) + polygene, and (3) performing fully efficient stagewise analyses. GR, rcov = ∼ units, Incorporating one target locus into GBLUP data = data) LMM (1) is expressed as where data is an n × 10 matrix containing the phenotypic observa- mmer(fixed = Y ∼ 1, tions Y, seven columns corresponding to the marker effects and in- random = ∼ M + teractions, a factor-coding entries G, and a factor-coding levels of gR. vsr(G, Gu = Kasv) + Due to the similarities between our first set of experiments and GR, this extension, we do not provide any additional simulations dem- rcov = ∼ units, onstrating the successes of this model extension. Feldmann, data = data) Piepho, Bridges, et al. (2021) demonstrated that multiple loci could where data is an n × 4 matrix containing the phenotypic observa- be fit simultaneously with their interactions, and variance compo- tions Y, levels of the marker genotypes, entries, and levels of the nents can be estimated accurately. The same is true for models residual genetic term, i.e. entries. The variable units is inferred incorporating a polygenic genomic relationship matrix (GRM) as by R/sommer::mmer() and can be considered as a column with well. However, the user is encouraged to check that the higher or- as many levels as rows in the data (Covarrubias-Pazaran 2016). der locus–locus interactions do not saturate the model and are not The version of this model with kM embedded is expressed as correlated with KASV. Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023 M. J. Feldmann et al. | 3 Partitioning marker variance into additive and dominance rcov = ∼ units, components data = data) LMM (9) is expressed as where data is an n × 3 matrix containing the phenotypic observa- mmer(fixed = Y ∼ 1, tions Y, one factor coding for the entry ID and one-factor coding random = ∼ Ma + Md + for Blocks within the ne environment. Blocks and other within- vs(G, Gu = Kasv) + location design elements can be incorporated as random effects GR, using the random = syntax. In R/sommer, Σe is obtained from rcov = ∼ units, each location as the ‘VarBeta‘ matrix in the R/sommer::mmer() data = data) output. “VarBeta” is the name of the model estimated variance– where data is an n × 5 matrix containing the phenotypic observa- covariance matrix among entry means in R/sommer. The Σes are tions Y, a factor-coding levels of mA, a factor-coding levels of mD, a then bound corner-to-corner, which is accomplished using R/ factor-coding entries G, and a factor-coding levels of gR. The factor sommer::adiag1() to obtain Ω. We then take the inverse of Ω coding of mA has three levels corresponding to AA : Aa : aa, and a using R/base::solve(). factor coding of mD has two groups corresponding to the genetic The LMM for stage 2 (17) is expressed as state—either homozygous or heterozygous. mmer(fixed = Y2 ∼ Env - 1, We performed one set of simulations for this model extension random = ∼ vsr(M, Gu = KM) + that follows the exact parameters as the first simulation set vsr(G, Gu = Kasv) + (m = 5, 000, n = 500, H = 0.38). In this simulation, we estimate G:Env + GR, which portion of the variance explained by a marker is from addi- rcov = ∼ vsr(units, tive variance and which is from dominance variance. In this simu- Gti = matrix(invSigma2,1,1), lation, we estimate which portion of the additive genetic variance Gtc = matrix(3,1,1)), √��� (θASV g = 66), the marker explained variance by additive nIters = 25, √��� √��� (θASV emWeight = rep(1,25), m = 20) or dominance variance (θASV m = 20), the residual gen- A √��� D etic variance (θASV g = 50), and the residual variance (on an entry- W = invOmega, R √��� mean basis) (θASV R = 40). In our simulations, 50% of the variance data = data) explained by the focal marker is from additive variation and where data is an n × 5 matrix containing the adjusted entry means, 50% is dominance variation. The other parameters of the simula- or BLUEs, from stage 1 (Y2) a factor-coding levels of M, two equiva- tion are equal to the first set. We examined the accuracy of esti- lent factor-coding entries, e.g. G and gR, and factor-coding environ- mating each term as well as the accuracy of estimating the total ments Env. In this approach, we must fix the residual variance variance explained by the focal marker. component equal to 1 so that all the scaling of the invOmega = Ω−1 is unaffected by the model estimation process. Within the vsr() ar- Incorporating a genomic dominance relationship matrix into gument, the Gti() and Gtc() arguments are used to set the initial GBLUP value of the variance component equal to the inverse of the variance among adjusted entry means (invSigma2 = σ̂−2) and to constrain the LMM (13) is expressed as variance component estimation to a fixed value by setting the first mmer(fixed = Y ∼ 1, argument equal to 3 (Covarrubias-Pazaran 2023). In this example, random = ∼ M + we use 25 iterations of the 100% expectation-maximization (EM) al- vsr(Ga, Gu = Kasv) + gorithm; however, the EM and Newton-Raphson (NR) methods can vsr(Gd, Gu = Kasv˙D) + be exchanged or averaged, i.e. average information, by changing GR, the emWeight argument. This is not a general rule or recommenda- rcov = ∼ units, tion. The large number of iterations we used caused this analysis to data = data) be computationally expensive and inefficient. where data is an n × 5 matrix containing the phenotypic observa- We performed one set of simulations for this model extension tions Y, a factor-coding levels of the marker genotypes, and three following the exact parameters of the first simulation set equivalent factor-coding entries, to be used for the additive, dom- (m = 5, 000, n = 500, H = 0.38). In this simulation, we estimate inance, and residual genetic terms. √��� which portion of the additive genetic variance (θASV g = 66), the We performed one set of simulations for this model extension √��� marker explained variance (θASV m = 40), the residual genetic vari- that follows the exact parameters as the first simulation set √��� ance (θASV g = 50), the genotype-by-environment interaction vari- (m = 5, 000, n = 500, H = 0.38). In this simulation, we estimate R √��� ance (θASV GE = 90), and the residual variance (on an entry-mean which portion of the polygenic variance is from additive √��� √��� √��� basis) (θASV ϵ = 40). In this simulation, the dominance polygenic (θASV g = 33) or dominance (θASV g = 33). In this simulation, the A D variance is the same magnitude as the additive polygenic vari- dominance polygenic variance is the same magnitude as the addi- ance, and the other simulation parameters are equal to the first tive polygenic variance, and the other simulation parameters are set. We examined the accuracy of estimating each term. equal to the first set. We also controlled the residual genetic vari- √��� ance (θASV g = 50) and the residual variance (on an entry-mean ba- R √��� sis) (θASV R = 40), as in all simulations. We examined the accuracy of estimating each term. Results and discussion Candidate genes and complex traits Incorporating stagewise meta-analysis into GBLUP Bernardo (2014) was the first to propose an integration of MAP and LMM (15) for stage 1 is expressed as GP. Since then, empirical studies have validated the methodology mmer(fixed = Y ∼ G, (Rutkoski et al. 2014; Spindel et al. 2016; Rice and Lipka 2019). In random = ∼ Block, contrast, others have shown little-to-no improvement over GP Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023 4 | G3, 2023, Vol. 00, No. 0 (Li et al. 2015; Galli et al. 2020), suggesting that modeling significant effective degrees of freedom. Also, the targeted markers should markers can improve prediction accuracy only when markers ex- not fit the marker effect size distribution assumptions used for plain a substantial portion of both genetic and phenotypic vari- the marker background, e.g. that all marker effects contribute ance (Galli et al. 2020). With the high densities of genome-wide equally to the genomic variance and are drawn from the same markers commonly assayed in gene finding studies, investigators distribution (Habier et al. 2007; Endelman 2011; Morota and often identify DNA markers tightly linked to a candidate or known Gianola 2014) and should not be in high LD with a large number causal genes as exemplified by diverse real-world examples of other markers. (Hayes and Goddard 2001; Hayes et al. 2010; Jensen et al. 2012; Visscher et al. 2012, 2017; Li et al. 2021). The candidate marker The entry mean, not the observation, is the loci are nearly always initially identified by genome-wide searches “phenotype” using sequential (marker-by-marker) approaches such as GWAS We believe the “phenotype” is the entry mean for a given subdiv- and QTL analysis. Following the discovery of statistically signifi- ision of environments, not the individual observations that consti- cant marker–trait associations from a marker-by-marker tute that entry mean. Our discussion here is primarily predicated genome-wide scan, the natural progression would be to analyze on plants, but does not necessarily exclude other organisms, single locus or multilocus genetic models where the effects of where replicate observations may be available per entry. In the the discovered loci are simultaneously corrected for the effects words of Dr. Rex Bernardo, “…the main focus of quantitative gen- of other discovered loci, e.g. polygenic variation (Stroup et al. etics is on identifying candidates with the best genotypic value for 2018; Gbur et al. 2020). a target population of environments” (Bernardo 2020). However, A marker will not explain a large portion of variance if that fine- or broad-scaled any subdivision is of a target population of marker does not have a large, detectable effect. Thus, markers environments or market segment, we argue that several environ- that explain a large part of the genetic variance will be the most ments must be sampled from each subdivision. Ultimately, an useful for MAP and other diagnostic techniques. For example, average across those environments will be used to communicate consider Fusarium race one wilt resistance in strawberry, which the value of an entry to a specific subdivision of target environ- is conferred by a single dominant acting locus Fw1 (Pincot et al. ments or to all target environments, if appropriate. These subdivi- 2022). This locus explains nearly 100% of the phenotypic and gen- sions may be defined by market segments, maturity zones, etic variance, and the mean differences delineate resistant vs. patterns of G × E, management strategies, geopolitics, and other susceptible genotypes. Thus there is almost no added benefit of elements of interest to a breeding or research program. The a genome-wide sample of markers over the single-marker assay granularity of the entry mean is important since not all environ- (m) for product delivery and germplasm improvement. While ments, micro or macro, or market segments can be considered the variance explained is directly linked to the effect size, it is equal, and severe genotype-by-environment interactions (G × E) not a direct substitute. However, the random effect machinery may limit the information contained in the entry mean (Heslot allows researchers to obtain variance component estimates, and et al. 2013; González-Barrios et al. 2019). Conceptualizing the effect sizes (e.g. best linear unbiased predictors, BLUPs) simultan- phenotype as the entry mean should pose little practical conse- eously (Searle et al. 1992), eliminating the need for multiple statis- quence as stagewise analyses, common in GWAS and GP, explicit- tical models to assess the variance explained and the effect size of ly express this idea (Dias et al. 2020; Pincot et al. 2020; Endelman a target locus. The BLUP procedure is directly applied in this mod- 2022) and variance component ratios, such as the broad sense her- el, so it is natural to use the same statistical machinery to esti- itability (H2), are often reported on an entry-mean basis (Bernardo mate genome-estimated breeding values (GEBVs) by genomic 2020). This concept is also concordant with single-stage analyses best linear unbiased prediction (GBLUP) and the genetic effect of incorporating all entries and subdivisions as main effects and the a locus. interaction, such as in product placement and other late-stage As a point of contrast, yield in maize (Zea mays) is heritable, but trials (Buntaran et al. 2020). Below, we show that ASV can be accur- no single locus explains any appreciable amount of phenotypic or ately applied in single-stage and stagewise analyses. genotypic variance (Heffner et al. 2009, 2010). To improve yield in maize, GP is potentially a more valuable approach because the re- LMM analysis and the ASV searcher, or breeder, can predict the polygenic value (g) without The ASV estimator of total variance (Piepho 2019) and the vari- relying on any particular locus but instead capturing variation ance of single markers and marker–marker interactions of a genome-wide sample of markers. The more challenging scen- (Feldmann, Piepho, Bridges, et al. 2021) is half the average total ario is the intermediate case in which a trait is controlled by both pairwise variance of a difference between entries and can be de- loci that are discernible from the polygenic background and a composed into independent sources of variance, e.g. genetic and quantitative polygenic effect. residual. In this article, we assume that researchers can replicate The ratio between the variance explained by the oligogenic entries independently—as in clonally propagated or inbred crop and polygenic terms with the total genetic or phenotypic vari- species—or can collect repeated measures on entries (e.g. indivi- ance is likely a significant factor determining the cost–benefit duals, families, or strains)—as in humans and animals—and of incorporating MAP, GP, or both into a breeding or diagnostic then estimate the least square means (LSMs), best linear unbiased program. Modeling an individual locus can be advantageous estimators (BLUEs), or other adjusted entry means in the first when the proportion of the phenotypic and genetic variance ex- stage of a stagewise analysis (Piepho et al. 2012; Damesa et al. plained by the locus is reasonably large and not partially cap- 2017, 2019). For simplicity, we assume that the residual vari- tured by other markers in linkage disequilibrium (LD) with the ance–covariance matrix, which can take many forms (Piepho target (Bernardo 2014; Rutkoski et al. 2014; Pincot et al. 2022). 2019), is R = Inσ2 ϵ , where n is the number of entries (e.g. individuals, In this case, one could factor code a pseudomarker from mul- accessions, genotypes, lines, or animals). In stagewise analysis, R tiple markers bracketing a QTL to capture the variance ex- is estimated in the first stage and therefore does not need to be re- plained by that locus, assuming that SNPs used to define a estimated in the second stage. Instead, it is forwarded to the se- QTL region are highly correlated and will not saturate a model’s cond stage by proper weighting. Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023 M. J. Feldmann et al. | 5 The form of the LMM for this analysis assuming only one ex- extend this using the approach for multilocus models as in planatory marker is Equation (8), with and without marker–marker interactions, de- scribed in Feldmann, Piepho, Bridges, et al. (2021). Specifically, y̅ = 1μ + Zmm + Ig + IgR + ϵ̅ (1) θASV m is the total variance explained by a marker and is analogous to the total genetic variance used to calculate broad sense herit- where y̅ is the vector of LSMs with y ∼ N (1μ, V), μ is the population ability, not the additive genetic variance. mean and the only fixed effect, Zm is the design matrix linking en- It is important to consider the relationship between the main try means to marker genotypes, m is the vector of random effects effect of markers and marker–marker interactions and KASV. of the main-effects locus with m ∼ N (0, Iσ2 m), g is the vector of ran- When markers are highly correlated—due to linkage disequilib- dom additive genetic effects associated with the genome-wide rium (LD) or selection bias—the LMM framework will fail to accur- framework of marker excluding m with g ∼ N (0, KASVσ2 g), gR is ately partition variance between two main effects, even if an the vector of random residual genetic term—the portion of the to- estimator is “unbiased.” One possible strategy here is to create tal genetic effect not accounted for by m or g—with gR ∼ N (0, Iσ2 g ), multilocus genotypes, e.g. AA.AA, AA.AB,…, BB.AB, BB.BB, from R several SNPs defining a target QTL region. If LD is high in the re- and ̅ϵ is the random residual term with ̅ϵ ∼ N (0, R). We use a pooled gion, there should be far fewer levels of the multilocus genotype estimate of σ2 ϵ̅ obtained from the first stage, so this term is known. than possible combinations. The same is true if the marker geno- We then calculated KASV as types are highly correlated with the geometry of the KASV—the LMM framework will fail to accurately partition the variance be- X̅X̅T KASV = tween the oligogenic foreground and the polygenic background. (n − 1)− (2) 1 tr(X̅X̅T ) One way to assess this is to examine the correlation between the first few eigenvectors of KASV and the main-effect marker geno- where X̅ = PX is the mean-centered marker matrix, X is the mark- types. If the correlation is large in magnitude, regardless of direc- er matrix coded [−1,0,1] for [aa,Aa,AA] genotypes, K̅ = X̅X̅T is the tion, the LMM will likely struggle to partition the variance realized genomic relationship or kinship matrix, P = I − n−11n1T components between the two terms accurately. n is the idempotent mean-centering matrix, and tr(·) is the trace. The ASV definition of the residual genetic variance is The ASV definition of total variance from LMM (1) is θASV g = (n − 1)−1σ2 tr(I TP) R g nI R n (6) θASV = σ2 y̅ = (n − 1)−1 tr(VP) gR (3) = θASV m + θASV g + θASV + θASV gR ϵ̅ Importantly, all terms are estimated on the same scale as the re- sidual variance θASV ϵ on an entry-mean basis. As with the marker where θASV y̅ is the total phenotypic variance, V is the variance–co- variance, the residual genetic variance will not be accurately par- variance among LSMs, θASV m is the average semivariance of the sim- titioned from the polygenic background as KASV → I. While KASV ple genetic term, θASV g is the average semivariance of the polygenic needs to have similar global features—n−1 tr(K) = 1 and 􏽐 term, θASV g is the average semivariance of the residual genetic n−2 􏽐 R i j Kij = 0—it is important that KASV ≠ I. term, and θASV ϵ̅ is the average semivariance of the residuals. The ASV definition of the residual variance is The ASV definition of genomic variance is θASV ϵ̅ = (n − 1)−1σ2 ϵ tr(InIT nP) (7) θASV g = (n − 1)−1σ2 g tr(XXTP) = σ2 ϵ̅ 􏼔 􏼕 tr(K̅) (4) = σ2 n − 1 g The residual variance σ2 ϵ̅ is estimated in the first stage and the es- timate is carried forward to the second stage. In general, we replace the unknown parameter values (σ2 g) with Two crucial results from Piepho (2019) and Feldmann, Piepho, Bridges, et al. (2021) are that (1) the ASV variance component esti- their REML estimates (σ̂2 g) to obtain the ASV estimates (θ̂ASV g ). mates for the total genetic variance from a simple model are Following this form, it is possible to extend LMM (1) to include equivalent to the REML variance components and (2) for REML es- dominance and epistatic sources of variance (see below). timates that are not ASV equivalent there are simple constants The ASV definition of marker-associated genetic variance is that can be applied post hoc to obtain ASV variance component estimates. Feldmann, Piepho, et al. (2022), and this article shows θASV m = (n − 1)−1σ2 m tr(ZmZT mPm) that some ASV variance components (e.g. additive genetic vari- 􏼢 􏼣 (n − n−1 􏽐nm ance, a single marker variance) can conveniently be obtained by h=1 n2 g : m ) = h σ2 m (5) n − 1 scaling the variance–covariance matrices for the specific random effects in the model directly. = kmσ̂2 m Simulations confirm that ASV yields accurate where Pm = I − n −1 m 1 T nm 1n is the idempotent mean-centering estimates of all genetic variance components and m marker genotype design matrix, nm is the number of marker gen- ratios otypes, and ng : mh is the number of entries nested in the hth marker As shown in previous studies (Piepho 2019; Feldmann, Piepho, genotype. We are factor-coding marker genotypes in these ana- Bridges, et al. 2021; Feldmann, Piepho, et al. 2022), ASV is ideal lyses and the marker genotypes are treated as discrete categorical for estimating the variance explained by both single loci and values instead of continuous values (dosage). It is possible to GRMs. In our simulations, we included variation in population Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023 6 | G3, 2023, Vol. 00, No. 0 n = 500 n = 1,000 n = 1,814 1.0 0.5 0.0 −0.5 1.0 0.5 0.0 −0.5 1.0 0.5 0.0 −0.5 2 Ĥ2 2 2 2 ĥ2 ŝ ŝ g Ĥ2 2 2 2 2 2 m g ŝm R ŝGR Ĥ ĥg Ĥ2 ŝ ŝ m g ŝm R ŝ2 2 2 2 2 GR Ĥ ĥ ŝ ŝ2 g Ĥm g m ŝ2 2 R ŝGR Variances and Ratios Fig. 1. Effect of n and r on the relative bias of variance components and ratios in simulated outbred populations. Phenotypic observations were simulated for 100 samples with n = 500, 1,000, and 1,814 (left to right) genotyped for m = 5, 000 SNPs and the average heterozygosity H = 0.38. The relative bias of marker heritability, genomic heritability estimates (ĥ2 g), broad sense heritability, genomic variance, marker variance, residual genetic variance, and residual variance heritability when the number of replicates of each entry (r) =1 (upper panel), 2 (middle panel), and 4 (lower panel). Each box’s upper and lower halves correspond to the first and third quartiles (the 25th and 75th percentiles). The notch corresponds to the median (the 50th percentile). The upper whisker extends from the box to the highest value within 1.5 × IQR of the third quartile, where IQR is the inter-quartile range or distance between the first and third quartiles. The lower whisker extends from the first quartile to the lowest value within 1.5 × IQR of the quartile. The dashed line in each plot is the true value from simulations. size, e.g. n = 500, 1,000, and 1,814, and replication of entries, e.g. nonadditive polygenic terms, and achieve fully efficient stagewise r = 1, 2, and 4 for both outbred (Fig. 1) and inbred populations analysis. Depending on the population, trait, environment, etc., (Fig. 2). We can see that the same pattern has emerged as in pre- the unique components of the models demonstrated here can be vious studies; the ASV approach yields accurate and consistent combined to accurately and holistically decompose the multitude estimates of variance components and variance component ratios of potential sources of genetic variation. The code to execute these from LMM analyses regardless of the constitution of the popula- models using the R/sommer > = v4.2.0 (Covarrubias-Pazaran 2016) tion or the study design. Even when there is only one replicate is provided in the Methods & Materials section. per entry (r = 1), all explanatory genetic terms are accurately par- titioned from the total variance. As n increased from 500 to 1,814, Extension #1: Incorporating multiple target loci and the precision of estimates increased dramatically (the sampling locus–locus interactions variance decreased). Increasing r from 1 to 4 did not affect the pre- It is common for multiple QTL to be implicated from genetic stud- cision or accuracy of genomic and marker-associated variances. ies (Rutkoski et al. 2014; Lopdell et al. 2019; Rice and Lipka 2019), However, increased numbers of replicates did improve the preci- the utility of which is not always certain (Bernardo 2001, 2004). sion of residual variance components. This is because entries While the simulations in this paper rely exclusively on LMM (1), are replicated among plots (n · r), but markers and other genetic this model can be easily expanded to include multiple explana- components are replicated among entries (n). Our simulations, tory loci and their interactions or statistical epistasis in conjunction with our previous results (Piepho 2019; (Álvarez-Castro and Carlborg 2007), as demonstrated by Feldmann, Piepho, Bridges, et al. 2021; Feldmann, Piepho, et al. Feldmann, Piepho, Bridges, et al. (2021). For example, the LMM 2022), demonstrate that in most populations—human, animal, with three main-effect loci is plant, or microbe—the ASV will yield accurate and easily inter- preted estimates of different variance components. 􏽘3 􏽘2 􏽘3 y̅ = 1nμ + Zm m + i i Zm m ij ij LMM extensions incorporating the ASV i = 1 i=1 j=i+1 (8) While an important model, LMM (1), only covers a narrow scope of + Zm123 m123 + Ig + IgR + ϵ̅ the possible genetic models and experiments, we want to provide re- searchers with a clear strategy for expanding this approach to more where mi is the random effect of the ith main-effect marker, mij is complex systems. This section demonstrates how to partition the the random effect of the two-way interaction between the ith and additive and dominance variance from a single marker, incorporate jth markers, and m123 is the random effect of the three-way inter- multiple explanatory loci, their interactions into the model, and action between the three main-effect loci. Zm , Zm , and Z ij m123 are i Relative Bias Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023 r = 1 r = 2 r = 4 M. J. Feldmann et al. | 7 n = 500 n = 1,000 n = 1,814 1.0 0.5 0.0 −0.5 −1.0 1.0 0.5 0.0 −0.5 −1.0 1.0 0.5 0.0 −0.5 −1.0 2 2 Ĥ2 2 2 2 ĥ2 ŝ ŝ g Ĥ2 m g ŝm R ŝ2 2 2 2 2 GR Ĥ ĥg Ĥ2 ŝ ŝ m g ŝm R ŝ2 2 2 2 2 GR Ĥ ĥ ŝ ŝ g Ĥm g m ŝ2 2 R ŝGR Variances and Ratios Fig. 2. Effect of n and r on the relative bias of variance components and ratios in simulated inbred populations. Phenotypic observations were simulated for 100 samples with n = 500, 1,000, and 1,814 (left to right) genotyped for m = 5, 000 SNPs and the average heterozygosity H = 0. The relative bias of marker heritability, genomic heritability estimates (ĥ2 g), broad sense heritability, genomic variance, marker variance, residual genetic variance, and residual variance heritability when the number of replicates of each entry (r) =1 (upper panel), 2 (middle panel), and 4 (lower panel). Each box’s upper and lower halves correspond to the first and third quartiles (the 25th and 75th percentiles). The notch corresponds to the median (the 50th percentile). The upper whisker extends from the box to the highest value within 1.5 × IQR of the third quartile, where IQR is the inter-quartile range or distance between the first and third quartiles. The lower whisker extends from the first quartile to the lowest value within 1.5 × IQR of the quartile. The dashed line in each plot is the true value from simulations. design matrices that link levels of the explanatory marker and in- variance can be significant, and partitioning the additive and teractions to LSMs in y. The rest of the terms have the same defi- dominance sources of variance from significant markers can be nitions. LMM (8) follows directly from Equation (1) and the results useful in hybrid crop breeding and disease risk prognoses. Our from Feldmann, Piepho, Bridges, et al. (2021), specifically the two goal is to partition θASV m , the variance explained by a focal locus, and three loci examples. into its additive (θASV m ) and dominance (θASV A m ) components. D Since we are factor-coding marker genotypes in these models, Here, we demonstrate an LMM that can partition the main- that is we are thinking of the marker genotypes as discrete cat- effect marker’s additive and dominance sources of variance by egorical values instead of continuous values (dosage), it is possible transforming the marker genotypes into two factors. The form to fully saturate the multilocus interaction with more levels than of the linear mixed model (LMM) for this analysis assuming only are observed in a given data set. Hence, it is important to consider one explanatory marker is the number of interaction terms evaluated. In this situation, packages such as lme4::lmer() will report an error that the y̅ = 1μ + ZmA mA + ZmD mD + Ig + IgR + ϵ̅ (9) “number of levels of each grouping factor must be < number of where mA is the random additive effect of the main-effect locus [LSMs]” (Bates et al. 2015). Further, these models assume that ran- with mA ∼ N (0, Iσ2 m ) and mD is the random dominance effect of dom effects are independent, so we do not advise incorporating A main effects from the SNPs used to define a target QTL region. the main-effect locus with mD ∼ N (0, Iσ2 m ). ZmA is an n × 3 design D Instead, it is possible to factor code a pseudohaplotype from the matrix linking marker genotypes to LSMs and ZmD is an n × 2 de- best markers bracketing a QTL to capture the variance explained sign matrix linking genotypic state, either homozygous (AA and by that locus, which can be more informative than a single SNP. aa) or heterozygous (Aa), to LSMs. For example, the ZmA and ZmD This approach assumes that SNPs used to define a QTL region design matrices for five individuals (rows) with marker genotypes are not independent and do not fully saturate the model. at a focal locus of [AA, Aa, Aa, aa, aa] = [1, 0, 0, − 1, − 1] are ⎡ ⎤ ⎡ ⎤ Extension #2: Partitioning θASV m into additive (θASV m ) and 1 0 0 1 0 A dominance (θASV ⎢ ⎥ m ) components ⎢ 0 1 0 ⎢ ⎥ ⎥ ⎢0 1⎥ D ZmA = ⎢ 0 1 0⎥⎢ ⎥, ZmD = ⎢ ⎥ ⎢0 1⎥ (10) The factor-coding of the Mendelian and oligogenic markers is a ⎣ 0 0 1⎦ ⎣1 0⎦ different approach than is standard in GWAS. In GWAS, markers 0 0 1 1 0 are typically treated as fixed and coded as continuous values, e.g. the dosage model. Assuming that a researcher is working with an Other terms are defined in LMM (1). This extension is a partition outbred species and the heterozygosity (H) ≠ 0, the dominance of Equation (1). So we expect that Equations (1) and (9) are Relative Bias Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023 r = 1 r = 2 r = 4 8 | G3, 2023, Vol. 00, No. 0 equivalent, except that Equation (9) will yield a variance compo- Single Stage Analysis nent for each of the additive and dominance terms, while 1.0 Equation (1) only yield the total genetic variance. The ASV estimate of the additive variance explained by a locus 0.5 is obtained as in Equation (5) by 0.0 θ̂ASV m = (n − 1)−1σ̂2 m tr(Z T A A mA Zm P A mA ) ⎡ 􏽐n ⎤ n − n−1 mD −0.5 n2 = h=1 g : m (11) ⎣ Ah⎦σ̂2 n − 1 mA −1.0 2 2 2 2 2 ŝm ŝg ŝG ŝ R GxE ŝR where PmA = I − n−1 T m 1 A nm 1n , n a e t e n m A m mA r h u ber of levels coding A the marker additive effects, ng : mA is the number of entries nested Stage−wise Analysis h in the hth marker genotype (Feldmann, Piepho, Bridges, et al. 1.0 2021). The average semivariance estimate of the dominance vari- ance explained by a locus is obtained by 0.5 θ̂ASV m = 0.0 (n − 1)−1σ̂2 m tr(Z T D D mD Zm P D mD ) ⎡ 􏽐n ⎤ n − n−1 mD 2 j=1 ng : m (12) −0.5 D = ⎣ j⎦σ̂2 n − 1 mD −1.0 2 2 2 2 2 where PmD = I − n−1 T ŝm ŝg ŝG ŝ ŝ m 1 D nm 1n , n a e t e n m m mD r h u ber of levels coding R GxE R D D Variance Component the genetic status, e.g. homozygous or heterozygous, ng : mD is i the number of entries nested in the jth genetic state. The sum of Fig. 3. Comparison of VCEs estimated from single-stage and stagewise [kmA σ̂2 m + k A V mD σ̂2 m ] = [θ̂ S θ̂ S D m + A V m ] = θ̂ASV and analyses of 500 entries replicated in four environments with block effects. A A D m [θ̂ASV m + θ̂ASV θ̂ S = 2 2 × A m ] − A V . 1 10−5. θ̂ASV The relative bias of genomic variance (σ̂2 g), marker variance (σ̂2 m), residual D m m is an accurate and consist- genetic variance (σ̂2 g ), genotype-by-environment interaction variance R ent estimate of the variance explained by a marker (Feldmann, (σ̂2 G×E), and residual variance (σ̂2 R) analyzed in a single stage (upper panel) Piepho, Bridges, et al. 2021). The likelihood ratio (LR) between or stagewise stages (lower panel). Each box’s upper and lower halves LMM (1) and (9) was LR ≈ 0. It was not significant in any simulated correspond to the first and third quartiles (the 25th and 75th percentiles). The notch corresponds to the median (the 50th percentile). The upper populations (PLR > 0.2), suggesting that there is no appreciable dif- whisker extends from the box to the highest value within 1.5 × IQR of the ference between the model likelihood of Equations (1) and (9). For third quartile, where IQR is the inter-quartile range or distance between each term, θ̂ASV m and θ̂ASV A m , the average bias’ across the 100 simu- the first and third quartiles. The lower whisker extends from the first D quartile to the lowest value within 1.5 × IQR of the quartile. The dashed lated populations was 1.06% and −1.24%, respectively. line in each plot is the true value from simulations. Extension #3: Incorporating additional polygenic terms for genome-wide dominance (gD) substantively different than both of the matrices proposed by LMM (1) can also be extended to include both additive (gA) and Nishio and Satoh (2014) and Su et al. (2012). dominance (gD) sources of genomic variance (Vitezica et al. 2013, Feldmann, Piepho, et al. (2022) showed that, regardless of popu- 2017; Zhang et al. 2021). The form of the LMM for analysis with lation quality, a GRM with an average diagonal value of 1 and an both gA and gD assuming only one explanatory marker M is average element value of 0 will produce consistent variance com- ponent estimates of the genomic variance. A matrix with the same y̅ = 1μ + Zmm + IgA + IgD + IgR + ϵ̅ (13) properties calculated from a dominance coding will produce simi- larly unbiased parameter estimates. The dominance GRM pro- where gA and gD are random effect vectors for the additive and dom- posed by Su et al. (2012) has an average diagonal value of 1, but inance polygenic effects, respectively, with gA ∼ N (0, KASVσ2 g ) and the average element value is >0, leading to a systematic under- A estimating since the covariances are overestimated. The domin- gD ∼ N (0, KD ASVσ2 g ). The ASV dominance kernel is D ance GRM proposed by Nishio and Satoh (2014) has an average element value of 0, but the average element value is <1, leading KD W̅W̅T ASV = (n − 1)− (14) to a systematic overestimating since the variances (diagonals) 1 tr(W̅W̅T ) are underestimated. This is true for a wide range of population heterozygosities. where W = 1 − |X|, assuming X is coded [ − 1, 0, 1], and W̅ = PW. This is a feasible approach to improve genetic performance in cross- Extension #4: Stagewise LMM analysis for bred populations with large dominance genetic variation (Nishio multienvironment trials (METs) and meta-analysis and Satoh 2014; Vitezica et al. 2017; Xiang et al. 2018). Both KASV Stagewise analyses are common in plant breeding trials in aca- and KD ASV have the matrix properties proposed by Speed and 􏽐 demic studies and the seed industry (Damesa et al. 2017, 2019). Balding (2015); i.e. n−1 tr(K) = 1 and n−2 􏽐 i j Kij = 0. The dominance One reason for this is that plant breeders are often not interested variance estimated with KD ASV was accurate, and the relative bias in the performance per se of a line or hybrid within a specific loca- from 100 simulated populations was −3.32%. Interestingly, KD ASV is tion unless the presence of cross-over (e.g. rank change) G × E is Relative Bias Relative Bias Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023 M. J. Feldmann et al. | 9 large enough to make data from one target environment unin- This approach is very common in plant sciences because it formative in another set of target environments. Instead, plant is simple but problematic outside a specific set of unrealistic breeders are often more interested in the performance of entries conditions, i.e. IID entry means. It is simple because it does across environments (Bernardo 2020). It is common then to fit a not require any information on precision from stage 1. It is first model that accounts for the variation of random design ele- problematic because the residual and genotype-by- ments, e.g. locations, years, blocks, and fixed genotype effects, environment variances are confounded. The naive approach to obtain the phenotype—estimated marginal means (EMMs) or does not require additional arguments for LMM software best linear unbiased estimators (BLUEs)—for use in subsequent and can be executed in any LMM software. analyses. In subsequent stages, these entry-means within envir- 2) In a weighted stagewise analysis, the Ω̃ = D(ω)−1 n matrix is diag- ge onments in a subdivision are used as the response variable. onal, but each diagonal element may differ based on data- In general, the single-stage analysis, when performed correctly, driven weight (D(ω)n ), where D(ω)n is an n - ge ge ge dimension di- should be considered the “gold standard.” However, there are ex- perimental conditions where the stagewise analysis may be sim- agonal matrix, estimated in the first stage of the analysis. pler to execute and functionally equivalent to the single-stage Importantly, these weights are derived as one of many pos- analysis when performed correctly (Fig. 3), and, given the fre- sible diagonal approximations of Ωnge (Móhring and Piepho quency of naive stagewise analyses—those that fail to incorporate 2009) or its inverse, e.g. Ω−1 n , from the first stage of the ana- ge the variance–covariance matrix of entry means, or even appropri- lysis (Smith et al. 2001). The weighted approach may take ate weights, from stage 1 into the second stage—we felt it prudent multiple forms that may or may not neglect the covariances to highlight the simplicity of these approaches to a general audi- among entry mean, leading to discrepancies between the ence. The purpose here is not to convince the reader that multi- single-stage and stagewise analyses (Smith et al. 2001, stage analyses are superior (they are not), nor to provide a 2005; Móhring and Piepho 2009). This approach requires an one-size-fits-all solution for every experiment (that is impossible), additional argument in LMM software, typically “weights,” but to provide a path for users to accomplish fully efficient, multi- which is input as a vector corresponding to entry means stage analyses using free, open-source software. Generally, the and internally transformed into a diagonal matrix, and can stagewise analysis should be considered a possible backup to be executed in several free or paid software (Inc. 2013; the single-stage analysis, not the standard (Schulz-Streeck et al. Covarrubias-Pazaran 2016; Butler 2021). 2013; Gogel et al. 2018; Damesa et al. 2019; Buntaran et al. 2020). 3) In a fully efficient stagewise analysis, entry means are allowed The LMM for stage one is to have nondiagonal covariance structures with Ω̃ = Ωnge , where Ωnge is the full variance–covariance matrix of entry ye = Xeg∗e + Zeue + ϵe (15) means from the different environments [defined in Equation (16)]. This approach is the most general solution where Xe is the fixed effect design matrix linking observations to for implementing stagewise meta-analyses, maintaining entries, and g∗e are the fixed effects (e.g. BLUEs) for the entries in all variances and covariances without approximation, but the eth environment, Ze is the random effect design matrix for de- it is the most limited in terms of software implementations. sign (e.g. blocks) elements within each environment (e.g. years The full variance–covariance matrix of the entry means will and locations), and ϵe are the residuals and ϵe ∼ N (0, Re), where be nondiagonal in most cases, and a diagonal matrix Re is the residual variance–covariance matrix estimated in the (weighted or unweighted) is almost invariably an approxi- eth environment. This model is fit within each environment mation as the random main effects of the environment, or independently. block, will induce a positive covariance among all entry From these models, we obtain the adjusted entry means y̅ and means. Incorporating the full variance–covariance matrix the variance–covariance matrices of the entry means Σe from requires a lot of additional data and significantly reduces each of ne environments, where ne are the number of environ- the computational efficiency of the LMM, which may out- ments. We can then construct the nge × nge block-diagonal stage weigh potential practice benefits. This approach requires one Ω matrix as in Equation (16). the inverse of the full variance–covariance matrix, Ω−1 n , as an ge ⎡ ⎤ input argument and can now be executed in R/sommer > = Σ1 · · · 0 v4.2.0 (Covarrubias-Pazaran 2016). Ωnge = ⎢ .. . . ⎥ ⎣ . . . .. ⎦ (16) 0 · · · Σn The LMM for stage two is then: e where nge is the number of entries nested in environments; for ex- y̅ = 1μ + Xe + Zmm + Zgg + ZgR gR + ZGEgGE + ϵ̅ (17) ample, if there are 500 entries in four environments, nge = 2, 000. This method allows us to carry the full Ωnge over from stage one where y̅ are the adjusted entry means from stage one, μ is the to stage two of the analysis. population mean, X is the fixed effect design matrix linking envir- The second stage can take several forms with varying complex- onments to adjusted entry means, e are the fixed environmental ities, more complete approximations of Ωnge or Ω−1 n , and software main effects, g is the random additive genetic effect associated ge accessibility. Briefly: with the genome-wide framework of marker excluding m with g ∼ N (0, KASVσ2 g), gR is the random residual genetic term—the por- 1) In a naive stagewise analysis, the residual matrix is given as tion of the total genetic effect not accounted for by m or g—with the identity matrix multiplied by a scalar gR ∼ N (0, Inσ2 g ), gGE is the genotype-by-environment interaction R (Ω̃ = Inge ω−1 = Inge σ2), where ω is a scalar (σ−2) estimated by term with gGE ∼ N (0, Inge σ2 g ), and ϵ̅ is the structured residual GE the second stage LMM, assuming that the variances for entry term from stage one with ̅ϵ ∼ N (0, Ω) With this model, we can es- means are identical with 0 covariances (independent); IID. timate the breeding values across environments with marker Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023 10 | G3, 2023, Vol. 00, No. 0 information (K) as in GBLUP and can perform GWAS by adding an genetics, particularly as it relates to genomic relatedness, genom- 􏽐 iterative term for single marker regression, such as j i=1 βixi where ic heritability, and GP (VanRaden 2008; Kang et al. 2010; Yang et al. j is the number of markers, βi is the linear regression coefficient of 2010; Habier et al. 2013) but it also deviates from the classic quan- the ith marker, and xi is the numeric coding of the ith markers gen- titative genetic model conceptually in that it assumes that marker otypes, e.g. [ − 1, 0, 1]. effects are random variables (Falconer and Mackay 1996; Lynch We created 1,000 simulated populations with 1,000 entries and and Walsh 1998). Despite the conceptual deviation, this approach 5,000 markers using a similar approach to the other simulations in has been demonstrated to have statistically valid assumptions this experiment. However, we included Environmental and Block and applied in several studies (Verbyla et al. 2012; Schreck et al. within Environment effects in this experiment. We estimate the 2019; Taylor et al. 2023). variance explained by the polygenic background, a large-effect lo- ASV has several beneficial elements, making it a viable option cus, the residual genetic variance, the genotype-by-environment for quantitative genetics. More importantly, it is appropriate for interaction variance, and the nongenetic residual. The single any quantitative discipline where variance components are of stage analysis yielded relative biases of −1.55%, −3.04%, −0.45%, interest, from plant and microbial biology to psychology and in- −0.12%, and 0.03% for the marker variance (σ̂2 m), genomic variance fant research. Namely: (σ̂2 g), residual genetic variance (σ̂2 g ), genotype-by-environment R interaction variance (σ̂2 g ), and residual variance (σ̂2 ϵ̅ ), respectively 1) The definitions of the variance components using ASV are additive GE (Fig. 3). The two stage analysis yielded relative biases of −1.39%, and sum to the phenotypic variance. Consequently, the LMM can −3.09%, 0.48%, −0.21%, and 0.03% for the marker variance (σ̂2 m), be extended to incorporate many explanatory components, genomic variance (σ̂2 g), residual genetic variance (σ̂2 g ), e.g. dominance, epistasis, and transcriptomic, and will yield R genotype-by-environment interaction variance (σ̂2 g ), and residual accurate VCEs for all terms. They will sum to the total vari- GE variance (σ̂2 ϵ̅ ), respectively (Fig. 3). ance. This is not necessarily true for all definitions of vari- ance components, such as the Average Marginal Variance Extension #5: Incorporating kM directly into LMM analyses (Piepho 2019; Feldmann, Piepho, Bridges, et al. 2021). 2) ASV is well suited for stagewise analyses. At the center of ASV is Feldmann, Piepho, Bridges, et al. (2021) introduced kM (5) as a post the idea that the “entry mean” is the phenotype per se, and hoc adjustment of the REML estimated variance explained by a not the observations directly. One interpretation is that indi- marker to obtain ASV equivalent VCEs. This led to Feldmann, viduals, not observations, are the primary source of vari- Piepho, et al. (2022), who showed that ASV estimates of the genom- ation or at least the primary source of interest. This ic variance could be obtained by scaling the genomic relationship concept can be easily extracted from single-stage analyses before or after the LMM analysis and introduced KASV, eliminating but seems at the heart of stagewise analyses (Piepho et al. the need for any post hoc adjustment. Scaling variance compo- 2012). Specifically, a single-stage analysis based on plot nents a priori is not novel and is routine in genomic evaluation data can be shown to be equivalent to a stagewise analysis across species (VanRaden 2008; Astle and Balding 2009; Yang in which entry means and their associated variance–covari- et al. 2010; Endelman and Jannink 2012; Legarra 2016; Vitezica ance matrix is carried forward to the second stage, in which et al. 2017). LMMs can directly scale the variance–covariance ma- BLUPs are computed for the genetic effects (Piepho et al. trix for large-effect loci M by kM. 2012). ASV yields accurate estimates of the genetic and gen- Instead, if we define: omic variance components in unreplicated or partially repli- cated designs common in humans, plants, and animals. ASV KM = k−1 M InM (18) also yields accurate VCEs in fully efficient multistage approaches. where KM is nM × nM and nM is the number of marker genotypes at a 3) ASV does not impact the BLUPs or breeding value predictions in given locus, we can essentially think of KM as a genomic relation- Genomic (G)-BLUP. ASV is only used to obtain accurate VCEs. ship matrix, e.g. KASV, except that we apply KM to the levels of the It has been demonstrated that marker coding and different marker genotype instead of entries. strategies for scaling and centering Z and K do not impact The form of the LMM for this analysis assuming only one ex- BLUPs or prediction accuracy (Strandén and Christensen planatory marker is the same as Equation (1), but where m is 2011; Legarra 2016; Feldmann, Piepho, et al. 2022) and be- the random effect of the main-effect locus with m ∼ N (0, KMσ2 m). cause ASV essentially works through a set of scalar coeffi- With this approach, we maintain the levels of the factor come cients determined by the experiment and population to from the same variance and zero covariance, but our scaling fac- obtain the expected features for the genomic relationship tor is embedded directly in the model eliminating the need for ad- matrix. Practically, ASV does not change the information justment. Embedding kM in the LMM analysis using KM is embedded in the LMM or data, only the scaling of the VCEs. equivalent to the post hoc adjustment that was proposed in 4) ASV works under many model assumptions in GLMM analyses. Feldmann, Piepho, Bridges, et al. (2021), and so it is up to the Beyond the often-assumed variance–covariance structure user to determine which approach they prefer. in this study, e.g. R = Iσ2 ϵ , many structures will lead to non- zero covariance between entry means. ASV can be applied Conclusions to designs accounting for spatial structures with auto- regressive correlations or spline-models (Rodríguez-Álvarez ASV is a strategy that can be used for estimating and partitioning et al. 2018; Selle et al. 2019). ASV can also be applied to data the total variance into components (Piepho 2019), such as the vari- sets where the observational units lead to nonnormality of ance explained by loci and locus–locus (Feldmann, Piepho, residuals, i.e. ordinal disease scores and proportion scores Bridges, et al. 2021) and the genomic variance (Feldmann, (Piepho 2019). Piepho, et al. 2022). The approach we are suggesting shares some As substantiated by our simulations in this study and the con- common threads with the current thinking in quantitative text of our previous studies, ASV with REML estimation of the Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023 M. J. Feldmann et al. | 11 underlying variance components yields accurate estimates for Bates D, Mächler M, Bolker B, Walker S. Fitting linear mixed-effects oligo- and polygenic effect, both individually and collectively, models using lme4. J Stat Softw. 2015;67:1–48. doi:10.18637/jss. and BLUPs of the additive and dominance effects of marker loci v067.i01 (Piepho 2019; Feldmann, Piepho, Bridges, et al. 2021; Feldmann, Bernardo R. What if we knew all the genes for a quantitative trait Piepho, et al. 2022). ASV directly yields accurate estimates of gen- in hybrid crops? Crop Sci. 2001;41:1–4. doi:10.2135/cropsci2001.4111 omic heritability in the observed population and can be used to Bernardo R. What proportion of declared QTL in plants are false? Theor adjust deviations that arise from other commonly used methods Appl Genet. 2004;109:419–424. doi:10.1007/s00122-004-1639-3 for calculating genomic relationships regardless of the population Bernardo R. Genomewide selection when major genes are known. constitution, such as inbred lines and F1 hybrids, unstructured Crop Sci. 2014;54:68–75. doi:10.2135/cropsci2013.05.0315 GWAS populations, or animal herds and flocks. We believe that Bernardo R. Bandwagons I, too, have known. Theor Appl Genet. 2016; KASV provides a powerful approach for directly estimating genom- 129:2323–2332. doi:10.1007/s00122-016-2772-5 ic heritability for the observed population regardless of study or- Bernardo R. Reinventing quantitative genetics for plant breeding: ganism or experiment design (Visscher et al. 2006, 2008, 2010). In something old, something new, something borrowed, something conclusion, we recommend that genetics researchers studying blue. Heredity. 2020;125:375–385. doi:10.1038/s41437-020-0312-1 humans, microbes, or (un)domesticated plants and animals con- Brandariz SP, Bernardo R. Small ad hoc versus large general training sider the ASV approach. populations for genomewide selection in maize biparental crosses. Theor Appl Genet. 2019;132:347–353. doi:10.1007/ s00122-018-3222-3 Data availability Buntaran H, Piepho HP, Schmidt P, Rydén J, Halling M, Forkman J. Cross-validation of stagewise mixed-model analysis of Swedish Code and output for simulations are provided in the publicly avail- variety trials with winter wheat and spring barley. Crop Sci. able Zenodo repository (https://doi.org/10.5281/zenodo.6981359) 2020;60:2221–2240. doi:10.1002/csc2.20177 (Feldmann, Covarrubias-Pazaran, et al. 2022). Butler D. asreml: Fits the Linear Mixed Model. R package version 4.1.0.160; 2021. Covarrubias-Pazaran G. Changes and FAQs for the sommer package; Acknowledgements 2023. https://r-mirror.zim.unidue.de/web/packages/sommer/ MJF thanks Steven J. Knapp for guidance, support, and vignettes/v2.sommer.changes.and.faqs.pdf mentorship. Covarrubias-Pazaran G. Genome-assisted prediction of quantitative traits using the R package sommer. PLoS ONE. 2016;11:e0156744. doi:10.1371/journal.pone.0156744 Funding Damesa TM, Hartung J, Gowda M, Beyene Y, Das B, Semagn K, Piepho HP. Comparison of weighted and unweighted stage-wise analysis MJF and this research were supported by grants to Steven J. Knapp for genome-wide association studies and genomic selection. Crop from the United States Department of Agriculture (http://dx.doi. Sci. 2019;59:2572–2584. doi:10.2135/cropsci2019.04.0209 org/10.13039/100000199), National Institute of Food and Damesa TM, Möhring J, Worku M, Piepho HP. One step at a time: Agriculture (NIFA) Specialty Crops Research Initiative (# stage-wise analysis of a series of experiments. Agron J. 2017; 2017-51181-26833 and # 2022-51181-38328), and California 109:845–857. doi:10.2134/agronj2016.07.0395 Strawberry Commission (http://dx.doi.org/10.13039/100006760), de Los Campos G, Vazquez AI, Fernando R, Klimentidis YC, Sorensen in addition to funding from the University of California, Davis D. Prediction of complex human traits using the genomic best lin- (http://dx.doi.org/10.13039/100007707). The German Research ear unbiased predictor. PLoS Genet. 2013;9:e1003608. doi:10. Foundation (DFG) supported H-PP, grant PI 377/24-1. The funders 1371/journal.pgen.1003608 had no role in study design, data collection, analysis, publication Demmings EM, Williams BR, Lee CR, Barba P, Yang S, Hwang CF, decision, or manuscript preparation. Reisch BI, Chitwood DH, Londo JP. Quantitative trait locus ana- lysis of leaf morphology indicates conserved shape loci in grape- vine. Front Plant Sci. 2019;10:1373. doi:10.3389/fpls.2019.01373 Conflicts of interest De Villemereuil P, Morrissey MB, Nakagawa S, Schielzeth H. The author(s) declare no conflict of interest. Fixed-effect variance and the estimation of repeatabilities and heritabilities: issues and solutions. J Evol Biol. 2018;31:621–632. doi:10.1111/jeb.13232 Literature cited DeWitt N, Guedira M, Lauer E, Murphy JP, Marshall D, Mergoum M, Johnson J, Holland JB, Brown-Guedira G. Characterizing the oligo- Álvarez-Castro JM, Carlborg O. A unified model for functional and genic architecture of plant growth phenotypes informs genomic statistical epistasis and its application in quantitative trait loci selection approaches in a common wheat population. BMC analysis. Genetics. 2007;176:1151–1167. Genomics. 2021;22:1–18. doi:10.1186/s12864-021-07574-6 Anderson JA, Chao S, Liu S. Molecular breeding using a major QTL for Dias K, Piepho H, Guimarães L, Guimarães PdO, Parentoni S, Pinto fusarium head blight resistance in wheat. Crop Sci. 2007;47: MdO, Noda R, Magalhães J, Guimarães C, Garcia A, et al. Novel 1–112. doi:10.2135/cropsci2006.05.0359 strategies for genomic prediction of untested single-cross maize Ariyomo TO, Carter M, Watt PJ. Heritability of boldness and aggres- hybrids using unbalanced historical data. Theor Appl Genet. siveness in the zebrafish. Behav Genet. 2013;43:161–167. doi:10. 2020;133:443–455. doi:10.1007/s00122-019-03475-1 1007/s10519-013-9585-y Eichler EE, Flint J, Gibson G, Kong A, Leal SM, Moore JH, Nadeau JH. Astle W, Balding DJ. Population structure and cryptic relatedness in Missing heritability and strategies for finding the underlying genetic association studies. Stat Sci. 2009;24:451–471. doi:10. causes of complex disease. Nat Rev Genet. 2010;11:446–450. doi: 1214/09-STS307 10.1038/nrg2809 Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023 12 | G3, 2023, Vol. 00, No. 0 Endelman JB. Fully efficient, two-stage analysis of multi- Heslot N, Jannink JL, Sorrells ME. Using genomic prediction to char- environment trials with directional dominance and multi-trait acterize environments and optimize prediction accuracy in ap- genomic selection. bioRxiv, 2022. plied breeding data. Crop Sci. 2013;53:921–933. doi:10.2135/ Endelman JB. Ridge regression and other kernels for genomic selec- cropsci2012.07.0420 tion with R package rrBLUP. Plant Genome. 2011;4:250–255. doi: Inc. SI. SAS/STAT 13.1 User’s Guide: Chapter 43—The GLIMMIX 10.3835/plantgenome2011.08.0024 Procedure. Cary (NC): Author; 2013. Endelman JB, Jannink JL. Shrinkage estimation of the realized rela- Jensen J, Su G, Madsen P. Partitioning additive genetic variance into tionship matrix. G3: Genes, Genomes, Genet. 2012;2:1405–1413. genomic and remaining polygenic components for complex traits doi:10.1534/g3.112.004259 in dairy cattle. BMC Genet. 2012;13:44. doi:10.1186/1471-2156-13-44 Falconer D, Mackay T. Introduction to Quantitative Genetics. Harlow Kang HM, Sul JH, Service SK, Zaitlen NA, Kong Sy, Freimer NB, Sabatti (UK): Longmans Green; 1996. C, Eskin E. Variance component model to account for sample Feldmann M, Covarrubias-Pazaran G, Piepho HP. Data for “complex structure in genome-wide association studies. Nat Genet. 2010; traits and candidate genes”; 2022. [accessed 2022 August 10]. 42:348–354. doi:10.1038/ng.548 https://doi.org/10.5281/zenodo.6981359 Lander ES, Schork NJ. Genetic dissection of complex traits. Science. Feldmann MJ, Piepho HP, Bridges WC, Knapp SJ. Average semivariance 1994;265:2037–2048. doi:10.1126/science.8091226 yields accurate estimates of the fraction of marker-associated gen- Legarra A. Comparing estimates of genetic variance across different etic variance and heritability in complex trait analyses. PLoS Genet. relationship models. Theor Popul Biol. 2016;107:26–30. doi:10. 2021;17:e1009762. doi:10.1371/journal.pgen.1009762 1016/j.tpb.2015.08.005 Feldmann MJ, Piepho HP, Knapp SJ. Average semivariance directly Li B, VanRaden P, Null D, O’Connell J, Cole J. Major quantitative trait yields accurate estimates of the genomic variance in complex loci influencing milk production and conformation traits in trait analyses. G3 (Bethesda). 2022;12:jkac080. doi:10.1093/ Guernsey dairy cattle detected on Bos taurus autosome 19. J g3journal/jkac080 Dairy Sci. 2021;104:550–560. doi:10.3168/jds.2020-18766 Galli G, Alves FC, Morosini JS, Fritsche-Neto R. On the usefulness of Li H, Wang J, Bao Z. A novel genomic selection method combining parental lines GWAS for predicting low heritability traits in trop- GBLUP and LASSO. Genetica. 2015;143:299–304. doi:10.1007/ ical maize hybrids. PLoS ONE. 2020;15:e0228724. doi:10.1371/ s10709-015-9826-5 journal.pone.0228724 Lopdell TJ, Tiplady K, Couldrey C, Johnson TJ, Keehan M, Davis SR, Gaudet MM, Kirchhoff T, Green T, Vijai J, Korn JM, Guiducci C, Segrè Harris BL, Spelman RJ, Snell RG, Littlejohn MD. Multiple QTL AV, McGee K, McGuffog L, Kartsonaki C, et al. Common genetic underlie milk phenotypes at the CSF2RB locus. Genet Sel Evol. variants and modification of penetrance of BRCA2-associated 2019;51:3. doi:10.1186/s12711-019-0446-x breast cancer. PLoS Genet. 2010;6:e1001183. doi:10.1371/ Lynch M, Walsh B. Genetics and Analysis of Quantitative Traits. Vol. journal.pgen.1001183 1. Sunderland (MA): Sinauer; 1998. Gbur EE, et al. Analysis of Generalized Linear Mixed Models in the Mackay TF. Quantitative trait loci in Drosophila. Nat Rev Genet. 2001; Agricultural and Natural Resources Sciences. Vol. 156. Hoboken, 2:11–20. doi:10.1038/35047544 NJ: John Wiley & Sons; 2020. Mackay TFC, Stone EA, Ayroles JF. The genetics of quantitative traits: Gogel B, Smith A, Cullis B. Comparison of a one-and two-stage mixed challenges and prospects. Nat Rev Genet. 2009;10:565–577. doi: model analysis of Australia’s national variety trial southern re- 10.1038/nrg2612 gion wheat data. Euphytica. 2018;214:1–21. Meuwissen T, Hayes B, Goddard M. Genomic selection: a paradigm González-Barrios P, Díaz-García L, Gutiérrez L. Mega-environmental shift in animal breeding. Animal Front. 2016;6:6–14. doi:10. design: using genotype× environment interaction to optimize re- 2527/af.2016-0002 sources for cultivar testing. Crop Sci. 2019;59:1899–1915. Móhring J, Piepho H. Comparison of weighting in two-stage analyses Habier D, Fernando RL, Dekkers JC. The impact of genetic relation- of series of experiments. Crop Sci. 2009;49:1988. ship information on genome-assisted breeding values. Genetics. Morota G, Gianola D. Kernel-based whole-genome prediction of com- 2007;177:2389–2397. doi:10.1534/genetics.107.081190 plex traits: a review. Front Genet. 2014;5:363. Habier D, Fernando RL, Garrick DJ. Genomic BLUP decoded: a look Nishio M, Satoh M. Including dominance effects in the genomic BLUP into the black box of genomic prediction. Genetics. 2013;194: method for genomic evaluation. PLoS ONE. 2014;9:e85792. doi:10. 597–607. doi:10.1534/genetics.113.152207 1371/journal.pone.0085792 Han K, Lee HY, Ro NY, Hur OS, Lee JH, Kwon JK, Kang BC. QTL map- Piepho HP. A coefficient of determination (R2) for generalized linear ping and GWAS reveal candidate genes controlling capsaicinoid mixed models. Biom J. 2019;61:860–872. content in capsicum. Plant Biotech J. 2018;16:1546–1558. doi:10. Piepho HP, Moehring J, Schulz-Streeck T, Ogutu JO. A stage-wise ap- 1111/pbi.12894 proach for the analysis of multi-environment trials. Biom J. 2012; Hayes B, Goddard ME. The distribution of the effects of genes affect- 54:844–860. doi:10.1002/bimj.201100219 ing quantitative traits in livestock. Genet Sel Evol. 2001;33:1–21. Pincot DD, et al. Novel fusarium wilt resistance genes uncovered in doi:10.1186/1297-9686-33-3-209 natural and cultivated strawberry populations are found on three Hayes BJ, Pryce J, Chamberlain AJ, Bowman PJ, Goddard ME. Genetic non-homoeologous chromosomes. Theor Appl Genet. 2022; architecture of complex traits and accuracy of genomic predic- 135(6):1–25. tion: coat colour, milk-fat percentage, and type in holstein cattle Pincot DD, Hardigan MA, Cole GS, Famula RA, Henry PM, Gordon TR, as contrasting model traits. PLoS Genet. 2010;6:e1001139. doi:10. Knapp SJ. Accuracy of genomic selection and long-term genetic 1371/journal.pgen.1001139 gain for resistance to verticillium wilt in strawberry. Plant Heffner EL, Lorenz AJ, Jannink JL, Sorrells ME. Plant breeding with Genome. 2020;13:e20054. doi:10.1002/tpg2.20054 genomic selection: gain per unit time and cost. Crop Sci. 2010; Rice B, Lipka AE. Evaluation of rr-BLUP genomic selection models 50:1681–1690. doi:10.2135/cropsci2009.11.0662 that incorporate peak genome-wide association study signals in Heffner EL, Sorrells ME, Jannink JL. Genomic selection for crop improve- maize and sorghum. Plant Genome. 2019;12:180052. doi:10. ment. Crop Sci. 2009;49:1–12. doi:10.2135/cropsci2008.08.0512 3835/plantgenome2018.07.0052 Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023 M. J. Feldmann et al. | 13 Rodríguez-Álvarez MX, Boer MP, van Eeuwijk FA, Eilers PH. association of complex traits in heterogeneous stock mice. Nat Correcting for spatial heterogeneity in plant breeding experi- Genet. 2006;38:879–887. doi:10.1038/ng1840 ments with p-splines. Spat Stat. 2018;23:52–71. VanRaden PM. Efficient methods to compute genomic predictions. J Rutkoski JE, Poland JA, Singh RP, Huerta-Espino J, Bhavani S, Barbier Dairy Sci. 2008;91:4414–4423. doi:10.3168/jds.2007-0980 H, Rouse MN, Jannink JL, Sorrells ME. Genomic selection for quan- Verbyla AP, Taylor JD, Verbyla KL. RWGAIM: an efficient high- titative adult plant stem rust resistance in wheat. Plant Genome. dimensional random whole genome average (QTL) interval map- 2014;7. doi:10.3835/plantgenome2014.02.0006 ping approach. Genet Res (Camb). 2012;94:291–306. doi:10.1017/ Schreck N, Piepho HP, Schlather M. Best prediction of the additive S0016672312000493 genomic variance in random-effects models. Genetics. 2019; Visscher PM, Brown MA, McCarthy MI, Yang J. Five years of GWAS 213:379–394. doi:10.1534/genetics.119.302324 discovery. Am J Hum Genet. 2012;90:7–24. doi:10.1016/j.ajhg. Schulz-Streeck T, Ogutu JO, Piepho HP. Comparisons of single-stage 2011.11.029 and two-stage approaches to genomic selection. Theor Appl Visscher PM, Hill WG, Wray NR. Heritability in the genomics era— Genet. 2013;126:69–82. doi:10.1007/s00122-012-1960-1 concepts and misconceptions. Nat Rev Genet. 2008;9:255–266. Searle SR, Casella G, McCulloch C. Variance Components. Hoboken, doi:10.1038/nrg2322 NJ: John Wiley & Sons; 1992. Visscher PM, Medland SE, Ferreira MA, Morley KI, Zhu G, Cornes BK, Selle ML, Steinsland I, Hickey JM, Gorjanc G. Flexible modelling of Montgomery GW, Martin NG. Assumption-free estimation of her- spatial variation in agricultural field trials with the R package itability from genome-wide identity-by-descent sharing between INLA. Theor Appl Genet. 2019;132:3277–3293. doi:10.1007/ full siblings. PLoS Genet. 2006;2:e41. doi:10.1371/journal.pgen. s00122-019-03424-y 0020041 Septiningsih EM, Pamplona AM, Sanchez DL, Neeraja CN, Vergara Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, GV, Heuer S, Ismail AM, Mackill DJ. Development of Yang J. 10 years of GWAS discovery: biology, function, and trans- submergence-tolerant rice cultivars: the sub1 locus and beyond. lation. Am J Hum Genet. 2017;101:5–22. doi:10.1016/j.ajhg.2017. Ann Bot. 2009;103:151–160. doi:10.1093/aob/mcn206 06.005 Smith A, Cullis B, Gilmour A. Applications: the analysis of crop var- Visscher PM, Yang J, Goddard ME. A commentary on “Common SNPs iety evaluation data in Australia. Aust N Z J Stat. 2001;43: explain a large proportion of the heritability for human height” by 129–145. doi:10.1111/1467-842X.00163 Yang et al. (2010). Twin Res Hum Genet. 2010;13:517–524. doi:10. Smith A, Cullis BR, Thompson R. The analysis of crop cultivar 1375/twin.13.6.517 breeding and evaluation trials: an overview of current mixed Vitezica ZG, Legarra A, Toro MA, Varona L. Orthogonal estimates of model approaches. J Agric Sci. 2005;143:449–462. doi:10.1017/ variances for additive, dominance, and epistatic effects in popu- S0021859605005587 lations. Genetics. 2017;206:1297–1307. doi:10.1534/genetics.116. Speed D, Balding DJ. Relatedness in the post-genomic era: is it still 199406 useful? Nat Rev Genet. 2015;16:33–44. doi:10.1038/nrg3821 Vitezica ZG, Varona L, Legarra A. On the additive and dominant vari- Spindel J, Begum H, Akdemir D, Collard B, Redoña E, Jannink J, ance and covariance of individuals within the genomic selection McCouch S. Genome-wide prediction models that incorporate scope. Genetics. 2013;195:1223–1230. doi:10.1534/genetics.113. de novo GWAS are a powerful new tool for tropical rice improve- 155176 ment. Heredity. 2016;116:395–408. doi:10.1038/hdy.2015.113 Walsh B, Lynch M. Evolution and Selection of Quantitative Traits. Strandén I, Christensen OF. Allele coding in genomic evaluation. Hoboken, NJ: Oxford University Press; 2018. Genet Sel Evol. 2011;43:1–11. Wray NR, Kemper KE, Hayes BJ, Goddard ME, Visscher PM. Complex Stroup WW, Milliken GA, Claassen EA, Wolfinger RD. SAS for Mixed trait prediction from genome data: contrasting EBV in livestock to Models: Introduction and Basic Applications. Hoboken, NJ: SAS PRS in humans: genomic prediction. Genetics. 2019;211: Institute; 2018. 1131–1141. doi:10.1534/genetics.119.301859 Su G, Christensen OF, Ostersen T, Henryon M, Lund MS. Estimating Xiang T, Christensen OF, Vitezica ZG, Legarra A. Genomic model with additive and non-additive genetic variances and predicting gen- correlation between additive and dominance effects. Genetics. etic merits using genome-wide dense single nucleotide poly- 2018;209:711–723. doi:10.1534/genetics.118.301015 morphism markers. PLoS ONE. 2012;7:e45293. doi:10.1371/ Xin F, Zhu T, Wei S, Han Y, Zhao Y, Zhang D, Ma L, Ding Q. QTL map- journal.pone.0045293 ping of kernel traits and validation of a major QTL for kernel Tang S, Leon A, Bridges WC, Knapp SJ. Quantitative trait loci for gen- length-width ratio using SNP and bulked segregant analysis in etically correlated seed traits are tightly linked to branching and wheat. Sci Rep. 2020;10:1–12. pericarp pigment loci in sunflower. Crop Sci. 2006;46:721–734. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, doi:10.2135/cropsci2005.0006-7 Madden PA, Heath AC, Martin NG, Montgomery GW. Common Taylor J, Jorgensen D, Moffat CS, Chalmers KJ, Fox R, Hollaway GJ, SNPs explain a large proportion of the heritability for human Cook MJ, Neate SM, See PT, Shankar M. An international wheat di- height. Nat Genet. 2010;42:565–569. doi:10.1038/ng.608 versity panel reveals novel sources of genetic resistance to tan Zhang J, Liu F, Reif JC, Jiang Y. On the use of GBLUP and its extension spot in Australia. Theor Appl Genet. 2023;136:61. doi:10.1007/ for GWAS with additive and epistatic effects. G3: Genes, Genomes, s00122-023-04332-y Genetics. 2021;11(7):jkab122. doi:10.1093/g3journal/jkab122 Valdar W, Solberg LC, Gauguier D, Burnett S, Klenerman P, Cookson WO, Taylor MS, Rawlins JNP, Mott R, Flint J. Genome-wide genetic Editor: A. Lipka Downloaded from https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkad148/7219628 by guest on 05 August 2023