Plant Communications
Resource article

llll
Metabolic marker-assisted genomic prediction
improves hybrid breeding
Yang Xu1,7, Wenyan Yang1,7, Jie Qiu2,7, Kai Zhou1, Guangning Yu1, Yuxiang Zhang1, XinWang1,
Yuxin Jiao1, Xinyi Wang1, Shujun Hu1, Xuecai Zhang3, Pengcheng Li1, Yue Lu1, Rujia Chen1,
Tianyun Tao1, Zefeng Yang1, Yunbi Xu4,5,6,* and Chenwu Xu1,*
1Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan

Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University,

Yangzhou 225009, China

2Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai 200234, China

3International Maize and Wheat Improvement Center (CIMMYT), Mexico D.F. 06600, Mexico

4Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China

5BGI Bioverse, Shenzhen 518083, China

6MolBreeding Biotechnology Co., Ltd., Shijiazhuang 050035, China

7These authors contributed equally to this article.

*Correspondence: Yunbi Xu (yunbi.xu@pku-iaas.edu.cn), Chenwu Xu (cwxu@yzu.edu.cn)

https://doi.org/10.1016/j.xplc.2024.101199

ABSTRACT

Hybrid breeding iswidely acknowledged as themost effectivemethod for increasing crop yield, particularly

in maize and rice. However, a major challenge in hybrid breeding is the selection of desirable combinations

from the vast pool of potential crosses. Genomic selection (GS) has emerged as a powerful tool to tackle

this challenge, but its success in practical breeding depends on prediction accuracy. Several strategies

have been explored to enhance prediction accuracy for complex traits, such as the incorporation of func-

tionalmarkers andmulti-omics data.Metabolome-wide association studies (MWAS) help to identifymetab-

olites that are closely linked to phenotypes, known as metabolic markers. However, the use of preselected

metabolic markers from parental lines to predict hybrid performance has not yet been explored. In this

study, we developed a novel approach called metabolic marker-assisted genomic prediction (MM_GP),

which incorporates significant metabolites identified from MWAS into GS models to improve the accuracy

of genomic hybrid prediction. In maize and rice hybrid populations, MM_GP outperformed genomic predic-

tion (GP) for all traits, regardless of the method used (genomic best linear unbiased prediction or eXtreme

gradient boosting). On average, MM_GP demonstrated 4.6% and 13.6% higher predictive abilities than GP

for maize and rice, respectively. MM_GP could also match or even surpass the predictive ability of M_GP

(integrated genomic-metabolomic prediction) for most traits. In maize, the integration of only six metabolic

markers significantly associated with multiple traits resulted in 5.0% and 3.1% higher average predictive

ability compared with GP and M_GP, respectively. With advances in high-throughput metabolomics tech-

nologies and prediction models, this approach holds great promise for revolutionizing genomic hybrid

breeding by enhancing its accuracy and efficiency.

Keywords: genomic prediction, hybrid, metabolome-wide association studies, metabolic marker, predictive

ability

Xu Y., YangW., Qiu J., Zhou K., Yu G., Zhang Y., Wang X., Jiao Y., Wang X., Hu S., Zhang X., Li P., Lu Y., Chen
R., Tao T., Yang Z., Xu Y., and Xu C. (2025). Metabolic marker-assisted genomic prediction improves hybrid
breeding. Plant Comm. 6, 101199.
INTRODUCTION

Hybrid breeding has proved to be the most efficient approach for

increasing yield potential in various crops, notably maize and rice
Plant Communications 6, 101199, March 10 202
CAS Center for Excellence in Molecular Plant Sciences, Chinese

This is an open access article under the
(Tu et al., 2000; Duvick, 2001). However, selection of the optimum

combinations from a wide range of potential crosses presents a

great challenge in hybrid breeding. Genomic selection (GS) has

emerged as a solution to this challenge, using genome-wide
5 ª 2024 The Authors. Published by Elsevier Inc. on behalf of
Academy of Sciences, and Chinese Society for Plant Biology.
CC BY license (http://creativecommons.org/licenses/by/4.0/).

1

mailto:yunbi.xu@pku-iaas.edu.cn
mailto:cwxu@yzu.edu.cn
https://doi.org/10.1016/j.xplc.2024.101199
http://creativecommons.org/licenses/by/4.0/


Plant Communications Metabolic marker-assisted genomic prediction
markers to predict the genomic values of individuals before phe-

notyping (Meuwissen et al., 2001; Hickey et al., 2014). Genomic

hybrid breeding, a special form of GS, leverages markers

derived from parental lines to predict hybrid performance,

thereby reducing breeding cycles and enhancing genetic gain

(Xu et al., 2014; Crossa et al., 2017; Cui et al., 2020). Several

studies have confirmed the effectiveness of genomic hybrid

breeding (Technow et al., 2014; Zhao et al., 2015; Yang et al.,

2022). The success of GS in practical breeding largely depends

on the accuracy of genomic prediction (GP) (Xu et al., 2021a).

Despite the availability of whole-sequence information, GS may

not fully capture the intricate interactions among genes and

their downstream regulation, which are integral to the entire

process linking genotype to phenotype (Westhues et al., 2017;

Hu et al., 2019). For complex quantitative traits, particularly

those heavily influenced by environmental factors, such as

grain yield, there exists a bottleneck that hinders the

improvement of prediction accuracy (Xu et al., 2020; Resende

et al., 2024).

With advances in high-throughput molecular biotechnology,

it has become possible to predict phenotypes using metabolo-

mic data. The metabolome serves as a link between genotype

and phenotype, offering the potential to enhance predictive

abilities compared with genomic data by shedding light on

downstream interactions (Washburn et al., 2020). For example,

the predictive ability of metabolomic data from parental

lines to predict the yield of rice hybrids was nearly twice

that of genomic data (Xu et al., 2016). Using 56 110 SNPs

and 130 metabolites from 285 maize inbred lines and two

testers, the general combining abilities of seven traits in

maize were predicted, and the results indicated comparable

predictive abilities between the two data types (Riedelsheimer

et al., 2012). The integration of multi-omics data is

increasingly being explored to further enhance prediction

accuracy. The combination of genomic, metabolomic, and tran-

scriptomic data can significantly improve predictive abilities for

various agronomic traits across diverse plant species (Hu et al.,

2021; Wu et al., 2022), highlighting the potential of integrating

genomic and metabolomic data to enhance genomic

prediction accuracy.

The incorporation of prior or preselected biological information

into GPmodels is another viable approach to enhance prediction

accuracy. For instance, the integration of GWAS findings into

genomic best linear unbiased prediction (GBLUP) resulted in a

4.8% improvement in the prediction of loin muscle area in pigs

(Liu et al., 2023a). Similarly, the use of single-nucleotide

polymorphisms (SNPs) preselected from whole-genome se-

quencing (WGS) data on the basis of expression quantitative trait

locus mapping of all genes led to better predictive abilities for

startle responses in fruit flies compared with the use of WGS

data alone (Ye et al., 2020). In rice, the GS + de novo GWAS

strategy outperformed six other models in a tropical breeding

population across several traits and environments (Spindel

et al., 2016). Together, these studies suggest that the

integration of prior or preselected biological information can

further enhance the accuracy of GS.

Previous studies have demonstrated the effectiveness of

metabolome-wide association studies (MWAS) in identifying
2 Plant Communications 6, 101199, March 10 2025
metabolic markers, i.e. metabolites that are closely linked to phe-

notypes (Gamboa-Becerra et al., 2019; Xu et al., 2021b). Because

of the high dimensionality, noise, and variability in metabolomics

data, the identification of metabolic markers is challenging.

Current methods for the detection of metabolic markers include

partial least-squares discriminant analysis, orthogonal partial

least-squares discriminant analysis, artificial neural networks,

support vectormachines, andothermultivariate analysismethods

(Worley and Powers, 2013). In a study involving 368 maize inbred

lines, 43 metabolites significantly associated with 100-kernel

weight were identified using stepwise regression (Wen et al.,

2014). Using an improved least absolute shrinkage and selection

operator (LASSO)method, 15metabolites significantly associated

with six agronomic traits were identified in 339 maize inbred lines

(Xu et al., 2017). A simulation study indicated that the LASSO

method had the highest power and lowest false-positive rate

among four MWAS methods, detecting 25 metabolites signifi-

cantly associated with yield-related traits in 533 rice varieties

(Wei et al., 2018). These metabolic markers directly influence

phenotypic traits, reflecting immediate physiological status and

environmental interactions, and are thus expected to provide

more accurate predictions. However, the integration of such

preselectedbiological information intoGS remains tobe explored.

In this study, we developed a novel approach called metabolic

marker-assisted GP (MM_GP), which incorporates significant

metabolites identified from parental lines by MWAS into GS

models to improve the accuracy of hybrid prediction. The

performance of MM_GP was evaluated using 425 maize

hybrids derived from 205 inbred lines and 278 rice hybrids from

210 recombinant inbred lines (RILs). The proposed MM_GP

approach offers a distinct advantage in refining GP, facilitating

more precise and effective selection for desirable traits in crop

hybrid breeding.
RESULTS

Metabolite profiling of seedling leaves in maize inbred
lines

Using a non-targeted liquid chromatography–mass spectrometry

(LC–MS) method, 925 metabolites were identified from the seed-

ling leaves of 205maize inbred lines, eachwith two biological rep-

licates. After excluding metabolites with significantly different

concentrations (p < 0.01) between replicates, 777metabolites re-

mained. Among these metabolite features, 557 were annotated

and classified into 11 categories (Figure 1A and Supplemental

Table 1). The three most abundant categories were

benzenoids (14.0%), organic oxygen compounds (13.6%), and

organoheterocyclic compounds (13.5%). Levels of metabolite

accumulation varied substantially among the inbred lines, with

an average coefficient of variation (CV) of 72.8%. A majority of

the metabolites (66.0%) exhibited a CV of >50%, particularly

the benzenoids (Figure 1B and Supplemental Table 1).
Identification of metabolic markers that influence
agronomic traits in maize

Using the LASSO method, 78 significant metabolites were

identified in maize inbred lines by MWAS: 30, 28, 31, and 24

metabolites for ear weight (EW), ear grain weight (EGW), ear

diameter (ED), and ear length (EL), respectively (Figure 2A and


Figure 1. Metabolic profiling of 777metabolites from 205maize
inbred lines.
(A) Classification of 777 metabolites.

(B) Distribution of the coefficients of variation (CVs) of 777 metabolites.

Metabolic marker-assisted genomic prediction Plant Communications
Supplemental Table 2). Forty-seven of the identified metabolites

were annotated and classified into 10 categories, with

benzenoids (17.0%), organic oxygen compounds (14.8%), and

phenylpropanoids and polyketides (14.8%) being the most

numerous. In addition, 28, six, and one metabolites showed

significant associations with two, three, and four traits,

respectively (Supplemental Table 3). For instance, metabolite

m863 (salicylic acid) exhibited significant correlations with both

EW and EGW. Metabolite m36 (leucine) had significant

associations with EW, EGW, and EL, and metabolite m111

(taurine) was significantly associated with all four traits.

The percentage of phenotypic variation explained depended on

traits and metabolic markers, ranging from 1.0% to 6.0%

(Supplemental Table 2). Metabolite m126 (hypoxanthine)

explained the most phenotypic variation for EW and ED, and

m136 (valeric acid) and m36 (leucine) were the top contributors

to EGW and EL, respectively. Functional enrichment analysis

was performed on the 47 annotated metabolic markers,

resulting in the identification of 22 enriched metabolic

pathways. The top five pathways were pyruvate metabolism,

galactose metabolism, linoleic acid metabolism, purine

metabolism, and pyrimidine metabolism (Figure 2B and

Supplemental Table 4). Notably, the enrichment of pyruvate

metabolism reached a significant level.

Evaluation of MM_GP for hybrid prediction in maize

To examine the capacity of MM_GP for hybrid prediction in

maize, we compared the predictive abilities of five prediction
models: GP,metabolomic prediction (MP), metabolicmarker pre-

diction (MMP), integrated genomic-metabolomic prediction

(M_GP), and metabolic marker-assisted GP (MM_GP). Metabo-

lites that showed significant associations with the target trait

were considered to be metabolic markers and were used in

MMP and MM_GP. The predictive abilities from 10-fold cross-

validation with 20 repetitions varied from 0.259 to 0.499 for GP,

0.130 to 0.442 for MP, 0.076 to 0.237 for MMP, 0.269 to 0.494

for M_GP, and 0.268 to 0.503 for MM_GP across the four agro-

nomic traits tested (Figure 3). Among these traits, prediction

performance was highest for ED, followed by EW, EGW, and

EL. Among the models, MP and MMP exhibited the worst

prediction performance.MM_GPdisplayed better predictive abil-

ities than GP. Specifically, with GBLUP, MM_GP improved the

predictive ability for EW by 4.1%, EGW by 5.3%, ED by 0.8%,

and EL by 2.7%. Similarly, with eXtreme gradient boosting

(XGBoost), MM_GP increased predictive ability for EW by

5.2%, EGW by 4.4%, ED by 4.2%, and EL by 9.7%.

The predictive ability of MM_GP also matched or even exceeded

that of M_GP. When using GBLUP, MM_GP increased predictive

ability by 1.8% for EW, 5.9% for EGW, and 1.8% for

ED compared with M_GP, although their predictive abilities for

ELwere similar. When using XGBoost, MM_GP increased predic-

tive ability by 3.0% for EW, 3.3% for EGW, 0.5% for ED, and 5.4%

for EL compared with M_GP. Notably, M_GP did not improve the

predictive ability for some traits compared with GP, whereas

MM_GP did. For example, in the case of EGW with GBLUP,

M_GP decreased predictive ability by 0.6% compared with GP,

whereas MM_GP increased it by 5.3%. Overall, MM_GP consis-

tently performed the best among the five models, regardless of

the method used (GBLUP or XGBoost).

To determine whether the enhanced predictive ability of MM_GP

was attributable to the small number of metabolic markers, we

randomly selected an equal number of metabolites from the me-

tabolomic data to match the number of metabolic markers.

Across an average of 10 replicated samples, the predictive abili-

ties of the randomly selected metabolites for assisting in GP were

significantly lower than those of MM_GP (Figure 4). Specifically,

using GBLUP, the randomly selected metabolites resulted in a

significant decrease in predictive ability for EW, EGW, ED, and

EL by 4.8%, 5.8%, 1.4%, and 4.1%, respectively, compared

with MM_GP. Similarly, with XGBoost, the randomly selected

metabolites significantly reduced predictive ability for EW,

EGW, and EL by 6.3%, 6.6%, and 7.1%, respectively. Therefore,

we conclude that the improved predictive ability of MM_GP

cannot be attributed solely to the small number of metabolic

markers.
Integration of shared significant metabolic markers in
MM_GP

Six metabolites were found to be significantly associated with

three or more traits (Figure 5A). To test the contribution of these

shared significant metabolic markers to GP, we combined them

with genomic data to predict the four traits in hybrid maize

(Figure 5B). The predictive abilities using GBLUP were 0.387

(EW), 0.349 (EGW), 0.502 (ED), and 0.260 (EL), and those using

XGBoost were 0.392 (EW), 0.338 (EGW), 0.482 (ED), and 0.283

(EL).MM_GP, which integrated the six sharedmetabolicmarkers,
Plant Communications 6, 101199, March 10 2025 3


Figure 2. Identification of metabolites associated with four traits in maize.
(A) Metabolites significantly associated with four traits of 205 maize inbred lines. The horizontal black lines represent the critical values at the 0.05

significance level.

(B)Enriched pathways of metabolic markers.

Plant Communications Metabolic marker-assisted genomic prediction
showed greater predictive ability than GP and M_GP. Compared

with GP, MM_GP with GBLUP significantly increased predictive

ability by 3.6% for EW and 6.3% for EGW, although their predic-

tive abilities for ED and EL were similar. Likewise, MM_GP with

XGBoost significantly increased predictive ability by 6.8% for
4 Plant Communications 6, 101199, March 10 2025
EW, 7.6% for EGW, 6.0% for ED, and 9.4% for EL. Compared

with M_GP, MM_GP with GBLUP significantly increased predic-

tive ability by 6.9% for EGW and 1.7% for ED, and MM_GP with

XGBoost significantly increased predictive ability by 4.6% for

EW, 6.4% for EGW, 2.2% for ED, and 5.1% for EL. These findings


Figure 3. Predictive abilities for four traits in 425 maize hybrids obtained from five prediction models using GBLUP and XGBoost
methods.
The four traits are ear weight (EW), ear grain weight (EGW), ear diameter (ED), and ear length (EL). The five prediction models are GP, MP, MMP, M_GP,

and MM_GP, representing genomic prediction, metabolomic prediction, metabolic marker prediction, integrated genomic–metabolomic prediction, and

metabolicmarker-assisted genomic prediction, respectively. In each histogram, different lowercase letters above the bars indicate significant differences

(p < 0.05) between the models.

Metabolic marker-assisted genomic prediction Plant Communications
highlight the greater potential of MM_GP to improve the accuracy

of genomic hybrid prediction compared with other methods.
Evaluation of MM_GP for hybrid prediction in rice

To confirm the advantages ofMM_GP observed inmaize, we per-

formed a similar analysis in rice. Using the LASSO method, we

detected 171 metabolites significantly associated with four traits

in rice RIL populations: 48 for yield per plant (YIELD), 40 for tiller

number per plant (TILLER), 55 for grain number per panicle

(GRAIN), and 64 for 1000-grain weight (KGW) (Figure 6

and Supplemental Table 5). Among these metabolites, 138

were significantly associated with one trait, 30 with two traits,

and three with three traits (Supplemental Table 6). For example,

metabolite m0149-L (sn-glycero-3-phosphocholine) was signifi-

cantly associated with only one trait (YIELD), m0092-L (D-panto-

thenic acid) with two traits (YIELD and GRAIN), and m0643-L

(chrysoeriol C-hexoside derivative) with three traits (YIELD,

GRAIN, and KGW). No metabolites were significantly associated

with all the tested traits.

We next examined the predictive abilities of the five aforemen-

tioned models for four traits in hybrid rice (Figure 7). Predictive

abilities varied from 0.138 to 0.694 for GP, 0.120 to 0.673 for

MP, 0.128 to 0.531 for MMP, 0.178 to 0.707 for M_GP, and

0.190 to 0.712 for MM_GP across the four agronomic traits.

MM_GP and M_GP performed well for most traits, whereas

MMP performed poorly. Comparison of the predictive abilities

of GP and MM_GP for the four traits in hybrid rice yielded results

consistent with those in maize. Using GBLUP, MM_GP demon-
strated significantly higher predictive ability for YIELD (by

37.5%), TILLER (13.6%), GRAIN (15.4%), and KGW (2.6%)

compared with GP. Using XGBoost, MM_GP significantly outper-

formed GP for three traits: YIELD (by 8.3%), TILLER (16.7%), and

GRAIN (17.5%). MM_GP also outperformed M_GP in the predic-

tion of TILLER, GRAIN, and KGW. Using GBLUP, MM_GP ex-

hibited significantly higher predictive ability for TILLER (by

6.1%) and GRAIN (3.4%). Using XGBoost, MM_GP exhibited

significantly higher predictive ability for TILLER (by 26.2%) and

KGW (14.5%). On average, MM_GP increased predictive ability

by 3.4% (relative to M_GP), 13.6% (relative to GP), and 24.1%

(relative to MP) across all traits and methods. These findings

demonstrate the greater potential of MM_GP in hybrid rice

compared with other tested methods.

We then compared the predictive ability of metabolic markers

with that of an equivalent number of randomly selected metabo-

lites and observed results similar to those found in maize

(Supplemental Figure 1). Specifically, using GBLUP, the

randomly selected metabolites significantly reduced the predic-

tive ability for YIELD, TILLER, GRAIN, and KGW by 4.8%,

10.0%, 9.4%, and 1.8%, respectively, compared with MM_GP.

Similarly, using XGBoost, the randomly selected metabolites

significantly reduced the predictive ability for YIELD, TILLER,

GRAIN, and KGW by 6.9%, 27.5%, 24.0%, and 2.9%. We also

analyzed the metabolites in two tissues, flag leaves and germi-

nated seeds, and evaluated the MM_GP model separately for

these two tissues (designated MM_GP_leaf and MM_GP_seed).

Predictive ability ranged from 0.201 to 0.717 for MM_GP_leaf

and from 0.158 to 0.704 for MM_GP_seed across the four traits
Plant Communications 6, 101199, March 10 2025 5


Figure 4. Predictive abilities for four traits in
hybrid maize obtained from integrated
genomic data and randomly selected
metabolites using GBLUP and XGBoost
methods.
The number of randomly selected metabolites

corresponds to the number of metabolic markers.

**p < 0.01.

Plant Communications Metabolic marker-assisted genomic prediction
(Supplemental Figure 2). Notably, MM_GP_leaf exhibited a

higher predictive ability than MM_GP_seed. Using GBLUP,

MM_GP_leaf demonstrated significantly greater predictive ability

for YIELD (by 27.4%), TILLER (17.8%), GRAIN (11.0%), and KGW

(1.9%) compared with MM_GP_seed. Using XGBoost, MM_GP_

leaf significantly outperformed MM_GP_seed for TILLER (by

21.9%) and GRAIN (24.9%).

Predicting untested crosses using MM_GP

Using parameters estimated from the training sample, we pre-

dicted EW for all 20 910 potential hybrids in maize and YIELD

for 21 945 potential hybrids in rice using the MM_GP model.

The average predicted values of the top 100 crosses were signif-

icantly higher than those of the bottom 100 crosses for both EW

and YIELD (Supplemental Tables 7 and 8). When GBLUP was

used, the average predicted values of the top 100 crosses for

EW and YIELD increased by 62.7% and 48.8%, respectively,

compared with the average predicted phenotypic values of the

bottom 100 crosses. Similarly, when XGBoost was used, the

average predicted values of the top 100 crosses for EW and

YIELD rose by 60.5% and 50.4%, respectively, compared with

the average predicted phenotypic values of the bottom 100

crosses. Supplemental Figures 3 and 4 illustrate the average

predicted phenotypic values of EW and YIELD when selecting

the top crosses for hybrid breeding. For instance, if the top 10

crosses predicted by XGBoost were used for hybrid breeding,

the average predicted EW and YIELD of these crosses would

be 198.27 and 51.15, respectively, indicating gains of 26.6%

and 17.9% in EW and YIELD. If the top 10 crosses predicted by

GBLUP were used for hybrid breeding, the average predicted

values would be 198.36 for EW and 52.11 for YIELD, reflecting

gains of 26.4% and 19.6% in EW and YIELD, respectively.

DISCUSSION

In this study, we propose an innovative approach,MM_GP, which

first integrates metabolic markers from parental lines with GS

models to predict hybrid performance in maize and rice popula-

tions. Our findings indicate that incorporating a small proportion

of selected metabolic markers enhances the accuracy of GP.

Compared with conventional GP models, the integration of me-
6 Plant Communications 6, 101199, March 10 2025
tabolomic data resulted in higher predictive

abilities for maize (1.8%) and rice (12.6%),

and the integration of selected metabolic

markers increased predictive abilities further

(4.6% for maize and 13.6% for rice), high-

lighting the potential of leveraging metabolic

data to predict yield-related traits. This result

may be due to the additional genetic infor-

mation implicitly captured by metabolites.

Whereas GP models focus on genetic varia-
tions at the gene level, M_GP and MM_GP are capable of

capturing a broader spectrum of genetic variation and physiolog-

ical epistasis (Fernie and Schauer, 2009; Feher et al., 2014; Guo

et al., 2016; Wang et al., 2021b).

Integration of selected metabolic markers has shown promise

in enhancing predictive abilities, potentially surpassing the

integration of entire metabolomic data. Our analysis indicated

that the MM_GP model generally exhibited superior predictive

abilities compared with the M_GP model in maize and rice

populations. Notably, the integration of only six selected meta-

bolic markers significantly associated with multiple traits re-

sulted in 3.1% higher predictive ability compared with the

M_GP model in maize. This improvement may be attributed

to the benefits of feature selection (Xu et al., 2022). Feature

selection not only reduced overfitting in the MLR algorithm

but also significantly improved the predictive ability of the

GLM algorithm for rapeseed seed yield (Shahsavari et al.,

2023). In Chinese Holsteins, the use of regularized regression

models for feature selection of WGS data demonstrated that

combining preselected SNPs with 50K SNP chip data could

improve the predictive abilities for milk, protein, and fat yields

compared with WGS data and 50K SNP chip data alone (Li

et al., 2022). In our study, the identification of metabolic

markers via MWAS enabled feature selection of metabolomic

data, potentially aiding in the elimination of irrelevant or

redundant features, preventing overfitting, and enhancing

model generalization.

The improved predictive ability of MM_GP might also be attrib-

uted to the incorporation of prior biological information. This

assertion is supported by a comparison of the predictive perfor-

mance of selected metabolic markers with an equivalent number

of randomly selected metabolites. Through integration of GWAS

results frompublic databases, GS accuracy increased for two out

of three traits in a dairy cattle dataset and nine out of 11 traits in a

rice dataset (Zhang et al., 2014). The inclusion of significant SNPs

from GWAS improved the prediction accuracy of GS models for

1000-grain weight and amylose content in hybrid rice (Yu et al.,

2022) and for nine agronomic traits by 4.0%–19.9% in rice

(Zhang et al., 2023). Selection of optimal marker sets and


Figure 5. Metabolites significantly associated with three or more traits in maize.
(A) The number of metabolites significantly associated with four traits of 205 maize inbred lines. The red font indicates the numbers of metabolites

significantly associated with three or more traits.

(B) Predictive abilities for four traits in hybrid maize obtained from MM_GP using GBLUP and XGBoost methods with metabolic markers identified from

the parental lines. In each histogram, different lowercase letters above the bars indicate significant differences (p < 0.05) between the models.

Metabolic marker-assisted genomic prediction Plant Communications
prediction of phenotypes in rice and soybean data using the

GMStool developed for GWAS analysis demonstrated higher

prediction accuracy than using all SNP markers (Jeong et al.,

2020). Other studies also showed that integration of prior

GWAS information enhanced predictive ability in livestock

species and traits, such as live weight in alpine merino sheep

(Li et al., 2023), milk fatty acid composition in dairy cattle

(Gebreyesus et al., 2019), and multiple traits in Hanwoo beef

cattle (de Las Heras-Saldana et al., 2020). These studies

underscore the advantages of incorporating existing biological

knowledge at the DNA level. Our results suggest that leveraging

prior information at the metabolite level can improve predictive
ability in maize and rice, offering potential for wider applications

across diverse populations and crop species.

The improved predictive ability of MM_GP relative to GP was

significantly greater in rice, with an increase of up to 13.6%,

compared with a 4.6% improvement in maize. This discrepancy

may stem from the tissues used for metabolite analysis and the

timing of sample collection (Westhues et al., 2017). In maize, the

predictive ability for 100-grain weight in tropical and subtropical

environments using metabolites from mature seeds was compa-

rable to that using genomic data, as metabolites in mature seeds

are directly linked to yield (Guo et al., 2016). In our study, maize
Plant Communications 6, 101199, March 10 2025 7


Figure 6. Metabolites significantly associated with four traits in 210 rice RILs.
The horizontal black lines represent the critical values at the 0.05 significance level.

Plant Communications Metabolic marker-assisted genomic prediction
metabolomic datawere obtained fromseedling leaves in a climate

chamber, whereas ricemetabolomic datawere obtained fromflag

leaves and germinated seeds, which are more relevant to yield

traits. The instability of metabolites in phenotype prediction

arises from the dynamic nature of metabolic profiles.

Characteristic-level perturbations in metabolites are significantly

greater than those in genomic sequences or marker data and

are susceptible to variations in sampling conditions, as well as

the age and type of tissue (Schrag et al., 2018). Therefore, to

enhance prediction accuracy effectively, it is crucial to be

explicit about the time points or tissues being sampled. Our

study focused on maize metabolomic data collected from

seedlings in climate chambers to minimize the impact of

environmental fluctuations compared with field conditions.

Previous studies have shown the viability of using metabolic

profiles obtained from 3.5-day-old roots cultivated in climate

chambers for prediction of hybrid performance (de Abreu e

Lima et al., 2017). The use of metabolomics in hybrid breeding

can benefit from sampling seedlings under controlled

conditions, enabling year-round evaluation with available parental

lines and simultaneous sampling of multiple tissues such as

leaves and roots. The shorter cultivation period leads to more

rapid availability of prediction results when developing superior

hybrids for further testing (Schrag et al., 2018). Although

metabolites in tissues at later developmental stages, such as

mature seeds, are associated with yield-related traits, time and

resource costs must also be considered. Early-stage sampling
8 Plant Communications 6, 101199, March 10 2025
under controlled conditions facilitates early selection, thereby

reducing breeding cycles and enhancing annual genetic gain.

We also used MM_GP to predict the phenotypic values of

20 910 potential hybrids for EW in maize. The genotypes and

metabolites of these future hybrids are not directly measured;

instead, they are inferred from their parental lines. The top

crosses can be immediately used and transformed into high-

performing hybrids. In addition, selection of the top 100 crosses

for EW results in gains of 192:24 � 156:96 = 35:28± 2:68 and

191:68 � 156:66 = 35:02± 2:69 g per plant when using

GBLUP and XGBoost, respectively. Although the improvement

in predictive ability of MM_GP in maize appears modest, the

gains of 35:28=156:96 = 22:5% and 35:02=156:66 = 22:4%

achieved through selection of the top 100 hybrids using

GBLUP and XGBoost, respectively, represent a noteworthy

accomplishment. Among the top 100 maize crosses, A017/

A037 had been designated as Suyu 161, a variety developed

by Jiangsu Yanjiang Institute of Agricultural Sciences, China.

It is worth noting that 24 and nine crosses exhibited a predicted

EW greater than that of A017/A037 when using GBLUP and

XGBoost, respectively. These crosses merit further validation

and could contribute to the development of new varieties aimed

at enhancing maize yield.

In this study, we identified metabolites significantly associated

with agronomic traits of maize and rice. The well-predicted


Figure 7. Predictive abilities for four traits in 278 rice hybrids obtained from five prediction models using GBLUP and XGBoost
methods.
The four traits are yield per plant (YIELD), tiller number per plant (TILLER), grain number per panicle (GRAIN), and 1000-grain weight (KGW). The five

prediction models are GP, MP, MMP, M_GP, and MM_GP. In each histogram, different lowercase letters above the bars indicate significant differences

(p < 0.05) between the models.

Metabolic marker-assisted genomic prediction Plant Communications
metabolic markers exhibited various degrees of correlation,

showing a roughly equal distribution of both positive and negative

correlations. The correlation coefficients ranged from �0.41 to

0.97 in maize and from �0.71 to 0.98 in rice (Supplemental

Tables 9 and 10). A total of 411 significant correlations

(p < 0.01) were identified in maize compared with 3350 in rice.

Notably, significant correlations were observed not only

between metabolic markers within the same categories but

also between markers from different categories (Supplemental

Figures 5 and 6). In addition, in maize, nine metabolic markers

were associated with shared metabolic pathways and exhibited

either upstream or downstream associations (Supplemental

Table 4). For instance, metabolites m819 (S-lactoylglutathione)

and m838 (malic acid) are both involved in pyruvate

metabolism. Metabolites m126 (hypoxanthine), m893 (inosine),

and m98 (deoxyguanosine) are associated with purine

metabolism. A literature search and information from the Kyoto

Encyclopedia of Genes and Genomes database revealed that,

among these metabolites, m819 (S-lactoylglutathione) can be

converted to m838 (malic acid) through several pathways (Long

et al., 2015; Dafre et al., 2017; Schw€orer et al., 2021).

Metabolites m893 (inosine) and m126 (hypoxanthine) can be

interconverted via laccase domain containing 1 (LACC1)

(Svetlana et al., 2022).

By assessing the phenotypic variation explained by parental ge-

notypes for 78 metabolic markers in maize, we found that these

markers are influenced by parental genotypes to various de-

grees. Specifically, parental genotypes explained less than

10% of the phenotypic variation in 16 metabolic markers, be-

tween 10% and 50% in 37 metabolic markers, and more than
50% in 25 metabolic markers (Supplemental Table 11). An

metabolome-based genome-wide association study (mGWAS)

analysis of metabolic markers using the FarmCPU (fixed and

random model circulating probability unification) method (Liu

et al., 2016), detected a total of 30, 19, 75, and 111 significant

(p < 4:83 10� 7) SNPs corresponding to nine, seven, 13, and 15

metabolite markers for EW, EGW, ED, and EL, respectively

(Supplemental Figure 7 and Supplemental Table 12). Notably,

four common significant SNPs were identified. SNP_3_

16890062 and SNP_3_223717387, both located on

chromosome 3, were significantly associated with metabolites

m126 (hypoxanthine) and m753 (ortho-hydroxyphenylacetic

acid); SNP_1_197177004 was significantly associated

with metabolites m706 (parthenin) and m375 (histamine);

and SNP_7_120230279 was significantly associated with

metabolites m614 and m684. These findings suggest shared

genetic control over these metabolites. In summary, our study

identified a set of SNPs that regulate significant metabolites

associated with maize yield traits. These results will facilitate

the functional verification of genes and enhance our

understanding of metabolic networks, ultimately contributing to

the improvement of maize yield.

Some of these metabolic markers play key roles in various plant

growth and development processes, directly or indirectly influ-

encing agronomic traits. For instance, metabolite m838 (malic

acid) was significantly correlated with EGW and EL in maize. Pre-

vious research also found that malic acid was linked to flag-leaf

width in wheat (Shi et al., 2020). Malic acid, an organic acid,

plays an essential part in regulating carbon metabolism in

plants by linking mitochondrial respiratory metabolism to
Plant Communications 6, 101199, March 10 2025 9


Plant Communications Metabolic marker-assisted genomic prediction
cytosolic biosynthetic pathways. It has important functions in the

tricarboxylic acid cycle and metabolic signaling as well (Shan

et al., 2023). Another metabolite, m36 (leucine), was found to

be related to EW, EGW, and EL in maize. An association

between leucine and heading date has been reported in rice (Li

et al., 2019). Leucine has been shown to regulate stress

tolerance via the plant’s respiratory system (Pires et al., 2016)

and can also serve as a plant growth regulator to increase

antioxidant capacity and heat resistance (Liu et al., 2023b).

Metabolite m863 (salicylic acid) was found to be associated

with maize EW and EGW in the present study, and salicylic acid

has also been identified at three developmental stages of

wheat, namely grain-filling kernels, mature kernels, and

germinating kernels (Yin et al., 2024). Another metabolite,

m0021-L (trigonelline), which was associated with yield per plant

and grain number per panicle in rice in our analysis, has also

shown correlations with grain width (Chen et al., 2016; Wei

et al., 2018) and grain length (Li et al., 2019). Trigonelline, an

alkaloid, plays an important role in the regulation of cell growth

and development (Mazzuca et al., 2000). A study on peanuts

suggested that reduction of trigonelline level could enhance

peanut yield (Cho et al., 2011). Identification of these

metabolites can help to reveal biological networks involving

genomic loci, metabolites, and traits, enabling us to better

understand the genetic mechanisms that underlie different traits.

Our research demonstrates the distinct advantages of metabolic

marker-assisted GP (MM_GP) for hybrid prediction in two staple

crops, maize and rice. With advances in high-throughput metab-

olomics technologies and prediction models, this approach has

the potential to transform GS by improving its accuracy and effi-

ciency. It not only accelerates the crop breeding process by

enabling early selection but also offers valuable insights for ad-

vances in precision breeding.

METHODS

Maize materials

The maize plant materials consisted of 425 hybrids produced using a

sparse partial diallel crossing experiment involving 205 inbred lines

that were a subset of a previously described maize panel (Wang et al.,

2021a). These maize materials were planted in Yangzhou (119.27� E,

32.36� N) and Taian (116.39� E, 35.83� N) in 2018, following a

randomized block design with two rows and two replications. Each

row contained 13 plants with a plant spacing of 25 cm and a row

spacing of 60 cm. Field management practices, including irrigation,

weeding, disease and pest control, and fertilization, were performed

according to local plot-trial management guidelines. For each inbred

line and hybrid, five maize ears of uniform size were selected for

evaluation of four traits: EW, EGW, ED, and EL. The 205 maize inbred

lines were genotyped using the genotype-by-sequencing method

using fresh young leaves collected during the vegetative growth stage.

After filtering SNPs with low allelic frequency (<0.05) and high missing

rates (>0.1), 104 011 high-quality SNPs were retained for subsequent

analysis. The genotypes of the 425 hybrids were inferred from those of

their parents.

Metabolite analysis by LC–MS

Non-targeted LC–MSwas used to analyze metabolites in seedling leaves

of 205 maize inbred lines. For each maize material, plump and uniform

maize seeds were selected for hydroponic experiments in a climate

chamber under controlled conditions. Two biological replicates were es-

tablished for each material, with 10 plants per replicate. At the three-leaf,
10 Plant Communications 6, 101199, March 10 2025
one-heart stage, leaves from three plants per replicate were collected for

metabolomic analysis. These samples were promptly frozen in liquid ni-

trogen and transferred to �80�C. Each sample was weighed to 200 mg

(±1%) in a 2-ml EP tube with 0.6 ml of methanol (�20�C) containing 4

ppm 2-chlorophenylalanine. The mixture was vortexed for 30 s, followed

by grinding in a tissue-grinding machine at 65 Hz for 60 s and ultrasonic

crushing at 40 kHz for 30min. The samples were then centrifuged at 25�C
and spun at 12 000 rpm for 10 min. The filtered supernatant (300 ml) was

transferred to a sample bottle for LC–MS analysis. Chromatographic

separation was performed on a Thermo Vanquish system equipped

with an ACQUITY UPLC HSS T3 column (1503 2.1 mm, 1.8 mm, Waters)

maintained at 40�C. The temperature of the autosampler was set to 8�C.
The gradient elution conditions are given in Supplemental Tables 13 and

14. The ESI-MSn experiments were performed using a Thermo Q Exac-

tive mass spectrometer with a spray voltage of 3.8 kV in positive mode

and�2.5 kV in negative mode. The raw data were converted into mzXML

format using ProteoWizard software (version 3.0.8789). The XCMS pack-

age in R (version 3.1.3) was used for peak identification, filtration, and

alignment. A data matrix containing information on mass-to-charge ratio

(m/z), retention time, intensity, and other relevant details was generated.

To facilitate comparison of data across different magnitudes, the inten-

sity values were subjected to batch normalization. The identification of

metabolites was initially confirmed on the basis of exact molecular

weight (with a molecular weight error of %30 ppm), followed by

analysis of the tandem mass spectrometry fragmentation pattern. The

Human Metabolome Database (HMDB) (http://www.hmdb.ca/),

METLIN (http://metlin.scripps.edu), MassBank (http://www.massbank.

jp/), LipidMaps (http://www.lipidmaps.org), mzCloud (https://www.

mzcloud.org), and the Panomix proprietary standard database were

used to verify annotations and identify metabolites.

Statistical analysis of metabolomic data

Statistical significance testing was performed on the concentration of

each detected metabolite in the two biological replicates. Metabolites

that showed a significant difference (p < 0.01) between the two replicates

were excluded, leaving 777 metabolites for further analysis. The metabo-

lite concentrationswere normalized, and themean value of the two biolog-

ical replicates was used for subsequent analysis. The CV was calculated

for eachmetabolite, and the phenotypic variation explained by eachmeta-

bolic marker was determined by the relevant r2. The LASSO method

(Tibshirani, 1997) was used in an MWAS to identify metabolites

significantly associated (p < 0.05) with agronomic traits of the parental

lines. Specifically, the lassopv/R package was used for LASSO computa-

tion (Wang and Michoel, 2017). The p value was calculated for each

metabolite, and those with a p value below 0.05 were considered to be

significant metabolites. These metabolites were then integrated into GP

models as metabolic markers.

Rice dataset

The rice datasets consisted of 210 RILs obtained from a cross

between two rice varieties (Zhenshan 97 and Minghui 63), along with

278 hybrids formed by random pairing of the 210 RILs (Hua et al.,

2003). The genomic data included 1619 bins identified from 270 820

SNPs by sequencing all 210 RILs (Yang et al., 2022). The metabolomic

data included 1000 metabolites, with 317 detected in germinated

seeds and the remaining 683 detected in flag leaves (Gong et al.,

2013). Four agronomic traits were analyzed: YIELD, TILLER, GRAIN,

and KGW.

The MM_GP model

We used two GS methods to demonstrate the effectiveness of MM_GP in

maize and rice. The first method, GBLUP, used kinship matrices to repre-

sent the genetic relationships among individuals based on a mixed linear

model (VanRaden, 2008). The second method, XGBoost, is a machine-

learning algorithm capable of capturing non-linear relationships without

requiring prior information from potential genetic models (Chen and

http://www.hmdb.ca/
http://metlin.scripps.edu
http://www.massbank.jp/
http://www.massbank.jp/
http://www.lipidmaps.org
https://www.mzcloud.org
https://www.mzcloud.org


Metabolic marker-assisted genomic prediction Plant Communications
Guestrin, 2016). Detailed information about model structure and

optimization is provided below.

GBLUP for MM_GP

The GBLUP model for MM_GP is described as

y = Xb+ZGgG +AMgMa +DMgMd + ε (Equation 1)

where y is an n31 vector of phenotypic observations of hybrids; X is an n3

p design matrix for the fixed effect; b is the fixed effect; ZG is an n3 g ge-

notypematrix of the hybrids; AM andDM are n3m additive and dominance

coding matrices of metabolites, respectively, where AM = 1
2 ðM +FÞ and

DM = 1
2 jM � Fj; andM and F represent the matrices of metabolic marker

concentrations for male and female parents, respectively. The details of

the coding system were described in our previous research (Xu et al.,

2021c). gG, gMa and gMd were assumed to follow the normal

distributions gG � N
�
0; 1gf

2
G

�
, gMa � N

�
0; 1

mf
2
Ma

�
, and gMd � N

�
0; 1

mf
2
Md

�
,

respectively, where f2
G, f

2
Ma, and f2

Md are the corresponding polygenic

variances, g and m are the numbers of SNPs and metabolites, and ε is

an n31 vector of residual errors with a normal distribution Nð0;s2
ε
Þ. The

expectation of y is EðyÞ = Xb, and the variance–covariance matrix is

varðyÞ = V =
1

g
ZGZ

T
Gf

2
G +

1

m
AMA

T
Mf

2
Ma +

1

m
DMD

T
Mf

2
Md = KGf

2
G

+ KMaf
2
Ma + KMdf

2
Md + Is2

ε
(Equation 2)

where KG, KMa, and KMd are kinship matrices for random effects gG, gMa,

and gMd , respectively. The variance components were estimated using

the restricted maximum likelihood (Patterson and Thompson, 1971; Yin

et al., 2023).

After parameters are estimated from the training set, they can be used to

predict the phenotypic values of the test set. Assuming y1 is an n13 1 vec-

tor of the phenotypic values in the training set, y2 is an n23 1 vector of the

phenotypic values in the testing set, and n1 + n2 = n, where n is the size of

the entire sample, Formula 1 can be rewritten as

�
y1
y2

�
=

�
X1b

X2b

�
+

�
ZG1

gG

ZG2
gG

�
+

�
AM1

gMa

AM2
gMa

�
+

�
DM1

gMd

DM2
gMd

�
+

�
ε1

ε2

�
(Equation 3)

The expectation and variance–covariance of y can be modified as:

E

�
y1
y2

�
=

�
X1b

X2b

�
(Equation 4)

var

�
y1
y2

�
=

�
V11 V12

V21 V22

�
=

�
KG11

KG12

KG21
KG22

�
f2
G +

�
KMa11 KMa12

KMa21 KMa22

�
f2
Ma

+

�
KMd11 KMd12

KMd21 KMd22

�
f2
Md +

�
In1 0
0 In2

�
s2
ε

(Equation 5)

where the kinship matrices have been partitioned into 2 3 2 blocks. After

the parameter vector q = ½b;f2
G;f

2
Ma;f

2
Md;s

2
ε
� is estimated, the predicted

phenotypic values of the testing set can be obtained from the following

formula:

by2 = E
�
y2jy1Þ = X2

bb +
�
KG21

f2
G + KMa21f

2
Ma + KMd21f

2
Md

�
V�1
11 ðy1 � X1

bbÞ
(Equation 6)

XGBoost for MM_GP

XGBoost, proposed by Chen and Guestrin (2016), is an effective and

flexible ensemble machine learning algorithm (Ma et al., 2022). The

process of XGBoost for MM_GP involved training on a dataset D =
fðXi ; yiÞgðjDj = n;Xi ˛Rq; yi ˛RÞ with n samples and q features, where

yi represents the phenotypic observation value of the i-th hybrid and

Xi = ½ZGi
AMi

DMi
� is a 13q feature vector comprising the genotype

vector (ZGi
), metabolite additive coding vector (AMi

), and metabolite domi-

nance coding vector (DMi
) of the i-th hybrid. Initially, XGBoost generates

predicted values by training a tree on the samples, and subsequent

trees are built using the residual errors of the previous tree (Yan et al.,

2021). After K iterations, the predicted phenotypic value (byi ) can be ex-

pressed as

byi =
XK
k = 1

fkðXiÞ; fk ˛F (Equation 7)

where fkðXiÞ represents the prediction value of the k-th decision tree

for the i-th individual. The tree-structured Parzen estimator, a Bayesian

optimization algorithm, was used to explore the hyperparameter

space and optimize the hyperparameters of each trait by minimizing

the root-mean-square error (Ozaki et al., 2020). The analysis

codes are available on GitHub (https://github.com/171702120/

yangxu89-GS2024).

Assessing the predictive abilities of prediction models

The predictive abilities of different prediction models in maize and rice da-

tasets were evaluated using 10-fold cross-validation. This procedure

involved randomly dividing the sample into 10 subsets, with nine used

for parameter estimation and one for prediction. This process was

repeated until all subsets were predicted. Predictive ability was calculated

as the determination coefficient between the observed and predicted

phenotypic values. To reduce random errors from sample partitioning,

the cross-validation procedure was iterated 20 times, and the average

of these iterations was calculated to determine the final predictive ability

of the models.
FUNDING
This work was supported by grants from the National Key Research

and Development Program of China (2023YFD1202200), the National

Natural Science Foundation of China (32170636, 32061143030,

32261143462, 32100448, 32070558), the Seed Industry Revitalization

Project of Jiangsu Province (JBGS[2021]009), the Key Research and

Development Program of Jiangsu Province (BE2022343, BE2023336),

Jiangsu Province Agricultural Science and Technology Independent

Innovation (CX(21)1003), the Shenzhen Science and Technology

Program (KQTD202303010928390070), the Hebei Science and Tech-

nology Program (215A7612D), the Shanghai Agricultural Science and

Technology Innovation Program (T2023204), the Provincial Technology

Innovation Program of Shandong, China, Qing Lan Project of Jiangsu

Province, Yangzhou University High-end Talent Support Program,

and the Priority Academic Program Development of Jiangsu Higher

Education Institutions (PAPD).
ACKNOWLEDGMENTS
No conflict of interest is declared.
AUTHOR CONTRIBUTIONS
C.X., Yang Xu, and Yunbi Xu designed the research.W.Y., J.Q., K.Z., G.Y.,

Y.Z., Y.L., R.C., and T.T. performed the research. Xin Wang, Y.J., Xinyi

Wang, S.H., and P.L. analyzed the data. Y.X. and W.Y. wrote the paper.

Y.X., Z.Y., and C.X. revised themanuscript. All authors read and approved

the final manuscript.
SUPPLEMENTAL INFORMATION
Supplemental information is available at Plant Communications Online.
Plant Communications 6, 101199, March 10 2025 11

https://github.com/171702120/%20yangxu89-GS2024
https://github.com/171702120/%20yangxu89-GS2024


Plant Communications Metabolic marker-assisted genomic prediction
Received: June 25, 2024

Revised: October 31, 2024

Accepted: November 26, 2024

Published: November 29, 2024

REFERENCES
Chen, T., and Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting

System. In Proceedings of the 22nd ACM SIGKDD International

Conference on Knowledge Discovery and Data Mining (ACM),

pp. 785–794. https://doi.org/10.1145/2939672.2939785.

Chen, W., Wang, W., Peng, M., Gong, L., Gao, Y., Wan, J., Wang, S.,

Shi, L., Zhou, B., Li, Z., et al. (2016). Comparative and parallel

genome-wide association studies for metabolic and agronomic

traits in cereals. Nat. Commun. 7:12767. https://doi.org/10.1038/

ncomms12767.

Cho, Y., Kodjoe, E., Puppala, N., and Wood, A. (2011). Reduced

trigonelline accumulation due to rhizobial activity improves grain yield

in peanut (Arachis hypogaea L.). Acta Agric. Scand. Sect. B Soil

Plant Sci 61:395–403. https://doi.org/10.1080/09064710.2010.

494614.

Crossa, J., Pérez-Rodrı́guez, P., Cuevas, J., Montesinos-López, O.,

Jarquı́n, D., De Los Campos, G., Burgueño, J., González-

Camacho, J.M., Pérez-Elizalde, S., Beyene, Y., et al. (2017).

Genomic selection in plant breeding: methods, models, and

perspectives. Trends Plant Sci. 22:961–975. https://doi.org/10.1016/

j.tplants.2017.08.011.

Cui, Y., Li, R., Li, G., Zhang, F., Zhu, T., Zhang, Q., Ali, J., Li, Z., and Xu,

S. (2020). Hybrid breeding of rice via genomic selection. Plant

Biotechnol. J. 18:57–67. https://doi.org/10.1111/pbi.13170.

Dafre, A.L., Schmitz, A.E., and Maher, P. (2017). Methylglyoxal-induced

AMPK activation leads to autophagic degradation of thioredoxin 1 and

glyoxalase 2 in HT22 nerve cells. Free Radic. Biol. Med. 108:270–279.

https://doi.org/10.1016/j.freeradbiomed.2017.03.028.

de Abreu e Lima, F., Westhues, M., Cuadros-Inostroza, Á., Willmitzer,

L., Melchinger, A.E., and Nikoloski, Z. (2017). Metabolic robustness

in young roots underpins a predictive model of maize hybrid

performance in the field. Plant J. 90:319–329. https://doi.org/10.

1111/tpj.13495.

de Las Heras-Saldana, S., Lopez, B.I., Moghaddar, N., Park, W., Park,

J.-e., Chung, K.Y., Lim, D., Lee, S.H., Shin, D., and van Der Werf,

J.H. (2020). Use of gene expression and whole-genome sequence

information to improve the accuracy of genomic prediction for

carcass traits in Hanwoo cattle. Genet. Sel. Evol. 52:1–16. https://

doi.org/10.1186/s12711-020-00574-2.

Duvick, D.N. (2001). Biotechnology in the 1930s: the development of

hybrid maize. Nat. Rev. Genet. 2:69–74. https://doi.org/10.1038/

35047587.

Feher, K., Lisec, J., R€omisch-Margl, L., Selbig, J., Gierl, A., Piepho, H.-

P., Nikoloski, Z., and Willmitzer, L. (2014). Deducing hybrid

performance from parental metabolic profiles of young primary roots

of maize using a multivariate diallel approach. PLoS One 9:e85435.

https://doi.org/10.1371/journal.pone.0085435.

Fernie, A.R., and Schauer, N. (2009). Metabolomics-assisted breeding: a

viable option for crop improvement? Trends Genet. 25:39–48. https://

doi.org/10.1016/j.tig.2008.10.010.

Gamboa-Becerra, R., Hernández-Hernández, M.C., González-Rı́os,

Ó., Suárez-Quiroz, M.L., Gálvez-Ponce, E., Ordaz-Ortiz, J.J., and

Winkler, R. (2019). Metabolomic markers for the early selection of

coffea canephora plants with desirable cup quality traits. Metabolites

9:214. https://doi.org/10.3390/metabo9100214.

Gebreyesus, G., Bovenhuis, H., Lund, M.S., Poulsen, N.A., Sun, D.,

and Buitenhuis, B. (2019). Reliability of genomic prediction for milk

fatty acid composition using a multi-population reference and
12 Plant Communications 6, 101199, March 10 2025
incorporating GWAS results. Genet. Sel. Evol. 51:16. https://doi.org/

10.1186/s12711-019-0460-z.

Gong, L., Chen,W., Gao, Y., Liu, X., Zhang, H., Xu, C., Yu, S., Zhang, Q.,

and Luo, J. (2013). Genetic analysis of the metabolome exemplified

using a rice population. Proc. Natl. Acad. Sci. USA 110:20320–

20325. https://doi.org/10.1073/pnas.1319681110.

Guo, Z., Magwire, M.M., Basten, C.J., Xu, Z., and Wang, D. (2016).

Evaluation of the utility of gene expression and metabolic information

for genomic prediction in maize. Theor. Appl. Genet. 129:2413–2427.

https://doi.org/10.1007/s00122-016-2780-5.

Hickey, J.M., Dreisigacker, S., Crossa, J., Hearne, S., Babu, R.,

Prasanna, B.M., Grondona, M., Zambelli, A., Windhausen, V.S.,

Mathews, K., et al. (2014). Evaluation of genomic selection training

population designs and genotyping strategies in plant breeding

programs using simulation. Crop Sci. 54:1476–1488. https://doi.org/

10.2135/cropsci2013.03.0195.

Hu, H., Campbell, M.T., Yeats, T.H., Zheng, X., Runcie, D.E.,

Covarrubias-Pazaran, G., Broeckling, C., Yao, L., Caffe-Treml,

M., Gutiérrez, L.a., et al. (2021). Multi-omics prediction of oat

agronomic and seed nutritional traits across environments and in

distantly related populations. Theor. Appl. Genet. 134:4043–4054.

https://doi.org/10.1007/s00122-021-03946-4.

Hu, X., Xie, W., Wu, C., and Xu, S. (2019). A directed learning strategy

integrating multiple omic data improves genomic prediction. Plant

Biotechnol. J. 17:2011–2020. https://doi.org/10.1111/pbi.13117.

Hua, J., Xing, Y., Wu, W., Xu, C., Sun, X., Yu, S., and Zhang, Q. (2003).

Single-locus heterotic effects and dominance by dominance

interactions can adequately explain the genetic basis of heterosis in

an elite rice hybrid. Proc. Natl. Acad. Sci. USA 100:2574–2579.

https://doi.org/10.1073/pnas.0437907100.

Jeong, S., Kim, J.-Y., and Kim, N. (2020). GMStool: GWAS-based

marker selection tool for genomic prediction from genomic data. Sci.

Rep. 10:19653. https://doi.org/10.1038/s41598-020-76759-y.

Li, C., Li, J., Wang, H., Zhang, R., An, X., Yuan, C., Guo, T., and Yue, Y.

(2023). Genomic Selection for Live Weight in the 14th Month in Alpine

Merino Sheep Combining GWAS Information. Animals. 13:3516.

https://doi.org/10.3390/ani13223516.

Li, K., Wang, D., Gong, L., Lyu, Y., Guo, H., Chen, W., Jin, C., Liu, X.,

Fang, C., and Luo, J. (2019). Comparative analysis of metabolome

of rice seeds at three developmental stages using a recombinant

inbred line population. Plant J. 100:908–922. https://doi.org/10.1111/

tpj.14482.

Li, S., Yu, J., Kang, H., and Liu, J. (2022). Genomic Selection in Chinese

Holsteins Using Regularized Regression Models for Feature Selection

ofWhole Genome Sequencing Data. Animals. 12:2419. https://doi.org/

10.3390/ani12182419.

Liu, H., Su, Y., Fan, Y., Zuo, D., Xu, J., Liu, Y., Mei, X., Huang, H., Yang,

M., and Zhu, S. (2023b). Exogenous leucine alleviates heat stress and

improves saponin synthesis in Panax notoginseng by improving

antioxidant capacity and maintaining metabolic homeostasis. Front.

Plant Sci. 14:1175878. https://doi.org/10.3389/fpls.2023.1175878.

Liu, X., Huang, M., Fan, B., Buckler, E.S., and Zhang, Z. (2016). Iterative

Usage of Fixed and Random Effect Models for Powerful and Efficient

Genome-Wide Association Studies. PLoS Genet. 12:e1005767.

https://doi.org/10.1371/journal.pgen.1005767.

Liu, Y., Zhang, Y., Zhou, F., Yao, Z., Zhan, Y., Fan, Z., Meng, X., Zhang,

Z., Liu, L., Yang, J., et al. (2023a). Increased Accuracy of Genomic

Prediction Using Preselected SNPs from GWAS with Imputed Whole-

Genome Sequence Data in Pigs. Animals. 13:3871. https://doi.org/

10.3390/ani13243871.

Long, L., Xin, Z., Hyun-Dong, S., R, C.R., Jianghua, L., Guocheng, D.,

and Jian, C. (2015). Improved production of propionic acid in

https://doi.org/10.1145/2939672.2939785
https://doi.org/10.1038/ncomms12767
https://doi.org/10.1038/ncomms12767
https://doi.org/10.1080/09064710.2010.494614
https://doi.org/10.1080/09064710.2010.494614
https://doi.org/10.1016/j.tplants.2017.08.011
https://doi.org/10.1016/j.tplants.2017.08.011
https://doi.org/10.1111/pbi.13170
https://doi.org/10.1016/j.freeradbiomed.2017.03.028
https://doi.org/10.1111/tpj.13495
https://doi.org/10.1111/tpj.13495
https://doi.org/10.1186/s12711-020-00574-2
https://doi.org/10.1186/s12711-020-00574-2
https://doi.org/10.1038/35047587
https://doi.org/10.1038/35047587
https://doi.org/10.1371/journal.pone.0085435
https://doi.org/10.1016/j.tig.2008.10.010
https://doi.org/10.1016/j.tig.2008.10.010
https://doi.org/10.3390/metabo9100214
https://doi.org/10.1186/s12711-019-<?thyc=10?>0460-z<?thyc?>
https://doi.org/10.1186/s12711-019-<?thyc=10?>0460-z<?thyc?>
https://doi.org/10.1073/pnas.1319681110
https://doi.org/10.1007/s00122-016-2780-5
https://doi.org/10.2135/cropsci2013.03.0195
https://doi.org/10.2135/cropsci2013.03.0195
https://doi.org/10.1007/s00122-021-03946-4
https://doi.org/10.1111/pbi.13117
https://doi.org/10.1073/pnas.0437907100
https://doi.org/10.1038/s41598-020-<?thyc=10?>76759-y<?thyc?>
https://doi.org/10.3390/ani13223516
https://doi.org/10.1111/tpj.14482
https://doi.org/10.1111/tpj.14482
https://doi.org/10.3390/ani12182419
https://doi.org/10.3390/ani12182419
https://doi.org/10.3389/fpls.2023.1175878
https://doi.org/10.1371/journal.pgen.1005767
https://doi.org/10.3390/ani13243871
https://doi.org/10.3390/ani13243871


Metabolic marker-assisted genomic prediction Plant Communications
Propionibacterium jensenii via combinational overexpression of

glycerol dehydrogenase and malate dehydrogenase from Klebsiella

pneumoniae. Appl. Environ. Microbiol. 81:2256–2264. https://doi.org/

10.1128/AEM.03572-14.

Ma, B., Yan, G., Chai, B., and Hou, X. (2022). XGBLC: an improved

survival prediction model based on XGBoost. Bioinformatics

38:410–418. https://doi.org/10.1093/bioinformatics/btab675.

Mazzuca, S., Bitonti, M.B., Innocenti, A.M., and Francis, D. (2000).

Inactivation of DNA replication origins by the cell cycle regulator,

trigonelline, in root meristems of Lactuca sativa. Planta 211:127–132.

https://doi.org/10.1007/s004250000272.

Meuwissen, T.H., Hayes, B.J., and Goddard, M.E. (2001). Prediction of

total genetic value using genome-wide dense marker maps. Genetics

157:1819–1829. https://doi.org/10.1093/genetics/157.4.1819.

Ozaki, Y., Tanigaki, Y., Watanabe, S., and Onishi, M. (2020).

Multiobjective tree-structured parzen estimator for computationally

expensive optimization problems. Proceedings of the 2020 Genetic

and Evolutionary Computation Conference, 533–541. https://doi.org/

10.1145/3377930.3389817.

Patterson, H.D., and Thompson, R. (1971). Recovery of inter-block

information when block sizes are unequal. Biometrika 58:545–554.

https://doi.org/10.1093/biomet/58.3.545.

Pires, M.V., Pereira Júnior, A.A., Medeiros, D.B., Daloso, D.M., Pham,

P.A., Barros, K.A., Engqvist, M.K.M., Florian, A., Krahnert, I.,

Maurino, V.G., et al. (2016). The influence of alternative pathways of

respiration that use branched-chain amino acids following water

shortage in Arabidopsis. Plant Cell Environ. 39:1304–1319. https://

doi.org/10.1111/pce.12682.

Resende, R.T., Hickey, L., Amaral, C.H., Peixoto, L.L., Marcatti, G.E.,

and Xu, Y. (2024). Satellite-enabled enviromics to enhance crop

improvement. Mol. Plant 17:848–866. https://doi.org/10.1016/j.molp.

2024.04.005.

Riedelsheimer, C., Czedik-Eysenberg, A., Grieder, C., Lisec, J.,

Technow, F., Sulpice, R., Altmann, T., Stitt, M., Willmitzer, L., and

Melchinger, A.E. (2012). Genomic and metabolic prediction of

complex heterotic traits in hybrid maize. Nat. Genet. 44:217–220.

https://doi.org/10.1038/ng.1033.

Schrag, T.A., Westhues, M., Schipprack, W., Seifert, F., Thiemann, A.,

Scholten, S., and Melchinger, A.E. (2018). Beyond genomic

prediction: combining different types of omics data can improve

prediction of hybrid performance in maize. Genetics 208:1373–1385.

https://doi.org/10.1534/genetics.117.300374.

Schw€orer, S., Pavlova, N.N., Cimino, F.V., King, B., Cai, X., Sizemore,

G.M., and Thompson, C.B. (2021). Fibroblast pyruvate carboxylase is

required for collagen production in the tumour microenvironment. Nat.

Metab. 3:1484–1499. https://doi.org/10.1038/s42255-021-00480-x.

Shahsavari, M., Mohammadi, V., Alizadeh, B., and Alizadeh, H. (2023).

Application of machine learning algorithms and feature selection in

rapeseed (Brassica napus L.) breeding for seed yield. Plant Methods

19:57. https://doi.org/10.1186/s13007-023-01035-9.

Shan, N., Zhang, Y., Guo, Y., Zhang, W., Nie, J., Fernie, A.R., and Sui,

X. (2023). Cucumber malate decarboxylase, CsNADP-ME2, functions

in the balance of carbon and amino acid metabolism in fruit. Hortic.

Res. 10:uhad216. https://doi.org/10.1093/hr/uhad216.

Shi, T., Zhu, A., Jia, J., Hu, X., Chen, J., Liu, W., Ren, X., Sun, D., Fernie,

A.R., Cui, F., et al. (2020). Metabolomics analysis and metabolite-

agronomic trait associations using kernels of wheat (Triticum

aestivum) recombinant inbred lines. Plant J. 103:279–292. https://

doi.org/10.1111/tpj.14727.

Spindel, J.E., Begum, H., Akdemir, D., Collard, B., Redoña, E.,

Jannink, J.L., and McCouch, S. (2016). Genome-wide prediction

models that incorporate de novo GWAS are a powerful new tool for
tropical rice improvement. Heredity 116:395–408. https://doi.org/10.

1038/hdy.2015.113.

Saveljeva, S., Sewell, G.W., Ramshorn, K., Cader, M.Z., West, J.A.,

Clare, S., Haag, L.M., de Almeida Rodrigues, R.P., Unger, L.W.,

Iglesias-Romero, A.B., et al. (2022). A purine metabolic checkpoint

that prevents autoimmunity and autoinflammation. Cell Metabol.

34:106–124.e110. https://doi.org/10.1016/j.cmet.2021.12.009.

Technow, F., Schrag, T.A., Schipprack, W., Bauer, E., Simianer, H.,

and Melchinger, A.E. (2014). Genome properties and prospects of

genomic prediction of hybrid performance in a breeding program of

maize. Genetics 197:1343–1355. https://doi.org/10.1534/genetics.

114.165860.

Tibshirani, R. (1997). The lasso method for variable selection in the Cox

model. Stat. Med. 16:385–395. https://doi.org/10.1002/(sici)1097-

0258(19970228).

Tu, J., Zhang, G., Datta, K., Xu, C., He, Y., Zhang, Q., Khush, G.S., and

Datta, S.K. (2000). Field performance of transgenic elite commercial

hybrid rice expressing Bacillus thuringiensis d-endotoxin. Nat.

Biotechnol. 18:1101–1104. https://doi.org/10.1038/80310.

VanRaden, P.M. (2008). Efficient methods to compute genomic

predictions. J. Dairy Sci. 91:4414–4423. https://doi.org/10.3168/jds.

2007-0980.

Wang, H., Tang, X., Yang, X., Fan, Y., Xu, Y., Li, P., Xu, C., and Yang, Z.

(2021a). Exploiting natural variation in crown root traits via genome-

wide association studies in maize. BMC Plant Biol. 21:346. https://

doi.org/10.1186/s12870-021-03127-x.

Wang, L., and Michoel, T. (2017). Controlling false discoveries in

Bayesian gene networks with lasso regression p-values. Preprint at

arXiv. https://arxiv.org/abs/1701.07011.

Wang, S., Xu, Y., Qu, H., Cui, Y., Li, R., Chater, J.M., Yu, L., Zhou, R.,

Ma, R., Huang, Y., et al. (2021b). Boosting predictabilities of

agronomic traits in rice using bivariate genomic selection. Briefings

Bioinf. 22:bbaa103. https://doi.org/10.1093/bib/bbaa103.

Washburn, J.D., Burch, M.B., and Franco, J.A.V. (2020). Predictive

breeding for maize: Making use of molecular phenotypes, machine

learning, and physiological crop models. Crop Sci. 60:622–638.

https://doi.org/10.1002/csc2.20052.

Wei, J., Wang, A., Li, R., Qu, H., and Jia, Z. (2018). Metabolome-wide

association studies for agronomic traits of rice. Heredity

120:342–355. https://doi.org/10.1038/s41437-017-0032-3.

Wen, W., Li, D., Li, X., Gao, Y., Li, W., Li, H., Liu, J., Liu, H., Chen, W.,

Luo, J., et al. (2014). Metabolome-based genome-wide association

study of maize kernel leads to novel biochemical insights. Nat.

Commun. 5:3438. https://doi.org/10.1038/ncomms4438.

Westhues, M., Schrag, T.A., Heuer, C., Thaller, G., Utz, H.F.,

Schipprack, W., Thiemann, A., Seifert, F., Ehret, A., Schlereth, A.,

et al. (2017). Omics-based hybrid prediction in maize. Theor. Appl.

Genet. 130:1927–1939. https://doi.org/10.1007/s00122-017-2934-0.

Worley, B., and Powers, R. (2013). Multivariate analysis inmetabolomics.

Curr. Metabolomics 1:92–107. https://doi.org/10.2174/2213235X1

1301010092.

Wu, P.-Y., Stich, B., Weisweiler, M., Shrestha, A., Erban, A., Westhoff,

P., and Inghelandt, D.V. (2022). Improvement of prediction ability by

integrating multi-omic datasets in barley. BMC Genom. 23:200.

https://doi.org/10.1186/s12864-022-08337-7.

Xu, S., Zhu, D., and Zhang, Q. (2014). Predicting hybrid performance in

rice using genomic best linear unbiased prediction. Proc. Natl.

Acad. Sci. USA 111:12456–12461. https://doi.org/10.1073/pnas.

1413750111.

Xu, S., Xu, Y., Gong, L., and Zhang, Q. (2016). Metabolomic prediction of

yield in hybrid rice. Plant J. 88:219–227. https://doi.org/10.1111/tpj.

13242.
Plant Communications 6, 101199, March 10 2025 13

https://doi.org/10.1128/AEM.03572-14
https://doi.org/10.1128/AEM.03572-14
https://doi.org/10.1093/bioinformatics/btab675
https://doi.org/10.1007/s004250000272
https://doi.org/10.1093/genetics/157.4.1819
https://doi.org/10.1145/3377930.3389817
https://doi.org/10.1145/3377930.3389817
https://doi.org/10.1093/biomet/58.3.545
https://doi.org/10.1111/pce.12682
https://doi.org/10.1111/pce.12682
https://doi.org/10.1016/j.molp.2024.04.005
https://doi.org/10.1016/j.molp.2024.04.005
https://doi.org/10.1038/ng.1033
https://doi.org/10.1534/genetics.117.300374
https://doi.org/10.1038/s42255-021-<?thyc=10?>00480-x<?thyc?>
https://doi.org/10.1186/s13007-023-01035-9
https://doi.org/10.1093/hr/uhad216
https://doi.org/10.1111/tpj.14727
https://doi.org/10.1111/tpj.14727
https://doi.org/10.1038/hdy.2015.113
https://doi.org/10.1038/hdy.2015.113
https://doi.org/10.1016/j.cmet.2021.12.009
https://doi.org/10.1534/genetics.114.165860
https://doi.org/10.1534/genetics.114.165860
https://doi.org/10.1002/(sici)1097-0258(19970228)
https://doi.org/10.1002/(sici)1097-0258(19970228)
https://doi.org/10.1038/80310
https://doi.org/10.3168/jds.2007-0980
https://doi.org/10.3168/jds.2007-0980
https://doi.org/10.1186/s12870-021-<?thyc=10?>03127-x<?thyc?>
https://doi.org/10.1186/s12870-021-<?thyc=10?>03127-x<?thyc?>
https://arxiv.org/abs/1701.07011
https://doi.org/10.1093/bib/bbaa103
https://doi.org/10.1002/csc2.20052
https://doi.org/10.1038/s41437-017-0032-3
https://doi.org/10.1038/ncomms4438
https://doi.org/10.1007/s00122-017-2934-0
https://doi.org/10.2174/2213235X1<?show [?tjl=20mm]&tjlpc;[?tjl]?>1301010092
https://doi.org/10.2174/2213235X1<?show [?tjl=20mm]&tjlpc;[?tjl]?>1301010092
https://doi.org/10.1186/s12864-022-08337-7
https://doi.org/10.1073/pnas.1413750111
https://doi.org/10.1073/pnas.1413750111
https://doi.org/10.1111/tpj.13242
https://doi.org/10.1111/tpj.13242


Plant Communications Metabolic marker-assisted genomic prediction
Xu, Y., Xu, C., and Xu, S. (2017). Prediction and association mapping of

agronomic traits in maize using multiple omic data. Heredity

119:174–184. https://doi.org/10.1038/hdy.2017.27.

Xu, Y., Ma, Y., Wang, X., Li, C., Zhang, X., Li, P., Yang, Z., and Xu, C.

(2021b). Kernel metabolites depict the diversity of relationship

between maize hybrids and their parental lines. Crops J. 9:181–191.

https://doi.org/10.1016/j.cj.2020.05.009.

Xu, Y., Zhao, Y.,Wang, X.,Ma, Y., Li, P., Yang, Z., Zhang, X., Xu, C., and

Xu, S. (2021c). Incorporation of parental phenotypic data into multi-

omic models improves prediction of yield-related traits in hybrid rice.

Plant Biotechnol. J. 19:261–272. https://doi.org/10.1111/pbi.13458.

Xu, Y., Zhang, X., Li, H., Zheng, H., Zhang, J., Olsen, M.S., Varshney,

R.K., Prasanna, B.M., and Qian, Q. (2022). Smart breeding driven

by big data, artificial intelligence, and integrated genomic-enviromic

prediction. Mol. Plant 15:1664–1695. https://doi.org/10.1016/j.molp.

2022.09.001.

Xu, Y., Liu, X., Fu, J., Wang, H., Wang, J., Huang, C., Prasanna, B.M.,

Olsen, M.S., Wang, G., and Zhang, A. (2020). Enhancing genetic gain

through genomic selection: from livestock to plants. Plant Commun.

1:100005. https://doi.org/10.1016/j.xplc.2019.100005.

Xu, Y., Ma, K., Zhao, Y., Wang, X., Zhou, K., Yu, G., Li, C., Li, P., Yang,

Z., Xu, C., et al. (2021a). Genomic selection: A breakthrough

technology in rice breeding. Crops J. 9:669–677. https://doi.org/10.

1016/j.cj.2021.03.008.

Yan, J., Xu, Y., Cheng, Q., Jiang, S., Wang, Q., Xiao, Y., Ma, C., Yan, J.,

and Wang, X. (2021). LightGBM: accelerated genomically designed

crop breeding through ensemble learning. Genome Biol. 22:271.

https://doi.org/10.1186/s13059-021-02492-y.

Yang, W., Guo, T., Luo, J., Zhang, R., Zhao, J., Warburton, M.L., Xiao,

Y., and Yan, J. (2022). Target-oriented prioritization: targeted selection

strategy by integrating organismal and molecular traits through
14 Plant Communications 6, 101199, March 10 2025
predictive analytics in breeding. Genome Biol. 23:80. https://doi.org/

10.1186/s13059-022-02650-w.

Ye, S., Li, J., and Zhang, Z. (2020). Multi-omics-data-assisted genomic

feature markers preselection improves the accuracy of genomic

prediction. J. Anim. Sci. Biotechnol. 11:109. https://doi.org/10.1186/

s40104-020-00515-5.

Yin, B., Jia, J., Sun, X., Hu, X., Ao, M., Liu, W., Tian, Z., Liu, H., Li, D.,

Tian, W., et al. (2024). Dynamic metabolite QTL analyses provide

novel biochemical insights into kernel development and nutritional

quality improvement in common wheat. Plant Commun. 5:100792.

https://doi.org/10.1016/j.xplc.2024.100792.

Yin, L., Zhang, H., Tang, Z., Yin, D., Fu, Y., Yuan, X., Li, X., Liu, X., and

Zhao, S. (2023). HIBLUP: an integration of statistical models on the

BLUP framework for efficient genetic evaluation using big genomic

data. Nucleic Acids Res. 51:3501–3512. https://doi.org/10.1093/nar/

gkad074.

Yu, P., Ye, C., Li, L., Yin, H., Zhao, J., Wang, Y., Zhang, Z., Li, W., Long,

Y., Hu, X., et al. (2022). Genome-wide association study and genomic

prediction for yield and grain quality traits of hybrid rice. Mol. Breed.

42:16. https://doi.org/10.1007/s11032-022-01289-6.

Zhang, Y., Zhang,M., Ye, J., Xu, Q., Feng, Y., Xu, S., Hu, D.,Wei, X., Hu,

P., and Yang, Y. (2023). Integrating genome-wide association study

into genomic selection for the prediction of agronomic traits in rice

(Oryza sativa L.). Mol. Breed. 43:81. https://doi.org/10.1007/s11032-

023-01423-y.

Zhang, Z., Ober, U., Erbe, M., Zhang, H., Gao, N., He, J., Li, J., and

Simianer, H. (2014). Improving the accuracy of whole genome

prediction for complex traits using the results of genome wide

association studies. PLoS One 9:e93017. https://doi.org/10.1371/

journal.pone.0093017.

Zhao, Y., Mette, M.F., and Reif, J.C. (2015). Genomic selection in hybrid

breeding. Plant Breed. 134:1–10. https://doi.org/10.1111/pbr.12231.

https://doi.org/10.1038/hdy.2017.27
https://doi.org/10.1016/j.cj.2020.05.009
https://doi.org/10.1111/pbi.13458
https://doi.org/10.1016/j.molp.2022.09.001
https://doi.org/10.1016/j.molp.2022.09.001
https://doi.org/10.1016/j.xplc.2019.100005
https://doi.org/10.1016/j.cj.2021.03.008
https://doi.org/10.1016/j.cj.2021.03.008
https://doi.org/10.1186/s13059-021-<?thyc=10?>02492-y<?thyc?>
https://doi.org/10.1186/s13059-022-<?thyc=10?>02650-w<?thyc?>
https://doi.org/10.1186/s13059-022-<?thyc=10?>02650-w<?thyc?>
https://doi.org/10.1186/s40104-020-00515-5
https://doi.org/10.1186/s40104-020-00515-5
https://doi.org/10.1016/j.xplc.2024.100792
https://doi.org/10.1093/nar/gkad074
https://doi.org/10.1093/nar/gkad074
https://doi.org/10.1007/s11032-022-01289-6
https://doi.org/10.1007/s11032-023-<?thyc=10?>01423-y<?thyc?>
https://doi.org/10.1007/s11032-023-<?thyc=10?>01423-y<?thyc?>
https://doi.org/10.1371/journal.pone.0093017
https://doi.org/10.1371/journal.pone.0093017
https://doi.org/10.1111/pbr.12231

	Metabolic marker-assisted genomic prediction improves hybrid breeding
	Introduction
	Results
	Metabolite profiling of seedling leaves in maize inbred lines
	Identification of metabolic markers that influence agronomic traits in maize
	Evaluation of MM_GP for hybrid prediction in maize
	Integration of shared significant metabolic markers in MM_GP
	Evaluation of MM_GP for hybrid prediction in rice
	Predicting untested crosses using MM_GP

	Discussion
	Methods
	Maize materials
	Metabolite analysis by LC–MS
	Statistical analysis of metabolomic data
	Rice dataset
	The MM_GP model
	GBLUP for MM_GP
	XGBoost for MM_GP

	Assessing the predictive abilities of prediction models

	Funding
	Acknowledgments
	Author contributions
	Supplemental information
	References