Plant Communications Resource article llll Metabolic marker-assisted genomic prediction improves hybrid breeding Yang Xu1,7, Wenyan Yang1,7, Jie Qiu2,7, Kai Zhou1, Guangning Yu1, Yuxiang Zhang1, XinWang1, Yuxin Jiao1, Xinyi Wang1, Shujun Hu1, Xuecai Zhang3, Pengcheng Li1, Yue Lu1, Rujia Chen1, Tianyun Tao1, Zefeng Yang1, Yunbi Xu4,5,6,* and Chenwu Xu1,* 1Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China 2Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai 200234, China 3International Maize and Wheat Improvement Center (CIMMYT), Mexico D.F. 06600, Mexico 4Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China 5BGI Bioverse, Shenzhen 518083, China 6MolBreeding Biotechnology Co., Ltd., Shijiazhuang 050035, China 7These authors contributed equally to this article. *Correspondence: Yunbi Xu (yunbi.xu@pku-iaas.edu.cn), Chenwu Xu (cwxu@yzu.edu.cn) https://doi.org/10.1016/j.xplc.2024.101199 ABSTRACT Hybrid breeding iswidely acknowledged as themost effectivemethod for increasing crop yield, particularly in maize and rice. However, a major challenge in hybrid breeding is the selection of desirable combinations from the vast pool of potential crosses. Genomic selection (GS) has emerged as a powerful tool to tackle this challenge, but its success in practical breeding depends on prediction accuracy. Several strategies have been explored to enhance prediction accuracy for complex traits, such as the incorporation of func- tionalmarkers andmulti-omics data.Metabolome-wide association studies (MWAS) help to identifymetab- olites that are closely linked to phenotypes, known as metabolic markers. However, the use of preselected metabolic markers from parental lines to predict hybrid performance has not yet been explored. In this study, we developed a novel approach called metabolic marker-assisted genomic prediction (MM_GP), which incorporates significant metabolites identified from MWAS into GS models to improve the accuracy of genomic hybrid prediction. In maize and rice hybrid populations, MM_GP outperformed genomic predic- tion (GP) for all traits, regardless of the method used (genomic best linear unbiased prediction or eXtreme gradient boosting). On average, MM_GP demonstrated 4.6% and 13.6% higher predictive abilities than GP for maize and rice, respectively. MM_GP could also match or even surpass the predictive ability of M_GP (integrated genomic-metabolomic prediction) for most traits. In maize, the integration of only six metabolic markers significantly associated with multiple traits resulted in 5.0% and 3.1% higher average predictive ability compared with GP and M_GP, respectively. With advances in high-throughput metabolomics tech- nologies and prediction models, this approach holds great promise for revolutionizing genomic hybrid breeding by enhancing its accuracy and efficiency. Keywords: genomic prediction, hybrid, metabolome-wide association studies, metabolic marker, predictive ability Xu Y., YangW., Qiu J., Zhou K., Yu G., Zhang Y., Wang X., Jiao Y., Wang X., Hu S., Zhang X., Li P., Lu Y., Chen R., Tao T., Yang Z., Xu Y., and Xu C. (2025). Metabolic marker-assisted genomic prediction improves hybrid breeding. Plant Comm. 6, 101199. INTRODUCTION Hybrid breeding has proved to be the most efficient approach for increasing yield potential in various crops, notably maize and rice Plant Communications 6, 101199, March 10 202 CAS Center for Excellence in Molecular Plant Sciences, Chinese This is an open access article under the (Tu et al., 2000; Duvick, 2001). However, selection of the optimum combinations from a wide range of potential crosses presents a great challenge in hybrid breeding. Genomic selection (GS) has emerged as a solution to this challenge, using genome-wide 5 ª 2024 The Authors. Published by Elsevier Inc. on behalf of Academy of Sciences, and Chinese Society for Plant Biology. CC BY license (http://creativecommons.org/licenses/by/4.0/). 1 mailto:yunbi.xu@pku-iaas.edu.cn mailto:cwxu@yzu.edu.cn https://doi.org/10.1016/j.xplc.2024.101199 http://creativecommons.org/licenses/by/4.0/ Plant Communications Metabolic marker-assisted genomic prediction markers to predict the genomic values of individuals before phe- notyping (Meuwissen et al., 2001; Hickey et al., 2014). Genomic hybrid breeding, a special form of GS, leverages markers derived from parental lines to predict hybrid performance, thereby reducing breeding cycles and enhancing genetic gain (Xu et al., 2014; Crossa et al., 2017; Cui et al., 2020). Several studies have confirmed the effectiveness of genomic hybrid breeding (Technow et al., 2014; Zhao et al., 2015; Yang et al., 2022). The success of GS in practical breeding largely depends on the accuracy of genomic prediction (GP) (Xu et al., 2021a). Despite the availability of whole-sequence information, GS may not fully capture the intricate interactions among genes and their downstream regulation, which are integral to the entire process linking genotype to phenotype (Westhues et al., 2017; Hu et al., 2019). For complex quantitative traits, particularly those heavily influenced by environmental factors, such as grain yield, there exists a bottleneck that hinders the improvement of prediction accuracy (Xu et al., 2020; Resende et al., 2024). With advances in high-throughput molecular biotechnology, it has become possible to predict phenotypes using metabolo- mic data. The metabolome serves as a link between genotype and phenotype, offering the potential to enhance predictive abilities compared with genomic data by shedding light on downstream interactions (Washburn et al., 2020). For example, the predictive ability of metabolomic data from parental lines to predict the yield of rice hybrids was nearly twice that of genomic data (Xu et al., 2016). Using 56 110 SNPs and 130 metabolites from 285 maize inbred lines and two testers, the general combining abilities of seven traits in maize were predicted, and the results indicated comparable predictive abilities between the two data types (Riedelsheimer et al., 2012). The integration of multi-omics data is increasingly being explored to further enhance prediction accuracy. The combination of genomic, metabolomic, and tran- scriptomic data can significantly improve predictive abilities for various agronomic traits across diverse plant species (Hu et al., 2021; Wu et al., 2022), highlighting the potential of integrating genomic and metabolomic data to enhance genomic prediction accuracy. The incorporation of prior or preselected biological information into GPmodels is another viable approach to enhance prediction accuracy. For instance, the integration of GWAS findings into genomic best linear unbiased prediction (GBLUP) resulted in a 4.8% improvement in the prediction of loin muscle area in pigs (Liu et al., 2023a). Similarly, the use of single-nucleotide polymorphisms (SNPs) preselected from whole-genome se- quencing (WGS) data on the basis of expression quantitative trait locus mapping of all genes led to better predictive abilities for startle responses in fruit flies compared with the use of WGS data alone (Ye et al., 2020). In rice, the GS + de novo GWAS strategy outperformed six other models in a tropical breeding population across several traits and environments (Spindel et al., 2016). Together, these studies suggest that the integration of prior or preselected biological information can further enhance the accuracy of GS. Previous studies have demonstrated the effectiveness of metabolome-wide association studies (MWAS) in identifying 2 Plant Communications 6, 101199, March 10 2025 metabolic markers, i.e. metabolites that are closely linked to phe- notypes (Gamboa-Becerra et al., 2019; Xu et al., 2021b). Because of the high dimensionality, noise, and variability in metabolomics data, the identification of metabolic markers is challenging. Current methods for the detection of metabolic markers include partial least-squares discriminant analysis, orthogonal partial least-squares discriminant analysis, artificial neural networks, support vectormachines, andothermultivariate analysismethods (Worley and Powers, 2013). In a study involving 368 maize inbred lines, 43 metabolites significantly associated with 100-kernel weight were identified using stepwise regression (Wen et al., 2014). Using an improved least absolute shrinkage and selection operator (LASSO)method, 15metabolites significantly associated with six agronomic traits were identified in 339 maize inbred lines (Xu et al., 2017). A simulation study indicated that the LASSO method had the highest power and lowest false-positive rate among four MWAS methods, detecting 25 metabolites signifi- cantly associated with yield-related traits in 533 rice varieties (Wei et al., 2018). These metabolic markers directly influence phenotypic traits, reflecting immediate physiological status and environmental interactions, and are thus expected to provide more accurate predictions. However, the integration of such preselectedbiological information intoGS remains tobe explored. In this study, we developed a novel approach called metabolic marker-assisted GP (MM_GP), which incorporates significant metabolites identified from parental lines by MWAS into GS models to improve the accuracy of hybrid prediction. The performance of MM_GP was evaluated using 425 maize hybrids derived from 205 inbred lines and 278 rice hybrids from 210 recombinant inbred lines (RILs). The proposed MM_GP approach offers a distinct advantage in refining GP, facilitating more precise and effective selection for desirable traits in crop hybrid breeding. RESULTS Metabolite profiling of seedling leaves in maize inbred lines Using a non-targeted liquid chromatography–mass spectrometry (LC–MS) method, 925 metabolites were identified from the seed- ling leaves of 205maize inbred lines, eachwith two biological rep- licates. After excluding metabolites with significantly different concentrations (p < 0.01) between replicates, 777metabolites re- mained. Among these metabolite features, 557 were annotated and classified into 11 categories (Figure 1A and Supplemental Table 1). The three most abundant categories were benzenoids (14.0%), organic oxygen compounds (13.6%), and organoheterocyclic compounds (13.5%). Levels of metabolite accumulation varied substantially among the inbred lines, with an average coefficient of variation (CV) of 72.8%. A majority of the metabolites (66.0%) exhibited a CV of >50%, particularly the benzenoids (Figure 1B and Supplemental Table 1). Identification of metabolic markers that influence agronomic traits in maize Using the LASSO method, 78 significant metabolites were identified in maize inbred lines by MWAS: 30, 28, 31, and 24 metabolites for ear weight (EW), ear grain weight (EGW), ear diameter (ED), and ear length (EL), respectively (Figure 2A and Figure 1. Metabolic profiling of 777metabolites from 205maize inbred lines. (A) Classification of 777 metabolites. (B) Distribution of the coefficients of variation (CVs) of 777 metabolites. Metabolic marker-assisted genomic prediction Plant Communications Supplemental Table 2). Forty-seven of the identified metabolites were annotated and classified into 10 categories, with benzenoids (17.0%), organic oxygen compounds (14.8%), and phenylpropanoids and polyketides (14.8%) being the most numerous. In addition, 28, six, and one metabolites showed significant associations with two, three, and four traits, respectively (Supplemental Table 3). For instance, metabolite m863 (salicylic acid) exhibited significant correlations with both EW and EGW. Metabolite m36 (leucine) had significant associations with EW, EGW, and EL, and metabolite m111 (taurine) was significantly associated with all four traits. The percentage of phenotypic variation explained depended on traits and metabolic markers, ranging from 1.0% to 6.0% (Supplemental Table 2). Metabolite m126 (hypoxanthine) explained the most phenotypic variation for EW and ED, and m136 (valeric acid) and m36 (leucine) were the top contributors to EGW and EL, respectively. Functional enrichment analysis was performed on the 47 annotated metabolic markers, resulting in the identification of 22 enriched metabolic pathways. The top five pathways were pyruvate metabolism, galactose metabolism, linoleic acid metabolism, purine metabolism, and pyrimidine metabolism (Figure 2B and Supplemental Table 4). Notably, the enrichment of pyruvate metabolism reached a significant level. Evaluation of MM_GP for hybrid prediction in maize To examine the capacity of MM_GP for hybrid prediction in maize, we compared the predictive abilities of five prediction models: GP,metabolomic prediction (MP), metabolicmarker pre- diction (MMP), integrated genomic-metabolomic prediction (M_GP), and metabolic marker-assisted GP (MM_GP). Metabo- lites that showed significant associations with the target trait were considered to be metabolic markers and were used in MMP and MM_GP. The predictive abilities from 10-fold cross- validation with 20 repetitions varied from 0.259 to 0.499 for GP, 0.130 to 0.442 for MP, 0.076 to 0.237 for MMP, 0.269 to 0.494 for M_GP, and 0.268 to 0.503 for MM_GP across the four agro- nomic traits tested (Figure 3). Among these traits, prediction performance was highest for ED, followed by EW, EGW, and EL. Among the models, MP and MMP exhibited the worst prediction performance.MM_GPdisplayed better predictive abil- ities than GP. Specifically, with GBLUP, MM_GP improved the predictive ability for EW by 4.1%, EGW by 5.3%, ED by 0.8%, and EL by 2.7%. Similarly, with eXtreme gradient boosting (XGBoost), MM_GP increased predictive ability for EW by 5.2%, EGW by 4.4%, ED by 4.2%, and EL by 9.7%. The predictive ability of MM_GP also matched or even exceeded that of M_GP. When using GBLUP, MM_GP increased predictive ability by 1.8% for EW, 5.9% for EGW, and 1.8% for ED compared with M_GP, although their predictive abilities for ELwere similar. When using XGBoost, MM_GP increased predic- tive ability by 3.0% for EW, 3.3% for EGW, 0.5% for ED, and 5.4% for EL compared with M_GP. Notably, M_GP did not improve the predictive ability for some traits compared with GP, whereas MM_GP did. For example, in the case of EGW with GBLUP, M_GP decreased predictive ability by 0.6% compared with GP, whereas MM_GP increased it by 5.3%. Overall, MM_GP consis- tently performed the best among the five models, regardless of the method used (GBLUP or XGBoost). To determine whether the enhanced predictive ability of MM_GP was attributable to the small number of metabolic markers, we randomly selected an equal number of metabolites from the me- tabolomic data to match the number of metabolic markers. Across an average of 10 replicated samples, the predictive abili- ties of the randomly selected metabolites for assisting in GP were significantly lower than those of MM_GP (Figure 4). Specifically, using GBLUP, the randomly selected metabolites resulted in a significant decrease in predictive ability for EW, EGW, ED, and EL by 4.8%, 5.8%, 1.4%, and 4.1%, respectively, compared with MM_GP. Similarly, with XGBoost, the randomly selected metabolites significantly reduced predictive ability for EW, EGW, and EL by 6.3%, 6.6%, and 7.1%, respectively. Therefore, we conclude that the improved predictive ability of MM_GP cannot be attributed solely to the small number of metabolic markers. Integration of shared significant metabolic markers in MM_GP Six metabolites were found to be significantly associated with three or more traits (Figure 5A). To test the contribution of these shared significant metabolic markers to GP, we combined them with genomic data to predict the four traits in hybrid maize (Figure 5B). The predictive abilities using GBLUP were 0.387 (EW), 0.349 (EGW), 0.502 (ED), and 0.260 (EL), and those using XGBoost were 0.392 (EW), 0.338 (EGW), 0.482 (ED), and 0.283 (EL).MM_GP, which integrated the six sharedmetabolicmarkers, Plant Communications 6, 101199, March 10 2025 3 Figure 2. Identification of metabolites associated with four traits in maize. (A) Metabolites significantly associated with four traits of 205 maize inbred lines. The horizontal black lines represent the critical values at the 0.05 significance level. (B)Enriched pathways of metabolic markers. Plant Communications Metabolic marker-assisted genomic prediction showed greater predictive ability than GP and M_GP. Compared with GP, MM_GP with GBLUP significantly increased predictive ability by 3.6% for EW and 6.3% for EGW, although their predic- tive abilities for ED and EL were similar. Likewise, MM_GP with XGBoost significantly increased predictive ability by 6.8% for 4 Plant Communications 6, 101199, March 10 2025 EW, 7.6% for EGW, 6.0% for ED, and 9.4% for EL. Compared with M_GP, MM_GP with GBLUP significantly increased predic- tive ability by 6.9% for EGW and 1.7% for ED, and MM_GP with XGBoost significantly increased predictive ability by 4.6% for EW, 6.4% for EGW, 2.2% for ED, and 5.1% for EL. These findings Figure 3. Predictive abilities for four traits in 425 maize hybrids obtained from five prediction models using GBLUP and XGBoost methods. The four traits are ear weight (EW), ear grain weight (EGW), ear diameter (ED), and ear length (EL). The five prediction models are GP, MP, MMP, M_GP, and MM_GP, representing genomic prediction, metabolomic prediction, metabolic marker prediction, integrated genomic–metabolomic prediction, and metabolicmarker-assisted genomic prediction, respectively. In each histogram, different lowercase letters above the bars indicate significant differences (p < 0.05) between the models. Metabolic marker-assisted genomic prediction Plant Communications highlight the greater potential of MM_GP to improve the accuracy of genomic hybrid prediction compared with other methods. Evaluation of MM_GP for hybrid prediction in rice To confirm the advantages ofMM_GP observed inmaize, we per- formed a similar analysis in rice. Using the LASSO method, we detected 171 metabolites significantly associated with four traits in rice RIL populations: 48 for yield per plant (YIELD), 40 for tiller number per plant (TILLER), 55 for grain number per panicle (GRAIN), and 64 for 1000-grain weight (KGW) (Figure 6 and Supplemental Table 5). Among these metabolites, 138 were significantly associated with one trait, 30 with two traits, and three with three traits (Supplemental Table 6). For example, metabolite m0149-L (sn-glycero-3-phosphocholine) was signifi- cantly associated with only one trait (YIELD), m0092-L (D-panto- thenic acid) with two traits (YIELD and GRAIN), and m0643-L (chrysoeriol C-hexoside derivative) with three traits (YIELD, GRAIN, and KGW). No metabolites were significantly associated with all the tested traits. We next examined the predictive abilities of the five aforemen- tioned models for four traits in hybrid rice (Figure 7). Predictive abilities varied from 0.138 to 0.694 for GP, 0.120 to 0.673 for MP, 0.128 to 0.531 for MMP, 0.178 to 0.707 for M_GP, and 0.190 to 0.712 for MM_GP across the four agronomic traits. MM_GP and M_GP performed well for most traits, whereas MMP performed poorly. Comparison of the predictive abilities of GP and MM_GP for the four traits in hybrid rice yielded results consistent with those in maize. Using GBLUP, MM_GP demon- strated significantly higher predictive ability for YIELD (by 37.5%), TILLER (13.6%), GRAIN (15.4%), and KGW (2.6%) compared with GP. Using XGBoost, MM_GP significantly outper- formed GP for three traits: YIELD (by 8.3%), TILLER (16.7%), and GRAIN (17.5%). MM_GP also outperformed M_GP in the predic- tion of TILLER, GRAIN, and KGW. Using GBLUP, MM_GP ex- hibited significantly higher predictive ability for TILLER (by 6.1%) and GRAIN (3.4%). Using XGBoost, MM_GP exhibited significantly higher predictive ability for TILLER (by 26.2%) and KGW (14.5%). On average, MM_GP increased predictive ability by 3.4% (relative to M_GP), 13.6% (relative to GP), and 24.1% (relative to MP) across all traits and methods. These findings demonstrate the greater potential of MM_GP in hybrid rice compared with other tested methods. We then compared the predictive ability of metabolic markers with that of an equivalent number of randomly selected metabo- lites and observed results similar to those found in maize (Supplemental Figure 1). Specifically, using GBLUP, the randomly selected metabolites significantly reduced the predic- tive ability for YIELD, TILLER, GRAIN, and KGW by 4.8%, 10.0%, 9.4%, and 1.8%, respectively, compared with MM_GP. Similarly, using XGBoost, the randomly selected metabolites significantly reduced the predictive ability for YIELD, TILLER, GRAIN, and KGW by 6.9%, 27.5%, 24.0%, and 2.9%. We also analyzed the metabolites in two tissues, flag leaves and germi- nated seeds, and evaluated the MM_GP model separately for these two tissues (designated MM_GP_leaf and MM_GP_seed). Predictive ability ranged from 0.201 to 0.717 for MM_GP_leaf and from 0.158 to 0.704 for MM_GP_seed across the four traits Plant Communications 6, 101199, March 10 2025 5 Figure 4. Predictive abilities for four traits in hybrid maize obtained from integrated genomic data and randomly selected metabolites using GBLUP and XGBoost methods. The number of randomly selected metabolites corresponds to the number of metabolic markers. **p < 0.01. Plant Communications Metabolic marker-assisted genomic prediction (Supplemental Figure 2). Notably, MM_GP_leaf exhibited a higher predictive ability than MM_GP_seed. Using GBLUP, MM_GP_leaf demonstrated significantly greater predictive ability for YIELD (by 27.4%), TILLER (17.8%), GRAIN (11.0%), and KGW (1.9%) compared with MM_GP_seed. Using XGBoost, MM_GP_ leaf significantly outperformed MM_GP_seed for TILLER (by 21.9%) and GRAIN (24.9%). Predicting untested crosses using MM_GP Using parameters estimated from the training sample, we pre- dicted EW for all 20 910 potential hybrids in maize and YIELD for 21 945 potential hybrids in rice using the MM_GP model. The average predicted values of the top 100 crosses were signif- icantly higher than those of the bottom 100 crosses for both EW and YIELD (Supplemental Tables 7 and 8). When GBLUP was used, the average predicted values of the top 100 crosses for EW and YIELD increased by 62.7% and 48.8%, respectively, compared with the average predicted phenotypic values of the bottom 100 crosses. Similarly, when XGBoost was used, the average predicted values of the top 100 crosses for EW and YIELD rose by 60.5% and 50.4%, respectively, compared with the average predicted phenotypic values of the bottom 100 crosses. Supplemental Figures 3 and 4 illustrate the average predicted phenotypic values of EW and YIELD when selecting the top crosses for hybrid breeding. For instance, if the top 10 crosses predicted by XGBoost were used for hybrid breeding, the average predicted EW and YIELD of these crosses would be 198.27 and 51.15, respectively, indicating gains of 26.6% and 17.9% in EW and YIELD. If the top 10 crosses predicted by GBLUP were used for hybrid breeding, the average predicted values would be 198.36 for EW and 52.11 for YIELD, reflecting gains of 26.4% and 19.6% in EW and YIELD, respectively. DISCUSSION In this study, we propose an innovative approach,MM_GP, which first integrates metabolic markers from parental lines with GS models to predict hybrid performance in maize and rice popula- tions. Our findings indicate that incorporating a small proportion of selected metabolic markers enhances the accuracy of GP. Compared with conventional GP models, the integration of me- 6 Plant Communications 6, 101199, March 10 2025 tabolomic data resulted in higher predictive abilities for maize (1.8%) and rice (12.6%), and the integration of selected metabolic markers increased predictive abilities further (4.6% for maize and 13.6% for rice), high- lighting the potential of leveraging metabolic data to predict yield-related traits. This result may be due to the additional genetic infor- mation implicitly captured by metabolites. Whereas GP models focus on genetic varia- tions at the gene level, M_GP and MM_GP are capable of capturing a broader spectrum of genetic variation and physiolog- ical epistasis (Fernie and Schauer, 2009; Feher et al., 2014; Guo et al., 2016; Wang et al., 2021b). Integration of selected metabolic markers has shown promise in enhancing predictive abilities, potentially surpassing the integration of entire metabolomic data. Our analysis indicated that the MM_GP model generally exhibited superior predictive abilities compared with the M_GP model in maize and rice populations. Notably, the integration of only six selected meta- bolic markers significantly associated with multiple traits re- sulted in 3.1% higher predictive ability compared with the M_GP model in maize. This improvement may be attributed to the benefits of feature selection (Xu et al., 2022). Feature selection not only reduced overfitting in the MLR algorithm but also significantly improved the predictive ability of the GLM algorithm for rapeseed seed yield (Shahsavari et al., 2023). In Chinese Holsteins, the use of regularized regression models for feature selection of WGS data demonstrated that combining preselected SNPs with 50K SNP chip data could improve the predictive abilities for milk, protein, and fat yields compared with WGS data and 50K SNP chip data alone (Li et al., 2022). In our study, the identification of metabolic markers via MWAS enabled feature selection of metabolomic data, potentially aiding in the elimination of irrelevant or redundant features, preventing overfitting, and enhancing model generalization. The improved predictive ability of MM_GP might also be attrib- uted to the incorporation of prior biological information. This assertion is supported by a comparison of the predictive perfor- mance of selected metabolic markers with an equivalent number of randomly selected metabolites. Through integration of GWAS results frompublic databases, GS accuracy increased for two out of three traits in a dairy cattle dataset and nine out of 11 traits in a rice dataset (Zhang et al., 2014). The inclusion of significant SNPs from GWAS improved the prediction accuracy of GS models for 1000-grain weight and amylose content in hybrid rice (Yu et al., 2022) and for nine agronomic traits by 4.0%–19.9% in rice (Zhang et al., 2023). Selection of optimal marker sets and Figure 5. Metabolites significantly associated with three or more traits in maize. (A) The number of metabolites significantly associated with four traits of 205 maize inbred lines. The red font indicates the numbers of metabolites significantly associated with three or more traits. (B) Predictive abilities for four traits in hybrid maize obtained from MM_GP using GBLUP and XGBoost methods with metabolic markers identified from the parental lines. In each histogram, different lowercase letters above the bars indicate significant differences (p < 0.05) between the models. Metabolic marker-assisted genomic prediction Plant Communications prediction of phenotypes in rice and soybean data using the GMStool developed for GWAS analysis demonstrated higher prediction accuracy than using all SNP markers (Jeong et al., 2020). Other studies also showed that integration of prior GWAS information enhanced predictive ability in livestock species and traits, such as live weight in alpine merino sheep (Li et al., 2023), milk fatty acid composition in dairy cattle (Gebreyesus et al., 2019), and multiple traits in Hanwoo beef cattle (de Las Heras-Saldana et al., 2020). These studies underscore the advantages of incorporating existing biological knowledge at the DNA level. Our results suggest that leveraging prior information at the metabolite level can improve predictive ability in maize and rice, offering potential for wider applications across diverse populations and crop species. The improved predictive ability of MM_GP relative to GP was significantly greater in rice, with an increase of up to 13.6%, compared with a 4.6% improvement in maize. This discrepancy may stem from the tissues used for metabolite analysis and the timing of sample collection (Westhues et al., 2017). In maize, the predictive ability for 100-grain weight in tropical and subtropical environments using metabolites from mature seeds was compa- rable to that using genomic data, as metabolites in mature seeds are directly linked to yield (Guo et al., 2016). In our study, maize Plant Communications 6, 101199, March 10 2025 7 Figure 6. Metabolites significantly associated with four traits in 210 rice RILs. The horizontal black lines represent the critical values at the 0.05 significance level. Plant Communications Metabolic marker-assisted genomic prediction metabolomic datawere obtained fromseedling leaves in a climate chamber, whereas ricemetabolomic datawere obtained fromflag leaves and germinated seeds, which are more relevant to yield traits. The instability of metabolites in phenotype prediction arises from the dynamic nature of metabolic profiles. Characteristic-level perturbations in metabolites are significantly greater than those in genomic sequences or marker data and are susceptible to variations in sampling conditions, as well as the age and type of tissue (Schrag et al., 2018). Therefore, to enhance prediction accuracy effectively, it is crucial to be explicit about the time points or tissues being sampled. Our study focused on maize metabolomic data collected from seedlings in climate chambers to minimize the impact of environmental fluctuations compared with field conditions. Previous studies have shown the viability of using metabolic profiles obtained from 3.5-day-old roots cultivated in climate chambers for prediction of hybrid performance (de Abreu e Lima et al., 2017). The use of metabolomics in hybrid breeding can benefit from sampling seedlings under controlled conditions, enabling year-round evaluation with available parental lines and simultaneous sampling of multiple tissues such as leaves and roots. The shorter cultivation period leads to more rapid availability of prediction results when developing superior hybrids for further testing (Schrag et al., 2018). Although metabolites in tissues at later developmental stages, such as mature seeds, are associated with yield-related traits, time and resource costs must also be considered. Early-stage sampling 8 Plant Communications 6, 101199, March 10 2025 under controlled conditions facilitates early selection, thereby reducing breeding cycles and enhancing annual genetic gain. We also used MM_GP to predict the phenotypic values of 20 910 potential hybrids for EW in maize. The genotypes and metabolites of these future hybrids are not directly measured; instead, they are inferred from their parental lines. The top crosses can be immediately used and transformed into high- performing hybrids. In addition, selection of the top 100 crosses for EW results in gains of 192:24 � 156:96 = 35:28± 2:68 and 191:68 � 156:66 = 35:02± 2:69 g per plant when using GBLUP and XGBoost, respectively. Although the improvement in predictive ability of MM_GP in maize appears modest, the gains of 35:28=156:96 = 22:5% and 35:02=156:66 = 22:4% achieved through selection of the top 100 hybrids using GBLUP and XGBoost, respectively, represent a noteworthy accomplishment. Among the top 100 maize crosses, A017/ A037 had been designated as Suyu 161, a variety developed by Jiangsu Yanjiang Institute of Agricultural Sciences, China. It is worth noting that 24 and nine crosses exhibited a predicted EW greater than that of A017/A037 when using GBLUP and XGBoost, respectively. These crosses merit further validation and could contribute to the development of new varieties aimed at enhancing maize yield. In this study, we identified metabolites significantly associated with agronomic traits of maize and rice. The well-predicted Figure 7. Predictive abilities for four traits in 278 rice hybrids obtained from five prediction models using GBLUP and XGBoost methods. The four traits are yield per plant (YIELD), tiller number per plant (TILLER), grain number per panicle (GRAIN), and 1000-grain weight (KGW). The five prediction models are GP, MP, MMP, M_GP, and MM_GP. In each histogram, different lowercase letters above the bars indicate significant differences (p < 0.05) between the models. Metabolic marker-assisted genomic prediction Plant Communications metabolic markers exhibited various degrees of correlation, showing a roughly equal distribution of both positive and negative correlations. The correlation coefficients ranged from �0.41 to 0.97 in maize and from �0.71 to 0.98 in rice (Supplemental Tables 9 and 10). A total of 411 significant correlations (p < 0.01) were identified in maize compared with 3350 in rice. Notably, significant correlations were observed not only between metabolic markers within the same categories but also between markers from different categories (Supplemental Figures 5 and 6). In addition, in maize, nine metabolic markers were associated with shared metabolic pathways and exhibited either upstream or downstream associations (Supplemental Table 4). For instance, metabolites m819 (S-lactoylglutathione) and m838 (malic acid) are both involved in pyruvate metabolism. Metabolites m126 (hypoxanthine), m893 (inosine), and m98 (deoxyguanosine) are associated with purine metabolism. A literature search and information from the Kyoto Encyclopedia of Genes and Genomes database revealed that, among these metabolites, m819 (S-lactoylglutathione) can be converted to m838 (malic acid) through several pathways (Long et al., 2015; Dafre et al., 2017; Schw€orer et al., 2021). Metabolites m893 (inosine) and m126 (hypoxanthine) can be interconverted via laccase domain containing 1 (LACC1) (Svetlana et al., 2022). By assessing the phenotypic variation explained by parental ge- notypes for 78 metabolic markers in maize, we found that these markers are influenced by parental genotypes to various de- grees. Specifically, parental genotypes explained less than 10% of the phenotypic variation in 16 metabolic markers, be- tween 10% and 50% in 37 metabolic markers, and more than 50% in 25 metabolic markers (Supplemental Table 11). An metabolome-based genome-wide association study (mGWAS) analysis of metabolic markers using the FarmCPU (fixed and random model circulating probability unification) method (Liu et al., 2016), detected a total of 30, 19, 75, and 111 significant (p < 4:83 10� 7) SNPs corresponding to nine, seven, 13, and 15 metabolite markers for EW, EGW, ED, and EL, respectively (Supplemental Figure 7 and Supplemental Table 12). Notably, four common significant SNPs were identified. SNP_3_ 16890062 and SNP_3_223717387, both located on chromosome 3, were significantly associated with metabolites m126 (hypoxanthine) and m753 (ortho-hydroxyphenylacetic acid); SNP_1_197177004 was significantly associated with metabolites m706 (parthenin) and m375 (histamine); and SNP_7_120230279 was significantly associated with metabolites m614 and m684. These findings suggest shared genetic control over these metabolites. In summary, our study identified a set of SNPs that regulate significant metabolites associated with maize yield traits. These results will facilitate the functional verification of genes and enhance our understanding of metabolic networks, ultimately contributing to the improvement of maize yield. Some of these metabolic markers play key roles in various plant growth and development processes, directly or indirectly influ- encing agronomic traits. For instance, metabolite m838 (malic acid) was significantly correlated with EGW and EL in maize. Pre- vious research also found that malic acid was linked to flag-leaf width in wheat (Shi et al., 2020). Malic acid, an organic acid, plays an essential part in regulating carbon metabolism in plants by linking mitochondrial respiratory metabolism to Plant Communications 6, 101199, March 10 2025 9 Plant Communications Metabolic marker-assisted genomic prediction cytosolic biosynthetic pathways. It has important functions in the tricarboxylic acid cycle and metabolic signaling as well (Shan et al., 2023). Another metabolite, m36 (leucine), was found to be related to EW, EGW, and EL in maize. An association between leucine and heading date has been reported in rice (Li et al., 2019). Leucine has been shown to regulate stress tolerance via the plant’s respiratory system (Pires et al., 2016) and can also serve as a plant growth regulator to increase antioxidant capacity and heat resistance (Liu et al., 2023b). Metabolite m863 (salicylic acid) was found to be associated with maize EW and EGW in the present study, and salicylic acid has also been identified at three developmental stages of wheat, namely grain-filling kernels, mature kernels, and germinating kernels (Yin et al., 2024). Another metabolite, m0021-L (trigonelline), which was associated with yield per plant and grain number per panicle in rice in our analysis, has also shown correlations with grain width (Chen et al., 2016; Wei et al., 2018) and grain length (Li et al., 2019). Trigonelline, an alkaloid, plays an important role in the regulation of cell growth and development (Mazzuca et al., 2000). A study on peanuts suggested that reduction of trigonelline level could enhance peanut yield (Cho et al., 2011). Identification of these metabolites can help to reveal biological networks involving genomic loci, metabolites, and traits, enabling us to better understand the genetic mechanisms that underlie different traits. Our research demonstrates the distinct advantages of metabolic marker-assisted GP (MM_GP) for hybrid prediction in two staple crops, maize and rice. With advances in high-throughput metab- olomics technologies and prediction models, this approach has the potential to transform GS by improving its accuracy and effi- ciency. It not only accelerates the crop breeding process by enabling early selection but also offers valuable insights for ad- vances in precision breeding. METHODS Maize materials The maize plant materials consisted of 425 hybrids produced using a sparse partial diallel crossing experiment involving 205 inbred lines that were a subset of a previously described maize panel (Wang et al., 2021a). These maize materials were planted in Yangzhou (119.27� E, 32.36� N) and Taian (116.39� E, 35.83� N) in 2018, following a randomized block design with two rows and two replications. Each row contained 13 plants with a plant spacing of 25 cm and a row spacing of 60 cm. Field management practices, including irrigation, weeding, disease and pest control, and fertilization, were performed according to local plot-trial management guidelines. For each inbred line and hybrid, five maize ears of uniform size were selected for evaluation of four traits: EW, EGW, ED, and EL. The 205 maize inbred lines were genotyped using the genotype-by-sequencing method using fresh young leaves collected during the vegetative growth stage. After filtering SNPs with low allelic frequency (<0.05) and high missing rates (>0.1), 104 011 high-quality SNPs were retained for subsequent analysis. The genotypes of the 425 hybrids were inferred from those of their parents. Metabolite analysis by LC–MS Non-targeted LC–MSwas used to analyze metabolites in seedling leaves of 205 maize inbred lines. For each maize material, plump and uniform maize seeds were selected for hydroponic experiments in a climate chamber under controlled conditions. Two biological replicates were es- tablished for each material, with 10 plants per replicate. At the three-leaf, 10 Plant Communications 6, 101199, March 10 2025 one-heart stage, leaves from three plants per replicate were collected for metabolomic analysis. These samples were promptly frozen in liquid ni- trogen and transferred to �80�C. Each sample was weighed to 200 mg (±1%) in a 2-ml EP tube with 0.6 ml of methanol (�20�C) containing 4 ppm 2-chlorophenylalanine. The mixture was vortexed for 30 s, followed by grinding in a tissue-grinding machine at 65 Hz for 60 s and ultrasonic crushing at 40 kHz for 30min. The samples were then centrifuged at 25�C and spun at 12 000 rpm for 10 min. The filtered supernatant (300 ml) was transferred to a sample bottle for LC–MS analysis. Chromatographic separation was performed on a Thermo Vanquish system equipped with an ACQUITY UPLC HSS T3 column (1503 2.1 mm, 1.8 mm, Waters) maintained at 40�C. The temperature of the autosampler was set to 8�C. The gradient elution conditions are given in Supplemental Tables 13 and 14. The ESI-MSn experiments were performed using a Thermo Q Exac- tive mass spectrometer with a spray voltage of 3.8 kV in positive mode and�2.5 kV in negative mode. The raw data were converted into mzXML format using ProteoWizard software (version 3.0.8789). The XCMS pack- age in R (version 3.1.3) was used for peak identification, filtration, and alignment. A data matrix containing information on mass-to-charge ratio (m/z), retention time, intensity, and other relevant details was generated. To facilitate comparison of data across different magnitudes, the inten- sity values were subjected to batch normalization. The identification of metabolites was initially confirmed on the basis of exact molecular weight (with a molecular weight error of %30 ppm), followed by analysis of the tandem mass spectrometry fragmentation pattern. The Human Metabolome Database (HMDB) (http://www.hmdb.ca/), METLIN (http://metlin.scripps.edu), MassBank (http://www.massbank. jp/), LipidMaps (http://www.lipidmaps.org), mzCloud (https://www. mzcloud.org), and the Panomix proprietary standard database were used to verify annotations and identify metabolites. Statistical analysis of metabolomic data Statistical significance testing was performed on the concentration of each detected metabolite in the two biological replicates. Metabolites that showed a significant difference (p < 0.01) between the two replicates were excluded, leaving 777 metabolites for further analysis. The metabo- lite concentrationswere normalized, and themean value of the two biolog- ical replicates was used for subsequent analysis. The CV was calculated for eachmetabolite, and the phenotypic variation explained by eachmeta- bolic marker was determined by the relevant r2. The LASSO method (Tibshirani, 1997) was used in an MWAS to identify metabolites significantly associated (p < 0.05) with agronomic traits of the parental lines. Specifically, the lassopv/R package was used for LASSO computa- tion (Wang and Michoel, 2017). The p value was calculated for each metabolite, and those with a p value below 0.05 were considered to be significant metabolites. These metabolites were then integrated into GP models as metabolic markers. Rice dataset The rice datasets consisted of 210 RILs obtained from a cross between two rice varieties (Zhenshan 97 and Minghui 63), along with 278 hybrids formed by random pairing of the 210 RILs (Hua et al., 2003). The genomic data included 1619 bins identified from 270 820 SNPs by sequencing all 210 RILs (Yang et al., 2022). The metabolomic data included 1000 metabolites, with 317 detected in germinated seeds and the remaining 683 detected in flag leaves (Gong et al., 2013). Four agronomic traits were analyzed: YIELD, TILLER, GRAIN, and KGW. The MM_GP model We used two GS methods to demonstrate the effectiveness of MM_GP in maize and rice. The first method, GBLUP, used kinship matrices to repre- sent the genetic relationships among individuals based on a mixed linear model (VanRaden, 2008). The second method, XGBoost, is a machine- learning algorithm capable of capturing non-linear relationships without requiring prior information from potential genetic models (Chen and http://www.hmdb.ca/ http://metlin.scripps.edu http://www.massbank.jp/ http://www.massbank.jp/ http://www.lipidmaps.org https://www.mzcloud.org https://www.mzcloud.org Metabolic marker-assisted genomic prediction Plant Communications Guestrin, 2016). Detailed information about model structure and optimization is provided below. GBLUP for MM_GP The GBLUP model for MM_GP is described as y = Xb+ZGgG +AMgMa +DMgMd + ε (Equation 1) where y is an n31 vector of phenotypic observations of hybrids; X is an n3 p design matrix for the fixed effect; b is the fixed effect; ZG is an n3 g ge- notypematrix of the hybrids; AM andDM are n3m additive and dominance coding matrices of metabolites, respectively, where AM = 1 2 ðM +FÞ and DM = 1 2 jM � Fj; andM and F represent the matrices of metabolic marker concentrations for male and female parents, respectively. The details of the coding system were described in our previous research (Xu et al., 2021c). gG, gMa and gMd were assumed to follow the normal distributions gG � N � 0; 1gf 2 G � , gMa � N � 0; 1 mf 2 Ma � , and gMd � N � 0; 1 mf 2 Md � , respectively, where f2 G, f 2 Ma, and f2 Md are the corresponding polygenic variances, g and m are the numbers of SNPs and metabolites, and ε is an n31 vector of residual errors with a normal distribution Nð0;s2 ε Þ. The expectation of y is EðyÞ = Xb, and the variance–covariance matrix is varðyÞ = V = 1 g ZGZ T Gf 2 G + 1 m AMA T Mf 2 Ma + 1 m DMD T Mf 2 Md = KGf 2 G + KMaf 2 Ma + KMdf 2 Md + Is2 ε (Equation 2) where KG, KMa, and KMd are kinship matrices for random effects gG, gMa, and gMd , respectively. The variance components were estimated using the restricted maximum likelihood (Patterson and Thompson, 1971; Yin et al., 2023). After parameters are estimated from the training set, they can be used to predict the phenotypic values of the test set. Assuming y1 is an n13 1 vec- tor of the phenotypic values in the training set, y2 is an n23 1 vector of the phenotypic values in the testing set, and n1 + n2 = n, where n is the size of the entire sample, Formula 1 can be rewritten as � y1 y2 � = � X1b X2b � + � ZG1 gG ZG2 gG � + � AM1 gMa AM2 gMa � + � DM1 gMd DM2 gMd � + � ε1 ε2 � (Equation 3) The expectation and variance–covariance of y can be modified as: E � y1 y2 � = � X1b X2b � (Equation 4) var � y1 y2 � = � V11 V12 V21 V22 � = � KG11 KG12 KG21 KG22 � f2 G + � KMa11 KMa12 KMa21 KMa22 � f2 Ma + � KMd11 KMd12 KMd21 KMd22 � f2 Md + � In1 0 0 In2 � s2 ε (Equation 5) where the kinship matrices have been partitioned into 2 3 2 blocks. After the parameter vector q = ½b;f2 G;f 2 Ma;f 2 Md;s 2 ε � is estimated, the predicted phenotypic values of the testing set can be obtained from the following formula: by2 = E � y2jy1Þ = X2 bb + � KG21 f2 G + KMa21f 2 Ma + KMd21f 2 Md � V�1 11 ðy1 � X1 bbÞ (Equation 6) XGBoost for MM_GP XGBoost, proposed by Chen and Guestrin (2016), is an effective and flexible ensemble machine learning algorithm (Ma et al., 2022). The process of XGBoost for MM_GP involved training on a dataset D = fðXi ; yiÞgðjDj = n;Xi ˛Rq; yi ˛RÞ with n samples and q features, where yi represents the phenotypic observation value of the i-th hybrid and Xi = ½ZGi AMi DMi � is a 13q feature vector comprising the genotype vector (ZGi ), metabolite additive coding vector (AMi ), and metabolite domi- nance coding vector (DMi ) of the i-th hybrid. Initially, XGBoost generates predicted values by training a tree on the samples, and subsequent trees are built using the residual errors of the previous tree (Yan et al., 2021). After K iterations, the predicted phenotypic value (byi ) can be ex- pressed as byi = XK k = 1 fkðXiÞ; fk ˛F (Equation 7) where fkðXiÞ represents the prediction value of the k-th decision tree for the i-th individual. The tree-structured Parzen estimator, a Bayesian optimization algorithm, was used to explore the hyperparameter space and optimize the hyperparameters of each trait by minimizing the root-mean-square error (Ozaki et al., 2020). The analysis codes are available on GitHub (https://github.com/171702120/ yangxu89-GS2024). Assessing the predictive abilities of prediction models The predictive abilities of different prediction models in maize and rice da- tasets were evaluated using 10-fold cross-validation. This procedure involved randomly dividing the sample into 10 subsets, with nine used for parameter estimation and one for prediction. This process was repeated until all subsets were predicted. Predictive ability was calculated as the determination coefficient between the observed and predicted phenotypic values. To reduce random errors from sample partitioning, the cross-validation procedure was iterated 20 times, and the average of these iterations was calculated to determine the final predictive ability of the models. FUNDING This work was supported by grants from the National Key Research and Development Program of China (2023YFD1202200), the National Natural Science Foundation of China (32170636, 32061143030, 32261143462, 32100448, 32070558), the Seed Industry Revitalization Project of Jiangsu Province (JBGS[2021]009), the Key Research and Development Program of Jiangsu Province (BE2022343, BE2023336), Jiangsu Province Agricultural Science and Technology Independent Innovation (CX(21)1003), the Shenzhen Science and Technology Program (KQTD202303010928390070), the Hebei Science and Tech- nology Program (215A7612D), the Shanghai Agricultural Science and Technology Innovation Program (T2023204), the Provincial Technology Innovation Program of Shandong, China, Qing Lan Project of Jiangsu Province, Yangzhou University High-end Talent Support Program, and the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD). ACKNOWLEDGMENTS No conflict of interest is declared. AUTHOR CONTRIBUTIONS C.X., Yang Xu, and Yunbi Xu designed the research.W.Y., J.Q., K.Z., G.Y., Y.Z., Y.L., R.C., and T.T. performed the research. Xin Wang, Y.J., Xinyi Wang, S.H., and P.L. analyzed the data. Y.X. and W.Y. wrote the paper. Y.X., Z.Y., and C.X. revised themanuscript. All authors read and approved the final manuscript. SUPPLEMENTAL INFORMATION Supplemental information is available at Plant Communications Online. Plant Communications 6, 101199, March 10 2025 11 https://github.com/171702120/%20yangxu89-GS2024 https://github.com/171702120/%20yangxu89-GS2024 Plant Communications Metabolic marker-assisted genomic prediction Received: June 25, 2024 Revised: October 31, 2024 Accepted: November 26, 2024 Published: November 29, 2024 REFERENCES Chen, T., and Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM), pp. 785–794. https://doi.org/10.1145/2939672.2939785. Chen, W., Wang, W., Peng, M., Gong, L., Gao, Y., Wan, J., Wang, S., Shi, L., Zhou, B., Li, Z., et al. (2016). Comparative and parallel genome-wide association studies for metabolic and agronomic traits in cereals. Nat. Commun. 7:12767. https://doi.org/10.1038/ ncomms12767. Cho, Y., Kodjoe, E., Puppala, N., and Wood, A. (2011). Reduced trigonelline accumulation due to rhizobial activity improves grain yield in peanut (Arachis hypogaea L.). Acta Agric. Scand. Sect. B Soil Plant Sci 61:395–403. https://doi.org/10.1080/09064710.2010. 494614. Crossa, J., Pérez-Rodrı́guez, P., Cuevas, J., Montesinos-López, O., Jarquı́n, D., De Los Campos, G., Burgueño, J., González- Camacho, J.M., Pérez-Elizalde, S., Beyene, Y., et al. (2017). Genomic selection in plant breeding: methods, models, and perspectives. Trends Plant Sci. 22:961–975. https://doi.org/10.1016/ j.tplants.2017.08.011. Cui, Y., Li, R., Li, G., Zhang, F., Zhu, T., Zhang, Q., Ali, J., Li, Z., and Xu, S. (2020). Hybrid breeding of rice via genomic selection. Plant Biotechnol. J. 18:57–67. https://doi.org/10.1111/pbi.13170. Dafre, A.L., Schmitz, A.E., and Maher, P. (2017). Methylglyoxal-induced AMPK activation leads to autophagic degradation of thioredoxin 1 and glyoxalase 2 in HT22 nerve cells. Free Radic. Biol. Med. 108:270–279. https://doi.org/10.1016/j.freeradbiomed.2017.03.028. de Abreu e Lima, F., Westhues, M., Cuadros-Inostroza, Á., Willmitzer, L., Melchinger, A.E., and Nikoloski, Z. (2017). Metabolic robustness in young roots underpins a predictive model of maize hybrid performance in the field. Plant J. 90:319–329. https://doi.org/10. 1111/tpj.13495. de Las Heras-Saldana, S., Lopez, B.I., Moghaddar, N., Park, W., Park, J.-e., Chung, K.Y., Lim, D., Lee, S.H., Shin, D., and van Der Werf, J.H. (2020). Use of gene expression and whole-genome sequence information to improve the accuracy of genomic prediction for carcass traits in Hanwoo cattle. Genet. Sel. Evol. 52:1–16. https:// doi.org/10.1186/s12711-020-00574-2. Duvick, D.N. (2001). Biotechnology in the 1930s: the development of hybrid maize. Nat. Rev. Genet. 2:69–74. https://doi.org/10.1038/ 35047587. Feher, K., Lisec, J., R€omisch-Margl, L., Selbig, J., Gierl, A., Piepho, H.- P., Nikoloski, Z., and Willmitzer, L. (2014). Deducing hybrid performance from parental metabolic profiles of young primary roots of maize using a multivariate diallel approach. PLoS One 9:e85435. https://doi.org/10.1371/journal.pone.0085435. Fernie, A.R., and Schauer, N. (2009). Metabolomics-assisted breeding: a viable option for crop improvement? Trends Genet. 25:39–48. https:// doi.org/10.1016/j.tig.2008.10.010. Gamboa-Becerra, R., Hernández-Hernández, M.C., González-Rı́os, Ó., Suárez-Quiroz, M.L., Gálvez-Ponce, E., Ordaz-Ortiz, J.J., and Winkler, R. (2019). Metabolomic markers for the early selection of coffea canephora plants with desirable cup quality traits. Metabolites 9:214. https://doi.org/10.3390/metabo9100214. Gebreyesus, G., Bovenhuis, H., Lund, M.S., Poulsen, N.A., Sun, D., and Buitenhuis, B. (2019). Reliability of genomic prediction for milk fatty acid composition using a multi-population reference and 12 Plant Communications 6, 101199, March 10 2025 incorporating GWAS results. Genet. Sel. Evol. 51:16. https://doi.org/ 10.1186/s12711-019-0460-z. Gong, L., Chen,W., Gao, Y., Liu, X., Zhang, H., Xu, C., Yu, S., Zhang, Q., and Luo, J. (2013). Genetic analysis of the metabolome exemplified using a rice population. Proc. Natl. Acad. Sci. USA 110:20320– 20325. https://doi.org/10.1073/pnas.1319681110. Guo, Z., Magwire, M.M., Basten, C.J., Xu, Z., and Wang, D. (2016). Evaluation of the utility of gene expression and metabolic information for genomic prediction in maize. Theor. Appl. Genet. 129:2413–2427. https://doi.org/10.1007/s00122-016-2780-5. Hickey, J.M., Dreisigacker, S., Crossa, J., Hearne, S., Babu, R., Prasanna, B.M., Grondona, M., Zambelli, A., Windhausen, V.S., Mathews, K., et al. (2014). Evaluation of genomic selection training population designs and genotyping strategies in plant breeding programs using simulation. Crop Sci. 54:1476–1488. https://doi.org/ 10.2135/cropsci2013.03.0195. Hu, H., Campbell, M.T., Yeats, T.H., Zheng, X., Runcie, D.E., Covarrubias-Pazaran, G., Broeckling, C., Yao, L., Caffe-Treml, M., Gutiérrez, L.a., et al. (2021). Multi-omics prediction of oat agronomic and seed nutritional traits across environments and in distantly related populations. Theor. Appl. Genet. 134:4043–4054. https://doi.org/10.1007/s00122-021-03946-4. Hu, X., Xie, W., Wu, C., and Xu, S. (2019). A directed learning strategy integrating multiple omic data improves genomic prediction. Plant Biotechnol. J. 17:2011–2020. https://doi.org/10.1111/pbi.13117. Hua, J., Xing, Y., Wu, W., Xu, C., Sun, X., Yu, S., and Zhang, Q. (2003). Single-locus heterotic effects and dominance by dominance interactions can adequately explain the genetic basis of heterosis in an elite rice hybrid. Proc. Natl. Acad. Sci. USA 100:2574–2579. https://doi.org/10.1073/pnas.0437907100. Jeong, S., Kim, J.-Y., and Kim, N. (2020). GMStool: GWAS-based marker selection tool for genomic prediction from genomic data. Sci. Rep. 10:19653. https://doi.org/10.1038/s41598-020-76759-y. Li, C., Li, J., Wang, H., Zhang, R., An, X., Yuan, C., Guo, T., and Yue, Y. (2023). Genomic Selection for Live Weight in the 14th Month in Alpine Merino Sheep Combining GWAS Information. Animals. 13:3516. https://doi.org/10.3390/ani13223516. Li, K., Wang, D., Gong, L., Lyu, Y., Guo, H., Chen, W., Jin, C., Liu, X., Fang, C., and Luo, J. (2019). Comparative analysis of metabolome of rice seeds at three developmental stages using a recombinant inbred line population. Plant J. 100:908–922. https://doi.org/10.1111/ tpj.14482. Li, S., Yu, J., Kang, H., and Liu, J. (2022). Genomic Selection in Chinese Holsteins Using Regularized Regression Models for Feature Selection ofWhole Genome Sequencing Data. Animals. 12:2419. https://doi.org/ 10.3390/ani12182419. Liu, H., Su, Y., Fan, Y., Zuo, D., Xu, J., Liu, Y., Mei, X., Huang, H., Yang, M., and Zhu, S. (2023b). Exogenous leucine alleviates heat stress and improves saponin synthesis in Panax notoginseng by improving antioxidant capacity and maintaining metabolic homeostasis. Front. Plant Sci. 14:1175878. https://doi.org/10.3389/fpls.2023.1175878. Liu, X., Huang, M., Fan, B., Buckler, E.S., and Zhang, Z. (2016). Iterative Usage of Fixed and Random Effect Models for Powerful and Efficient Genome-Wide Association Studies. PLoS Genet. 12:e1005767. https://doi.org/10.1371/journal.pgen.1005767. Liu, Y., Zhang, Y., Zhou, F., Yao, Z., Zhan, Y., Fan, Z., Meng, X., Zhang, Z., Liu, L., Yang, J., et al. (2023a). Increased Accuracy of Genomic Prediction Using Preselected SNPs from GWAS with Imputed Whole- Genome Sequence Data in Pigs. Animals. 13:3871. https://doi.org/ 10.3390/ani13243871. Long, L., Xin, Z., Hyun-Dong, S., R, C.R., Jianghua, L., Guocheng, D., and Jian, C. (2015). Improved production of propionic acid in https://doi.org/10.1145/2939672.2939785 https://doi.org/10.1038/ncomms12767 https://doi.org/10.1038/ncomms12767 https://doi.org/10.1080/09064710.2010.494614 https://doi.org/10.1080/09064710.2010.494614 https://doi.org/10.1016/j.tplants.2017.08.011 https://doi.org/10.1016/j.tplants.2017.08.011 https://doi.org/10.1111/pbi.13170 https://doi.org/10.1016/j.freeradbiomed.2017.03.028 https://doi.org/10.1111/tpj.13495 https://doi.org/10.1111/tpj.13495 https://doi.org/10.1186/s12711-020-00574-2 https://doi.org/10.1186/s12711-020-00574-2 https://doi.org/10.1038/35047587 https://doi.org/10.1038/35047587 https://doi.org/10.1371/journal.pone.0085435 https://doi.org/10.1016/j.tig.2008.10.010 https://doi.org/10.1016/j.tig.2008.10.010 https://doi.org/10.3390/metabo9100214 https://doi.org/10.1186/s12711-019-0460-z https://doi.org/10.1186/s12711-019-0460-z https://doi.org/10.1073/pnas.1319681110 https://doi.org/10.1007/s00122-016-2780-5 https://doi.org/10.2135/cropsci2013.03.0195 https://doi.org/10.2135/cropsci2013.03.0195 https://doi.org/10.1007/s00122-021-03946-4 https://doi.org/10.1111/pbi.13117 https://doi.org/10.1073/pnas.0437907100 https://doi.org/10.1038/s41598-020-76759-y https://doi.org/10.3390/ani13223516 https://doi.org/10.1111/tpj.14482 https://doi.org/10.1111/tpj.14482 https://doi.org/10.3390/ani12182419 https://doi.org/10.3390/ani12182419 https://doi.org/10.3389/fpls.2023.1175878 https://doi.org/10.1371/journal.pgen.1005767 https://doi.org/10.3390/ani13243871 https://doi.org/10.3390/ani13243871 Metabolic marker-assisted genomic prediction Plant Communications Propionibacterium jensenii via combinational overexpression of glycerol dehydrogenase and malate dehydrogenase from Klebsiella pneumoniae. Appl. Environ. Microbiol. 81:2256–2264. https://doi.org/ 10.1128/AEM.03572-14. Ma, B., Yan, G., Chai, B., and Hou, X. (2022). XGBLC: an improved survival prediction model based on XGBoost. Bioinformatics 38:410–418. https://doi.org/10.1093/bioinformatics/btab675. Mazzuca, S., Bitonti, M.B., Innocenti, A.M., and Francis, D. (2000). Inactivation of DNA replication origins by the cell cycle regulator, trigonelline, in root meristems of Lactuca sativa. Planta 211:127–132. https://doi.org/10.1007/s004250000272. Meuwissen, T.H., Hayes, B.J., and Goddard, M.E. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829. https://doi.org/10.1093/genetics/157.4.1819. Ozaki, Y., Tanigaki, Y., Watanabe, S., and Onishi, M. (2020). Multiobjective tree-structured parzen estimator for computationally expensive optimization problems. Proceedings of the 2020 Genetic and Evolutionary Computation Conference, 533–541. https://doi.org/ 10.1145/3377930.3389817. Patterson, H.D., and Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika 58:545–554. https://doi.org/10.1093/biomet/58.3.545. Pires, M.V., Pereira Júnior, A.A., Medeiros, D.B., Daloso, D.M., Pham, P.A., Barros, K.A., Engqvist, M.K.M., Florian, A., Krahnert, I., Maurino, V.G., et al. (2016). The influence of alternative pathways of respiration that use branched-chain amino acids following water shortage in Arabidopsis. Plant Cell Environ. 39:1304–1319. https:// doi.org/10.1111/pce.12682. Resende, R.T., Hickey, L., Amaral, C.H., Peixoto, L.L., Marcatti, G.E., and Xu, Y. (2024). Satellite-enabled enviromics to enhance crop improvement. Mol. Plant 17:848–866. https://doi.org/10.1016/j.molp. 2024.04.005. Riedelsheimer, C., Czedik-Eysenberg, A., Grieder, C., Lisec, J., Technow, F., Sulpice, R., Altmann, T., Stitt, M., Willmitzer, L., and Melchinger, A.E. (2012). Genomic and metabolic prediction of complex heterotic traits in hybrid maize. Nat. Genet. 44:217–220. https://doi.org/10.1038/ng.1033. Schrag, T.A., Westhues, M., Schipprack, W., Seifert, F., Thiemann, A., Scholten, S., and Melchinger, A.E. (2018). Beyond genomic prediction: combining different types of omics data can improve prediction of hybrid performance in maize. Genetics 208:1373–1385. https://doi.org/10.1534/genetics.117.300374. Schw€orer, S., Pavlova, N.N., Cimino, F.V., King, B., Cai, X., Sizemore, G.M., and Thompson, C.B. (2021). Fibroblast pyruvate carboxylase is required for collagen production in the tumour microenvironment. Nat. Metab. 3:1484–1499. https://doi.org/10.1038/s42255-021-00480-x. Shahsavari, M., Mohammadi, V., Alizadeh, B., and Alizadeh, H. (2023). Application of machine learning algorithms and feature selection in rapeseed (Brassica napus L.) breeding for seed yield. Plant Methods 19:57. https://doi.org/10.1186/s13007-023-01035-9. Shan, N., Zhang, Y., Guo, Y., Zhang, W., Nie, J., Fernie, A.R., and Sui, X. (2023). Cucumber malate decarboxylase, CsNADP-ME2, functions in the balance of carbon and amino acid metabolism in fruit. Hortic. Res. 10:uhad216. https://doi.org/10.1093/hr/uhad216. Shi, T., Zhu, A., Jia, J., Hu, X., Chen, J., Liu, W., Ren, X., Sun, D., Fernie, A.R., Cui, F., et al. (2020). Metabolomics analysis and metabolite- agronomic trait associations using kernels of wheat (Triticum aestivum) recombinant inbred lines. Plant J. 103:279–292. https:// doi.org/10.1111/tpj.14727. Spindel, J.E., Begum, H., Akdemir, D., Collard, B., Redoña, E., Jannink, J.L., and McCouch, S. (2016). Genome-wide prediction models that incorporate de novo GWAS are a powerful new tool for tropical rice improvement. Heredity 116:395–408. https://doi.org/10. 1038/hdy.2015.113. Saveljeva, S., Sewell, G.W., Ramshorn, K., Cader, M.Z., West, J.A., Clare, S., Haag, L.M., de Almeida Rodrigues, R.P., Unger, L.W., Iglesias-Romero, A.B., et al. (2022). A purine metabolic checkpoint that prevents autoimmunity and autoinflammation. Cell Metabol. 34:106–124.e110. https://doi.org/10.1016/j.cmet.2021.12.009. Technow, F., Schrag, T.A., Schipprack, W., Bauer, E., Simianer, H., and Melchinger, A.E. (2014). Genome properties and prospects of genomic prediction of hybrid performance in a breeding program of maize. Genetics 197:1343–1355. https://doi.org/10.1534/genetics. 114.165860. Tibshirani, R. (1997). The lasso method for variable selection in the Cox model. Stat. Med. 16:385–395. https://doi.org/10.1002/(sici)1097- 0258(19970228). Tu, J., Zhang, G., Datta, K., Xu, C., He, Y., Zhang, Q., Khush, G.S., and Datta, S.K. (2000). Field performance of transgenic elite commercial hybrid rice expressing Bacillus thuringiensis d-endotoxin. Nat. Biotechnol. 18:1101–1104. https://doi.org/10.1038/80310. VanRaden, P.M. (2008). Efficient methods to compute genomic predictions. J. Dairy Sci. 91:4414–4423. https://doi.org/10.3168/jds. 2007-0980. Wang, H., Tang, X., Yang, X., Fan, Y., Xu, Y., Li, P., Xu, C., and Yang, Z. (2021a). Exploiting natural variation in crown root traits via genome- wide association studies in maize. BMC Plant Biol. 21:346. https:// doi.org/10.1186/s12870-021-03127-x. Wang, L., and Michoel, T. (2017). Controlling false discoveries in Bayesian gene networks with lasso regression p-values. Preprint at arXiv. https://arxiv.org/abs/1701.07011. Wang, S., Xu, Y., Qu, H., Cui, Y., Li, R., Chater, J.M., Yu, L., Zhou, R., Ma, R., Huang, Y., et al. (2021b). Boosting predictabilities of agronomic traits in rice using bivariate genomic selection. Briefings Bioinf. 22:bbaa103. https://doi.org/10.1093/bib/bbaa103. Washburn, J.D., Burch, M.B., and Franco, J.A.V. (2020). Predictive breeding for maize: Making use of molecular phenotypes, machine learning, and physiological crop models. Crop Sci. 60:622–638. https://doi.org/10.1002/csc2.20052. Wei, J., Wang, A., Li, R., Qu, H., and Jia, Z. (2018). Metabolome-wide association studies for agronomic traits of rice. Heredity 120:342–355. https://doi.org/10.1038/s41437-017-0032-3. Wen, W., Li, D., Li, X., Gao, Y., Li, W., Li, H., Liu, J., Liu, H., Chen, W., Luo, J., et al. (2014). Metabolome-based genome-wide association study of maize kernel leads to novel biochemical insights. Nat. Commun. 5:3438. https://doi.org/10.1038/ncomms4438. Westhues, M., Schrag, T.A., Heuer, C., Thaller, G., Utz, H.F., Schipprack, W., Thiemann, A., Seifert, F., Ehret, A., Schlereth, A., et al. (2017). Omics-based hybrid prediction in maize. Theor. Appl. Genet. 130:1927–1939. https://doi.org/10.1007/s00122-017-2934-0. Worley, B., and Powers, R. (2013). Multivariate analysis inmetabolomics. Curr. Metabolomics 1:92–107. https://doi.org/10.2174/2213235X1 1301010092. Wu, P.-Y., Stich, B., Weisweiler, M., Shrestha, A., Erban, A., Westhoff, P., and Inghelandt, D.V. (2022). Improvement of prediction ability by integrating multi-omic datasets in barley. BMC Genom. 23:200. https://doi.org/10.1186/s12864-022-08337-7. Xu, S., Zhu, D., and Zhang, Q. (2014). Predicting hybrid performance in rice using genomic best linear unbiased prediction. Proc. Natl. Acad. Sci. USA 111:12456–12461. https://doi.org/10.1073/pnas. 1413750111. Xu, S., Xu, Y., Gong, L., and Zhang, Q. (2016). Metabolomic prediction of yield in hybrid rice. Plant J. 88:219–227. https://doi.org/10.1111/tpj. 13242. Plant Communications 6, 101199, March 10 2025 13 https://doi.org/10.1128/AEM.03572-14 https://doi.org/10.1128/AEM.03572-14 https://doi.org/10.1093/bioinformatics/btab675 https://doi.org/10.1007/s004250000272 https://doi.org/10.1093/genetics/157.4.1819 https://doi.org/10.1145/3377930.3389817 https://doi.org/10.1145/3377930.3389817 https://doi.org/10.1093/biomet/58.3.545 https://doi.org/10.1111/pce.12682 https://doi.org/10.1111/pce.12682 https://doi.org/10.1016/j.molp.2024.04.005 https://doi.org/10.1016/j.molp.2024.04.005 https://doi.org/10.1038/ng.1033 https://doi.org/10.1534/genetics.117.300374 https://doi.org/10.1038/s42255-021-00480-x https://doi.org/10.1186/s13007-023-01035-9 https://doi.org/10.1093/hr/uhad216 https://doi.org/10.1111/tpj.14727 https://doi.org/10.1111/tpj.14727 https://doi.org/10.1038/hdy.2015.113 https://doi.org/10.1038/hdy.2015.113 https://doi.org/10.1016/j.cmet.2021.12.009 https://doi.org/10.1534/genetics.114.165860 https://doi.org/10.1534/genetics.114.165860 https://doi.org/10.1002/(sici)1097-0258(19970228) https://doi.org/10.1002/(sici)1097-0258(19970228) https://doi.org/10.1038/80310 https://doi.org/10.3168/jds.2007-0980 https://doi.org/10.3168/jds.2007-0980 https://doi.org/10.1186/s12870-021-03127-x https://doi.org/10.1186/s12870-021-03127-x https://arxiv.org/abs/1701.07011 https://doi.org/10.1093/bib/bbaa103 https://doi.org/10.1002/csc2.20052 https://doi.org/10.1038/s41437-017-0032-3 https://doi.org/10.1038/ncomms4438 https://doi.org/10.1007/s00122-017-2934-0 https://doi.org/10.2174/2213235X11301010092 https://doi.org/10.2174/2213235X11301010092 https://doi.org/10.1186/s12864-022-08337-7 https://doi.org/10.1073/pnas.1413750111 https://doi.org/10.1073/pnas.1413750111 https://doi.org/10.1111/tpj.13242 https://doi.org/10.1111/tpj.13242 Plant Communications Metabolic marker-assisted genomic prediction Xu, Y., Xu, C., and Xu, S. (2017). Prediction and association mapping of agronomic traits in maize using multiple omic data. Heredity 119:174–184. https://doi.org/10.1038/hdy.2017.27. Xu, Y., Ma, Y., Wang, X., Li, C., Zhang, X., Li, P., Yang, Z., and Xu, C. (2021b). Kernel metabolites depict the diversity of relationship between maize hybrids and their parental lines. Crops J. 9:181–191. https://doi.org/10.1016/j.cj.2020.05.009. Xu, Y., Zhao, Y.,Wang, X.,Ma, Y., Li, P., Yang, Z., Zhang, X., Xu, C., and Xu, S. (2021c). Incorporation of parental phenotypic data into multi- omic models improves prediction of yield-related traits in hybrid rice. Plant Biotechnol. J. 19:261–272. https://doi.org/10.1111/pbi.13458. Xu, Y., Zhang, X., Li, H., Zheng, H., Zhang, J., Olsen, M.S., Varshney, R.K., Prasanna, B.M., and Qian, Q. (2022). Smart breeding driven by big data, artificial intelligence, and integrated genomic-enviromic prediction. Mol. Plant 15:1664–1695. https://doi.org/10.1016/j.molp. 2022.09.001. Xu, Y., Liu, X., Fu, J., Wang, H., Wang, J., Huang, C., Prasanna, B.M., Olsen, M.S., Wang, G., and Zhang, A. (2020). Enhancing genetic gain through genomic selection: from livestock to plants. Plant Commun. 1:100005. https://doi.org/10.1016/j.xplc.2019.100005. Xu, Y., Ma, K., Zhao, Y., Wang, X., Zhou, K., Yu, G., Li, C., Li, P., Yang, Z., Xu, C., et al. (2021a). Genomic selection: A breakthrough technology in rice breeding. Crops J. 9:669–677. https://doi.org/10. 1016/j.cj.2021.03.008. Yan, J., Xu, Y., Cheng, Q., Jiang, S., Wang, Q., Xiao, Y., Ma, C., Yan, J., and Wang, X. (2021). LightGBM: accelerated genomically designed crop breeding through ensemble learning. Genome Biol. 22:271. https://doi.org/10.1186/s13059-021-02492-y. Yang, W., Guo, T., Luo, J., Zhang, R., Zhao, J., Warburton, M.L., Xiao, Y., and Yan, J. (2022). Target-oriented prioritization: targeted selection strategy by integrating organismal and molecular traits through 14 Plant Communications 6, 101199, March 10 2025 predictive analytics in breeding. Genome Biol. 23:80. https://doi.org/ 10.1186/s13059-022-02650-w. Ye, S., Li, J., and Zhang, Z. (2020). Multi-omics-data-assisted genomic feature markers preselection improves the accuracy of genomic prediction. J. Anim. Sci. Biotechnol. 11:109. https://doi.org/10.1186/ s40104-020-00515-5. Yin, B., Jia, J., Sun, X., Hu, X., Ao, M., Liu, W., Tian, Z., Liu, H., Li, D., Tian, W., et al. (2024). Dynamic metabolite QTL analyses provide novel biochemical insights into kernel development and nutritional quality improvement in common wheat. Plant Commun. 5:100792. https://doi.org/10.1016/j.xplc.2024.100792. Yin, L., Zhang, H., Tang, Z., Yin, D., Fu, Y., Yuan, X., Li, X., Liu, X., and Zhao, S. (2023). HIBLUP: an integration of statistical models on the BLUP framework for efficient genetic evaluation using big genomic data. Nucleic Acids Res. 51:3501–3512. https://doi.org/10.1093/nar/ gkad074. Yu, P., Ye, C., Li, L., Yin, H., Zhao, J., Wang, Y., Zhang, Z., Li, W., Long, Y., Hu, X., et al. (2022). Genome-wide association study and genomic prediction for yield and grain quality traits of hybrid rice. Mol. Breed. 42:16. https://doi.org/10.1007/s11032-022-01289-6. Zhang, Y., Zhang,M., Ye, J., Xu, Q., Feng, Y., Xu, S., Hu, D.,Wei, X., Hu, P., and Yang, Y. (2023). Integrating genome-wide association study into genomic selection for the prediction of agronomic traits in rice (Oryza sativa L.). Mol. Breed. 43:81. https://doi.org/10.1007/s11032- 023-01423-y. Zhang, Z., Ober, U., Erbe, M., Zhang, H., Gao, N., He, J., Li, J., and Simianer, H. (2014). Improving the accuracy of whole genome prediction for complex traits using the results of genome wide association studies. PLoS One 9:e93017. https://doi.org/10.1371/ journal.pone.0093017. Zhao, Y., Mette, M.F., and Reif, J.C. (2015). Genomic selection in hybrid breeding. Plant Breed. 134:1–10. https://doi.org/10.1111/pbr.12231. https://doi.org/10.1038/hdy.2017.27 https://doi.org/10.1016/j.cj.2020.05.009 https://doi.org/10.1111/pbi.13458 https://doi.org/10.1016/j.molp.2022.09.001 https://doi.org/10.1016/j.molp.2022.09.001 https://doi.org/10.1016/j.xplc.2019.100005 https://doi.org/10.1016/j.cj.2021.03.008 https://doi.org/10.1016/j.cj.2021.03.008 https://doi.org/10.1186/s13059-021-02492-y https://doi.org/10.1186/s13059-022-02650-w https://doi.org/10.1186/s13059-022-02650-w https://doi.org/10.1186/s40104-020-00515-5 https://doi.org/10.1186/s40104-020-00515-5 https://doi.org/10.1016/j.xplc.2024.100792 https://doi.org/10.1093/nar/gkad074 https://doi.org/10.1093/nar/gkad074 https://doi.org/10.1007/s11032-022-01289-6 https://doi.org/10.1007/s11032-023-01423-y https://doi.org/10.1007/s11032-023-01423-y https://doi.org/10.1371/journal.pone.0093017 https://doi.org/10.1371/journal.pone.0093017 https://doi.org/10.1111/pbr.12231 Metabolic marker-assisted genomic prediction improves hybrid breeding Introduction Results Metabolite profiling of seedling leaves in maize inbred lines Identification of metabolic markers that influence agronomic traits in maize Evaluation of MM_GP for hybrid prediction in maize Integration of shared significant metabolic markers in MM_GP Evaluation of MM_GP for hybrid prediction in rice Predicting untested crosses using MM_GP Discussion Methods Maize materials Metabolite analysis by LC–MS Statistical analysis of metabolomic data Rice dataset The MM_GP model GBLUP for MM_GP XGBoost for MM_GP Assessing the predictive abilities of prediction models Funding Acknowledgments Author contributions Supplemental information References