Genomic prediction powered by
multi-omics data

Osval A. Montesinos-López  1, Abelardo Montesinos-López2*,
Brandon Alejandro Mosqueda-González3, Iván Delgado-Enciso4,
Moises Chavira-Flores5, José Crossa6, Susanne Dreisigacker7,
Jin Sun8 and Rodomiro Ortiz9*
1Facultad de Telemática, Universidad de Colima, Colima, Mexico, 2Centro Universitario de Ciencias
Exactas e Ingenierías (CUCEI), Universidad deGuadalajara, Guadalajara, Jalisco, Mexico, 3Institut National
des Sciences Appliquées de Lyon, Villeurbanne, France, 4School of Medicine, University of Colima,
Colima, Mexico, 5Instituto de Investigaciones en Matemáticas Aplicadas y Sistemas (IIMAS), Universidad
Nacional Autónoma de México (UNAM), Mexico City, Mexico, 6Colegio de Postgraduados (COLPOS),
Montecillos, Mexico, 7International Maize and Wheat Improvement Center (CIMMYT), Mexico,
8Department of Statistics, School of Science, Yanshan University, Qinhuangdao, China, 9Department of
Plant Breeding at SLU, Swedish University of Agricultural Sciences, Uppsala, Sweden

Genomic selection (GS) has transformed plant breeding by enabling early and
accurate prediction of complex traits. However, its predictive performance is
often constrained by the limited information captured through genomic markers
alone, especially for traits influenced by intricate biological pathways. To address
this, the integration of complementary omics layers—such as transcriptomics and
metabolomics—has emerged as a promising strategy to enhance prediction
accuracy by providing a more comprehensive view of the molecular
mechanisms underlying phenotypic variation. We used three datasets, each
collected under a single-environment condition, which allowed us to isolate
the effects of omics integration without the confounding influence of genotype-
by-environment interaction. We assessed 24 integration strategies combining
three omics layers: genomics, transcriptomics, and metabolomics. These
strategies encompassed both early data fusion (concatenation) and model-
based integration techniques capable of capturing non-additive, nonlinear,
and hierarchical interactions across omics layers. The evaluation was
conducted using three real-world datasets from maize and rice, which varied
in population size, trait complexity, and omics dimensionality. Our results indicate
that specific integration methods—particularly those leveraging model-based
fusion—consistently improve predictive accuracy over genomic-only models,
especially for complex traits. Conversely, several commonly used concatenation
approaches did not yield consistent benefits and, in some cases,
underperformed. These findings underscore the importance of selecting
appropriate integration strategies and suggest that more sophisticated
modeling frameworks are necessary to fully exploit the potential of multi-
omics data. Overall, this work highlights both the value and limitations of
multi-omics integration for genomic prediction and offers practical insights
into the design of omics-informed selection strategies for accelerating
genetic gain in plant breeding programs.

KEYWORDS

genomic selection, omics data, optimal integration, plant breeding, prediction accuracy

OPEN ACCESS

EDITED BY

Kazutoshi Okuno,
University of Tsukuba, Japan

REVIEWED BY

Moyses Nascimento,
Universidade Federal de Viçosa, Brazil
Asif Bashir Shikari,
Sher-e-Kashmir University of Agricultural
Sciences and Technology of Kashmir, India

*CORRESPONDENCE

Abelardo Montesinos-López,
aml_uach2004@hotmail.com

Rodomiro Ortiz,
rodomiro.ortiz@slu.se

RECEIVED 27 May 2025
ACCEPTED 31 July 2025
PUBLISHED 17 September 2025

CITATION

Montesinos-López OA, Montesinos-López A,
Mosqueda-González BA, Delgado-Enciso I,
Chavira-Flores M, Crossa J, Dreisigacker S,
Sun J and Ortiz R (2025) Genomic prediction
powered by multi-omics data.
Front. Genet. 16:1636438.
doi: 10.3389/fgene.2025.1636438

COPYRIGHT

© 2025 Montesinos-López, Montesinos-López,
Mosqueda-González, Delgado-Enciso,
Chavira-Flores, Crossa, Dreisigacker, Sun and
Ortiz. This is an open-access article distributed
under the terms of the Creative Commons
Attribution License (CC BY). The use,
distribution or reproduction in other forums is
permitted, provided the original author(s) and
the copyright owner(s) are credited and that the
original publication in this journal is cited, in
accordance with accepted academic practice.
No use, distribution or reproduction is
permitted which does not comply with these
terms.

Frontiers in Genetics frontiersin.org01

TYPE Original Research
PUBLISHED 17 September 2025
DOI 10.3389/fgene.2025.1636438

https://www.frontiersin.org/articles/10.3389/fgene.2025.1636438/full
https://www.frontiersin.org/articles/10.3389/fgene.2025.1636438/full
https://orcid.org/0000-0002-3973-6547
https://crossmark.crossref.org/dialog/?doi=10.3389/fgene.2025.1636438&domain=pdf&date_stamp=2025-09-17
mailto:aml_uach2004@hotmail.com
mailto:aml_uach2004@hotmail.com
mailto:rodomiro.ortiz@slu.se
mailto:rodomiro.ortiz@slu.se
https://doi.org/10.3389/fgene.2025.1636438
https://creativecommons.org/licenses/by/4.0/
https://creativecommons.org/licenses/by/4.0/
https://www.frontiersin.org/journals/genetics
https://www.frontiersin.org
https://www.frontiersin.org/journals/genetics
https://www.frontiersin.org/journals/genetics#editorial-board
https://www.frontiersin.org/journals/genetics#editorial-board
https://doi.org/10.3389/fgene.2025.1636438


Introduction

Genomic selection (GS) has revolutionized the field of plant
breeding by enabling the selection of superior genotypes based on
genomic estimated breeding values (GEBVs) derived from dense
molecular marker information. Initially proposed by Meuwissen
et al. (2001), GS bypasses the need for direct phenotypic selection,
allowing for early and more efficient selection decisions, thereby
shortening the breeding cycle and enhancing genetic gain. This
methodology represents a fundamental shift in the breeder’s
toolbox, moving from phenotype-based to genotype-driven
decision-making, and has been successfully implemented in
numerous crop breeding programs worldwide (Crossa et al.,
2017; Desta and Ortiz, 2014).

Despite its transformative potential, the implementation of GS
in real-world breeding programs faces several challenges. One key
limitation is the variability in prediction accuracy across different
environments and breeding populations. Factors such as genotype-
by-environment (G × E) interactions, limited training population
sizes, and the genetic architecture of traits of interest can
significantly hinder the robustness of genomic predictions
(Roorkiwal et al., 2016). Additionally, the cost and logistics of
genotyping and phenotyping large populations remain substantial
obstacles, particularly in resource-limited settings. These challenges
necessitate the development of novel strategies to optimize the
accuracy and applicability of GS.

In response to these challenges, a growing body of research has
focused on improving the prediction accuracy of GS models.
Strategies such as optimizing training population design (Rincent
et al., 2012), incorporating G × E interactions (Jarquín et al., 2014),
and applying advanced statistical learning techniques (Montesinos-
López et al., 2018) have shown promising results. However, even
with these improvements, the integration of additional layers of
biological data offers a compelling avenue for further enhancing
model performance.

The integration of multi-omics data—including genomics,
transcriptomics, metabolomics, and proteomics—has emerged as
a powerful strategy to enhance the performance of genomic
prediction (GP) models in plant and animal breeding. These
diverse yet complementary datasets provide a multidimensional
view of the complex biological systems that govern phenotypic
expression, enabling a more precise dissection of the genotype-
to-phenotype relationship. Unlike single-layer genomic data, which
often capture only a portion of the heritable variance, multi-omics
approaches can account for regulatory, transcriptional, post-
transcriptional, and metabolic interactions that influence trait
architecture. For example, transcriptomic data capture gene
expression levels across tissues or developmental stages, shedding
light on functional genes and regulatory networks underlying
complex traits (Guo et al., 2016; Azodi et al., 2020). Similarly,
metabolomic profiles offer dynamic snapshots of cellular
biochemical processes, which are often directly associated with
phenotypic traits such as growth, stress response, or yield
(Riedelsheimer et al., 2012; Wen et al., 2014). Proteomics data,
although less frequently used due to technical constraints, provide
critical insights into post-translational modifications and protein
abundance, which are closely tied to phenotypic outcomes (Misra
et al., 2019). The synergistic integration of these omics layers can

substantially improve the explanatory power of prediction models,
particularly for complex traits governed by multiple small-effect loci
and their downstream interactions (Wang M. et al., 2024).
Furthermore, machine learning and statistical modeling
techniques have increasingly enabled the effective fusion of high-
dimensional omics data into genomic selection pipelines, resulting
in significant gains in predictive accuracy (Montesinos-López et al.,
2022). As such, multi-omics integration not only enriches the
biological relevance of genomic predictions but also facilitates
more informed decision-making in breeding programs aimed at
improving crop resilience, productivity, and nutritional quality.

Several studies have demonstrated the utility of integrating
multi-omics data into GS models. For example, Azodi et al.
(2020) showed that combining gene expression data with
genomic information improved the prediction of complex traits
in maize. Similarly, Riedelsheimer et al. (2012) found that metabolite
profiles significantly contributed to the prediction of biomass traits
in maize hybrids.

Wang et al. (2024b) developed an extensive multi-omics atlas for
wheat, integrating transcriptomic, proteomic, phosphoproteomic,
and acetylproteomic data across various tissues and developmental
stages. This integrative approach enhanced the understanding of
complex traits, including disease resistance and grain quality,
highlighting the potential of multi-omics data to improve the
predictive accuracy of genomic selection models.

Despite its potential, the statistical integration of heterogeneous
omics datasets presents significant challenges. These arise from
inherent differences in data dimensionality, measurement scales,
noise levels, and patterns of missingness across various omics
platforms. Additionally, capturing the intricate—and often
nonlinear—interactions both within and between omics layers,
and their combined influence on complex phenotypes, requires
highly sophisticated analytical frameworks. Traditional linear
models commonly employed in GS may lack the flexibility to
adequately model these multidimensional relationships.
Consequently, there is an increasing need to adopt advanced
machine learning approaches, including deep learning
architectures, kernel-based methods, and Bayesian hierarchical
models, which offer greater adaptability and capacity to uncover
hidden structures in complex biological data (Montesinos-López
et al., 2019; Montesinos-López et al., 2022).

Another critical aspect is the model tuning process. Although
machine learning approaches are often highly competitive
compared to traditional methods in predictive accuracy, they are
frequently associated with complex and computationally intensive
tuning procedures. This complexity can limit their practical
applicability, especially in high-dimensional omics contexts. As
such, the development of methodologies that strike a balance
between predictive performance and user-friendly tuning remains
a pressing research priority. Furthermore, the standardization of
data preprocessing pipelines and the assurance of data quality across
omics layers are essential for enhancing the reliability and
reproducibility of integrative analyses.

Several recent studies have explored multi-omics integration
using deep learning to predict phenotypic traits in crops and model
species (Angermueller et al., 2016; Zingaretti et al., 2020;
Montesinos-López et al., 2021). Although promising, most
approaches have been limited by dataset size, environmental

Frontiers in Genetics frontiersin.org02

Montesinos-López et al. 10.3389/fgene.2025.1636438

https://www.frontiersin.org/journals/genetics
https://www.frontiersin.org
https://doi.org/10.3389/fgene.2025.1636438


heterogeneity, or lack of benchmarking across model types. Our
study addresses these limitations by evaluating predictive
performance across three distinct datasets using standardized
cross-validation procedures and multiple deep learning
architectures.

In this research, we aim to address these challenges by
integrating genomic, transcriptomic, and metabolomic data to
explore alternative modeling approaches for improving GS
methodology. The availability of these datasets enables the
application of statistical and machine learning methods to predict
complex traits by integrating genomic (G), transcriptomic (T), and
metabolomic (M) data. By leveraging the complementary
information provided by each omics layer, our goal is to enhance
the prediction of complex agronomic traits in plant breeding. This
integrative framework not only holds promise for improving model
accuracy but also offers a deeper understanding of the biological
mechanisms driving trait variation. We first evaluate how omics-
based similarity among these datasets relates to trait variation.

Our study contributes to the growing body of evidence
supporting multi-omics integration in GS and aims to identify
modeling strategies that effectively harness the rich biological
information embedded in diverse omics datasets. We explore
conventional statistical learning methodologies capable of
addressing the unique challenges of multi-omics integration, with
an emphasis on practical implementation in plant breeding
programs. Ultimately, this research aims to provide breeders with
more accurate and biologically informed tools to accelerate genetic
improvement.

Materials and methods

Datasets

We used three datasets previously presented by Yang et al.
(2022) for benchmarking the proposed predictors. These datasets
were collected under a single-environment condition and contain
various continuous traits along with metabolomic and
transcriptomic data. Table 1 summarizes the characteristics of
three multi-omics datasets used in this research. The
Maize282 dataset includes 279 lines evaluated for 22 phenotypic
traits, with high-density genotypic data comprising 50,878 markers,
along with 18,635 metabolomic and 17,479 transcriptomic features.
TheMaize368 dataset consists of 368 lines assessed for 20 traits, with
a larger genotypic matrix of 100,000 markers, complemented by
748 metabolomic and 28,769 transcriptomic variables. The
Rice210 dataset comprises 210 lines evaluated for four traits, with
comparatively fewer genotypic markers (1,619) and metabolomic

features (1,000), but a similarly large transcriptomic profile of
24,994 features.

These datasets illustrate the diversity in sample size, trait
complexity, and omics layer dimensionality across species,
highlighting the analytical challenges in integrative modeling.
More details about these datasets can be found at the following
link: https://doi.org/10.6084/m9.figshare.19312205.v1.

Statistical models

Model 1 (M1)
The basic Bayesian genomic best linear unbiased predictor

(GBLUP) model incorporates only genomic main effects using
the genomic relationship matrix (G) and is defined as follows:

Y � 1μ + P + . (1)

Here, Y represents the vector of the continuous response
variable observed of order n × 1. 1 denotes a vector of ones of
order n × 1. μ stands for the general mean or intercept. P �
(P1, . . . , Pn)T denotes a predictor that contains at least one
random effect associated with the vector of genotypes. In the
case of model M1 the predictor (P) contains only the genomic
information. Additionally,  denotes the vector random error
components for vector of genotypes, where each error is
independently and normally distributed with a mean of 0 and a
variance of σ2 � Ve. All predictors implemented are provided
in Table 2.

Predictors

All predictors evaluated (Equation 1) varied depending on the
source of omics data used. For this reason, the predictors (given in
Equation 1) comprise different combinations of markers (G),
metabolomic (M) and transcriptomic (T) data, each evaluated
with two different kernel functions: linear, denoted with L
subscript, and Gaussian, denoted with G subscript. The
integration of omics datasets was performed using early data
fusion: after normalization and mean imputation, all block
features from the genomic, transcriptomic, and metabolomic
layers and combinations of these feature layers were modeled as
separate random effects under a mixed-model framework. This
modeling framework has the advantage of correctly partitioning
variance among different sources (genomic, transcriptomic,
metabolomic, etc.) and leads to more accurate and unbiased
estimates of fixed effects. It also improves the prediction accuracy
of random components, such as breeding values in plant and animal

TABLE 1 Summary of the three datasets used in the study.

Dataset Lines Traits Markers Metabolomics Transcriptomics

Maize282 279 22 50,878 18,635 17,479

Maize368 368 20 100,000 748 28,769

Rice210 210 4 1,619 1,000 24,994

Frontiers in Genetics frontiersin.org03

Montesinos-López et al. 10.3389/fgene.2025.1636438

https://doi.org/10.6084/m9.figshare.19312205.v1
https://www.frontiersin.org/journals/genetics
https://www.frontiersin.org
https://doi.org/10.3389/fgene.2025.1636438


genetics. Although this approach facilitates unified modeling, it
introduces limitations such as computational intensity, noise
accumulation, and increased risk of overfitting due to the high
number of predictors in the input datasets.

Model 1 (M1)
For example, when P � gL, the model denoted as M1 represents

a random effect with genomic information in terms of a linear
kernel. It is assumed that gL ~ (g1, . . . , gn)T ~ NJ(0, σ2gKg), where
σ2g is the variance component for lines using genomic information
and Kg is a linear kernel referred to as the genomic relationship
matrix, calculated using the method outlined by VanRaden (2008).

Models 2 and 3 (M2 and M3)
In a similar fashion, when P � tL, which is called model 2 (M2),

this predictor represents a random effect of lines with the
transcriptomic information in terms of a linear kernel. That is,
tL ~ (t1, . . . , tn)T ~ Nn(0, σ2tK t), where σ2t is the variance

component for lines using transcriptomic information and K t is
a linear transcriptomic relationship matrix. Furthermore, when
P � mL, the model is called model M3; this predictor represents
a random effect of lines with the metabolomic information in terms
of a linear kernel. That is, mL ~ (m1, . . . , mn)T ~ Nn(0, σ2mKm),
where σ2m is the variance component for lines using metabolomic
information and Km is a linear metabolomic relationship matrix.
Thus, M3 integrates transcriptomic data using kernel K t.

Models 4 and 5 (M4 and M5)
Model 4 (M4) is a predictor that includes the three previous

random effects, that is, P � gL + tL +mL. Model M5 contains the
predictor P � gL + tL +mL + (gLtL)CC + (gLtL)PP. That is, it is
equivalent to model M4 plus two additional terms. The first term
in this predictor is distributed as (gLtL)CC ~ Nn(0, σ2gtKCC

gt ), where
σ2gt is the variance component for lines using transcriptomic and
genomic information and KCC

gt is computed using the upper
triangular (UT) part of the matrix resulting from the

TABLE 2 Evaluated models. Here, each model corresponds to a different predictor.

Model Predictor (P)

M1 gL

M2 tL

M3 mL

M4 gL + tL +mL

M5 gL + tL +mL + (gLtL)CC + (gLtL)PP

M6 gL + tL +mL + (gLmL)CC + (gLmL)PP

M7 gL + tL +mL + (tLmL)CC + (tLmL)PP

M8 gL + tL +mL + (gLtL)CC + (gLtL)PP + (gLmL)CC + (gLmL)PP

M9 gL + tL +mL + (gLtL)CC + (gLtL)PP + (tLmL)CC + (tLmL)PP

M10 gL + tL +mL + (gLmL)CC + (gLmL)PP + (tLmL)CC + (tLmL)PP

M11 gL + tL +mL + (gLtL)CC + (gLtL)PP + (gLmL)CC + (gLmL)PP + (tLmL)CC + (tLmL)PP

M12 gG

M13 tG

M14 mG

M15 gG + tG +mG

M16 gG + tG +mG + (gGtG)CC + (gGtG)PP

M17 gG + tG +mG + (gGmG)CC + (gGmG)PP

M18 gG + tG +mG + (tGmG)CC + (tGmG)PP

M19 gG + tG +mG + (gGtG)CC + (gGtG)PP + (gGmG)CC + (gGmG)PP

M20 gG + tG +mG + (gGtG)CC + (gGtG)PP + (tGmG)CC + (tGmG)PP

M21 gG + tG +mG + (gGmG)CC + (gGmG)PP + (tGmG)CC + (tGmG)PP

M22 gG + tG +mG + (gGtG)CC + (gGtG)PP + (gGmG)CC + (gGmG)PP + (tGmG)CC + (tGmG)PP

M23 gL + tL +mL + gG + tG +mG

M24 gL + tL +mL + gG + tG +mG + (gGtG)CC + (gGtG)PP + (gGmG)CC + (gGmG)PP + (tGmG)CC + (tGmG)PP

Frontiers in Genetics frontiersin.org04

Montesinos-López et al. 10.3389/fgene.2025.1636438

https://www.frontiersin.org/journals/genetics
https://www.frontiersin.org
https://doi.org/10.3389/fgene.2025.1636438


multiplication of the linear kernels Kg and K t, that is, KCC
gt � UT + t

(UT) (Cuevas, et al., 2025). On the other hand, the second term in
this predictor is distributed as (gLtL)PP ~ Nn(0, σ2gtpKPP

gt ), where
σ2gtp is the variance component for lines using transcriptomic and
genomic information and KPP

gt is computed using the lower
triangular (LT) part of the same matrix multiplication of the
linear kernels Kg and K t, that is, KPP

gt � LT + t (LT) (Cuevas,
et al., 2025).

In this case, the KCC
gt and KPP

gt kernels capture complex
interactions between genomic and transcriptomic information
that are not accounted for by the traditional interaction term
based on the Hadamard product of the Kg and K t kernels.
Moreover, these new kernels are valid because they satisfy the
three essential conditions for a kernel: (1) symmetry, (2) positive
semi-definiteness (PSD), and (3) Mercer’s condition for
continuous kernels.

Models 5–7 (M5–M7)
In the predictor of model M6, the first three terms are equal to

those of model M5, but the remaining two are different; these last
two terms are distributed as (gLmL)CC ~ Nn(0, σ2gmKCC

gm) and
(gLmL)PP ~ Nn(0, σ2gmpK

PP
gm), where σ2gm and σ2gmp are variance

components for lines using genomic and metabolomic
information and KCC

gm and KPP
gm were computed in the same way

as in model M5, but here the UT and LT parts are taken from the
multiplication of the linear kernels Kg and Km (Cuevas, et al., 2025).
Furthermore, model M7 is equivalent to model M5, except that the
last two terms differ. In M7, these terms are distributed as
(tLmL)CC ~ Nn(0, σ2tmKCC

tm ) and (tLmL)PP ~ Nn(0, σ2tmpK
PP
tm ),

where σ2tm and σ2tmp are variance components for lines using
transcriptomic and metabolomic information and KCC

tm and KPP
tm

were computed in the same way as in model M5, but here the UT
and LT parts are taken from the multiplication of the linear kernels
K t and Km (Cuevas, et al., 2025).

Note that the matrices K t and Km were computed as Gaussian
kernel matrices from the transcriptomic and metabolomic data,
respectively, using Euclidean distance and a default bandwidth
parameter of σ = median (d2), where d2 denotes squared distances
among individuals. As previously mentioned, to capture nonlinear
interactions between transcriptomic and metabolomics effects, we
decomposed the matrix multiplication of K t and Km into two
components: the UT matrix and the LT matrix. These components
capture complex interactions that are not accounted for by the
conventional interaction term based on the Hadamard product of
the K t and Km kernels. This approach aims to reflect distinct
transcriptional and metabolite mechanisms across the distribution
and improve the biological relevance of interaction modeling.

Models 8–14 (M8–M14)
The predictor of M8 is equivalent to that of M5 plus the last two

terms of M6, while the predictor of model M9 is equivalent to that of
M5 plus the last two terms of M7. The predictor of M10 is equivalent
to that of M6 plus the last two terms of M7, while the predictor of
M11 is equivalent to that of M10 plus the last two terms of model
M5. On the other hand, model (M12) is equivalent to model M1,
except that a Gaussian Kernel is used in place of a linear kernel. In
this case, P � gG and gG ~ (g1, . . . , gn)T ~ NJ(0, σ2GgKGg), where
KGg is a Gaussian kernel computed using marker information.

Similarly, model (M13) is equivalent to model M2 but replaces
the linear kernel with a Gaussian Kernel. In this case, P � tG and
tG ~ (t1, . . . , tn)T ~ Nn(0, σ2GtKGt), where KGt is a Gaussian kernel
computed using transcriptomic information. Similarly, model
(M14) is equivalent to model M3 but uses a Gaussian Kernel
mG ~ (m1, . . . , mn)T ~ Nn(0, σ2GmKGm), where KGm is a Gaussian
kernel computed using metabolomic information.

Models 15–24 (M15 and M24)
Model M15 is equivalent to model M4 but replaces the linear

kernels with the corresponding Gaussian kernels. Model M16 is
equivalent to model M5 but replaces the linear kernels with the
corresponding Gaussian kernels. Model M17 is equivalent to model
M6 but replaces the linear kernels with the corresponding Gaussian
kernels. Model M18 is equivalent to model M7 but replaces the linear
kernels with the corresponding Gaussian kernels. In a similar fashion,
models M19, M20, M21, and M22 are equivalent to models M8, M9,
M10 and M11, respectively, but replace the linear kernels with the
corresponding Gaussian kernels. On the other hand, model
M23 combines the terms of models M4 and M15. Finally, model
M24 is equivalent to model M22 plus the terms of model M15. More
details of these 24 models are provided in Table 2. In addition, we use
the CC and PP superscripts to refer to those terms in each model that
contain hybrid kernels derived from the product of two linear or
Gaussian kernels. All these models were implemented in R statistical
software (R Core Team, 2025) utilizing the BGLR library (Pérez and de
los Campos, 2014).

Model’s relationships

The 24 evaluated models represent a structured hierarchy of
increasing complexity, ranging from single-omics baselines to fully
integrated multi-omics predictors. Models M1 to M3 and M12 to
M14 serve as baselines, each incorporating a single omics
layer—genomics (g), transcriptomics (t), or metabolomics (m)—at
either the low (subscript _L) or global level (subscript _G) (Figure 1).
Building upon these,modelsM4 andM15 combine all three omics layers
additively within their respective low or global levels, while
M23 incorporates additive terms from both levels without interaction
effects. A second tier of models (M5 to M11 and M16 to M22)
introduces pairwise interaction terms to capture potential nonlinear
and hierarchical relationships among omics layers. These interactions are
modeled using both upper and lower triangular matrices resulting of
matrix multiplication of two original kernels. Models M5 toM11 extend
M4 by adding one to three pairwise interactions at the low level, while
M16 to M22 do the same for M15 at the global level. The most
comprehensive model, M24, integrates all low- and global-level
additive terms with all global-level interaction terms, representing a
fully fused multi-omics framework. This systematic design allows for a
nuanced assessment of how each omics source and its interactions
contribute to improving genomic prediction accuracy.

Heritability estimates

For each model provided in Table 2, its corresponding variance
components were computed using the full datasets. Then, for each

Frontiers in Genetics frontiersin.org05

Montesinos-López et al. 10.3389/fgene.2025.1636438

https://www.frontiersin.org/journals/genetics
https://www.frontiersin.org
https://doi.org/10.3389/fgene.2025.1636438


model, the variance component of error (Ve) and the genetic
variance component (VP) were computed. For the computation of
VP, all the variance components contained in each model were
summed. For example, for model M4, the variance component VP
was computed as VP � σ2g + σ2t + σ2m, while for model M5, VP was
computed as VP � σ2g + σ2t + σ2m + σ2gt + σ2gtp , and in a similar
fashion, the variance component VP was computed for the
remaining models. Then, the heritability of each model was
computed as h2 � VP

VP+Ve.

Cross-validation and evaluation metrics

To evaluate and compare the predictive performance of different
models, we used the random cross-validation method, where a
percentage of samples is randomly selected as a training set and
the remaining lines are used as the testing set. In our experiments, we
use 50% of the samples as a training set and the remaining 50% as a
testing set over 20 partitions. We used a 50/50 split for cross-
validation to ensure a balanced and robust estimation of accuracy
under a conservative evaluation scenario.

To evaluate prediction performance, we compute the
normalized root mean squared error (NRMSE) and the average
Pearson correlation (APC) metrics in each of the 20 partitions using
testing sets. The average of the NRMSE and APC of these
20 partitions across the traits is reported as prediction accuracy
for each dataset.

While both metrics (APC and NRMSE) reflect prediction
performance, NRMSE measures the absolute relative prediction
error; on the other hand, APC (average Pearson correlation)

evaluates the consistency of predictions with observed values
regardless of scale, making them complementary. To assess the
convergence of the posterior distribution, we used trace plots
and verified that the potential scale reduction factor
(Gelman–Rubin statistic) was <1.1 across all parameters.

Results

We present the results in four sections. The first section reports
the results for the Rice210 dataset. The second section provides the
results for the Maize282 dataset, the third section provides the
results for the Maize368 dataset, and the fourth section summarizes
the results across datasets.

Rice210

In Table 3, models M19 and M22 exhibited the highest
prediction accuracy in the rice dataset and a high heritability
value (0.863), followed closely by model M21 (0.862) and model
M20 (0.861). In contrast, model M1 (0.666) showed the lowest
heritability. Consequently, models M19 and M22 outperformed
M21, M20, and M1 in terms of heritability by 0.116%, 0.232%,
and 29.57%, respectively.

In Figures 2A,B, we present the results for the Rice210 dataset in
terms of APC and NRMSE, respectively. In Figure 2A, we can
observe that model M4 exhibited the highest APC value (0.7324 ±
0.008), followed closely by model M23 (0.7306 ± 0.0086) and model
M24 (0.7281± 0.0090). In contrast, model M13 (0.5434± 0.0125)

FIGURE 1
Hierarchical structure of multi-omics prediction models, displaying low- and global-level additive models and interaction models, along with
combined and full integrated models.

Frontiers in Genetics frontiersin.org06

Montesinos-López et al. 10.3389/fgene.2025.1636438

https://www.frontiersin.org/journals/genetics
https://www.frontiersin.org
https://doi.org/10.3389/fgene.2025.1636438


showed the lowest APC. Consequently, model M4 outperformed
M23, M24, and M13 in terms of APC by 0.246%, 0.591%, and
34.781%, respectively.

Model M12 = G (genomics) using genomic-only data
produced an APC of 0.5956 ± 0.0125, while model M23 = G
(genomics) + T (transcriptomics) + M (metabolomics), which
integrates all three omics layers, achieved an APC of (0.7324±
0.008). This represents a 22.96% improvement in prediction
accuracy due to omics integration.

However, in Figure 2B, we can observe that model
M4 exhibited the lowest NRMSE value (0.1003 ± 0.0016),
followed closely by model M23 (0.1021 ± 0.0016) and model
M5 (0.1022 ± 0.0017). In contrast, model M13 (0.1247 ± 0.0017)
showed the largest NRMSE. Consequently, model
M4 outperformed M23, M5, and M13 in terms of NRMSE by
1.795%, 1.894%, and 24.327%, respectively.

Maize282

As shown in Table 4, model M17 attained the highest heritability
estimate (0.807), with model M19 (0.804) and models M21 andM22
(both 0.803) yielding slightly lower values. In contrast, model M2
(0.345) recorded the lowest heritability. Accordingly, model
M17 surpassed M19, M21, M22, and M2 in heritability by
0.373%, 0.498%, 0.498%, and 133.913%, respectively.

Figures 3A,B display the outcomes for the Maize282 dataset
based on APC and NRMSE metrics, respectively. As illustrated in
Figure 3A, model M1 achieved the highest APC score (0.5078 ±
0.0101), with model M12 (0.5043 ± 0.0101) and model M24
(0.4854 ± 0.0108) ranking next. In contrast, model
M13 registered the lowest APC value (0.267 ± 0.0150). As a
result, M1 exceeded the APC performance of M12, M24, and
M13 by 0.694%, 4.615%, and 90.187%, respectively.

TABLE 3 Estimates of posterior mean of variance (Vi) and heritability (h2) from the Rice210 dataset across traits.

Model VP_mean VP_sd Ve_mean Ve_sd h2_mean h2_sd

M1 77.062 144.134 38.111 67.687 0.666 0.157

M2 95.784 175.781 18.857 33.515 0.809 0.053

M3 70.170 129.160 22.433 40.478 0.727 0.077

M4 71.497 131.506 15.801 28.059 0.793 0.055

M5 69.712 128.361 16.469 29.456 0.787 0.053

M6 70.676 129.974 16.622 29.675 0.788 0.048

M7 67.688 124.616 17.036 30.379 0.781 0.062

M8 70.488 129.982 17.112 30.677 0.781 0.049

M9 68.211 125.988 17.418 31.139 0.777 0.063

M10 68.372 126.246 17.435 31.118 0.776 0.060

M11 68.027 125.776 17.675 31.607 0.772 0.059

M12 109.774 202.336 25.023 44.225 0.806 0.063

M13 111.953 204.362 20.790 37.559 0.833 0.025

M14 93.087 170.306 20.608 37.657 0.813 0.036

M15 92.597 169.404 16.565 30.055 0.847 0.017

M16 103.214 188.912 17.103 30.995 0.857 0.020

M17 102.799 187.758 17.070 30.930 0.857 0.017

M18 102.493 187.731 17.034 30.897 0.856 0.019

M19 109.274 199.995 17.170 31.097 0.863 0.020

M20 106.963 195.456 17.282 31.335 0.861 0.018

M21 107.491 196.457 17.111 30.996 0.862 0.018

M22 110.724 202.461 17.593 31.875 0.863 0.019

M23 78.856 144.249 15.238 27.357 0.828 0.032

M24 89.822 164.262 15.928 28.503 0.838 0.037

Bold values denotes the worst and best estimates of heritability.

Frontiers in Genetics frontiersin.org07

Montesinos-López et al. 10.3389/fgene.2025.1636438

https://www.frontiersin.org/journals/genetics
https://www.frontiersin.org
https://doi.org/10.3389/fgene.2025.1636438


Meanwhile, Figure 3B shows that model M1 obtained the lowest
NRMSE (0.1710 ± 0.0019), indicating superior predictive accuracy.
It was followed by model M12 (0.1730 ± 0.0019) and model M6
(0.1732 ± 0.0019). Conversely, model M13 had the highest NRMSE
(0.1934 ± 0.0020). Therefore, M1 outperformed M12, M6, and
M13 in terms of NRMSE by 1.169%, 1.287%, and 13.99%,
respectively.

Maize368

Table 5 indicates that models M16 andM19 achieved the highest
heritability value (0.829). Models M20 and M22 followed closely
with estimates of 0.826, while M15 and M17 (both 0.823) ranked
third with slightly lower values. In contrast, model M3 exhibited the
lowest heritability (0.290). Thus, M16 and M19 exceeded the
heritability of M20, M22, M15, M17, and M3 by 0.363%, 0.363%,
0.729%, 0.729%, and 197.132%, respectively.

Figures 4A,B summarize the results for the Maize368 dataset
with respect to APC and NRMSE, respectively. In Figure 4A, model
M1 recorded the highest APC value (0.4832 ± 0.0100), followed by
models M15 (0.4732 ± 0.0086) and M12 (0.473 ± 0.0098). On the

other hand, model M3 showed the lowest APC (0.2678 ± 0.0105).
Accordingly, M1 surpassed the APC values of M15, M12, andM3 by
2.113%, 2.156%, and 80.433%, respectively.

In Figure 4B, model M1 reported the minimum NRMSE
(0.1041 ± 0.0100), reflecting the most accurate predictions. This
was closely followed by model M5 (0.1044 ± 0.0011) and models
M6 and M8, both registering an NRMSE of 0.1046 ± 0.0011. In
contrast, model M3 exhibited the highest NRMSE (0.1146 ± 0.0010).
Consequently, M1 outperformed M5, M6, M8, and M3 in NRMSE
by 0.288%, 0.480%, 0.480%, and 10.087%, respectively.

Across dataset

According to Table 6, across all traits and datasets, model
M19 exhibited the highest heritability value (0.832). This was
closely followed by model M22 (0.831), while models M16, M17,
M20, andM21 (each with 0.829) showedmarginally lower estimates.
Conversely, model M3 demonstrated the lowest heritability (0.487).
Consequently, M19 exceeded the heritability of M22, M16, M17,
M20, M21, andM3 by 0.160%, 0.402%, 0.402%, 0.402%, 0.402%, and
70.725%, respectively. Overall, across all datasets, we observed that

FIGURE 2
Prediction performance of each model in terms of APC (A) and NRMSE (B) for the Rice210 dataset. Error bars represent standard deviations across
20-fold cross-validation.

Frontiers in Genetics frontiersin.org08

Montesinos-López et al. 10.3389/fgene.2025.1636438

https://www.frontiersin.org/journals/genetics
https://www.frontiersin.org
https://doi.org/10.3389/fgene.2025.1636438


the most complex models—specifically those excluding the simpler
predictors used in models M1 to M3 and M12 to M14—consistently
reported higher heritability estimates. This suggests that integrating
the three sources of information (genomic, metabolic, and
transcriptomic data) enhances the ability to capture meaningful
biological signals rather than noise.

Figures 5A,B present a comparative analysis across datasets based
on APC and NRMSE, respectively. As illustrated in Figure 5A, model
M4 achieved the top APC value (0.5634 ± 0.0094), closely followed by
models M23 (0.5614 ± 0.0094) andM24 (0.5586 ± 0.0097). In contrast,
model M13 registered the lowest APC score (0.3999 ± 0.0125). Thus,
M4 outperformed M23, M24, and M13 in APC by margins of 0.356%,
0.859%, and 40.885%, respectively.

In terms of NRMSE (Figure 5B), model M4 yielded the lowest
value (0.1263 ± 0.0015), indicating the highest level of predictive
precision. Model M6 (0.1265 ± 0.0016) and model M5 (0.1269 ±
0.0016) followed in performance. Conversely, model M13 showed
the greatest NRMSE (0.1426 ± 0.0016). Accordingly, M4 exhibited

superior performance over M6, M5, and M13 by 0.158%, 0.475%,
and 12.91%, respectively.

Summary of results

The study assessed 24 statistical models integrating genomics,
transcriptomics, and metabolomics across three
datasets—Rice210, Maize282, and Maize368—to determine
their effectiveness in enhancing genomic prediction accuracy.

Our results support the idea that integrating multi-omics data
into genomic prediction models has the potential to improve
predictive accuracy by leveraging complementary biological
information. However, it is hindered by significant differences in
dimensionality, measurement scales, noise levels, and missing data
patterns. Addressing these issues through advanced preprocessing,
normalization, regularization, and modeling strategies is essential
for the effective and unbiased integration of multi-omics datasets.

TABLE 4 Estimates of posterior mean of variance components (Vi) and heritability (h2) from the Maize282 dataset across traits.

Model Vg_mean Vg_sd Ve_mean Ve_sd h2_mean h2_sd

M1 382.202 1173.784 172.264 464.831 0.659 0.108

M2 294.031 979.809 413.711 1227.466 0.345 0.060

M3 345.915 1145.114 329.699 937.095 0.456 0.123

M4 440.775 1361.122 152.962 408.773 0.710 0.081

M5 435.008 1349.434 167.915 446.892 0.694 0.090

M6 426.439 1300.493 175.455 478.378 0.685 0.088

M7 443.654 1370.434 157.978 421.567 0.706 0.089

M8 431.281 1331.394 181.369 489.226 0.674 0.091

M9 433.742 1347.032 172.508 458.596 0.688 0.097

M10 431.039 1335.374 177.673 477.820 0.677 0.097

M11 429.684 1342.524 184.911 489.197 0.669 0.099

M12 624.124 1936.033 159.934 441.158 0.769 0.077

M13 414.492 1377.521 440.686 1338.788 0.400 0.073

M14 495.751 1668.184 349.142 1006.293 0.520 0.136

M15 685.346 2125.390 147.142 407.136 0.799 0.056

M16 707.529 2192.624 147.820 407.395 0.800 0.059

M17 724.953 2246.408 145.278 399.245 0.807 0.056

M18 701.319 2175.069 147.014 408.095 0.801 0.058

M19 731.771 2271.649 147.658 405.264 0.804 0.060

M20 721.242 2249.651 148.925 409.034 0.799 0.068

M21 735.898 2292.289 147.627 407.976 0.803 0.062

M22 735.365 2277.802 148.204 410.638 0.803 0.063

M23 529.615 1616.155 146.115 396.228 0.757 0.063

M24 556.607 1708.231 148.529 397.585 0.758 0.067

Bold values denotes the worst and best estimates of heritability.

Frontiers in Genetics frontiersin.org09

Montesinos-López et al. 10.3389/fgene.2025.1636438

https://www.frontiersin.org/journals/genetics
https://www.frontiersin.org
https://doi.org/10.3389/fgene.2025.1636438


Heritability estimates
Models that integrated all three omics layers (e.g., M19 and

M22) generally showed the highest heritability across datasets.
Rice210: M19 and M22 had the highest heritability (0.863).
Maize282: M17 was highest (0.807). Maize368: M16 and
M19 reached 0.829. The lowest heritability was often found in
single-omics models, particularly those using only
metabolomics (e.g., M3).

Prediction accuracy
For the Rice210 dataset, the best-performing (Figures 6, 7)

was M4 (additive integration of gL + tL + mL), which achieved
the highest Pearson correlation (APC: 0.7324) and the lowest
NRMSE (0.1003).

For the Maize282 (Figures 6, 7), the best model was M1 (only
genomic info, linear kernel), which surprisingly outperformed
multi-omics models in prediction accuracy (APC: 0.5078,
NRMSE: 0.1710). For the Maize368 (Figures 6, 7), the best model
was again M1, achieving the highest prediction accuracy (APC:
0.4832, NRMSE: 0.1041). In terms of cross-dataset trends, while
multi-omics models improved heritability across all datasets,
improvements in prediction accuracy were dataset-specific. Model

M4, which used only additive effects from the three omics layers (no
interactions), consistently performed well across datasets. Gaussian
kernel models (e.g., M13) generally underperformed compared to
linear kernel-based models.

Implications
Integrating multi-omics data has shown promise in improving

heritability estimation, particularly for complex traits. However,
increases in heritability do not always translate into improved
prediction accuracy, underscoring the importance of careful
model selection and data preprocessing. Although
sophisticated models incorporating interaction terms (e.g.,
M19 and M22) effectively capture biological variance, they do
not consistently enhance predictive performance. This
discrepancy suggests the possibility of heritability
overestimation or model overfitting. Therefore, although our
framework for multi-omics integration is promising, further
research is necessary to refine the approach and achieve
optimal predictive utility.

Integrating genomics, transcriptomics, and metabolomics is a
promising strategy for enriching genomic prediction models,
particularly in understanding trait architecture.

FIGURE 3
Prediction performance of each model in terms of APC (A) and NRMSE (B) for the Maize282 dataset. Error bars represent standard deviations across
20-fold cross-validation.

Frontiers in Genetics frontiersin.org10

Montesinos-López et al. 10.3389/fgene.2025.1636438

https://www.frontiersin.org/journals/genetics
https://www.frontiersin.org
https://doi.org/10.3389/fgene.2025.1636438


Nevertheless, the choice of integration strategy, data type, and
model complexity critically affects performance. The study supports
the continued development of multi-omics frameworks, balancing
complexity with predictive utility.

Discussion

Despite the promise of GS in accelerating genetic gain in plant
breeding programs, its widespread adoption in real-world
applications remains constrained by several challenges. Key
among these is the variability in prediction accuracy across traits,
environments, and populations, often limiting the robustness and
generalizability of the models (Crossa et al., 2017; Hickey et al.,
2017). Moreover, the complex genetic architecture of many
agronomic traits—often governed by numerous small-effect loci
and subject to genotype-by-environment interactions—poses a
significant barrier to achieving consistent and high predictive
performance. Additionally, limitations in training population size,

data quality, and the inability to fully capture the underlying biology
of traits further impede the effective application of GS in breeding
pipelines (Xu et al., 2017).

Our results align with recent developments in omics-based
genomic prediction. Several studies have shown that combining
genomics with transcriptomic or metabolomic data improves trait
prediction, particularly when trait heritability is moderate or low.
Compared with traditional models like GBLUP or kernel methods,
deep learning architectures—especially those utilizing multi-layer
feedforward or convolutional layers—can better capture nonlinear
patterns inherent in omics data. The consistent superiority of MLP-
based models in our results aligns with the findings of Zingaretti
et al. (2020) and Montesinos-López et al. (2021).”

In response to these limitations, a variety of methodological
strategies have been explored to improve the performance and
reliability of GS. Among the most promising is the integration of
multi-omics data, including genomics, transcriptomics, and
metabolomics, which offer complementary layers of biological
information. These integrative approaches aim to enhance the

TABLE 5 Estimates of posterior mean of variance components (Vi) and heritability (h2) from the Maize368 dataset across traits.

Model Vg_mean Vg_sd Ve_mean Ve_sd h2_mean h2_sd

M1 25.115 67.569 8.272 22.646 0.705 0.077

M2 33.408 93.164 10.490 27.391 0.690 0.094

M3 10.145 27.096 29.906 82.120 0.279 0.064

M4 27.451 75.048 7.664 20.231 0.739 0.069

M5 26.877 73.222 8.760 23.650 0.719 0.068

M6 27.001 73.651 7.998 21.042 0.722 0.072

M7 26.673 72.784 7.835 20.838 0.731 0.067

M8 26.233 71.617 9.163 24.584 0.704 0.076

M9 26.211 71.367 8.768 23.623 0.712 0.071

M10 26.469 72.127 8.170 21.786 0.719 0.071

M11 25.913 70.622 8.926 23.626 0.700 0.078

M12 39.334 106.403 7.500 20.228 0.811 0.047

M13 42.967 119.407 9.641 25.235 0.765 0.073

M14 16.888 44.869 27.743 76.357 0.395 0.097

M15 40.514 110.253 7.263 19.444 0.823 0.040

M16 45.076 123.978 7.273 19.243 0.829 0.043

M17 41.274 112.465 7.385 19.601 0.823 0.046

M18 40.740 110.509 7.479 20.008 0.821 0.042

M19 44.416 120.970 7.369 19.688 0.829 0.045

M20 43.728 119.316 7.502 20.043 0.826 0.049

M21 41.589 112.550 7.414 19.834 0.821 0.047

M22 43.660 118.309 7.521 19.939 0.826 0.046

M23 31.901 86.917 7.331 19.406 0.782 0.048

M24 33.442 91.384 7.731 20.507 0.779 0.059

Bold values denotes the worst and best estimates of heritability.

Frontiers in Genetics frontiersin.org11

Montesinos-López et al. 10.3389/fgene.2025.1636438

https://www.frontiersin.org/journals/genetics
https://www.frontiersin.org
https://doi.org/10.3389/fgene.2025.1636438


predictability and interpretability of GS models by capturing
intermediate phenotypes and regulatory mechanisms that
mediate the genotype-to-phenotype relationship (Fernie and
Schauer, 2009; Acharjee et al., 2016). For example, transcriptomic
data can provide insights into gene expression patterns linked to
trait variability, while metabolomic profiles may reflect physiological
states more directly related to phenotype expression. Such
integrative models have shown potential to improve the
biological relevance of predictions and increase their accuracy
across different contexts (Azodi et al., 2020).

The integration of omics data into genomic prediction
frameworks is thus emerging as a critical frontier for enhancing
predictive accuracy and model interpretability. By leveraging
synergistic information from genomics and other omics layers,
researchers can better account for the biological complexity of
traits, ultimately leading to more precise selection decisions. This
approach aligns with systems biology paradigms, which emphasize
the interconnected nature of biological data and advocate for a holistic
perspective in predictive modeling (Sandhu et al., 2022). Moreover,

integrating omics data enables breeders to gain mechanistic insights
into trait architecture, supporting both prediction and discovery and
offering a dual benefit to crop improvement programs.

Interestingly, the performance gain from integrating omics varied
by dataset. In rice, where metabolomic and transcriptomic signals are
rich and traits show complex regulation, omics integration yielded
substantial gains. In contrast, in the chickpea dataset, where
transcriptomic features may have less variance relative to genomic
information, the improvement was more modest. These differences
illustrate the context dependency ofmulti-omicsmodeling, highlighting
that data type relevance varies by species and trait architecture.”

Using only genomic information (model M1) led to relatively
lower prediction accuracy compared to integrated models
(M19–M22), especially in datasets with high-dimensional
transcriptomics. This highlights the added value of capturing
gene expression patterns or metabolite activity, which can serve
as proximal indicators of phenotypic variance. Our study reinforces
the view that multi-omics layers provide complementary biological
information not captured by markers alone.

FIGURE 4
Prediction performance of each model in terms of APC (A) and NRMSE (B) for the Maize368 dataset. Error bars represent standard deviations across
20-fold cross-validation.

Frontiers in Genetics frontiersin.org12

Montesinos-López et al. 10.3389/fgene.2025.1636438

https://www.frontiersin.org/journals/genetics
https://www.frontiersin.org
https://doi.org/10.3389/fgene.2025.1636438


Our empirical results underscore the value of an integrative
approach to genomic prediction. By incorporating transcriptomic
and metabolomic data along with genomic information, we
observed a notable increase in heritability across all three
datasets. However, a significant improvement in prediction
accuracy was observed in only one dataset. This gain in accuracy
is not merely incremental—it offers compelling empirical evidence
for the critical importance of multi-omics integration within
genomic prediction frameworks. These findings highlight the
advantages of expanding the data foundation of GS models
beyond genomics alone and support the systematic inclusion of
all available omics data when feasible. As plant breeding moves
further into a data-driven era, leveraging the full spectrum of
biological information is no longer optional but essential for
achieving the next generation of genetic gain.

Integrating genomic, transcriptomic, and metabolomic data did
not, however, enhance prediction accuracy in two of the datasets,
thus indicating that multi-omics integration is not universally
advantageous. This result highlights key statistical challenges

inherent to combining heterogeneous data types. Omics datasets
often differ in scale, dimensionality, noise levels, and correlation
structures, making integration complex. These differences can
complicate model training, particularly in high-dimensional
settings where the number of features vastly exceeds the number
of observations, increasing the risk of overfitting and reducing
generalizability. Additionally, the contribution of each omics
layer may vary across samples, with some data sources offering
useful signals in certain cases while adding noise in others. Such
variability underscores the need for careful preprocessing, feature
selection, and model design to fully harness the potential of multi-
omics approaches.

Furthermore, the presence of multicollinearity within and across
omics layers can inflate variance estimates and obscure the
identification of truly informative predictors. Another critical
issue is the potential for redundant or weakly informative signals
in additional omics layers, which may dilute the predictive power
when not appropriately weighted or regularized. The statistical
challenge of determining which features contribute meaningfully

TABLE 6 Estimates of posterior mean of (Vi) and heritability (h2) across traits and datasets.

Model Vg_mean Vg_sd Ve_mean Ve_sd h2_mean h2_sd

M1 161.460 461.829 72.882 185.055 0.677 0.114

M2 141.074 416.251 147.686 429.457 0.615 0.069

M3 142.077 433.790 127.346 353.231 0.487 0.088

M4 179.908 522.559 58.809 152.354 0.747 0.068

M5 177.199 517.006 64.381 166.666 0.733 0.070

M6 174.705 501.373 66.692 176.365 0.732 0.069

M7 179.338 522.611 60.950 157.595 0.739 0.073

M8 176.001 510.998 69.215 181.496 0.720 0.072

M9 176.055 514.796 66.231 171.119 0.726 0.077

M10 175.293 511.249 67.759 176.908 0.724 0.076

M11 174.541 512.974 70.504 181.477 0.714 0.079

M12 257.744 748.257 64.152 168.537 0.795 0.062

M13 189.804 567.097 157.039 467.194 0.666 0.057

M14 201.909 627.786 132.498 373.436 0.576 0.090

M15 272.819 801.682 56.990 152.212 0.823 0.038

M16 285.273 835.171 57.399 152.544 0.829 0.041

M17 289.675 848.877 56.578 149.925 0.829 0.040

M18 281.517 824.436 57.176 153.000 0.826 0.040

M19 295.154 864.205 57.399 152.016 0.832 0.042

M20 290.644 854.808 57.903 153.471 0.829 0.045

M21 294.993 867.099 57.384 152.935 0.829 0.042

M22 296.583 866.191 57.773 154.151 0.831 0.043

M23 213.457 615.774 56.228 147.664 0.789 0.048

M24 226.624 654.626 57.396 148.865 0.792 0.054

Bold values denotes the worst and best estimates of heritability.

Frontiers in Genetics frontiersin.org13

Montesinos-López et al. 10.3389/fgene.2025.1636438

https://www.frontiersin.org/journals/genetics
https://www.frontiersin.org
https://doi.org/10.3389/fgene.2025.1636438


to the prediction—and how to combine them effectively—requires
sophisticated modeling strategies such as dimension reduction,
penalization, or multi-view learning approaches.

These findings underscore the need for rigorous and statistical
frameworks that can manage the complexity and high
dimensionality of multi-omics data while also capturing their
complementary information. Addressing these challenges is
essential to fully harness the promise of integrative omics in
genomic prediction and avoid misleading conclusions based on
suboptimal integration methods.

In general, despite the demonstrated efficiency of the integration
of multi-omics data in prediction, there is still considerable room for
improvement in learning from these different datasets. For this
reason, the integration of genomics, transcriptomics, and
metabolomics represents a transformative strategy to overcome
current limitations in genomic prediction. Our findings highlight
the empirical and theoretical advantages of this approach and
provide strong justification for future breeding programs to
adopt multi-omics data integration as a core component of
predictive breeding methodologies.

Additional considerations

Although the current study provides a comprehensive
comparison of multi-omics integration strategies for genomic
prediction, several limitations merit attention. First, the lack of
post hoc biological interpretation of model outputs limits our
understanding of why certain omics layers contributed more
effectively in specific datasets. For example, the consistent
outperformance of simple genomic models in the maize datasets
suggests that not all omics layers contributed meaningful biological
signals, yet this observation remains unexplored. Second, although
the final remarks highlight the importance of model interpretability
tools such as SHAP values (Shapley additive explanations help us
understand how complex models make their predictions), these
were not implemented in the current study. Incorporating such tools
in future analyses could enhance biological insights and practical
usability by breeders. Finally, all datasets used in this study are
derived from single-environment trials, which restricts the
generalizability of findings to real-world breeding conditions that
are inherently multi-environmental and subject to genotype-by-

FIGURE 5
Prediction performance of each model in terms of APC (A) and NRMSE (B) for across datasets. Error bars represent standard deviations across 20-
fold cross-validation.

Frontiers in Genetics frontiersin.org14

Montesinos-López et al. 10.3389/fgene.2025.1636438

https://www.frontiersin.org/journals/genetics
https://www.frontiersin.org
https://doi.org/10.3389/fgene.2025.1636438


environment interactions. As a result, future work should prioritize
model validation under diverse environmental scenarios and
consider integrating G × E effects into multi-omics prediction

frameworks. Addressing these limitations will be essential for the
successful translation of multi-omics prediction models into
practical breeding applications.

FIGURE 6
APC values of selected models across Rice210, Maize282, Maize368, and overall datasets. Models M4 and M24 showed balanced performance,
while M1 excelled in maize but not rice. M13 (transcriptomics with the Gaussian kernel) had the lowest APC across all datasets. Error bars represent
standard deviations across 20-fold cross-validation.

FIGURE 7
Heatmap of APC values across all models (M1–M24) and datasets (Rice210, Maize282, Maize368, and overall).

Frontiers in Genetics frontiersin.org15

Montesinos-López et al. 10.3389/fgene.2025.1636438

https://www.frontiersin.org/journals/genetics
https://www.frontiersin.org
https://doi.org/10.3389/fgene.2025.1636438


Final remarks

Although the results are promising, some limitations must be
acknowledged. First, all datasets were collected under a single-
environment condition, limiting our ability to evaluate genotype-
by-environment interactions. Second, the early fusion strategy may
not fully exploit hierarchical interactions between omics types.
Future work should explore advanced integration methods (e.g.,
deep learning methods with attention mechanisms or graph neural
networks) and test model robustness across environments and larger
populations. Additionally, evaluating trait-specific model
performance would provide more refined insights.

Given the increasing availability of large-scale public omics
databases and computational resources, there is a strong
incentive to build flexible, modular platforms that can be
customized by crop, trait, and available data type. Open-source,
community-driven initiatives are likely to accelerate the adoption of
multi-omics genomic prediction in diverse agricultural contexts.

Additionally, the importance of explainable models cannot be
overstated. As multi-omics models grow in complexity,
understanding which features or interactions are driving predictions
becomes essential for both interpretability and acceptance by breeders.
Model explainability tools such as SHAP values or saliency maps in
deep learning can aid in identifying the biological significance of
predictors (Lundberg and Lee, 2017). Applying such methods to the
current modeling framework may offer breeders mechanistic insights
that support not just selection but hypothesis generation and discovery.
Moreover, the integration of proteomic and epigenomic data remains
an untapped opportunity not only in the context of genomic prediction
(Wang et al., 2024b) but also in other fields like human medicine (Lin
et al., 2025) and animal science (Wang et al., 2024a). While this study
focused on genomics, transcriptomics, and metabolomics, recent
advances in high-throughput proteomics and epigenetic profiling
(e.g., DNA methylation and histone modifications) have shown
potential to capture trait-associated regulatory variation that is not
accessible through other omics layers (Langfelder and Horvath, 2017;
Chen et al., 2021). Including these additional layers may further
enhance both the biological insights and predictive performance of
multi-omics models.

Another aspect not extensively discussed in the current manuscript
is the role of tissue specificity and developmental timing in
transcriptomic and metabolomic data acquisition. Omics layers are
inherently dynamic; transcript abundance and metabolite levels change
throughout development and in response to environmental stimuli. As
such, integrating omics data collected at a single time point or from a
limited tissue type might obscure important regulatory mechanisms
relevant to trait expression (Rai et al., 2021; Do et al., 2020). Future
research should consider the temporal and spatial aspects of omics data
collection to improve predictive resolution.

Conclusion

The integration of genomics, transcriptomics, and metabolomics
into genomic prediction models has shown promise in enhancing
predictive accuracy, although improvements have been observed
only in certain datasets compared to models relying solely on
genomic data. This variability underscores both the potential and

the complexity of incorporating multi-omics information into
prediction frameworks. Although empirical evidence highlights the
added value of multi-omics integration, it also reveals significant
challenges in achieving effective data fusion and optimal model
performance. Nonetheless, the incorporation of multi-omics data
holds great potential to enhance predictive power and enable more
informed decision-making in plant breeding programs. These findings
emphasize the transformative capacity of holistic, data-driven strategies
in modern breeding efforts. We strongly advocate for continued
research aimed at developing and refining integrative multi-omics
frameworks. Their effective implementation could substantially
improve the identification of superior candidate lines and accelerate
genetic gain in genomic selection programs.

Data availability statement

Publicly available datasets were analyzed in this study. These data
can be found at: https://doi.org/10.6084/m9.figshare.19312205.v1.

Author contributions

OM-L: Formal Analysis, Visualization, Writing – original draft, Data
curation, Conceptualization, Investigation, Validation, Writing – review
and editing, Methodology. AM-L: Visualization, Validation, Formal
Analysis, Conceptualization, Investigation, Writing – review and
editing, Supervision, Writing – original draft. BM-G: Validation,
Writing – review and editing, Data curation, Conceptualization. ID-E:
Validation, Writing – review and editing, Writing – original draft. MC-F:
Validation, Writing – review and editing, Writing – original draft. JC:
Validation, Writing – review and editing, Writing – original draft. SD:
Resources,Writing – review and editing, Investigation, Conceptualization,
Funding acquisition. JS: Writing – review and editing, Funding
acquisition. RO: Resources, Writing – review and editing.

Funding

The author(s) declare that financial support was received for the
research and/or publication of this article. The authors are thankful to
the Hebei Natural Science Foundation under grant no. C2022203005.

Conflict of interest

The authors declare that the research was conducted in the
absence of any commercial or financial relationships that could be
construed as a potential conflict of interest.

The author(s) declared that they were an editorial board
member of Frontiers, at the time of submission. This had no
impact on the peer review process and the final decision.

Generative AI statement

The author(s) declare that no Generative AI was used in the
creation of this manuscript.

Frontiers in Genetics frontiersin.org16

Montesinos-López et al. 10.3389/fgene.2025.1636438

https://doi.org/10.6084/m9.figshare.19312205.v1
https://www.frontiersin.org/journals/genetics
https://www.frontiersin.org
https://doi.org/10.3389/fgene.2025.1636438


Publisher’s note

All claims expressed in this article are solely those of the authors
and do not necessarily represent those of their affiliated

organizations, or those of the publisher, the editors and the
reviewers. Any product that may be evaluated in this article, or
claim that may be made by its manufacturer, is not guaranteed or
endorsed by the publisher.

References

Acharjee, A., Kloosterman, B., Visser, R. G., and Maliepaard, C. (2016). Integration of
multi-omics data for prediction of phenotypic traits using random forest. BMC
Bioinforma. 17, 180–373. doi:10.1186/s12859-016-1043-4

Angermueller, C., Pärnamaa, T., Parts, L., and Stegle, O. (2016). Deep learning for
computational biology. Mol. Syst. Biol. 12 (7), 878. doi:10.15252/msb.20156651

Azodi, C. B., Tang, J., and Shiu, S. H. (2020). Opening the Black box: interpretable machine
learning for geneticists. Trends Genet. 36 (6), 442–455. doi:10.1016/j.tig.2020.03.005

Chen, M., Penfold, C. A., Buchanan-Wollaston, V., and Doonan, J. H. (2021).
Dissecting plant complex traits using omics and data integration approaches. Curr.
Opin. Plant Biol. 60, 102047.

Crossa, J., Pérez-Rodríguez, P., Cuevas, J., Montesinos-López, O., Jarquín, D., De Los
Campos, G., et al. (2017). Genomic selection in plant breeding: methods, models, and
perspectives. Trends Plant Sci. 22 (11), 961–975. doi:10.1016/j.tplants.2017.08.011

Cuevas, J., Crossa, J., Montesinos-López, A., Martini, J. W., Gerard, G. S., Ortegón, J., et al.
(2025). Enhancing wheat genomic prediction by a hybrid kernel approach. Front. Plant Sci.
16, 1605202. doi:10.3389/fpls.2025.1605202

Desta, Z. A., and Ortiz, R. (2014). Genomic selection: genome-wide prediction in
plant improvement. Trends Plant Sci. 19 (9), 592–601. doi:10.1016/j.tplants.2014.05.006

Do, P. T., Dreher, K., DiLeo, M. V., and Schmelz, E. A. (2020). Integrated multi-omics
approaches provide insights into maize defense metabolome and transcriptome
regulation. Plant Physiol. 182 (1), 243–258.

Fernie, A. R., and Schauer, N. (2009). Metabolomics-assisted breeding: a viable option
for crop improvement? Trends Genet. 25 (1), 39–48. doi:10.1016/j.tig.2008.10.010

González-Camacho, J. M., Ornella, L., Pérez-Rodríguez, P., Gianola, D., Dreisigacker,
S., and Crossa, J. (2018). Appl. Mach. Learn. methods genomic Sel. Breed. wheat rust
Resist. Plant Genome 11, 1–15. doi:10.3835/plantgenome2017.11.0104

Guo, Z., Magwire, M. M., Basten, C. J., Xu, Z., andWang, D. (2016). Evaluation of the
utility of gene expression and metabolic information for genomic prediction in maize.
Theor. Appl. Genet. 129, 2413–2427. doi:10.1007/s00122-016-2780-5

Hickey, J. M., Chiurugwi, T., Mackay, I., Powell, W., and Implementing Genomic
Selection in CGIAR Breeding Programs Workshop Participants (2017). Genomic
prediction unifies animal and plant breeding programs to form platforms for
biological discovery. Nat. Genet. 49 (9), 1297–1303. doi:10.1038/ng.3920

Jarquín, D., Crossa, J., Lacaze, X., Du Cheyron, P., Daucourt, J., Lorgeou, J., et al. (2014). A
reaction norm model for genomic selection using high-dimensional genomic and
environmental data. Theor. Appl. Genet. 127, 595–607. doi:10.1007/s00122-013-2243-1

Langfelder, P., and Horvath, S. (2017). Integrated genomics approaches for system-
level understanding of complex traits. Nat. Rev. Genet. 18 (10), 585–598.

Lin, M., Guo, J., Gu, Z., Tang, W., Tao, H., You, S., et al. (2025). Machine learning and
multi-omics integration: advancing cardiovascular translational research and clinical
practice. J. Transl. Med. 23 (1), 388. doi:10.1186/s12967-025-06425-2

Lundberg, S. M., and Lee, S.-I. (2017). A unified approach to interpreting model
predictions. Adv. Neural Inf. Process. Syst. 30.

Meuwissen, T. H., Hayes, B. J., and Goddard, M. (2001). Prediction of total genetic
value using genome-wide dense marker maps. Genetics 157 (4), 1819–1829. doi:10.
1093/genetics/157.4.1819

Misra, B. B., Langefeld, C., Olivier,M., andCox, L.A. (2019). Integrated omics: tools, advances
and future approaches. J. Mol. Endocrinol. 62 (1), R21–R45. doi:10.1530/JME-18-0055

Montesinos López, O. A., Montesinos López, A., and Crossa, J. (2022). Multivariate
statistical machine learning methods for genomic prediction. Cham, Switzerland:
Springer Nature, 691.

Montesinos-López, A., Montesinos-López, O. A., Gianola, D., Crossa, J., and
Hernández-Suárez, C. M. (2018). Multi-environment genomic prediction of plant
traits using deep learners with dense architecture. G3 Genes, Genomes, Genet. 8
(12), 3813–3828. doi:10.1534/g3.118.200740

Montesinos-López, O. A., Martín-Vallejo, J., Crossa, J., Gianola, D., Hernández-
Suárez, C. M., Montesinos-López, A., et al. (2019). New deep learning genomic-based
prediction model for multiple traits with binary, ordinal, and continuous phenotypes.
G3 Genes|Genomes|Genetics 9 (5), 1545–1556. doi:10.1534/g3.119.300585

Montesinos-López, O. A., Montesinos-López, A., Crossa, J., Gianola, D., Hernández-
Suárez, C. M., and Dreisigacker, S. (2021). The complete genome and comparative
analysis of the phage phiC120 infecting multidrug-resistant Escherichia coli and
Salmonella strains. G3 Genes|Genomes|Genetics 11 (3), jkab014. doi:10.1093/
g3journal/jkab014

Pérez, P., and de los Campos, G. (2014). BGLR: a statistical package for whole genome
regression and prediction. Genetics 198 (2), 483–495. doi:10.1534/genetics.114.164442

R Core Team (2025). R: a Language and environment for statistical computing.
Vienna, Austria: R Foundation for Statistical Computing.

Rai, A., Saito, K., and Yamazaki, M. (2021). Integrated omics analysis of specialized
metabolism in plants. Curr. Opin. Plant Biol. 60, 102040.

Riedelsheimer, C., Lisec, J., Czedik-Eysenberg, A., Sulpice, R., Flis, A., Grieder, C.,
et al. (2012). Genome-wide association mapping of leaf metabolic profiles for dissecting
complex traits in maize. Proc. Natl. Acad. Sci.U. S. A. 109 (23), 8872–8877. doi:10.1073/
pnas.1120813109

Rincent, R., Laloë, D., Nicolas, S., Altmann, T., Brunel, D., Revilla, P., et al. (2012).
Maximizing the reliability of genomic selection by optimizing the calibration set of
reference individuals: comparison of methods in two diverse groups of maize inbreds
(Zea mays L.). Genetics 192 (2), 715–728. doi:10.1534/genetics.112.141473

Roorkiwal, M., Rathore, A., Das, R. R., Singh, M. K., Jain, A., Srinivasan, S., et al.
(2016). Genome-enabled prediction models for yield related traits in chickpea. Front.
Plant Sci. 7, 1666. doi:10.3389/fpls.2016.01666

Sandhu, K. S., Shiv, A., Kaur, G., Meena, M. R., Raja, A. K., Vengavasi, K., et al. (2022).
Integrated approach in genomic selection to accelerate genetic gain in sugarcane. Plants
11 (16), 2139. doi:10.3390/plants11162139

Wang, M., Yang, N., Laterrière, M., Gagné, D., Omonijo, F., and Ibeagha-Awemu, E.
M. (2024a). Multi-omics integration identifies regulatory factors underlying bovine
subclinical mastitis. J. Animal Sci. Biotechnol. 15 (1), 46. doi:10.1186/s40104-024-
00996-8

Wang, P., Lehti-Shiu, M. D., Lotreck, S., Segura Abá, K., Krysan, P. J., and Shiu, S. H.
(2024b). Prediction of plant complex traits via integration of multi-omics data. Nat.
Commun. 15 (1), 6856. doi:10.1038/s41467-024-50701-6

Wen, W., Li, D., Li, X., Gao, Y., Li, W., Li, H., et al. (2014). Metabolome-based
genome-wide association study of maize kernel leads to novel biochemical insights.Nat.
Commun. 5 (1), 3438. doi:10.1038/ncomms4438

Xu, Y., Li, P., Zou, C., Lu, Y., Xie, C., Zhang, X., et al. (2017). Enhancing genetic gain in
the era of molecular breeding. J. Exp. Bot. 68 (11), 2641–2666. doi:10.1093/jxb/erx135

Yang, W., Guo, T., Luo, J., Zhang, R., Zhao, J., Warburton, M. L., et al. (2022). Target-
oriented prioritization: targeted selection strategy by integrating organismal and
molecular traits through predictive analytics in breeding. Genome Biol. 23 (1), 80.
doi:10.1186/s13059-022-02650-w

Zingaretti, L. M., Gezan, S. A., Ferrão, L. F. V., Osorio, L. F., Monfort, A., Muñoz, P.
R., et al. (2020). Exploring deep learning for complex trait genomic prediction in
polyploid outcrossing species. Front. Plant Sci. 11, 25. doi:10.3389/fpls.2020.00025

Frontiers in Genetics frontiersin.org17

Montesinos-López et al. 10.3389/fgene.2025.1636438

https://doi.org/10.1186/s12859-016-1043-4
https://doi.org/10.15252/msb.20156651
https://doi.org/10.1016/j.tig.2020.03.005
https://doi.org/10.1016/j.tplants.2017.08.011
https://doi.org/10.3389/fpls.2025.1605202
https://doi.org/10.1016/j.tplants.2014.05.006
https://doi.org/10.1016/j.tig.2008.10.010
https://doi.org/10.3835/plantgenome2017.11.0104
https://doi.org/10.1007/s00122-016-2780-5
https://doi.org/10.1038/ng.3920
https://doi.org/10.1007/s00122-013-2243-1
https://doi.org/10.1186/s12967-025-06425-2
https://doi.org/10.1093/genetics/157.4.1819
https://doi.org/10.1093/genetics/157.4.1819
https://doi.org/10.1530/JME-18-0055
https://doi.org/10.1534/g3.118.200740
https://doi.org/10.1534/g3.119.300585
https://doi.org/10.1093/g3journal/jkab014
https://doi.org/10.1093/g3journal/jkab014
https://doi.org/10.1534/genetics.114.164442
https://doi.org/10.1073/pnas.1120813109
https://doi.org/10.1073/pnas.1120813109
https://doi.org/10.1534/genetics.112.141473
https://doi.org/10.3389/fpls.2016.01666
https://doi.org/10.3390/plants11162139
https://doi.org/10.1186/s40104-024-00996-8
https://doi.org/10.1186/s40104-024-00996-8
https://doi.org/10.1038/s41467-024-50701-6
https://doi.org/10.1038/ncomms4438
https://doi.org/10.1093/jxb/erx135
https://doi.org/10.1186/s13059-022-02650-w
https://doi.org/10.3389/fpls.2020.00025
https://www.frontiersin.org/journals/genetics
https://www.frontiersin.org
https://doi.org/10.3389/fgene.2025.1636438

	Genomic prediction powered by multi-omics data
	Introduction
	Materials and methods
	Datasets
	Statistical models
	Model 1 (M1)

	Predictors
	Model 1 (M1)
	Models 2 and 3 (M2 and M3)
	Models 4 and 5 (M4 and M5)
	Models 5–7 (M5–M7)
	Models 8–14 (M8–M14)
	Models 15–24 (M15 and M24)

	Model’s relationships
	Heritability estimates
	Cross-validation and evaluation metrics

	Results
	Rice210
	Maize282
	Maize368
	Across dataset
	Summary of results
	Heritability estimates
	Prediction accuracy
	Implications


	Discussion
	Additional considerations
	Final remarks

	Conclusion
	Data availability statement
	Author contributions
	Funding
	Conflict of interest
	Generative AI statement
	Publisher’s note
	References