Academic Editor: Quan Zou Received: 4 June 2025 Revised: 19 June 2025 Accepted: 26 June 2025 Published: 16 July 2025 Citation: Montesinos-López, O.A.; Crossa, J.; Vitale, P.; Gerard, G.; Crespo-Herrera, L.; Dreisigacker, S.; Saint Pierre, C.; Delgado-Enciso, I.; Montesinos-López, A.; Howard, R. Boosting Genomic Prediction Transferability with Sparse Testing. Genes 2025, 16, 827. https://doi.org/ 10.3390/genes16070827 Copyright: © 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/ licenses/by/4.0/). Article Boosting Genomic Prediction Transferability with Sparse Testing Osval A. Montesinos-López 1, Jose Crossa 2,3, Paolo Vitale 2, Guillermo Gerard 2, Leonardo Crespo-Herrera 2 , Susanne Dreisigacker 2 , Carolina Saint Pierre 2 , Iván Delgado-Enciso 4 , Abelardo Montesinos-López 5,* and Reka Howard 6,* 1 Facultad de Telemática, Universidad de Colima, Colima 28040, Col., Mexico; osval78t@gmail.com 2 International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, Texcoco 52640, Edo. Mex., Mexico; j.crossa@cgiar.org (J.C.); p.vitale@cgiar.org (P.V.); g.gerard@cgiar.org (G.G.); l.crespo@cgiar.org (L.C.-H.); s.dreisigacker@cgiar.org (S.D.); c.saintpierre@cgiar.org (C.S.P.) 3 Colegio de Postgraduados, Montecillos, Texcoco 56230, Edo. Mex., Mexico 4 School of Medicine, University of Colima, Colima 28040, Col., Mexico; ivan_delgado_enciso@ucol.mx 5 Centro Universitario de Ciencias Eactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara 44430, Jal., Mexico 6 Department of Statistics, University of Nebraska-Lincoln, 343C Hardin Hall, Lincoln, NE 68583-0963, USA * Correspondence: abelardo.montesinos@academicos.udg.mx or amlcimat@gmail.com (A.M.-L.); rekahoward@unl.edu (R.H.) Abstract Background/Objectives: Improving sparse testing is essential for enhancing the efficiency of genomic prediction (GP). Accordingly, new strategies are being explored to refine genomic selection (GS) methods under sparse testing conditions. Methods: In this study, a sparse testing approach was evaluated, specifically in the context of predicting performance for tested lines in untested environments. Sparse testing is particularly practical in large- scale breeding programs because it reduces the cost and logistical burden of evaluating every genotype in every environment, while still enabling accurate prediction through strategic data use. To achieve this, we used training data from CIMMYT (Obregon, Mexico), along with partial data from India, to predict line performance in India using observations from Mexico. Results: Our results show that incorporating data from Obregon into the training set improved prediction accuracy, with greater effectiveness when the data were temporally closer. Across environments, Pearson’s correlation improved by at least 219% (in a testing proportion of 50%), while gains in the percentage of matching in top 10% and 20% of top lines were 18.42% and 20.79%, respectively (also in a testing proportion of 50%). Conclusions: These findings emphasize that enriching training data with relevant, temporally proximate information is key to enhancing genomic prediction performance; conversely, incorporating unrelated data can reduce prediction accuracy. Keywords: sparse testing; tested lines in untested environment; genomic prediction 1. Introduction Genomic prediction (GP) is transforming plant breeding by enabling scientists to identify high-performing genetic profiles earlier in the breeding process, significantly reducing the time and costs associated with developing improved crop varieties. Unlike traditional breeding, which relies heavily on observable traits and lengthy field trials, GP leverages genomic data to predict plant performance, even for complex traits like yield stability and disease resistance. By integrating vast amounts of genetic information Genes 2025, 16, 827 https://doi.org/10.3390/genes16070827 https://doi.org/10.3390/genes16070827 https://doi.org/10.3390/genes16070827 https://creativecommons.org/licenses/by/4.0/ https://creativecommons.org/licenses/by/4.0/ https://www.mdpi.com/journal/genes https://www.mdpi.com https://orcid.org/0000-0003-0506-4700 https://orcid.org/0000-0002-3546-5989 https://orcid.org/0000-0003-1291-7468 https://orcid.org/0000-0001-9848-862X https://doi.org/10.3390/genes16070827 https://www.mdpi.com/article/10.3390/genes16070827?type=check_update&version=1 Genes 2025, 16, 827 2 of 26 with machine learning algorithms, GP allows breeders to make faster and more accurate selection decisions, improving both the precision and efficiency of breeding programs. As a result, it is now possible to breed plants that are better adapted to specific climates and stresses, supporting food security and resilience against climate change worldwide. This shift towards data-driven selection is helping to sustain agricultural productivity in the face of environmental challenges, ultimately benefiting both breeders and farmers globally [1,2]. Implementing genomic prediction in plant breeding remains challenging due to com- plex genetic and statistical factors. One significant hurdle is the high dimensionality of genomic data, where the number of markers often exceeds the sample size, creating multi- collinearity issues. This complexity demands sophisticated statistical models that can han- dle these data intricacies, especially for polygenic traits controlled by numerous small-effect loci. Additionally, genotype-by-environment (G × E) interactions complicate predictions, as the performance of genotypes can vary widely across environments. Accounting for these interactions requires advanced models to capture genetic correlations across diverse environments, which increases computational demands. Another challenge is the high cost of genotyping large populations, especially in developing countries where resources may be limited, further slowing the adoption of genomic selection technologies [1,3,4]. For this reason, many strategies have been implemented in GP with the goal of im- proving its efficiency. One of these strategies is called sparse testing. Sparse testing is crucial in genomic prediction as it enables the evaluation of a wide variety of cultivars across multiple environments without the cost and logistical constraints of fully testing each of them in every environment. By strategically selecting and testing only a subset of genotypes in specific environments, sparse testing helps generate sufficient data to build accurate prediction models that account for G × E, allowing breeders to predict untested combinations effectively. This approach is particularly beneficial in large-scale breeding programs, where it reduces field trial costs and resource demands while maintaining the prediction power required for selecting cultivars suited to varied environmental condi- tions. Moreover, sparse testing supports data efficiency, enhancing the ability to predict performance in unobserved environments, ultimately accelerating the breeding cycle and improving genetic gains across diverse climates [1,5]. Recent developments in machine learning have led to the integration of non-linear and deep learning models into genomic prediction, offering the potential to capture complex trait architectures and G × E interactions more effectively than traditional linear methods. Models such as convolutional neural networks (CNNs), multilayer perceptrons (MLPs), and hybrid ensemble frameworks have demonstrated competitive performance, especially when dealing with high-dimensional genomic and environmental data [6]. While these models offer advantages in flexibility and potential accuracy, they also require large datasets and careful tuning, which may not always be feasible in breeding contexts with limited training data. Thus, GBLUP remains a robust and widely used benchmark model for evaluating genomic prediction strategies, including those involving sparse testing. In plant breeding, multi-environment trials (METs) are critical for accurately evaluating genotype performance and stability under diverse environmental conditions. Genomic pre- diction (GP) models that incorporate genotype-by-environment (G × E) interactions have significantly advanced breeding programs by predicting the performance of unobserved genotype–environment combinations. In crop improvement, many cultivars (varieties) (called genotypes) have been observed in different places or years (called environments). Breeders have data from those varieties in some environments, but not in others, and we must predict how those same varieties would do in the missing environments. So, breeders train the model using the observed environments and then test the model by predicting the performance in the environments where varieties were not observed. The cross-validation Genes 2025, 16, 827 3 of 26 (CV2-type cross-validation scheme), initially introduced by Burgueño et al. (2012) [7], specifically addresses realistic scenarios encountered in plant breeding programs where some genotype–environment combinations are deliberately masked, simulating situations where genotypes have incomplete environmental testing due to resource limitations or logistical constraints. This approach allows for a realistic assessment of genomic prediction models’ capability to estimate genotype performance in environments where no direct phenotypic data exist. Since its initial proposal, the CV2 methodology has evolved to reflect practical con- straints and opportunities within breeding programs. For example, Montesinos et al. (2024) [8] integrated sparse testing methodologies, applying incomplete block and random allocation designs to further simulate realistic breeding scenarios. Additionally, this study further expanded upon the CV2 concept by strategically enriching training datasets with re- lated environmental data, aiming to enhance predictive accuracy in untested environments. These advancements illustrate the versatility and adaptability of CV2-based strategies within modern genomic selection practices. In this research, we will explore sparse testing for tested lines in untested environments. This type of sparse testing allows breeders to predict the performance of tested genotypes in untested environments by leveraging information from strategically tested lines in various conditions. This approach helps to identify robust genotypes capable of thriving across different environments, even when complete testing in all conditions is impractical. Sparse testing frameworks rely on statistical and genomic models that use data from tested genotypes to infer the potential of similar but untested genotypes, addressing GE with fewer resources. By optimizing the selection of test sites and genotypes, sparse testing improves efficiency, reducing costs and labor while maintaining high predictive accuracy. This method is particularly advantageous in large-scale breeding programs with limited testing budgets and in regions with diverse and variable climates, where anticipating genotype adaptation is essential [1,5,7]. In this study, we assess the predictive capacity of sparse testing under tested lines in untested environments using a real-world dataset from South Asian Target Population of Environments (TPEs), encompassing 25 unique site–year combinations. Our analysis simulates scenarios where specific genotypes are evaluated in certain environments but are absent in others. These approaches include methods for predicting missing lines for a specific environment using information on other environments with related lines. This work builds upon our previous study [8], which evaluated sparse testing under random and incomplete block designs. Here, we focus on a more realistic and operationally relevant sparse testing scenario—predicting tested lines in untested environments—while leveraging multi-year, multi-environmental data enrichment. By explicitly comparing enriched versus non-enriched training sets, this study adds new insights into the transfer- ability of genomic predictions under practical field conditions. 2. Materials and Methods 2.1. Datasets The experimental material comprised 941 elite wheat lines from CIMMYT (Table 1). These genotypes were evaluated for grain yield (GY) over two consecutive crop seasons across three target population environments (TPEs). Of the total wheat lines, 444 were tested in the 2021–2022 growing season, with the remaining 497 were evaluated in the 2022–2023 season. In the 2021–2022 season, 166 lines were assigned to TPE1 (4 locations in India and 3 locations in Obregon, México), 165 to TPE2 (5 locations in India and 3 locations in Obregon, México), and 112 to TPE3 (2 locations in India and 3 locations in Obregon, México). In the 2022–2023 season, 166 genotypes were planted in each TPE: Genes 2025, 16, 827 4 of 26 TPE1 (6 locations in India and 6 in Obregon, México), TPE2 (6 locations in India and 6 in Obregon, México), and TPE3 (3 locations in India and 6 in Obregon, México). At each location, an alpha lattice design with two replications was established to optimize cost efficiency while ensuring robust parameter estimation, yielding reliable results for CIMMYT’s breeding programs. Table 1. Description of the wheat datasets. MAF denotes the minor allele frequency and PMV denotes the threshold of percentage of missing values. No. Data Lines Markers Env_India Env_Mexico MAF PMV 1 TPE_1_2021_2022 166 18,238 4 3 0.05 50% 2 TPE_1_2022_2023 166 18,238 6 6 0.05 50% 3 TPE_2_2021_2022 166 18,238 5 3 0.05 50% 4 TPE_2_2022_2023 165 18,238 6 6 0.05 50% 5 TPE_3_2021_2022 112 18,238 2 3 0.05 50% 6 TPE_3_2022_2023 166 18,238 3 6 0.05 50% Description of the Target Population of Environments (TPEs) In Mexico, all evaluations were conducted at CENEB (Centro Experimental Norman E. Borlaug) in Ciudad Obregón, Sonora (27.4936◦ N, 109.9380◦ W), under fully irrigated conditions typical of the northwestern wheat belt. Obregón has a median maximum daily temperature of 32 ◦C during the growing season, with total seasonal rainfall below 50 mm, necessitating full irrigation. Soils are predominantly clay loam with high fertility, and trials are managed with high-input protocols. In India, trials were carried out at representative sites of the All India Coordinated Wheat Improvement Program (AICWIP), including the following: Ludhiana (30.9010◦ N, 75.8573◦ E)—northwest plains; timely sown, moderate rainfall (300–400 mm), clay loam soils. Pusa (25.9852◦ N, 85.6638◦ E)—Eastern Indo-Gangetic plains; warmer, sub-tropical cli- mate with annual rainfall ~1000 mm, sandy loam soils. Wellington (11.3724◦ N, 76.7850◦ E)— southern hills; temperate climate with high humidity (~70–90%), cooler night temperatures, and well-drained forest soils. Regarding the genetic material, all evaluated wheat lines were elite breeding lines from CIMMYT’s spring wheat program. A total of 941 unique genotypes were included in the study, with subsets planted across TPEs. In each TPE × year combination, distinct but partially overlapping subsets of genotypes were evaluated. For example, 166 lines were planted in TPE1 in 2021–2022 and another 166 in 2022–2023. Some genotypes were shared across years and sites to enable sparse testing designs. Environments were grouped into TPEs using expert knowledge of breeding programs and the clustering of historical yield and environmental covariates (e.g., temperature, rainfall). This TPE classification allows us to evaluate the potential for sparse testing, where only a subset of lines is evaluated in a subset of sites within each TPE, and genomic prediction is used to infer performance in untested environments within the same TPE. This approach is consistent with the operational needs of large-scale breeding programs in both countries. It is important to highlight that the same lines under study in each dataset were evaluated across all environments in both countries (India and Mexico). In Mexico, all evaluations were conducted in Cd. Obregon, Sonora, while in India, they were carried out in Ludhiana. This consistent evaluation approach within each country ensures the Genes 2025, 16, 827 5 of 26 comparability of results across environments and strengthens the reliability of genotype performance assessments. 2.2. Bayesian GBLUP Model The multi-environment GBLUP model implemented: Yij = µ + Ei + gj + gEij + ϵij (1) where Yij represents the Best Linear Unbiased Estimate (BLUE) for the i-th genotype in the j-th environment. The grand mean is denoted by µ, and the random effects associated with environments, Ei for i = 1,. . .,I, are assumed to follow a multivariate normal distribution E = (E1, . . . , EI) T ∼ NJ ( 0, σ2 EIE ) , where IE is the identity covariance matrix of environ- ments, and σ2 E represents the variance component attributed to environmental effects. Addi- tionally, gj, j = 1, . . . , J, are the random effects of genotypes (lines), and gEij denotes the ran- dom effects associated with the genotype-by-environment interaction. The residual errors, ϵij, are assumed to be independent and normally distributed with mean 0 and variance σ2. Furthermore, the genotypic random effects vector g = ( g1, . . . , gJ )T ∼ NJ ( 0, σ2 gG ) , where G is the genomic relationship matrix [9], and σ2 g is the genetic variance component. The genotype-by-environment interaction effects, gE = ( gE11, . . . , gE1J , . . . , gEI J )T , are mod- eled as following a multivariate normal distribution gE ∼ NI J ( 0, σ2 gE ( ZgGZT g ◦ZEIEZT E )) , where Zg is the incidence matrix for the additive genetic effects, the variance component σ2 gE corresponds to the genotype-by-environment interaction, ◦ denotes the Hadamard product, ZE is the incidence matrix representing the environmental effects, and IE is the identity matrix denoting independent environments. The implementation of this model was carried out using the BGLR package [10]. Finally, the residual error components ϵij were assumed to be distributed as ϵij ∼ NJ ( 0, σ2 ϵ ) , where σ2 ϵ is the error variance. Why Using GBLUP and GBLUP_Ad? In this study, we focused on the genomic best linear unbiased predictor (GBLUP) and its enriched variant (GBLUP_Ad) to isolate and evaluate the effects of training data composition under sparse testing conditions. While more complex models such as re- producing kernel Hilbert space (RKHS) regression, Bayesian Lasso, and deep learning approaches have been successfully applied in genomic prediction, our aim was not to compare predictive algorithms but to assess how strategic data enrichment can improve prediction accuracy in untested environments. GBLUP was selected for its widespread use, ease of implementation, and ability to provide a stable reference point for evaluating the impact of cross-environment training scenarios. Future work may incorporate non-linear models to further investigate whether they can better capture G × E interactions under similar sparse testing settings. 2.3. Cross-Validation Schemes Two primary cross-validation strategies were employed to evaluate the prediction accuracy of sparse testing approaches. 2.3.1. Cross-Validation Strategy 1 A 10-fold random partitioning scheme was used for all target environments in India. The training data consisted of 85%, 70%, 50%, and 30% of the lines, while the remaining 15%, 30%, 50%, and 70%, respectively, were reserved for testing (target population). The results from this strategy, using only data from the target environment in India, were denoted as GBLUP. Genes 2025, 16, 827 6 of 26 2.3.2. Cross-Validation Strategy 2 (Incorporating Additional Training Data to TARGET Data) This strategy enhanced the training set by including data from previous years in India, along with data from Obregon, Sonora, Mexico (both from the current and previous years, when available). This approach was labeled GBLUP_Ad, emphasizing the impact of enriched, multi-environmental training datasets on model performance. For instance, when the testing set consisted of 15%, 30%, 50%, and 70% of the lines from India in the target environment TPE_3_2022_2023, the training set comprised the following: • The remaining 85%, 70%, 50%, and 30% of lines from India in TPE_3_2022_2023. • All lines from India in TPE_3_2021_2022. • All lines from Obregon, Sonora, Mexico, from both TPE_3_2021_2022 and TPE_3_2022_2023. 2.4. Model Performance Evaluation and Comparisons Model performance was evaluated using two key metrics: (1) Average Pearson’s cor- relation (COR), that is a measure of the linear correlation between observed and predicted values across 10 partitions, and (2) Percentage of Matching in the top-performing lines, which includes the percentage of overlap between observed and predicted lines in the top 10% (PM_10) and top 20% (PM_20) of performance. Collectively, these metrics provided a comprehensive assessment of prediction accuracy across all random partitions. Although statistical tests such as paired t-tests or confidence intervals are widely used in other contexts, they are not appropriate for comparing model performance within standard k-fold cross-validation frameworks. This is because the cross-validation folds are not independent: the training and testing partitions typically overlap, violating the assumption of independent and identically distributed samples required for valid statistical inference. As demonstrated by [11], there exists no unbiased estimator of the variance in k-fold cross-validation, and any attempt to estimate significance based on such partitions may lead to incorrect conclusions. Similarly, [12] highlighted that performing model selection and evaluation within the same cross-validation framework can introduce bias and artificially inflate significance. For this reason, we follow established best practices in genomic prediction by reporting the average prediction metrics (e.g., Pearson’s correlation, PM_10, PM_20), along with their standard deviations and standard errors across folds, which offer a more robust and interpretable measure of model performance. 3. Results The results are presented in four sections. Sections 3.1–3.4 contain the results for the datasets TPE_1_2021_2022, TPE_2_2021_2022, and TPE_3_2022_2023 and across, re- spectively. Meanwhile, Section 3.4 provides the results across all datasets (Across data). Finally, Appendices B and C provide the figures and tables corresponding to the datasets TPE_1_2022_2023, TPE_2_2022_2023, and TPE_3_2021_2022. The results are presented in terms of three metrics: the Pearson’s Correlation (COR), Percentage of Matching in the top 10% (PM_10), and Percentage of Matching in the top 20% (PM_20) for each dataset. In some scenarios, the baseline GBLUP model produced negative Pearson’s correlation values or extreme relative efficiency (RE) scores. These negative values reflect instances where the model failed to generalize to the testing set, often due to limited or uninformative training data. The RE metric was calculated as the percentage change in the squared correla- tion of GBLUP relative to GBLUP_Ad, which can result in large or undefined values when the baseline model’s correlation approaches zero or becomes negative. While such values may seem extreme, they are useful in highlighting the extent to which GBLUP_Ad improves prediction under sparse or biologically dissimilar training conditions. Importantly, these Genes 2025, 16, 827 7 of 26 results also emphasize the need to carefully interpret low or negative correlations as signals of limited transferability between training and testing environments. 3.1. TPE_1_2021_2022 Figure 1 presents the results for the dataset TPE_1_2021_2022 under a comparative analysis of the models GBLUP and GBLUP_Ad in terms of their predictive efficiency, measured by Pearson’s correlation (COR), and the Percentage of Matching for the selected optimal lines in the top 10% and 20% (PM_10 and PM_20). For further details, please refer to Table A1 in Appendix A. In the analysis, the GBLUP_Ad model demonstrates superior performance across all evaluated metrics (COR, PM_10, PM_20) compared to GBLUP for several scenarios, especially for COR. For the COR metric, GBLUP_Ad maintains positive averages, with means ranging from 0.101 to 0.179 across different Tst values (where Tst denotes the proportion of testing set with possible values of 0.15, 0.30, 0.50, and 0.70), while GBLUP shows negative averages for the lower Tst values, such as −0.017 for Tst = 0.15 and −0.045 for TST = 0.30, reflecting its lower performance. Regarding the PM_10 and PM_20 metrics, GBLUP_Ad outperforms GBLUP for some cases. For Tst = 0.15 and PM_20, the mean value for GBLUP_Ad is 25.000 compared to 7.500 for GBLUP. Also, for Tst = 0.30 and PM_20, the mean is 27.778 for GBLUP_Ad compared for GBLUP having a mean of 17.778. For the other scenarios comparing the metrics PM_10 and PM_20, GBLUP outperforms GBLUP_Ad in terms of the mean. Overall, the relative efficiency of GBLUP is negative or significantly lower, whereas GBLUP_Ad establishes itself as the reference model with a relative efficiency of 0%, consol- idating its superiority in all evaluated aspects. (A) Figure 1. Cont. Genes 2025, 16, 827 8 of 26 (B) (C) Figure 1. Comparative performance of genomic prediction models in terms of Pearson correlation (COR) (A), percentage of agreement in the top 10% (PM_10) (B) and top 20% (PM_20) (C) for TPE_1_2021_2022, using random cross-validation. Tst denotes the proportion of testing set. For each metric (COR, PM_10, PM_20), standard errors were calculated across the 10 cross-validation folds. These error bars provide an estimate of variability and aid in the interpretation of model stability across replicates. 3.2. TPE_2_2021_2022 Figure 2 presents the results for TPE_2_2021_2022 under a comparative analysis of the GBLUP and GBLUP_Ad models in terms of COR, PM_10 and PM_20. For further details, please refer to Table A2 in Appendix A. For the COR metric, GBLUP shows better performance at Tst = 0.15 and Tst = 0.70, with averages of 0.024 and 0.081, respectively, while GBLUP_Ad presents negative averages Genes 2025, 16, 827 9 of 26 across all evaluated Tst, ranging from −0.148 to −0.194. However, the standard deviation of GBLUP_Ad is generally lower, suggesting more consistent predictions, although with overall lower performance. The relative efficiency (RE) of GBLUP is negative at Tst = 0.15 and Tst = 0.70, indicating inferior performance compared to GBLUP_Ad. For the PM_10 metric, GBLUP_Ad shows little variability in the early Tst, with av- erages of 0.000 at several points, while GBLUP has higher averages, such as 13.636 at TST = 0.70. However, the relative efficiency of GBLUP is negative or low across all Tst, reinforcing the superiority of GBLUP_Ad in terms of efficiency and accuracy. Finally, for the PM_20 metric, GBLUP_Ad has lower averages and smaller standard deviations compared to GBLUP, which has averages like 28.696 for Tst = 0.70. The relative efficiency of GBLUP is negative in most cases, while GBLUP_Ad demonstrates greater consistency and efficiency. (A) (B) Figure 2. Cont. Genes 2025, 16, 827 10 of 26 (C) Figure 2. Comparative performance of genomic prediction models in terms of Pearson correlation (COR) (A), and percentage of agreement in the top 10% (PM_10) (B) and top 20% (PM_20) (C) for TPE_2_2021_2022, using random cross-validation. Tst denotes the proportion of testing set. For each metric (COR, PM_10, PM_20), standard errors were calculated across the 10 cross-validation folds. These error bars provide an estimate of variability and aid in the interpretation of model stability across replicates. Although GBLUP shows some positive average values in certain metrics and Tst, GBLUP_Ad excels in terms of consistency and lower variability, making it generally more efficient, as reflected by the low or zero relative efficiency rates compared to GBLUP. 3.3. TPE_3_2022_2023 The results for the TPE_3_2022_2023 dataset are presented in Figure 3. For more details, please refer to Table A3 in Appendix A. For the COR metric, for Tst = 0.15, the GBLUP_Ad model demonstrates superior performance with a mean value of 0.455 and a low standard deviation of 0.104, suggesting more consistent and accurate predictions. In contrast, GBLUP has a mean value of 0.073 and a higher standard deviation of 0.236, indicating lower accuracy. The relative efficiency (RE) of GBLUP is high, suggesting inferior performance compared to GBLUP_Ad. As Tst increases, GBLUP_Ad continues to outperform GBLUP. For example, at Tst = 0.70, GBLUP_Ad shows a mean of 0.418 and a standard deviation of 0.029, while GBLUP shows a negative mean of −0.029 and a standard deviation of 0.196, with a negative RE, reflecting significantly inferior performance. For the PM_10 (Top 10% Prediction Accuracy) metric, at Tst = 0.15, GBLUP_Ad per- forms better with a mean of 30.000 compared to 20.000 for GBLUP. Both models have the same standard deviation of 25.820, indicating that GBLUP_Ad is superior in terms of predic- tion accuracy. As Tst increases, GBLUP_Ad continues to show better results. At Tst = 0.70, GBLUP_Ad has a mean of 34.545 and a standard deviation of 11.175, while GBLUP shows a mean of 12.727 and a similar standard deviation, highlighting the advantage of GBLUP_Ad. Genes 2025, 16, 827 11 of 26 (A) (B) (C) Figure 3. Comparative performance of genomic prediction models in terms of Pearson correlation (COR) (A), and percentage of agreement in the top 10% (PM_10) (B) and top 20% (PM_20) (C) for TPE_3_2022_2023, using random cross-validation. Tst denotes the proportion of testing set. For each metric (COR, PM_10, PM_20), standard errors were calculated across the 10 cross-validation folds. These error bars provide an estimate of variability and aid in the interpretation of model stability across replicates. Genes 2025, 16, 827 12 of 26 Finally, for the PM_20 (Top 20% Prediction Accuracy) metric and for Tst = 0.15, GBLUP_Ad again outperforms GBLUP with a mean of 40.000 compared to 20.000. Although GBLUP_Ad has a higher standard deviation (21.082 vs. 15.811), its overall performance is superior. At Tst = 0.70, GBLUP_Ad maintains its advantage with a mean of 47.391 and a standard deviation of 8.056, while GBLUP has a mean of 20.435 and a slightly higher standard deviation, confirming the better performance of GBLUP_Ad with mean of 47.391. 3.4. Across Data Finally, the across data results are presented in Figure 4. For further details, please refer to Table A4 in Appendix A. (A) (B) Figure 4. Cont. Genes 2025, 16, 827 13 of 26 (C) Figure 4. Comparative performance of genomic prediction models in terms of Pearson correlation (COR) (A), and percentage of agreement in the top 10% (PM_10) (B) and top 20% (PM_20) (C) for across data, using random cross-validation. Tst denotes the proportion of testing set. For each metric (COR, PM_10, PM_20), standard errors were calculated across the 10 cross-validation folds. These error bars provide an estimate of variability and aid in the interpretation of model stability across replicates. For the COR (Correlation) metric and for TST = 0.15, GBLUP shows a mean value close to zero (−0.001) and a standard deviation of 0.243, indicating high variability in predictions. Additionally, the relative efficiency (RE) is extremely negative (−16,136.276), suggesting very poor performance compared to GBLUP_Ad. As Tst increases, GBLUP continues to show low or negative mean values and higher standard deviations, indicating inconsistent predictions. For instance, at TST = 0.70, GBLUP has a mean of −0.004 and a standard deviation of 0.186, with a negative RE of −3316.083. In the PM_10 (Top 10% Prediction Accuracy) and PM_20 (Top 20% Prediction Accu- racy) metrics, GBLUP also demonstrates lower performance compared to GBLUP_Ad. For example, at TST = 0.15, GBLUP has a mean of 7.500 in PM_10 and 14.167 in PM_20, with relatively high standard deviations, indicating variability in predictions. In comparison, GBLUP_Ad has higher means in both metrics. As Tst increases, GBLUP continues to show lower means and considerable standard deviations. At TST = 0.70, GBLUP has a mean of 10.909 in PM_10 and 20.995 in PM_20, with standard deviations that indicate significant dispersion in the results compared to means of 13.030 and 26.415 for GBLUP_Ad for PM_10 and PM_20, respectively. 4. Discussion Predicting the performance of tested lines in new environments poses significant challenges in genomic prediction due to the complexity of genotype-by-environment (G × E) interactions [13]. When moving to new environments, conditions such as climate, soil quality, and local agricultural practices may vary considerably, impacting the expression of genetic traits in ways that are often unpredictable from data in known environments [5]. Genes 2025, 16, 827 14 of 26 This variability in environmental factors can interact with the genetic composition of a line, complicating the extrapolation of performance predictions [13]. Another major issue is the limited data on how different lines perform across diverse environments. Genomic prediction models rely on historical data, which often represents only a subset of possible conditions, limiting the models’ ability to generalize to new envi- ronments [1]. Moreover, these models are usually calibrated with specific environmental trials, making them highly tailored to those conditions. As a result, predictions in new settings may fail to accurately capture relevant environmental interactions, leading to reduced prediction accuracy [5,14]. Addressing these limitations often requires collecting extensive multi-environment trial data or developing sophisticated models that can better capture and adjust for G × E interactions. These approaches, however, involve significant resource investments, under- scoring the ongoing challenge of predicting performance in new environments for genomic selection and plant breeding programs [14,15]. Our results show that across datasets, the proposed strategy of enriching the training set with data from other environments significantly outperforms the approach of using only target environment data. Gains observed in Pearson’s correlation were notable across all tested proportions of the testing set. For instance, with a testing proportion of 15%, 30%, 50% and 70%, the observed Pearson’s correlation gains were at least of 189.00%, 219.23%, 328.125%, and 2950%, respectively. Similarly, improvements in PM_10 were observed, with gains of 100% (in 15% testing), 69.84% (in 30% testing), 18.42% (in 50% testing), and 19.44% (in 70% testing), while PM_20 gains reached 82.35%, 61.83%, 20.79%, and 25.82%, respectively. These findings underscore the importance of incorporating data from additional environments into the training set. However, it is worth noting that despite the substantial relative gains, the absolute prediction accuracies achieved in these environments were generally below 0.5 in terms of Pearson’s correlation. This suggests a limited relationship between the environments used for enrichment and the target environment, India. This observation aligns with the fact that the enrichment environments included data from Obregon, Mexico, as well as from India in a previous year, and in some cases, from both locations combined. These results underscore the potential of enriching target environments with informa- tion from other environments. However, the gains achieved are not uniform, which can be attributed to the significant heterogeneity among the environments used for enrichment. Consequently, it is recommended to prioritize enrichment using environments that closely resemble the target environment. Nonetheless, this approach is not always practical, as the number of available environments for enrichment may be limited, and they may not closely align with the target environment. Despite these challenges, the findings are generally promising, as they demonstrate that enriching target environments with data from similar environments can effectively enhance prediction performance. These challenges are well-documented in the literature [14,15], and they underscore the need for models that can more effectively account for non-additive G × E patterns or integrate environmental covariables directly into prediction frameworks. For example, Taïbi et al. (2015) [16] demonstrated how phenotypic plasticity and local adaptation strongly influenced reforestation success in Pinus halepensis, underlining the critical role of G × E interaction and environmental fit in predictive performance. Our findings highlight the practical reality faced by breeders: even when model improvement is observed, absolute prediction accuracy may remain modest due to underlying biological complexity and environmental divergence between training and testing sets. Finally, these results further strengthen the empirical evidence supporting the effec- tiveness of the GS methodology in uni-environment settings. When genetic material is Genes 2025, 16, 827 15 of 26 relatively homogeneous and management practices are well-standardized, GS demonstrates a remarkable ability to deliver accurate predictions. This is particularly advantageous in controlled breeding programs where minimizing environmental variability is crucial for isolating genetic effects. The consistency of GS in such settings not only enhances predic- tion reliability but also supports more efficient selection decisions, ultimately accelerating genetic gain. Furthermore, these findings highlight the importance of carefully manag- ing experimental conditions and selecting environments with minimal heterogeneity to maximize the utility of GS in practical applications [3,17]. 4.1. Contrasting Sparse Testing Methodologies and Results from This Study, Montesinos et al. (2024) [8], and Burgueno et al. (2012) [7] 4.1.1. Montesinos et al. (2024) [8] This study explored genomic predictions under sparse conditions, employing both incomplete block design (IBD) and random allocation of genotypes to environments. Six GBLUP models were assessed, with one model (GBLUP_TRN) directly utilizing observed data without imputing missing values. The primary goal was to ascertain the benefits or disadvantages of pre-imputation versus the direct use of available genomic and phenotypic information. The practical advantages are no reliance on imputation, reduced computa- tional complexity, and a realistic scenario for breeding programs with resource constraints. 4.1.2. This Research In this study, the authors advanced the CV2 concept by assessing prediction strategies for tested genotypes in previously untested environments. The genomic prediction was im- plemented through two major approaches: training exclusively on the target environment data and training enriched by additional relevant environments, notably Obregon (Mexico) and historical Indian trials. Predictive accuracy was evaluated using correlations and the percentage of top-performing lines correctly identified (PM_10, PM_20), emphasizing prac- tical implications in selection efficiency. Enhanced predictive accuracy through enriched training datasets and improved identification of high-performing genotypes in untested environments are some advantages, whereas disadvantages include dependency on the availability and relevance of external historical data and potential biases if external data differ significantly from target environments. 4.1.3. Burgueño et al. (2012) [7] This foundational study served as a benchmark for evaluating various statistical models’ robustness and predictive capabilities under realistically masked data. Advantages are the robust framework for evaluating model performance under realistic breeding conditions and the comprehensive modeling of G × E interactions; however, the method requires extensive computational resources for factorial analysis model implementation and may be overly complex for small-scale or less-resourced breeding programs. Collectively, Table 2 shows that the results from [7], this study, and [6] underscore the critical role CV2 validation plays in realistically assessing genomic prediction models in plant breeding. Each study uniquely contributes to the methodological refinement and application of CV2 schemes, demonstrating different advantages: direct genomic prediction from sparse testing conditions [7], leveraging enriched datasets to enhance accuracy in untested environments (this study), and comprehensive model comparison under structured masking conditions [6]. Genes 2025, 16, 827 16 of 26 Table 2. Comparative summary of methodologies. Feature Montesinos et al. (2024) [8] This Study Burgueño et al. (2012) [7] Crop Wheat Wheat Wheat Cross-Validation Scheme CV2 CV2 CV2 Data Design Sparse testing: IBD and Random Sparse testing: targeted enrichment Systematic random masking Genotype–Environment Coverage All genotypes observed at least once Some genotypes entirely unobserved Balanced masking across environments Prediction Models GBLUP (multiple variants) GBLUP enriched with external datasets Pedigree, markers, FA structures Modeling G × E Interaction Yes (covariance structure) Yes (multi-environment integration) Yes (FA models explicitly modeling covariance) Evaluation Metrics COR, NRMSE, PM_10, PM_20 COR, PM_10, PM_20 COR Overall, the strategic use of CV2 validations, combined with methodological adapta- tions tailored to practical breeding scenarios and the integration of environmental covari- ables, highlights a powerful pathway toward more accurate and resource-efficient genomic selection in plant breeding programs. 4.2. Factors Limiting Prediction Accuracy Across Environments Despite the consistent performance improvement of GBLUP_Ad over GBLUP, we observed that the overall Pearson’s correlation values remained below 0.5 in many cases. This is not unexpected in multi-environment genomic prediction involving sparse testing across heterogeneous environments. One major factor limiting predictive accuracy is the presence of strong genotype-by-environment (G × E) interactions, where the expression of genetic effects varies with environmental context. The contrasting environmental conditions and agronomic management practices between the Indian test sites and Obregon (Mexico) likely contribute to non-transferable genotype performance, especially for yield-related traits that are highly sensitive to local stresses. These challenges are well-documented in the literature [13,14]; for example, Taïbi et al. (2015) [16] demonstrated how phenotypic plasticity and local adaptation strongly influenced reforestation success in Pinus halepensis, underlining the critical role of G × E interaction and environmental fit in predictive performance. Our findings highlight the practical reality faced by breeders: even when model improvement is observed, absolute prediction accuracy may remain modest due to underlying biological complexity and environmental divergence between training and testing sets. 5. Conclusions From our results, we conclude that utilizing data from diverse environments can significantly enhance prediction accuracy in new environments with sparse testing. By integrating information from multiple environmental contexts, genomic prediction models can capture a broader range of genotype-by-environment (G × E) interactions, thereby improving their ability to generalize to unfamiliar conditions. This approach allows models to more accurately estimate genetic responses under varying environmental pressures, increasing their robustness and reliability in settings with limited testing data. While challenges in data collection and model complexity remain, leveraging multi-environment data offers a promising strategy to overcome the limitations of sparse testing, facilitating better decision making in plant breeding and selection. However, even with improved prediction accuracy through data from diverse environments, the overall accuracy remains relatively low. This limitation arises because G × E interactions are highly complex and Genes 2025, 16, 827 17 of 26 often specific to environmental conditions, which are challenging to fully capture and generalize. While multi-environmental data enrich the model, they cannot account for all potential environmental variables or their interactions with genotypes in every new setting. Thus, despite gains from this approach, prediction accuracies in new environments remain constrained by the inherent variability and unpredictable nature of G × E interactions, underscoring the need for continuous model refinement and advanced strategies to enhance prediction reliability in plant breeding. Author Contributions: Conceptualization, O.A.M.-L. and A.M.-L.; methodology, O.A.M.-L., A.M.-L., J.C., P.V., G.G., L.C.-H., I.D.-E. and R.H. software, O.A.M.-L. and A.M.-L. validation, O.A.M.-L., A.M.-L., J.C., P.V., G.G., S.D., C.S.P., L.C.-H., I.D.-E. and R.H.; formal analysis, O.A.M.-L. and A.M.-L. All authors have read and agreed to the published version of the manuscript. Funding: We acknowledge the financial support provided by the BMGF/FCDO Accelerating Genetic Gains in Maize and Wheat for Improved Livelihoods (AGG), USAID-CIMMYT Wheat/AGGMW, and CGIAR Accelerated Breeding Initiative (ABI). Informed Consent Statement: Not applicable. Data Availability Statement: All phenotypic data, genotype marker matrices, R scripts, and parameter settings used in this study are fully available at the following GitHub repository: https://github.com/osval78/Sparse_testing_Across (accessed on 28 July 2024). The repository includes scripts for data preprocessing, model fitting using the BGLR package [10], and performance evaluation across cross-validation scenarios. A detailed README file provides instructions for reproducing the analyses presented in this manuscript. Conflicts of Interest: The authors declare no conflicts of interest. Appendix A Table A1. Comparative performance of genomic prediction models in terms of Pearson’s correlation (COR) (A), and Percentage of Matching in top 10% (PM_10) (B) and top 20% (PM_20) (C) for the TPE_1_2021_2022 dataset under random cross-validation. Tst denotes the proportion of testing set. Metric Model Tst Min Mean Max Sd RE (%) COR GBLUP 0.15 −0.390 −0.017 0.618 0.312 −1156.801 COR GBLUP_Ad 0.15 −0.172 0.179 0.439 0.180 0.000 COR GBLUP 0.30 −0.322 −0.045 0.262 0.177 −402.346 COR GBLUP_Ad 0.30 −0.049 0.137 0.344 0.113 0.000 COR GBLUP 0.50 −0.218 0.103 0.390 0.184 45.852 COR GBLUP_Ad 0.50 0.033 0.150 0.245 0.064 0.000 COR GBLUP 0.70 −0.192 0.091 0.391 0.207 10.133 COR GBLUP_Ad 0.70 0.028 0.101 0.177 0.047 0.000 PM_10 GBLUP 0.15 0.000 5.000 50.000 15.811 0.000 PM_10 GBLUP_Ad 0.15 0.000 5.000 50.000 15.811 0.000 PM_10 GBLUP 0.30 0.000 7.500 25.000 12.076 −66.667 PM_10 GBLUP_Ad 0.30 0.000 2.500 25.000 7.906 0.000 PM_10 GBLUP 0.50 0.000 13.750 50.000 18.114 −63.636 PM_10 GBLUP_Ad 0.50 0.000 5.000 25.000 8.740 0.000 PM_10 GBLUP 0.70 9.091 15.455 27.273 7.484 −88.235 PM_10 GBLUP_Ad 0.70 0.000 1.818 18.182 5.750 0.000 https://github.com/osval78/Sparse_testing_Across Genes 2025, 16, 827 18 of 26 Table A1. Cont. Metric Model Tst Min Mean Max Sd RE (%) PM_20 GBLUP 0.15 0.000 7.500 50.000 16.874 233.333 PM_20 GBLUP_Ad 0.15 0.000 25.000 75.000 20.412 0.000 PM_20 GBLUP 0.30 0.000 17.778 44.444 14.055 56.250 PM_20 GBLUP_Ad 0.30 11.111 27.778 44.444 14.103 0.000 PM_20 GBLUP 0.50 6.250 24.375 43.750 11.200 −5.128 PM_20 GBLUP_Ad 0.50 12.500 23.125 31.250 7.247 0.000 PM_20 GBLUP 0.70 8.696 27.391 43.478 12.971 −12.698 PM_20 GBLUP_Ad 0.70 17.391 23.913 34.783 5.519 0.000 Table A2. Comparative performance of genomic prediction models in terms of Pearson’s correlation (COR) (A), and Percentage of Matching in top 10% (PM_10) (B) and top 20% (PM_20) (C) for the TPE_2_2021_2022 dataset under random cross-validation. Tst denotes the proportion of testing set. Metric Model Tst Min Mean Max Sd RE (%) COR GBLUP 0.15 −0.419 0.024 0.343 0.234 −718.437 COR GBLUP_Ad 0.15 −0.464 −0.148 0.135 0.212 0.000 COR GBLUP 0.30 −0.510 −0.166 0.024 0.181 −21.570 COR GBLUP_Ad 0.30 −0.335 −0.130 0.046 0.124 0.000 COR GBLUP 0.50 −0.200 −0.016 0.177 0.115 809.800 COR GBLUP_Ad 0.50 −0.271 −0.148 −0.087 0.057 0.000 COR GBLUP 0.70 −0.181 0.081 0.361 0.159 −340.741 COR GBLUP_Ad 0.70 −0.264 −0.194 −0.107 0.046 0.000 PM_10 GBLUP 0.15 0.000 5.000 50.000 15.811 −100.000 PM_10 GBLUP_Ad 0.15 0.000 0.000 0.000 0.000 NA PM_10 GBLUP 0.30 0.000 7.500 25.000 12.076 −100.000 PM_10 GBLUP_Ad 0.30 0.000 0.000 0.000 0.000 NA PM_10 GBLUP 0.50 0.000 12.500 25.000 8.333 −90.000 PM_10 GBLUP_Ad 0.50 0.000 1.250 12.500 3.953 0.000 PM_10 GBLUP 0.70 0.000 13.636 54.545 16.177 −100.000 PM_10 GBLUP_Ad 0.70 0.000 0.000 0.000 0.000 NA PM_20 GBLUP 0.15 0.000 17.500 50.000 20.582 −57.143 PM_20 GBLUP_Ad 0.15 0.000 7.500 50.000 16.874 0.000 PM_20 GBLUP 0.30 0.000 16.667 44.444 14.103 −80.000 PM_20 GBLUP_Ad 0.30 0.000 3.333 22.222 7.499 0.000 PM_20 GBLUP 0.50 0.000 21.250 37.500 12.569 −76.471 PM_20 GBLUP_Ad 0.50 0.000 5.000 12.500 3.953 0.000 PM_20 GBLUP 0.70 13.043 28.696 47.826 12.332 −84.848 PM_20 GBLUP_Ad 0.70 0.000 4.348 8.696 2.899 0.000 Genes 2025, 16, 827 19 of 26 Table A3. Comparative performance of genomic prediction models in terms of Pearson’s correlation (COR) (A), and Percentage of Matching in top 10% (PM_10) (B) and top 20% (PM_20) (C) for the TPE_3_2022_2023 dataset under random cross-validation. Tst denotes the proportion of testing set. Metric Model Tst Min Mean Max Sd RE (%) COR GBLUP 0.15 −0.366 0.073 0.364 0.236 519.809 COR GBLUP_Ad 0.15 0.335 0.455 0.677 0.104 0.000 COR GBLUP 0.30 −0.404 0.018 0.436 0.263 2501.594 COR GBLUP_Ad 0.30 0.378 0.481 0.640 0.072 0.000 COR GBLUP 0.50 −0.285 0.031 0.285 0.193 1284.182 COR GBLUP_Ad 0.50 0.336 0.425 0.486 0.044 0.000 COR GBLUP 0.70 −0.366 −0.029 0.274 0.196 −1522.158 COR GBLUP_Ad 0.70 0.372 0.418 0.476 0.029 0.000 PM_10 GBLUP 0.15 0.000 20.000 50.000 25.820 50.000 PM_10 GBLUP_Ad 0.15 0.000 30.000 50.000 25.820 0.000 PM_10 GBLUP 0.30 0.000 5.000 25.000 10.541 750.000 PM_10 GBLUP_Ad 0.30 25.000 42.500 75.000 16.874 0.000 PM_10 GBLUP 0.50 0.000 17.500 37.500 12.076 92.857 PM_10 GBLUP_Ad 0.50 12.500 33.750 62.500 14.494 0.000 PM_10 GBLUP 0.70 0.000 12.727 27.273 10.671 171.429 PM_10 GBLUP_Ad 0.70 18.182 34.545 54.545 11.175 0.000 PM_20 GBLUP 0.15 0.000 20.000 50.000 15.811 100.000 PM_20 GBLUP_Ad 0.15 0.000 40.000 75.000 21.082 0.000 PM_20 GBLUP 0.30 0.000 20.000 44.444 17.213 122.222 PM_20 GBLUP_Ad 0.30 33.333 44.444 66.667 9.072 0.000 PM_20 GBLUP 0.50 18.750 28.125 43.750 9.433 55.556 PM_20 GBLUP_Ad 0.50 31.250 43.750 62.500 10.206 0.000 PM_20 GBLUP 0.70 4.348 20.435 30.435 8.946 131.915 PM_20 GBLUP_Ad 0.70 39.130 47.391 60.870 8.056 0.000 Genes 2025, 16, 827 20 of 26 Table A4. Comparative performance of genomic prediction models in terms of Pearson’s correlation (COR) (A), and Percentage of Matching in top 10% (PM_10) (B) and top 20% (PM_20) (C) for the across data under random cross-validation. Tst denotes the proportion of testing set. Metric Model Tst Min Mean Max Sd RE (%) COR GBLUP 0.15 −0.591 −0.001 0.618 0.243 18900 COR GBLUP_Ad 0.15 −0.464 0.190 0.677 0.271 0.000 COR GBLUP 0.30 −0.510 −0.052 0.436 0.214 219.23 COR GBLUP_Ad 0.30 −0.335 0.166 0.655 0.245 0.000 COR GBLUP 0.50 −0.357 0.032 0.390 0.165 328.125 COR GBLUP_Ad 0.50 −0.271 0.137 0.486 0.199 0.000 COR GBLUP 0.70 −0.385 −0.004 0.391 0.186 2950 COR GBLUP_Ad 0.70 −0.264 0.122 0.476 0.194 0.000 PM_10 GBLUP 0.15 0.000 7.500 50.000 18.004 100.000 PM_10 GBLUP_Ad 0.15 0.000 15.000 100.000 28.074 0.000 PM_10 GBLUP 0.30 0.000 8.750 50.000 13.413 69.841 PM_10 GBLUP_Ad 0.30 0.000 14.861 75.000 19.951 0.000 PM_10 GBLUP 0.50 0.000 12.667 50.000 12.186 18.421 PM_10 GBLUP_Ad 0.50 0.000 15.000 62.500 15.404 0.000 PM_10 GBLUP 0.70 0.000 10.909 54.545 10.900 19.444 PM_10 GBLUP_Ad 0.70 0.000 13.030 54.545 14.353 0.000 PM_20 GBLUP 0.15 0.000 14.167 75.000 17.847 82.353 PM_20 GBLUP_Ad 0.15 0.000 25.833 100.000 24.390 0.000 PM_20 GBLUP 0.30 0.000 17.222 44.444 14.014 61.828 PM_20 GBLUP_Ad 0.30 0.000 27.870 66.667 18.679 0.000 PM_20 GBLUP 0.50 0.000 22.045 43.750 10.769 20.790 PM_20 GBLUP_Ad 0.50 0.000 26.629 62.500 14.212 0.000 PM_20 GBLUP 0.70 0.000 20.995 47.826 11.963 25.817 PM_20 GBLUP_Ad 0.70 0.000 26.415 60.870 13.894 0.000 Genes 2025, 16, 827 21 of 26 Appendix B Appendix B.1. TPE_1_2022_2023 (A) (B) (C) Figure A1. Comparative performance of genomic prediction models in terms of Pearson correlation (COR) (A), and percentage of agreement in the top 10% (PM_10) (B) and top 20% (PM_20) (C) for TPE_1_2022_2023, using random cross-validation. Tst denotes the proportion of testing set. For each metric (COR, PM_10, PM_20), standard errors were calculated across the 10 cross-validation folds. These error bars provide an estimate of variability and aid in the interpretation of model stability across replicates. Genes 2025, 16, 827 22 of 26 Appendix B.2. TPE_2_2022_2023 (A) (B) (C) Figure A2. Comparative performance of genomic prediction models in terms of Pearson correlation (COR) (A), and percentage of agreement in the top 10% (PM_10) (B) and top 20% (PM_20) (C) for TPE_2_2022_2023, using random cross-validation. Tst denotes the proportion of testing set. For each metric (COR, PM_10, PM_20), standard errors were calculated across the 10 cross-validation folds. These error bars provide an estimate of variability and aid in the interpretation of model stability across replicates. Genes 2025, 16, 827 23 of 26 Appendix B.3. TPE_3_2021_2022 (A) (B) (C) Figure A3. Comparative performance of genomic prediction models in terms of Pearson correlation (COR) (A), and percentage of agreement in the top 10% (PM_10) (B) and top 20% (PM_20) (C) for TPE_3_2021_2022, using random cross-validation. Tst denotes the proportion of testing set. For each metric (COR, PM_10, PM_20), standard errors were calculated across the 10 cross-validation folds. These error bars provide an estimate of variability and aid in the interpretation of model stability across replicates. Genes 2025, 16, 827 24 of 26 Appendix C Table A5. Comparative performance of genomic prediction models in terms of Pearson’s correla- tion (COR) (A), and Percentage of Matching in top 10% (PM_10) (B) and top 20% (PM_20) (C) for TPE_1_2022_2023, TPE_2_2022_2023 and TPE_3_2021_2022 datasets under random cross-validation. Tst denotes the proportion of testing set. Data Metric Model Tst Min Mean Max Sd RE (%) TPE_1_2022_2023 COR GBLUP 0.15 −0.20 0.06 0.22 0.14 224.83 TPE_1_2022_2023 COR GBLUP_Ad 0.15 −0.07 0.20 0.51 0.18 0.00 TPE_1_2022_2023 COR GBLUP 0.30 −0.37 −0.09 0.27 0.21 −302.93 TPE_1_2022_2023 COR GBLUP_Ad 0.30 0.09 0.19 0.32 0.07 0.00 TPE_1_2022_2023 COR GBLUP 0.50 −0.07 0.12 0.25 0.11 15.11 TPE_1_2022_2023 COR GBLUP_Ad 0.50 −0.08 0.14 0.29 0.12 0.00 TPE_1_2022_2023 COR GBLUP 0.70 −0.29 −0.03 0.33 0.19 −620.93 TPE_1_2022_2023 COR GBLUP_Ad 0.70 0.06 0.15 0.22 0.05 0.00 TPE_1_2022_2023 PM_10 GBLUP 0.15 0.00 10.00 50.00 21.08 50.00 TPE_1_2022_2023 PM_10 GBLUP_Ad 0.15 0.00 15.00 50.00 24.15 0.00 TPE_1_2022_2023 PM_10 GBLUP 0.30 0.00 7.50 50.00 16.87 0.00 TPE_1_2022_2023 PM_10 GBLUP_Ad 0.30 0.00 7.50 25.00 12.08 0.00 TPE_1_2022_2023 PM_10 GBLUP 0.50 0.00 12.50 25.00 8.33 −30.00 TPE_1_2022_2023 PM_10 GBLUP_Ad 0.50 0.00 8.75 12.50 6.04 0.00 TPE_1_2022_2023 PM_10 GBLUP 0.70 0.00 10.91 36.36 11.18 −58.33 TPE_1_2022_2023 PM_10 GBLUP_Ad 0.70 0.00 4.55 9.09 4.79 0.00 TPE_1_2022_2023 PM_20 GBLUP 0.15 0.00 12.50 25.00 13.18 120.00 TPE_1_2022_2023 PM_20 GBLUP_Ad 0.15 0.00 27.50 50.00 14.19 0.00 TPE_1_2022_2023 PM_20 GBLUP 0.30 0.00 14.44 33.33 12.88 69.23 TPE_1_2022_2023 PM_20 GBLUP_Ad 0.30 11.11 24.44 33.33 8.76 0.00 TPE_1_2022_2023 PM_20 GBLUP 0.50 6.25 21.88 43.75 10.31 17.14 TPE_1_2022_2023 PM_20 GBLUP_Ad 0.50 12.50 25.63 37.50 8.56 0.00 TPE_1_2022_2023 PM_20 GBLUP 0.70 8.70 23.04 47.83 13.29 11.32 TPE_1_2022_2023 PM_20 GBLUP_Ad 0.70 17.39 25.65 30.43 5.21 0.00 TPE_2_2022_2023 COR GBLUP 0.15 −0.59 −0.09 0.48 0.31 −225.53 TPE_2_2022_2023 COR GBLUP_Ad 0.15 −0.42 0.11 0.35 0.27 0.00 TPE_2_2022_2023 COR GBLUP 0.30 −0.20 0.01 0.17 0.11 −659.51 TPE_2_2022_2023 COR GBLUP_Ad 0.30 −0.28 −0.04 0.17 0.14 0.00 TPE_2_2022_2023 COR GBLUP 0.50 −0.21 −0.03 0.16 0.10 −75.61 TPE_2_2022_2023 COR GBLUP_Ad 0.50 −0.11 −0.01 0.18 0.08 0.00 TPE_2_2022_2023 COR GBLUP 0.70 −0.39 −0.12 0.04 0.13 −130.06 TPE_2_2022_2023 COR GBLUP_Ad 0.70 −0.05 0.04 0.14 0.06 0.00 TPE_2_2022_2023 PM_10 GBLUP 0.15 0.00 5.00 50.00 15.81 100.00 TPE_2_2022_2023 PM_10 GBLUP_Ad 0.15 0.00 10.00 50.00 21.08 0.00 TPE_2_2022_2023 PM_10 GBLUP 0.30 0.00 15.00 25.00 12.91 −33.33 Genes 2025, 16, 827 25 of 26 Table A5. Cont. Data Metric Model Tst Min Mean Max Sd RE (%) TPE_2_2022_2023 PM_10 GBLUP_Ad 0.30 0.00 10.00 25.00 12.91 0.00 TPE_2_2022_2023 PM_10 GBLUP 0.50 0.00 13.75 37.50 13.76 −18.18 TPE_2_2022_2023 PM_10 GBLUP_Ad 0.50 0.00 11.25 25.00 9.22 0.00 TPE_2_2022_2023 PM_10 GBLUP 0.70 0.00 2.73 9.09 4.39 533.33 TPE_2_2022_2023 PM_10 GBLUP_Ad 0.70 0.00 17.27 36.36 10.88 0.00 TPE_2_2022_2023 PM_20 GBLUP 0.15 0.00 17.50 75.00 23.72 42.86 TPE_2_2022_2023 PM_20 GBLUP_Ad 0.15 0.00 25.00 75.00 28.87 0.00 TPE_2_2022_2023 PM_20 GBLUP 0.30 0.00 21.11 44.44 14.30 5.26 TPE_2_2022_2023 PM_20 GBLUP_Ad 0.30 0.00 22.22 44.44 16.56 0.00 TPE_2_2022_2023 PM_20 GBLUP 0.50 6.25 19.38 31.25 9.06 29.03 TPE_2_2022_2023 PM_20 GBLUP_Ad 0.50 18.75 25.00 31.25 5.10 0.00 TPE_2_2022_2023 PM_20 GBLUP 0.70 4.35 11.74 21.74 6.17 125.93 TPE_2_2022_2023 PM_20 GBLUP_Ad 0.70 17.39 26.52 34.78 5.96 0.00 TPE_3_2021_2022 COR GBLUP 0.15 −0.43 −0.06 0.28 0.20 −640.49 TPE_3_2021_2022 COR GBLUP_Ad 0.15 −0.08 0.35 0.66 0.22 0.00 TPE_3_2021_2022 COR GBLUP 0.30 −0.46 −0.03 0.31 0.29 −1220.66 TPE_3_2021_2022 COR GBLUP_Ad 0.30 0.03 0.36 0.66 0.19 0.00 TPE_3_2021_2022 COR GBLUP 0.50 −0.36 −0.02 0.24 0.22 −1508.08 TPE_3_2021_2022 COR GBLUP_Ad 0.50 0.18 0.26 0.40 0.08 0.00 TPE_3_2021_2022 COR GBLUP 0.70 −0.20 −0.01 0.28 0.17 −1825.50 TPE_3_2021_2022 COR GBLUP_Ad 0.70 0.13 0.22 0.33 0.07 0.00 TPE_3_2021_2022 PM_10 GBLUP 0.15 0.00 0.00 0.00 0.00 Inf TPE_3_2021_2022 PM_10 GBLUP_Ad 0.15 0.00 30.00 100.00 48.30 0.00 TPE_3_2021_2022 PM_10 GBLUP 0.30 0.00 10.00 33.33 16.10 166.67 TPE_3_2021_2022 PM_10 GBLUP_Ad 0.30 0.00 26.67 66.67 21.08 0.00 TPE_3_2021_2022 PM_10 GBLUP 0.50 0.00 6.00 20.00 9.66 400.00 TPE_3_2021_2022 PM_10 GBLUP_Ad 0.50 20.00 30.00 40.00 10.54 0.00 TPE_3_2021_2022 PM_10 GBLUP 0.70 0.00 10.00 28.57 9.64 100.00 TPE_3_2021_2022 PM_10 GBLUP_Ad 0.70 14.29 20.00 28.57 7.38 0.00 TPE_3_2021_2022 PM_20 GBLUP 0.15 0.00 10.00 33.33 16.10 200.00 TPE_3_2021_2022 PM_20 GBLUP_Ad 0.15 0.00 30.00 100.00 33.15 0.00 TPE_3_2021_2022 PM_20 GBLUP 0.30 0.00 13.33 33.33 13.15 237.50 TPE_3_2021_2022 PM_20 GBLUP_Ad 0.30 16.67 45.00 66.67 15.81 0.00 TPE_3_2021_2022 PM_20 GBLUP 0.50 0.00 17.27 36.36 10.88 115.79 TPE_3_2021_2022 PM_20 GBLUP_Ad 0.50 18.18 37.27 45.45 7.96 0.00 TPE_3_2021_2022 PM_20 GBLUP 0.70 0.00 14.67 26.67 8.20 109.09 TPE_3_2021_2022 PM_20 GBLUP_Ad 0.70 20.00 30.67 40.00 6.44 0.00 Genes 2025, 16, 827 26 of 26 References 1. Werner, C.R.; Zaman-Allah, M.; Assefa, T.; Cairns, J.E.; Atlin, G.N. Accelerating genetic gain through early-stage on-farm sparse testing. Trends Plant Sci. 2025, 30, 17–20. [CrossRef] [PubMed] 2. Varshney, R.K.; Roorkiwal, M.; Sorrells, M.E. Genomic Selection for Crop Improvement: Current Status and Prospects. In Frontiers in Genetics; Springer International Publishing: Cham, Switzerland, 2021; pp. 1–10. [CrossRef] 3. Meuwissen, T.H.; Hayes, B.J.; Goddard, M.E. Prediction of total genetic value using genome-wide dense marker maps. Genetics 2001, 157, 1819–1829. [CrossRef] [PubMed] 4. Heffner, E.L.; Lorenz, A.J.; Jannink, J.-L.; Sorrells, M.E. Plant Breeding with Genomic Selection: Gain per Unit Time and Cost. Crop Sci. 2010, 50, 1681–1690. [CrossRef] 5. Jarquín, D.; Crossa, J.; Lacaze, X.; Cheyron, P.H.; Daucourt, J.; Lorgeou, J.; Burgueno, J. A reaction norm model for genomic selection using high-dimensional genomic and environmental data. Theor. Appl. Genet. 2014, 127, 595–607. [CrossRef] [PubMed] 6. Sandhu, K.S.; Lozada, D.N.; Zhang, Z.; Belamkar, V. Deep learning for predicting complex traits in spring wheat. Front. Plant Sci. 2021, 12, 634909. [CrossRef] 7. Burgueño, J.; de los Campos, G.; Weigel, K.; Crossa, J. Genomic prediction of breeding values when modeling genotype × environment interaction using pedigree and dense molecular markers. Crop Sci. 2012, 52, 707–719. [CrossRef] 8. Montesinos-López, O.A.; Vitale, P.; Gerard, G.; Crespo-Herrera, L.; Saint Pierre, C.; Montesinos-López, A.; Crossa, J. Genotype Performance Estimation in Targeted Production Environments by Using Sparse Genomic Prediction. Plants 2024, 13, 3059. [CrossRef] [PubMed] 9. Goddard, M.E.; Hayes, B.J.; Meuwissen, T.H. Using the genomic relationship matrix to predict the accuracy of genomic selection. J. Anim. Breed. Genet. 2011, 128, 409–421. [PubMed] 10. Pérez, P.; de los Campos, G. BGLR: A statistical package for whole genome regression and prediction. Genetics 2014, 198, 483–495. [CrossRef] [PubMed] 11. Bengio, Y.; Grandvalet, Y. No unbiased estimator of the variance of k-fold cross-validation. J. Mach. Learn. Res. 2004, 5, 1089–1105. Available online: https://www.jmlr.org/papers/volume5/grandvalet04a/grandvalet04a.pdf (accessed on 1 December 2004). 12. Varma, S.; Simon, R. Bias in error estimation when using cross-validation for model selection. BMC Bioinform. 2006, 7, 91. [CrossRef] [PubMed] 13. de los Campos, G.; Sorensen, D. Genomic heritability: What is it? PLoS Genet. 2018, 14, e1007209. [CrossRef] [PubMed] 14. Cooper, M.; Hammer, G.L.; Messina, C.D. Modeling plant adaptation and breeding for drought-prone environments. Theor. Appl. Genet. 2014, 127, 713–733. [CrossRef] 15. Millet, E.J.; Welcker, C.; Kruijer, W.; Negro, S.; Nicolas, S.D.; Praud, S.; Tardieu, F. Genome-by-environment interactions to dissect candidate genes for drought tolerance in maize. Plant Cell Environ. 2019, 42, 1838–1856. [CrossRef] 16. Taïbi, K.; del Campo, A.D.; Aguado, A.; Mulet, J.M. The effect of genotype by environment interaction, phenotypic plasticity and adaptation on Pinus halepensis reforestation establishment under expected climate drifts. Ecol. Eng. 2015, 84, 218–228. [CrossRef] 17. VanRaden, P.M. Efficient methods to compute genomic predictions. J. Dairy Sci. 2008, 91, 4414–4423. [CrossRef] [PubMed] Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. https://doi.org/10.1016/j.tplants.2024.10.010 https://www.ncbi.nlm.nih.gov/pubmed/39521690 https://doi.org/10.1007/978-3-319-63170-7 https://doi.org/10.1093/genetics/157.4.1819 https://www.ncbi.nlm.nih.gov/pubmed/11290733 https://doi.org/10.2135/cropsci2009.11.0662 https://doi.org/10.1007/s00122-013-2243-1 https://www.ncbi.nlm.nih.gov/pubmed/24337101 https://doi.org/10.3389/fpls.2020.613325 https://doi.org/10.2135/cropsci2011.06.0299 https://doi.org/10.3390/plants13213059 https://www.ncbi.nlm.nih.gov/pubmed/39519975 https://www.ncbi.nlm.nih.gov/pubmed/22059574 https://doi.org/10.1534/genetics.114.164442 https://www.ncbi.nlm.nih.gov/pubmed/25009151 https://www.jmlr.org/papers/volume5/grandvalet04a/grandvalet04a.pdf https://doi.org/10.1186/1471-2105-7-91 https://www.ncbi.nlm.nih.gov/pubmed/16504092 https://doi.org/10.1371/journal.pgen.1005048 https://www.ncbi.nlm.nih.gov/pubmed/25942577 https://doi.org/10.1007/s00122-014-2262-2 https://doi.org/10.1111/pce.13533 https://doi.org/10.1016/j.ecoleng.2015.09.005 https://doi.org/10.3168/jds.2007-0980 https://www.ncbi.nlm.nih.gov/pubmed/18946147 Introduction Materials and Methods Datasets Bayesian GBLUP Model Cross-Validation Schemes Cross-Validation Strategy 1 Cross-Validation Strategy 2 (Incorporating Additional Training Data to TARGET Data) Model Performance Evaluation and Comparisons Results TPE_1_2021_2022 TPE_2_2021_2022 TPE_3_2022_2023 Across Data Discussion Contrasting Sparse Testing Methodologies and Results from This Study, Montesinos et al. (2024) B8-genes-3710493, and Burgueno et al. (2012) B7-genes-3710493 Montesinos et al. (2024) B8-genes-3710493 This Research Burgueño et al. (2012) B7-genes-3710493 Factors Limiting Prediction Accuracy Across Environments Conclusions Appendix A Appendix B TPE_1_2022_2023 TPE_2_2022_2023 TPE_3_2021_2022 Appendix C References