Remote Sensing Applications: Society and Environment 27 (2022) 100782 Contents lists available at ScienceDirect Remote Sensing Applications: Society and Environment journal homepage: www.elsevier.com/locate/rsase Estimation of soybean grain yield from multispectral high-resolution UAV data with machine learning models in West Africa Tunrayo R. Alabi a,*, Abush T. Abebe a, Godfree Chigeza b, Kayode R. Fowobaje a a International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria b International Institute of Tropical Agriculture (IITA), Lusaka, Zambia A R T I C L E I N F O A B S T R A C T Keywords: Soybean (Glycine max (L.) Merr.) is a leguminous and oil crop with rapidly growing importance Soybean and demand in Africa following the increasing demand for oil and livestock and poultry feed in Gray level co-occurrence matrix (GLCM) sub-Saharan Africa. However, soybean productivity is low in most countries of sub-Saharan Af- Yield prediction rica, especially in West Africa, where productivity is below one ton per ha. Hence, concerted Machine learning models soybean varietal development and testing efforts have been underway by the International Vegetation indices Institute of Tropical Agriculture (IITA), collaborating with the various African and US-based soybean breeding programs. Integrating new varietal evaluation approaches based on advanced phenotyping techniques into IITA’s soybean breeding program is crucial for designing efficient crop genetic improvement techniques. Hence, this work aims to investigate machine learning (ML) models and Unmanned Aerial vehicles (UAVs) to aid rapid high throughput phenotypic workflow for soybean yield estimation. We acquired multispectral images through a Sequoia® camera aboard a senseFly eBee X UAV from five variety trials during the 2020 growing season in Nigeria. UAV-based spectral bands, canopy height, vegetation indices (VI), and texture features were generated by gray level co-occurrence matrix (GLCM) and integrated to predict crop grain yield using five machine learning (ML) regression models, including Cubist, Extreme Gradient Boosting (XGBoost), Stochastic Gradient Boosting (GBM), Support vector machine (SVM), and Random Forest (RF). The main findings are the textural information generated using gray level co-occurrence matrix (GLCM) slightly outperformed predictors based mainly on vegetation indices (VI) and provided a promising alternative to the conventional use of VI in crop yield estimation. All the five ML models performed moderately well in predicting grain yield for all the soybean trials investigated, though the Cubist and RF model stood out, with R2 reaching 0.89. The study provides a framework to perform crop breeding trial assessments more effectively and consistently at high spatial scales that African crop breeding programs did not commonly apply. The workflow can also be successfully modified and applied for high throughput phenotyping of breeding platforms in other crops. 1. Introduction Soybean (Glycine max (L.) Merr.) has enormous economic benefits in African smallholder farming systems because of its high * Corresponding author. E-mail address: t.alabi@cgiar.org (T.R. Alabi). https://doi.org/10.1016/j.rsase.2022.100782 Received 12 October 2021; Received in revised form 12 May 2022; Accepted 20 May 2022 Available online 25 May 2022 2352-9385/© 2022 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). T.R. Alabi et al. R e m o t e S e n s in g A p p l i c a t io n s : S o c ie t y a n d E n v i r o n m e n t 27 (2022) 100782 nutritional value, raw material for edible oil processing factories, and soil-enriching potential because of its symbiotic N2 fixation (Sinclair et al., 2014). Soybean seeds are high in protein content and therefore have excellent nutritional values for the low-income population of the developing world who can not afford animal source proteins. Soybean was a crop recently introduced into the African farming system with minimal options of high-yielding varieties, especially for smallholder farmers. Smallholder farmers are commonly experiencing lower yields than the global average (FAO, 2021; Khojely et al., 2018) and other soybean-producing Western countries (Diers and Scaboo, 2019; Santos, 2019). Therefore, the role of high-yielding soybean varieties in ensuring nutritional security and diversifying their on-farm income sources is of paramount importance. Thus, the International Institute of Tropical Agriculture soybean breeding program (IITASBP) has been working to develop new, improved soybean varieties for sub-Saharan Africa (SSA). IITASBP employs hybridization and development of breeding lines and the introduction of exotic germplasm from various countries to evaluate and identify high-yielding and well-adapted soybean varieties. The program has played a substantial role in providing advanced soybean breeding lines for the national soybean breeding programs in SSA countries, including Nigeria, which led to the release of several IITA varieties, most of which are still under production by farmers (Chigeza et al., 2019). As part of this effort, the IITASBP implemented two advanced yield trials viz., Advanced Variety Trial Early Set (AVTES), and Medium Set (AVTMS), and one Pan-African Variety Trial (PAVT), and two preliminary variety trials (PVT) i.e., set 01 (PVT01) and Set 02 (PVT02) in the 2020 cropping season at Ibadan, Nigeria. In crop improvement programs, breeders evaluate breeding lines for high yield, resistance to diseases, and abiotic stresses. Plant phenotyping involves evaluating various plant attributes, such as its biophysical properties, leaf arrangement and biochemical traits, to identify critical determinants of yield and growth parameters (Johansen et al., 2019; Yang et al., 2017). Establishing good quality phenotypic data from field trials is still a constraint to enhancing the efficiency of breeding programs (Singh et al., 2016). Several recent authors have reported the shortcomings of traditional techniques of obtaining crop traits, such as yield, leaf color, aboveground biomass, and chlorophyll content (Fukano et al., 2021; Yang et al., 2017). Some drawbacks of conventional phenotyping based on manual sampling techniques are that they are not cost-effective, labor-intensive, and destructive. Moreover, the standard procedure of manual phenotyping often involves visual assessment, which can introduce subjectivity into data collection, thereby limiting accuracy and capacity (Sankaran et al., 2018). In contrast, digital phenotyping methods can consistently and rapidly acquire extensive data that are impracticable with manual measurements. Remote sensing technologies have been employed to gather consistent non-destructive agricultural data in a near real-time manner (Chang et al., 2021; Johansen et al., 2019) in many applications. Unfortunately, satellite-based imagery products have some inherent disadvantages, such as low spatial resolution, atmospheric cloud conditions, and data acquisition frequency, for application in detailed plant or plot-level assessment of breeding trials (Chang et al., 2021). In the recent literature, Unmanned Aerial Vehicles (UAV) technologies have been proven to possess great potential to address the limitations of traditional field assessment or conventional satellite remote sensing approaches in measuring crop traits in breeding programs (Chang et al., 2017). The UAV approach offers high spectral resolution images over time and space, containing detailed canopy and other phenological features than the conventional satellite products (Sagan et al., 2019; Sidike et al., 2018). It also provides new opportunities to acquire vast, consistent, higher spatio-temporal resolution phenotyping data. UAV systems and machine learning-based high throughput phenotyping of plants are paramount in developing high precision and low-cost crop genetic programs. Recent progress in multispectral imaging and drone technology presents a cheaper platform for obtaining high-precision phenotyping data (Araus et al., 2018; Maimaitijiang et al., 2020) Several studies reported that UAV acquired datasets such as spectral, textural, and structural features have successfully predicted various plant traits, such as grain yield, biomass, plant density, emergence, and senescence (Hassan et al., 2019; Malambo et al., 2018; Randelović et al., 2020; Roth and Streit, 2018). Recently, Chang et al. (2021) employed UAV-based high-throughput phenotyping methods to accurately estimate tomato yield, while Johansen et al. (2019) used the same approach to evaluate salinity stress tolerance in wild tomato cultivars. Many other crops such as maize, wheat, sorghum, and dry bean have been assessed successfully using drone-based high throughput phenotyping approaches for some of their phenotypic parameters. Makanza et al. (2018) utilized Red-green-blue (RGB) imagery based on UAV to appraise crop biomass and crop senescence in a maize trial and found a moderately high performance in predicting both traits. Additional examples of the applications of UAV based high throughput phenotyping include evaluating dry bean responses to drought stress and nitrogen deficit (Sankaran et al., 2018); genomic prediction modeling of sorghum plant height (Watanabe et al., 2017); and discriminating vigor of different barley genotypes (Di Gennaro et al., 2017). Furthermore, several studies that used UAV data for soybean traits prediction and modeling are now available. Toda et al. (2021) reported proper determination of genetic variation of soybean for growth parameters using UAV-generated data for genomic pre- diction of soybean biomass. Fukano et al. (2021) also reported the significant beneficial effect of soybean cultivar traits on wheat yield, examining the benefits of soybean-wheat rotation based on UAV-derived vegetative indices. Maimaitijiang et al. (2020) reported a successful soybean yield estimation based on canopy spectral information obtained from UAV images from an experimental site in Columbia, Missouri, USA. Additionally, Herrero-Huerta et al. (2020) employed an array of vegetation indices (VI) and structural features to model soybean productivity. Furthermore, including GLCM features can boost prediction or classification accuracy (Iqbal et al., 2021; Kwak and Park, 2019; Räsänen and Virtanen, 2019). Machine learning (ML) algorithms combined with UAV data have shown outstanding achievement in estimating and modeling crop traits, such as yield, biomass, and height (Herrero-Huerta et al., 2020). These methods employ advanced statistical techniques to model complex non-linear functions between spectral information and biophysical features. For example, Maimaitijiang et al. (2020) effectively used an array of regression ML models to estimate soybean grain yield in the humid climates of the USA. Equally, Eugenio et al. (2020) utilized the Multi-Layer Perceptron algorithm to predict soybean yield in the soybean growing region of Brazil and obtained satisfactory results. Moreover, Herrero-Huerta et al. (2020) accurately estimated the grain yield of the soybean trial from a site in Indiana, USA, using eXtreme Gradient Boosting (XGBoost) and Random forest regression models. However, few studies have 2 T.R. Alabi et al. R e m o t e S e n s in g A p p l i c a t io n s : S o c ie t y a n d E n v i r o n m e n t 27 (2022) 100782 used ML models, UAV-derived vegetation indices, and textural features to estimate soybean yield for rapid phenotypic pipelines within African farming systems and soybean breeding programs. Thus, there is a critical and urgent need to adopt spectral vegetative indices and texture derived from UAV sensors in predicting crop phenotypes to modernize the soybean breeding program of IITA with im- plications to enhance the efficiency in the breeding programs of other crops and countries. Therefore, this study aims to assess the use of UAV-derived vegetation indices (VI) and texture information from GLCM in com- bination with structural height to predict soybean yield as an aid to rapid high throughput phenotypic workflow. 2. Materials and methods 2.1. Description of the study area The soybean yield trials were carried out at the International Institute of Tropical Agriculture (IITA) experimental research station plot in Ibadan (Latitude 07.5◦N, Longitude 003.9◦E), Oyo State, Nigeria (Fig. 1). IITA is a not-for-profit organization committed to agricultural research for food security in Africa for more than 50 years. About half of the 1000-ha research station is primarily forest, while about a third of the station comprises agricultural experiment fields. Researchers grow cassava, maize, cowpea, banana, and soybean crops for breeding or agronomic trials. The mean annual rainfall of the experimental station is 1370 mm, with the minimum and maximum temperatures of 22.1 and 31.5 ◦C, respectively. The soil type of the breeding trial site is mainly Ferric Luvisols, with a soil pH of between 6 and 6.5 (Oladoye, 2015). Soil texture is sandy loam with relatively low water holding capacity. 2.2. Experimental setup Five soybean yield trials were conducted at the experimental site used for this study (Fig. 1), comprising two advanced and two preliminary trials during the rainy season of 2020. The advanced trials included two sets: i.e., Advanced Variety Trial-Early Set (AVTES) and Advanced Variety Trial-Medium Set (AVTMS). The preliminary tests included: Preliminary Variety Trial Set-01 (PVT01) and Preliminary Variety Trial Set-02 (PVT02). An additional variety trial called Pan African Variety Trials (PAVT) was included in the study. The AVTES trial consisted of 16 genotypes, including the standard check varieties. The genotypes in this trial were specially selected for early maturity, with days to maturity ranging between 90 and 100 days. The AVTMS consisted of 45 soybean genotypes mainly selected for medium maturity, with maturity days varying between 100 and 120 days. The PVT01 and PVT02 consisted of 50 and 60 entries combining early and medium maturing genotypes with the standard checks. The genotypes that best performed at this stage were promoted to advanced yield trials in 2021. The PAVT conducted in Nigeria in the cropping season of 2020 was composed of 45 soybean varieties that are best performing and registered in different African countries, including the standard checks recently registered in the country. The PAVT is a variety testing network of 59 public- and private sector partners from 24 countries across 113 locations, jointly coordinated by Feed the Future Soybean Innovation Lab (SIL) and the IITA soybean breeding program. Each trial used in this study was laid out in an alpha lattice design with three replications and planted with a standard soybean plot size of four rows of 4 m in length, each with inter and intra-row spacings of 50 cm and 5 cm, respectively (Fig. 2e and f). The inorganic fertilizers N, P, K, and S were applied at 20, 12, 100, and 15 kg ha− 1. All the genotypes evaluated in the five trials were treated with a commercial Rhizobia inoculum called Nodumax (a product of IITA Business Incubation Platform IITA-BIP) at the recommended 100 g of inoculum Fig. 1. UAV image showing the Soybean plot area (pink border), IITA Ibadan, Oyo State, Nigeria. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.) 3 T.R. Alabi et al. R e m o t e S e n s in g A p p l i c a t io n s : S o c ie t y a n d E n v i r o n m e n t 27 (2022) 100782 for 10 kg of seed. 2.3. UAV image collection This study used a senseFly eBee X fixed-wing drone with a 116 cm wingspan weighing 1.1 to 1.4 Kg (Fig. 2d). The drone flight missions had aboard a Parrot Sequoia multispectral camera integrated with an RGB camera (Fig. 2c). The Sequoia sensor is a self- calibrating radiometric system with four bands (near-infrared (NIR): 770–810 nm; red: 640–680 nm; green: 530–570 nm; red edge: 730–740 nm). Furthermore, it uses a sunshine sensor that synchronizes brightness values with the Inertial Measurement Unit (IMU) and onboard GPS. In addition, we employed a senseFly Geobase station (Fig. 2b) that enables high-precision positioning systems during every flight. The GeoBase system helped achieve an accuracy of about 3 cm without Ground Control Points (GCPs). Flight planning was designed with the Emotion Software. The flight missions were performed twice during the growing stages of soybean on 21 September (about 60 days after planting (DAP) at the start of flowering (R1 stage) and 27 October 2020 (95 DAP, at the commencement of pod setting (R6 stage)). The drone flight took place between 11:00 a.m. and 2:00 p.m. local time in clear weather conditions. The flight parameters were set to lateral overlap at 60% during the flight operations, while longitudinal overlap was at 80%, as recommended for optimal UAV image overlap (Böhler et al., 2018). 2.4. UAV image processing After each flight, Emotion Software was used to carry out post-flight processing. Pix4D mapper version 4.6.4 was used to generate orthomosaic images. Capturing accurate geo-referenced information of multispectral images is a significant advancement incorporated into the Parrot Sequoia camera. The Pix4D processed the orthomosaic images through geo-referencing, camera alignment, dense point cloud development, digital surface model (DSM), and digital terrain model (DTM) generation (Maimaitijiang et al., 2020). Moreover, to produce accurate orthomosaic images, the Pix4D mapper automatically computed GCPs by matching tie points positions of images. Furthermore, it performed radiometric calibration using vignetting correction, lens distortion information, and the irradiance values obtained by the sunshine sensor during the flight. All multispectral reflectance bands were generated at a spatial resolution of 12 cm and RGB mosaic images at 2.6 cm. Fig. 2. (a) Soybean plots, (b) Sensefly Geobase, (c) Parrot Sequoia camera, (d) Sensefly eBee X fixed-wing drone, (e & f) PVT02, AVTMS, and AVTES field layout showing plot no. and replication. 4 T.R. Alabi et al. R e m o t e S e n s i n g A p p l i c a t i o n s : S o c i e t y a n d E n v i r o n m e n t 27 (2022) 100782 2.5. Texture and vegetation indices extraction 2.5.1. Canopy texture information Many crop classification and trait predictions utilize textural features as input layers using machine learning models. These features help reduce the pixel noise effect and give valuable information about a spatial object’s architectural arrangement, color, and in- tensities (Iqbal et al., 2021). Gray level co-occurrence matrix (GLCM), proposed by Haralick et al. (1973), is a well-known statistical technique for handling remotely sensed data to classify landcover and model vegetation structure (Kwak and Park, 2019). Haralick et al. (1973) identified fourteen texture information from an image object, of which some correlate, indicating redundant information. Hence in this study, seven commonly used GLCM features were considered: (1) mean, (2) standard deviation, (3) homogeneity, (4) dissimilarity, (5) entropy, (6) angular second moment, and (7) variance. The seven texture parameters used in the present work are well described in Haralick et al. (1973) and Kwak and Park (2019). As implemented in R Software, we used the “glcm” package to calculate the GLCM metrics (Zvoleff, 2020). Texture parameters obtained from NIR, Rededge, Red, and Green reflectance bands were the input variables for soybean yield prediction. 2.5.2. Spectral vegetative indices In recent literature, vegetative indices (VIs) have been used in UAV-based crop phenotyping workflow, apart from canopy texture information (Randelović et al., 2020). These VIs are indicators for the rate of photosynthesis, level of chlorophyll, leaf area index (LAI), and green biomass. They are also commonly used to assess growth parameters and crop yield (Hassan et al., 2019). Twenty VIs were used in this study and are described briefly in Table 1. For the computation of VIs, we used the function “spectralIndices” from the RStoolbox package within R software (Leutner et al., 2019; Suab and Avtar, 2020). 2.5.3. Plot polygon generation and spectral feature extraction Plot boundary polygons were generated using on-the screen digitizing from the UAV-RGB (Fig. 3). Pixel values of spectral bands, Table 1 Description of vegetation indices. Index Description Formula Reference CTVI Corrected Transformed √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅(NDVI + 0.5)/ (ABS(NDVI ​ + 0.5)) Vegetation Index Perry & Lautenschlager (1984) DVI Difference Vegetation SL* NR - RD Richardson & Index Weigand (1977) GEMI Global Environmental (((NR^2 - RD^2) * 2 + (NR * 1.5) + (RD * 0.5))/(NR + RD + 0.5)) * (1 - ((((NR^2 - RD^2) * 2 + Pinty & Verstraete Monitoring Index (NR * 1.5) + (RD * 0.5))/(NR + RD + 0.5)) * 0.25)) - ((RD - 0.125)/(1 - red)) (1992) GCVI Green Chlorophyll (NR/GR)-1 Gitelson et al. (2005) Vegetation Index GNDVI Green Normalized (NR - GR)/(NR + GR) Gitelson & Merzlyak Difference Vegetation (1998) Index MCARI Modified Chlorophyll ((RE - RD) - (RE - GR)) * (RE/RD) Daughtry et al. Absorption Ratio Index (2000) MSAVI Modified Soil Adjusted √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅NR + 0.5 - (0.5 * ((2 ​ * ​ NR ​ + 1)̂ 2 − 8 ​ * ​ (NR ​ − (2 ​ * ​ RD)))) Vegetation Index Qi et al. (1994) MSAVI2 Modified Soil Adjusted √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅(2 * (NR + 1) ((2 ​ * ​ NR ​ + 1)̂ 2 − 8 ​ * ​ (NR ​ − ​ RD)))/2 Vegetation Index 2 Qi et al. (1994) NDRE Normalized Difference Red (NR - RE)/(NR + RE) Barnes et al. (2000) Edge Index NDVI Normalized Difference (NR - RD)/(NR + RD) Rouse et al. (1974) Vegetation Index NDWI Normalized Difference (GR - NR)/(GR + NR) McFEETERS (2007) Water Index NRVI Normalized Ratio (RD/NR - 1)/(RD/NR + 1) Baret & Guyot (1991) Vegetation Index RECI Red edge chlorophyll index (NR/RE)-1 Gitelson et al. (2003) RVI Ratio Vegetation Index RD/NR Jordan (1969) SAVI Soil Adjusted Vegetation (NR - RD) * (1 + SB)/(NR + RD + SB) Huete (1988) Index SR Simple Ratio Vegetation NR/RD Birth & McVey Index (1968) TTVI Thiam s Transformed √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅’ (ABS((NR ​ − ​ RD)/(NR ​ + ​ RD) + 0.5)) Vegetation Index Thiam (1997) TVI Transformed Vegetation √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅((NR ​ − ​ RD)/(NR ​ + ​ RD) + 0.5) Index Deering et al. (1975) WDVI Weighted Difference NR - SL* RD Richardson & Vegetation Index Weigand (1977) WDRVI Wide Dynamic Range (SL*NR - RD)/(SL*NR + RD) Gitelson (2004) Vegetation Index, NR = Near Infrared, GR = Green, RD = Red, RE = Rededge, SL = slope of the soil line, SB =Soil brightness factor, ABS = absolute. 5 T.R. Alabi et al. R e m o t e S e n s in g A p p l i c a t io n s : S o c ie t y a n d E n v i r o n m e n t 27 (2022) 100782 Fig. 3. Plot images of the five soybean trials acquired by the UAV on 21 September 2020. vegetative indices, canopy height, and textural features were extracted from the plot polygons. To avoid the border effects caused by weeds or soil around each plot, we created a negative buffer of 0.7m away from the plot boundary using the ArcGIS 10.7 buffer tool (Fig. 4). Pixel values for testing and training were then obtained within the polygon buffer. 2.6. Predictive model description Five predictive machine learning models were used to estimate soybean yield from five advanced breeding trials. A brief description of these machine learning models is presented in the following sections. 2.6.1. Cubist Cubist is a regression procedure commonly used for prediction (Quinlan and Quinlan, 1992). Primarily it functions by forming a tree structure and converts each path of the tree into a rule. Each rule is transformed into a regression model using the data subset specified by the rules. The different rules are trimmed or possibly merged, and the promising variables for the linear regression 6 T.R. Alabi et al. R e m o t e S e n s in g A p p l i c a t io n s : S o c ie t y a n d E n v i r o n m e n t 27 (2022) 100782 Fig. 4. Plot layout showing GNDVI and GCVI for September and October. White polygons show individual plot boundaries, while black polygons are 70 cm buffers within each plot to exclude borders in feature extraction. procedure are the dependent variables used in the parts of the rule that were trimmed out. Tuning this model requires two parameters, namely committees and neighbors. The R package caret (caret::train() function) has bindings to find the optimal values of the parameters. 2.6.2. Random forest (RF) RF is a commonly employed ML model for classification and regression initially introduced by Breiman (2001). As an ensemble ML algorithm, it constructs several independent decision trees for model fitting. The tree or model prediction with the maximum votes becomes the outcome for a particular class or value. A certain number of input variables are selected randomly from each node, and the subsets are used to calculate the best model output (Herrero-Huerta et al., 2020). 2.6.3. Stochastic gradient boosting (GBM) GBM is a widely used ML technique successfully implemented across many areas of modeling studies and involves resampling observations and features in each round (Freeman et al., 2015). RF constructs an ensemble of deep independent trees, while GBM creates an ensemble of shallow sequential trees with each tree using the previous model estimations for improvement. The combi- nation of these numerous weak consecutive predictions often produces a robust model. The main principle of stochastic gradient boosting is the sequential addition of new models to the ensemble. A different weak, base-learner model is trained to minimize the entire ensemble’s error at each iteration. A simple GBM model contains boosting and tree-specific hyperparameters. There are two boosting hyperparameters, i.e., learning rate and the number of trees. On the other hand, tree-specific hyperparameters include the minimum number of observations in terminal nodes and tree depth. 2.6.4. Extreme Gradient Boosting (XGBoost) XGBoost is an open-source ML model that efficiently implements gradient boosting decision trees (Chen and Guestrin, 2016). XGBoost accelerates the boosted tree construction using parallelization and utilizes a more efficient tree searching procedure. The weight of each input to the trained model is considered when boosted trees are built to obtain accurate feature scores. Some hyper- parameters have to be tuned to develop a robust XGBost model. They include learning rate, which helps shrink the weight at each iteration and regularisation alpha, a parameter to prevent overfitting. Other essential parameters include the maximum tree depth, subsample ratio that indicates the proportion of observations used for each iteration to the total training samples, and colsample_bytree denoting the percentage of features sampled randomly at each step. 2.6.5. Support vector machines (SVM) SVM are supervised learning techniques based on vectors separated by hyperplanes or decision boundaries. SVMs are founded on statistical learning theory and are very powerful for complex classification or regression problems (Cortes and Vapnik, 1995). SVMs are highly flexible, using various kernel functions and can estimate complex non-linear decision boundaries. The present work uses the 7 T.R. Alabi et al. R e m o t e S e n s in g A p p l i c a t io n s : S o c ie t y a n d E n v i r o n m e n t 27 (2022) 100782 highly flexible radial basis kernel to implement SVM. Two optimization parameters, namely: Sigma (σ) and cost (C), were tuned using the train function of the caret package for prediction in this study. 2.7. Feature integration for yield prediction Diverse feature integration techniques have been explored to explain the influence of various types of data fusion on the perfor- mance of ML prediction models on crop traits estimation (Maimaitijiang et al., 2020). Additionally, several authors have substantiated the capability of combined spectral and structure data in crop yield prediction (Stanton et al., 2017; Bendig et al., 2015). Hence, this study explored the prediction model accuracies on the three variants of datasets where spectral bands (GREEN, RED, NIR, and REDEDGE) and canopy height were common to the different versions of datasets. The first consisted of GLCM features coupled with spectral bands (SP) and canopy height (CH), hereafter called the GLCM model. The second was based on integrating vegetation indices (VI), spectral bands (GREEN, RED, NIR, and REDEDGE), and canopy height, subsequently called the VI model. The third was the integration of GLCM and VI datasets resulting in the GLCM + VI model. The two UAV flights on 21 September and 27 October 2020, corresponding to R1 and R6 of the soybean growth stages, respectively, generated 106 predictors. Eight of these parameters were from the multispectral bands of the two dates (GREEN, NIR, RED, REDEDGE), while 40 vegetation indices were obtained from the reflectance bands for the two dates (Table 1). The GLCM-derived variables formed 56 input layers to the models. The last two parameters came from the canopy height derived from the point cloud analysis during the photogrammetric workflow in Pix4D software. 2.8. Development of predictive models All the regression ML models employed in the study have different hyperparameters for optimal performance. The k-fold cross- validation technique is an excellent approach for hyperparameter tuning, proven for its superiority in estimating prediction error (Eugenio et al., 2020; Fushiki, 2009). This approach also prevents the overfitting of predictive models. Hence, the study’s overall predictive model development process involved 10-fold cross-validation repeated three times and implemented using the caret::train function. Modeling data is often divided into training and testing subsets of the entire data to avoid overfitting problems. Recent studies in ML predictive modeling used various ratios such as 80/20, 60/40, 70/30, and 50/50 to subset field data into train and test datasets. Nguyen et al. (2021) evaluated different ML techniques with several ratios and reported that 70/30 gave the best result. Thus, following the recommended ratio of data splitting, the field data in this study was divided into a 70/30 ratio for training and testing samples, respectively. 2.9. Predictive model performance evaluation Four commonly used statistical indicators of accuracy, i.e., the coefficient of determination (R2), root mean square error (RMSE), mean absolute error (MAE), and normalized RMSE (NRMSE%) were computed to assess quantitively the performance of all the yield prediction models. NRMSE% was calculated by dividing RMSE by the measured soybean grain yield range. The computation of these metrics is expressed in the following equations, where Ei and Gi are the predicted and observed grain yield, respectively. ⎛ ⎞2 ∑n 2 ⎜ (Gi − G)(Ei − E)R ⎟=⎝√̅̅̅̅̅̅̅̅̅̅̅̅̅i=̅̅̅1̅̅̅̅̅̅̅̅̅̅̅̅̅̅√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅∑n 2 ∑ ⎠ (1) i=1(G G n i − ) i=1(E 2 i − E) / ∑n MAE= |(Gi − Ei)| n (2) i=1 [ / ] n 1/2∑ RMSE = (G 2i − Ei) n (3) i=1 RMSE NRMSE%= *100 (4) ymax − ymin 2.10. Statistical analysis A descriptive summary for each soybean trial was analyzed using tidyverse and moments libraries of R-software. The statistical distribution of soybean yield in the variety trials was visualized using a boxplot, and multcompBoxplot and the Tukey HSD function were used to estimate statistical significance using R statistical software version 4.1. 2.11. Feature importance computation We performed a feature importance analysis that identifies significant input variables contributing to soybean grain yield pre- diction using the Boruta package in R (Kursa et al., 2021). The Boruta package, an advanced variable selection technique for finding all relevant features, is one of the most accurate and robust feature selection methods (Degenhardt et al., 2019; Sanchez-Pinto et al., 2018; Speiser et al., 2019). The technique is built around the random forest regression model. It removes the statistically proven irrelevant features (Kursa and Rudnicki, 2010). 8 T.R. Alabi et al. R e m o t e S e n s in g A p p l i c a t io n s : S o c ie t y a n d E n v i r o n m e n t 27 (2022) 100782 3. Results This study employed five ML models, i.e., RF, Cubist, SVM, GBM, and XGBoost, for soybean yield prediction using extracted canopy structural features (height), vegetation indices, and textural elements derived from multispectral UAV imagery. 3.1. Performance among regression models for soybean grain yield prediction Mean quantitative measures of model performance in soybean grain yield prediction are shown in Fig. 5. Considering all the performance metrics (R2, RMSE, NRMSE, and MAE), Cubist exhibited the best prediction accuracy and was followed closely by the Random Forest (RF) across the three data sets viz., GLCM, VI, and GLCM + VI. The Extreme Gradient Boosting (XGBoost), across all datasets, was the third-best performing model in soybean grain yield predictive ability. Stochastic Gradient Boosting (GBM) was the least accurate estimate of soybean grain yield among the five models. The GLCM dataset exhibited the highest R2 value along with the RF and Cubist models in the soybean yield prediction (Fig. 5). In contrast, no noticeable differences were found in RMSE values using GLCM and GLCM + VI datasets for the other three models (GBM, SVM, and XGBoost). Furthermore, the Cubist model’s superior performance is presented in Table 2, where its R2 varied from 0.73 to 0.89 for the GLCM dataset across the five trials. The RF algorithm closely followed this, with R2 ranging from 0.62 to 0.89. Moreover, these two models’ superior performance trends are displayed in Table 2; NRMSE varied from 6.2 to 10.7% for the Cubist and 7.1–13% for the RF model. 3.2. Influence of feature integration on soybean yield prediction To understand the influence of different configurations of feature types on the prediction power of the ML models, we trained and tested the prediction algorithms with three subsets of datasets, namely, texture information (GLCM), vegetation indices (VI), and integration of both datasets (GLCM +VI). As depicted in Fig. 5 and Table 2, the model based on the texture information (GLCM) offered the most accurate soybean yield prediction, slightly outperforming the model based on the combined datasets (GLCM + VI). Using all performance metrics (R2, RMSE, MAE, and NRMSE), the VI-based model displayed the lowest accuracy in estimating yield. The mean R2 obtained using the VI-based model ranged from 0.44 to 0.66, while those observed using the GLCM model ranged from 0.55 to 0.73. Equally, the percentage error of grain yield estimation (NRMSE%) from the VI-based model (8.5–18.7%) was consistently higher than those achieved with the GLCM model (7.8–14.5%) (Table 2). Integrating both VI and GLCM data did not significantly increase the performance metrics, probably due to the critical input variables such as canopy height and spectral bands common to both models. 3.3. Yield prediction for the different preliminary and advanced soybean variety trials The prediction accuracy varied across trials by the different yield estimation ML models. There exist minor differences between the two best performing dataset configurations, as can be seen from the mean R2 and NRMSE (Table 2). Hence, to elucidate the prediction patterns across the different trials, we concentrate on the GLCM + VI model. The models predicted that soybean grain yield in the PVT01 trial achieved the lowest mean R2 of 0.50. However, the most accurately predicted trial was PVT02, judged by the NRMSE values of 8% compared to 15% obtained for the AVTES. Although a slightly lower mean R2 (0.7) was achieved under PVT02 relative to 0.72 for the AVTES, the NRMSE of 8.0% for PVT02 was much better than the 15% obtained for AVTES. Overall, the ML models’ performance for soybean yield estimation for the studied five trials compares consistently well and, in most cases, is not significantly Fig. 5. Mean of model performance metrics (R2, RMSE (g/plot), NRMSE (%), and MAE (g/plot) across the five experiments. 9 T.R. Alabi et al. R e m o t e S e n s in g A p p l i c a t io n s : S o c ie t y a n d E n v i r o n m e n t 27 (2022) 100782 Table 2 Performance metrics of the five ML models using GLCM, VI, and GLCM + VI dataset for five soybean trials. R2 NRMSE (%) MODEL AVTES AVTMS PAVT PVT01 PVT02 AVTES AVTMS PAVT PVT01 PVT02 GLCM RF 0.89 0.62 0.77 0.67 0.77 10.2 13.1 11.1 10.5 7.1 GBM 0.58 0.44 0.61 0.38 0.62 18.7 15.5 14.4 13.8 8.9 SVM 0.64 0.44 0.61 0.64 0.62 17.4 15.5 14.2 17.4 9.0 Cubist 0.89 0.74 0.83 0.73 0.81 9.9 10.7 9.5 9.1 6.3 XGBOOST 0.68 0.50 0.69 0.39 0.71 16.2 14.6 12.6 13.7 7.9 Mean 0.73 0.55 0.70 0.56 0.71 14.5 13.9 12.3 12.9 7.8 VI RF 0.62 0.56 0.71 0.53 0.71 17.9 13.9 12.3 12.2 7.9 GBM 0.52 0.41 0.58 0.35 0.59 20.0 15.9 14.8 14.1 9.3 SVM 0.57 0.44 0.62 0.37 0.61 18.9 15.4 13.9 13.9 9.1 Cubist 0.63 0.58 0.74 0.54 0.73 17.6 13.3 11.7 11.8 7.5 XGBOOST 0.56 0.46 0.63 0.42 0.64 19.2 15.2 13.7 13.3 8.8 Mean 0.58 0.49 0.66 0.44 0.66 18.7 14.7 13.3 13.1 8.5 GLCM + VI RF 0.77 0.60 0.75 0.59 0.74 14.1 13.3 11.4 11.5 7.56 GBM 0.59 0.43 0.59 0.38 0.62 18.4 15.6 14.6 13.8 9.03 SVM 0.70 0.46 0.62 0.40 0.63 15.7 15.2 14.0 13.6 8.92 Cubist 0.87 0.74 0.83 0.70 0.82 10.3 10.7 9.5 9.6 6.22 XGBOOST 0.68 0.52 0.69 0.46 0.68 16.3 14.3 12.7 12.9 8.22 Mean 0.72 0.55 0.70 0.50 0.70 15.0 13.8 12.4 12.3 8.0 different (Table 2). ANOVA was performed for the different soybean trials. The means were separated with HSDTukey tests (at P < 0.05 level of significance) to evaluate further the predicted soybean grain yield difference produced with the combined datasets (GLCM + VI). The Fig. 6. Boxplot showing the distribution of measured soybean grain yield and ML models predicted yields for the five trials. 10 T.R. Alabi et al. R e m o t e S e n s in g A p p l i c a t io n s : S o c ie t y a n d E n v i r o n m e n t 27 (2022) 100782 boxplot (Fig. 6) of the variation among the estimated soybean grain yield by the five ML algorithms revealed that the output of each model was not statistically different. The pattern of the predicted soybean yields was also following the distribution of the observed grain yield (Fig. 6). These findings suggest that all ML models effectively detected grain yield differences among the five varietal evaluations. The boxplot further revealed slight differences in grain yield distribution and patterns measured in the five experiments’. There were no outliers within the distribution of the predicted and actual yields of the PAVT and AVTES and a few outliers for the AVTMS (Fig. 6). As displayed in the box plots, the performance of genotypes for grain yield was higher in AVTMS, PVT01, and PVT02s than in AVTES and PAVT. The scatterplots of yield estimates developed from the different regression models were compared with the corresponding actual values measured in the variety trials (Fig. 7). The distribution of the scatterplots for the various experiments was similar. The coef- ficient of determination (R2) obtained from Cubist, the best performing model, is shown on the graph, ranging from 0.77 to 0.84, suggesting good predicting power by the models. As can be noticed from the scatterplots, most models overestimated the actual grain yield at low measured values; while underestimating them at high actual yields. The extent of this discrepancy differed with the different experiments. For instance, in AVTES, the extremely low yields were more accurately predicted by Cubist, RF, and XGBoost. A similar trend of performance can be seen in PAVT and PVT01. Fig. 7. Comparison of the five models measured against predicted soybean plot yield for the breeding experiments. The best fit line was obtained from the Cubist model. 11 T.R. Alabi et al. R e m o t e S e n s in g A p p l i c a t io n s : S o c ie t y a n d E n v i r o n m e n t 27 (2022) 100782 3.4. Visual assessment of the spatial variation of soybean yield prediction map Effective grain yield prediction models must capture spatial variabilities caused by changes in terrain, irrigation, soil fertility, irrigation, and other environmental factors influencing crop development (Rischbeck et al., 2016). Hence, predicted yield maps were compared with the measured grain yield map for the two trials, as shown in Fig. 8. At the left uppermost corner, the measured plot yield map showed spatial variability of soybean grain yield for the two advanced variety trials (AVTMS and AVTES). The AVTES trial consisted of 16 genotypes, including standard check varieties, while the AVTMS consisted of 45 soybean genotypes. As seen in Fig. 8, the five models captured the spatial variation of grain yield shown for the measured plots. The models successfully delineated fields with high, medium, and low grain yields for AVTMS and AVTES. This result indicates that the five regression models were adaptable over space and accurately detected the within-field heterogeneity. 3.5. Impact of input features on soybean yield prediction The impact of different predictor variables on yield prediction by the Random Forest algorithm is presented in Fig. 9. Variable importance obtained from the models that used vegetation indices coupled with canopy height and spectral bands (VI model) is presented in Fig. 9a. Fig. 9b shows the feature importance obtained from the model that utilized textural data combined with canopy height and spectral bands (GLCM model). Fig. 9a and b showed that the canopy height of the two soybean phenological stages Fig. 8. Comparison of actual soybean plot yield against prediction maps by five ML models (Cubist, RF, SVM and XGBOOST) for two Trials (AVTES and AVTMS). 12 T.R. Alabi et al. R e m o t e S e n s in g A p p l i c a t io n s : S o c ie t y a n d E n v i r o n m e n t 27 (2022) 100782 Fig. 9. (a) The importance of spectral bands, canopy height and vegetation indices; (b) spectral bands, canopy height, and texture attributes based on the GLCM on soybean yield prediction. Features obtained in September (soybean R1 stage) are distinguished with “1” added to their names, while “2” was added to October features (Soybean R6 stage). Only about 75% of the predictors are shown in both cases. The GLCM features (Fig. 9b) are shortened (e.g. mGreen = mean of Green, vGreen = variance of Green, enGreen = entropy of Green, dsGreen = dissimilarity of Green, ctGreen = contrast of Green, hGreen = homogeneity of Green, smGreen = second moment of Green). (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.) appeared highly influential in soybean yield prediction. They ranked 1st and 2nd among the predictors of the VI model. They ranked high 1st and 4th among the predictor variables’ importance obtained with the GLCM model. Spectral bands (GREEN2, GREEN1, NIR2) were among the most critical predictors of soybean grain yield as they occurred 3rd, 5th and 7th in order of importance (Fig. 9a). Consequently, the VIs based on GREEN and NIR bands (NDWI, GNDVI, GCVI) were prominently influential among input variables to the VI model. Surprisingly, NDVI, the most widely applied index in vegetation and crop prediction studies, did not rank among the best 20 estimators of soybean grain yield. The variable importance plots (Fig. 9a and b) also reveal the special significance of estimators collected during the later growth stage of soybean (R6). For example, the four most influential VIs (NDWI2, GNDVI2, MCARI2, and GCVI2) displayed in Fig. 9a were based on the UAV data collected in October (R6 growth stage), signifying that the later soybean reproductive phase is crucial for yield modeling. Furthermore, among the ten most significant textural features (mGreen2, vGreen2, Green2, vRed2, mRed2, mNIR2,vRe- dedge2) influencing grain yield detection (Fig. 9b), seven of them were from the data collected during the soybean R6 stage in October, further confirming the great utility of variables collected at that soybean reproductive phase. 4. Discussion Soybean grain yield prediction is of paramount importance as it can inform policy on agriculture, climate change adaptation, and understanding of crop phenotyping. Several researchers have demonstrated the utility of ML models combined with UAV-based im- agery to estimate crop grain yields. This study used five ML models for quantitative soybean grain yield prediction. Generally, our results show that the five ML regressions models (RF, Cubist, XGBoost, GBM, and SVM) can estimate soybean yields accurately. However, among the five ML models, the performance of Cubist was the best, closely followed by the RF models. Some regression studies show that the Cubist model outperformed RF. For instance, Zhou et al. (2019) observed a superior performance of the Cubist 13 T.R. Alabi et al. R e m o t e S e n s in g A p p l i c a t io n s : S o c ie t y a n d E n v i r o n m e n t 27 (2022) 100782 models over the RF, while Noi et al. (2017) recommended both the Cubist and RF models for predicting air temperature over other regression methods. Wang et al. (2016) similarly reported the complementary performance of both Cubist and RF algorithms. Overall, the R2 obtained for soybean grain yield prediction in this study was similar to recent studies (Eugenio et al., 2020; Herrero-Huerta et al., 2020; Randelović et al., 2020). Maimaitijiang et al. (2020) employed some ML regression models in their study to predict soybean yield. They reported that the Deep Neural Network (DNN) and RF were outstanding in predicting soybean yield under irri- gated and non-irrigated conditions. They reported R2 values of about 0.85 and 0.83 for irrigated and rainfed soybean, respectively, similar to the prediction by Cubist and RF models in the present work. Furthermore, they found lower values of R2 for Partial Least Squares Regression (PLSR) and SVM than what was obtained for RF and DNN. They concluded that although some models gave a better performance, prediction accuracy enhancement was insignificant. The impact of the inclusion of diverse feature attributes in crop trait modeling has been explored in many studies (da Silva et al., 2020; Rischbeck et al., 2016; Toda et al., 2021; Zheng et al., 2019). This study evaluated the power of UAV-derived crop features such as canopy height, spectral bands, vegetation indices, and GLCM in soybean yield prediction. The findings revealed that crop height data derived from the 3D point cloud model are valuable in soybean grain yield prediction. The extreme importance of UAV-derived canopy height for crop yield estimation or classification has been highlighted in recent literature (Tao et al., 2020; Yu et al., 2016). Sagan et al. (2019) tested the influence of crop height on soybean yield prediction. They proposed that plant height information is a promising alternative to traditional vegetation indices in crop yield prediction. Furthermore, Kedia et al. (2021) reported that incorporating UAV-derived canopy height feature increased overall accuracy from 80 to 93%; while mapping invasive vegetation types in the arid regions of the USA. This study further elucidates the potential of canopy textural attributes (GLCM) as essential input variables for soybean yield modeling. Our results suggest that GLCM variables are promising alternatives to the popular vegetation indices (VIs), as the GLCM- based model outperformed the VI model. Among the indicators often used in crop yield prediction and identification research, vegetation indices (VIs) are the most commonly used. Several studies have documented that VIs are notable indicators of crop maturity, stress, and many other attributes (Ballester et al., 2017; Sankaran et al., 2018; Yeom et al., 2019; X. Zhou et al., 2017). However, some studies have demonstrated that NDVI, one of the most popular VI, has limitations as it gets saturated with dense vegetation cover (Kedia et al., 2021; Thenkabail et al., 2000; Zheng et al., 2019). Hence, other indicators with a more comprehensive range are proved efficient for applications in landcover discrimination and crop attributes modeling (Garonna et al., 2009; Yeom et al., 2019; Zhang and Liu, 2014). Maimaitijiang et al. (2020) reported that coupling texture information in the ML models significantly enhanced soybean yield estimation accuracy. The texture attributes enable a better description of crop spatial configurations, color, and intensities. Several researchers have established the potential of texture information in crop identification and trait estimation (Böhler et al., 2018; Iqbal et al., 2021; Kwak and Park, 2019; Zheng et al., 2019). The studies by Kwak and Park (2019) revealed an improved accuracy of crop identification using texture information with ML models, while Zheng et al. (2019) reported significant performance of textural indices over vegetation indices in modeling the aboveground biomass in rice. This study revealed that applying texture information coupled with spectral and canopy height data proved superior to the VI-model for soybean yield prediction. In addition, the use of UAV imagery for soybean grain yield prediction in this study also helped to compare the input variables at two phenological stages of the crop. Our findings suggest that phenological attributes obtained during the R6 stage significantly featured in yield prediction. The findings are consistent with other studies that reported the importance of phenological stages for successful soybean trait modeling (Eugenio et al., 2020; Herrero-Huerta et al., 2020; Yoosefzadeh-Najafabadi et al., 2021). Several studies have suggested the optimal time window for soybean yield prediction to be from flowering (R2) to commencement of podding (R5) (Eugenio et al., 2020; Gao et al., 2020; WU et al., 2013). Ma et al. (2001) observed that the best development phase for predicting soybean yield was the initial seed-filling stage (R5). We collected the UAV data at the beginning (R1) and the consummation of the reproductive phase (R6), which might be responsible for the observed excellent yield prediction. However, the results in this study slightly differed from that of Eugenio et al. (2020), who reported the best modeling stage at (V6), but in agreement with other re- searchers who reported better yield prediction from phenotypic traits measured at the later stages of soybean development. Herrer- o-Huerta et al. (2020) analyzed prediction errors in soybean grain yield changes based on UAV data collected at different soybean growth stages. They concluded that estimators collected at a later growth stage of the crop showed better performance in predicting grain yield. Additionally, Randelović et al. (2020) compared soybean plant density prediction at two middle growth stages (V4 and R3) of soybean using a random forest model. They found better predictions from VIs collected from UAV data at the R3 stage. Authors still differ in the best stage of development crucial to soybean yield modeling attributing differences to location and cultivar characteristics. Additional research will be needed to ascertain the optimal phases and conditions for the best soybean trait prediction. The box plots based on the ML algorithms revealed several outliers in the PVT01 & PVT02, indicating a significant level of mixtures. This might be because the materials in these trials consisted of soybean genotypes from different maturity (early and medium maturity) groups, which were evaluated separately at later advanced variety trial stages. Conversely, the absence of outliers in the PAVT and AVTES and few outliers in AVTMS might be because the trials were composed of varieties that have passed through several stages of selection and grouped based on maturity leading to the similarity in their performances; consequently, no or few outliers. The fact that materials with different genetic backgrounds and maturity groups performed differently indicate that genetic differences in the ge- notypes have implications on yield predictions in soybean or other crops. In line with this, Shook et al. (2021) reported the insignificance of short-season variables for yield predictions of late-maturing genotypes. Ensemble-Stacking (ES) algorithm based on either the full or selected spectra reflectance can be effective in performing early selection (R5 stage) using yield prediction and identifying the best performing genotypes out of a large number of genotypes (Yoosefzadeh-Najafabadi et al., 2021) 14 T.R. Alabi et al. R e m o t e S e n s in g A p p l i c a t io n s : S o c ie t y a n d E n v i r o n m e n t 27 (2022) 100782 5. Conclusions Our findings demonstrate the enormous potential of high-resolution drone multispectral images to predict soybean yield based on texture information, vegetation indices, and reflectance bands. The five ML regression models (Cubist, RF, SVM, GBM, and XGBoost) combined with UAV-derived multispectral reflectance data were employed to predict grain yield in soybean variety trials accurately. The main findings are that GLCM-based models slightly outperformed VI-based predictors and provided a promising alternative to the conventional use of VIs in crop yield estimation. All the five ML models performed moderately well in all the soybean variety trials investigated, though the Cubist and RF models stood out, with R2 reaching 0.89. The study provides an effective practical approach for crop variety evaluations that African-based crop breeding programs have not commonly used. With advances in UAV technology, more comprehensive studies involving the collection of additional traits such as plant height, biomass, and chlorophyll concentration can provide further insights into these modern methods. Finally, this modeling framework can be easily modified and implemented for many other crops to modernize the variety of testing techniques of the breeding programs. Author contributions “Conceptualization, TA, and AA; methodology, TA; software, TA and FK; Experimental design, AA, and GC; formal analysis, TA; resources, A.A and GC; data curation, AA; writing—original draft preparation, TA; writing—review and editing, TA; A.A visualization, FK; supervision, GC; funding acquisition, AA and CG. Funding The soybean trials used in this study were funded by USAID, while the Pan African Variety Trial was supported by Feed the Future Soybean Innovation Lab, University of Illinois; while the International Institute of Tropical Agricultural (IITA) provided logistics, research facilities, and administrative support for the implementation of the trials. Data availability statement All the data used in this study are included in this published article and can be made available upon request. Ethical statement The authors declare that all ethical practices have been followed in relation to the article’s development, writing, and publication. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgements We would like to acknowledge the Soybean Innovation Lab for facilitating and financially supporting the Pan African Variety Trial (PAVT) in Nigeria. We want to acknowledge the different National Soybean Breeding Programs that contributed to the soybean va- rieties used in the Pan-African Variety Trial. References Araus, J., Kefauver, S.C., M, Z.-A., Olsen, M.S., Cairns, J.E., 2018. Translating high-throughput phenotyping into genetic gain. Trends Plant Sci. 23 (5), 451–466. https://doi.org/10.1016/J.TPLANTS.2018.02.001. Ballester, C., Hornbuckle, J., Brinkhoff, J., Smith, J., Quayle, W., 2017. Assessment of in-season cotton nitrogen status and lint yield prediction from unmanned aerial system imagery. Rem. Sens. 9 (11), 1149. https://doi.org/10.3390/RS9111149. 2017, Vol. 9, Page 1149. Baret, F., Guyot, G., 1991. Potentials and limits of vegetation indices for LAI and APAR assessment. Remote Sensing of Environment 35 (2–3), 161–173. https://doi. org/10.1016/0034-4257(91)90009-U. Barnes, E.M., Clarke, T.R., Richards, S.E., 2000. Coincident Detection of Crop Water Stress, Nitrogen Status and Canopy Density Using Ground Based Multispectral Data. Fifth International Conference on Precision Agriculture, Madison, WI. https://www.researchgate.net/profile/Peter-Waller/publication/43256762_ Coincident_detection_of_crop_water_stress_nitrogen_status_and_canopy_density_using_ground_based_multispectral_data/links/55ac358c08ae481aa7ff4da7/ Coincident-detection-of-crop-water-stress-nitrogen-status-and-canopy-density-using-ground-based-multispectral-data.pdf. Bendig, J., Yu, K., Aasen, H., Bolten, A., Bennertz, S., Broscheit, J., Gnyp, M.L., Bareth, G., 2015. Combining UAV-based plant height from crop surface models, visible, and near infrared vegetation indices for biomass monitoring in barley. Int. J. Appl. Earth Obs. Geoinf. 39, 79–87. https://doi.org/10.1016/J.JAG.2015.02.012. Birth, G.S., McVey, G.R., 1968. Measuring the color of growing turf with a reflectance Spectrophotometer1. Agron. J. 60 (6), 640–643. https://doi.org/10.2134/ AGRONJ1968.00021962006000060016X. Böhler, J.E., Schaepman, M.E., Kneubühler, M., 2018. Crop classification in a heterogeneous arable landscape using uncalibrated UAV data. Rem. Sens. 10 (8) https:// doi.org/10.3390/rs10081282. Breiman, L., 2001. Random forests. Mach. Learn. 45 (1), 5–32. https://doi.org/10.1023/A:1010933404324. 45(1), 2001. Chang, A., Jung, J., Maeda, M.M., Landivar, J., 2017. Crop height monitoring with digital imagery from Unmanned Aerial System (UAS). Comput. Electron. Agric. 141, 232–237. https://doi.org/10.1016/J.COMPAG.2017.07.008. Chang, A., Jung, J., Yeom, J., Maeda, M.M., Landivar, J.A., Enciso, J.M., Avila, C.A., Anciso, J.R., 2021. Unmanned aircraft system- (UAS-) based high-throughput phenotyping (HTP) for tomato yield estimation. J. Sens. https://doi.org/10.1155/2021/8875606, 2021. Chen, T., Guestrin, C., 2016. XGBoost: a scalable tree boosting system. Proceed. ACM SIGKDD Int. Conf. Knowled. Discov. Data Min. 785–794. https://doi.org/ 10.1145/2939672.2939785. 13-17-August-2016. 15 T.R. Alabi et al. R e m o t e S e n s in g A p p l i c a t io n s : S o c ie t y a n d E n v i r o n m e n t 27 (2022) 100782 Chigeza, G., Boahen, S., Gedil, M., Agoyi, E., Mushoriwa, H., Denwar, N., Gondwe, T., Tesfaye, A., Kamara, A., Alamu, O.E., Chikoye, D., 2019. Public sector soybean (Glycine max) breeding: advances in cultivar development in the African tropics. Plant Breed. 138 (4), 455–464. https://doi.org/10.1111/pbr.12682. Cortes, C., Vapnik, V., 1995. Support-vector networks. Mach. Learn. 20 (3), 273–297. https://doi.org/10.1007/BF00994018, 1995 20:3. da Silva, E.E., Rojo Baio, F.H., Ribeiro Teodoro, L.P., da Silva Junior, C.A., Borges, R.S., Teodoro, P.E., 2020. UAV-multispectral and vegetation indices in soybean grain yield prediction based on in situ observation. Remote Sens. Appl.: Soc. Environ. 18, 100318 https://doi.org/10.1016/J.RSASE.2020.100318. Daughtry, C.S.T., Walthall, C.L., Kim, M.S., De Colstoun, E.B., McMurtrey, J.E., 2000. Estimating corn leaf chlorophyll concentration from leaf and canopy reflectance. Remote Sensing of Environment 74 (2), 229–239. https://doi.org/10.1016/S0034-4257(00)00113-9. Deering, D.W., Rouse, J.W., Haas, R.H., Schell, J.A., 1975. Measuring “forage production” of grazing units from landsat MSS data. In: Proceedings of the 10th International Symposium on Remote Sensing of Environment, pp. 1169–1178. Degenhardt, F., Seifert, S., Szymczak, S., 2019. Evaluation of variable selection methods for random forests and omics data sets. Briefings Bioinf. 20 (2), 492–503. https://doi.org/10.1093/BIB/BBX124. Di Gennaro, S.F., Rizza, F., Badeck, F.W., Berton, A., Delbono, S., Gioli, B., Toscano, P., Zaldei, A., Matese, A., 2017. UAV-based high-throughput phenotyping to discriminate barley vigour with visible and near-infrared vegetation indices. Https://Doi.Org/10.1080/01431161.2017.1395974, 39,15–16,5330-5344. https:// doi.org/10.1080/01431161.2017.1395974. Diers, B., Scaboo, A., 2019. Soybean breeding in africa. Afr. J. Food Nutr. Sci. 19 (5), 15121–15125. https://doi.org/10.18697/ajfand.88.SILFarmDoc03. Eugenio, F.C., Grohs, M., Venancio, L.P., Schuh, M., Bottega, E.L., Ruoso, R., Schons, C., Mallmann, C.L., Badin, T.L., Fernandes, P., 2020. Estimation of soybean yield from machine learning techniques and multispectral RPAS imagery. Remote Sens. Appl.: Soc. Environ. 20 (April) https://doi.org/10.1016/j.rsase.2020.100397. FAO, 2021. FAOstat. FAO, Rome, Italy. https://www.fao.org/faostat/en/#data/QCL. accessed 12/11/2021. Freeman, E.A., Moisen, G.G., Coulston, J.W., Wilson, B.T., 2015. Random forests and stochastic gradient boosting for predicting tree canopy cover: comparing tuning processes and model performance. Can. J. For. Res. 46 (3), 323–339. https://doi.org/10.1139/cjfr-2014-0562. Fukano, Y., Guo, W., Aoki, N., Ootsuka, S., Noshita, K., Uchida, K., Kato, Y., Sasaki, K., Kamikawa, S., Kubota, H., 2021. GIS-based analysis for UAV-supported field experiments reveals soybean traits associated with rotational benefit. Front. Plant Sci. 12 (May), 1–11. https://doi.org/10.3389/fpls.2021.637694. Fushiki, T., 2009. Estimation of prediction error by using K -fold cross-validation. Stat. Comput. 21 (2), 137–146. https://doi.org/10.1007/S11222-009-9153-8, 2009 21:2. Gao, F., Anderson, M., Daughtry, C., Karnieli, A., Hively, D., Kustas, W., 2020. A within-season approach for detecting early growth stages in corn and soybean using high temporal and spatial resolution imagery. Remote Sensing of Environment 242, 111752. https://doi.org/10.1016/J.RSE.2020.111752. Garonna, I., Fazey, I., Brown, M.E., Pettorelli, N., 2009. Rapid primary productivity changes in one of the last coastal rainforests: the case of Kahua, Solomon Islands. Environ. Conserv. 36 (3), 253–260. https://doi.org/10.1017/S0376892909990208. Gitelson, A.A., 2004. Wide dynamic range vegetation index for remote quantification of biophysical characteristics of vegetation. J. Plant Physiol. 161 (2), 165–173. https://doi.org/10.1078/0176-1617-01176. Gitelson, A.A., Gritz, Y., Merzlyak, M.N., 2003. Relationships between leaf chlorophyll content and spectral reflectance and algorithms for non-destructive chlorophyll assessment in higher plant leaves. J. Plant Physiol. 160 (3), 271–282. https://doi.org/10.1078/0176-1617-00887. Gitelson, A.A., Merzlyak, M.N., 1998. Remote sensing of chlorophyll concentration in higher plant leaves. Adv. Space Res. 22 (5), 689–692. https://doi.org/10.1016/ S0273-1177(97)01133-2. Gitelson, A.A., Viña, A., Ciganda, V., Rundquist, D.C., Arkebauer, T.J., 2005. Remote estimation of canopy chlorophyll content in crops. Geophys. Res. Lett. 32 (8), 1–4. https://doi.org/10.1029/2005GL022688. Haralick, R.M., Dinstein, I., Shanmugam, K., 1973. Textural features for image classification. IEEE Transact. Sys. Man and Cybernet., SMC- 3 (6), 610–621. https:// doi.org/10.1109/TSMC.1973.4309314. Hassan, M.A., Yang, M., Fu, L., Rasheed, A., Zheng, B., Xia, X., Xiao, Y., He, Z., 2019. Accuracy assessment of plant height using an unmanned aerial vehicle for quantitative genomic analysis in bread wheat. Plant Methods 15 (1), 1–12. https://doi.org/10.1186/s13007-019-0419-7. Herrero-Huerta, M., Rodriguez-Gonzalvez, P., Rainey, K.M., 2020. Yield prediction by machine learning from UAS-based mulit-sensor data fusion in soybean. Plant Methods 16 (1), 1–16. https://doi.org/10.1186/s13007-020-00620-6. Huete, A.R., 1988. A soil-adjusted vegetation index (SAVI). Remote Sensing of Environment 25 (3), 295–309. https://doi.org/10.1016/0034-4257(88)90106-X. Iqbal, N., Mumtaz, R., Shafi, U., Zaidi, S.M.H., 2021. Gray level co-occurrence matrix (GLCM) texture based crop classification using low altitude remote sensing platforms. PeerJ Comp. Sci. 7, e536. https://doi.org/10.7717/peerj-cs.536. Johansen, K., Morton, M.J.L., Malbeteau, Y.M., Aragon, B., Al-Mashharawi, S.K., Ziliani, M.G., Angel, Y., Fiene, G.M., Negrão, S.S.C., Mousa, M.A.A., Tester, M.A., McCabe, M.F., 2019. Unmanned aerial vehicle-based phenotyping using morphometric and spectral analysis can quantify responses of wild tomato plants to salinity stress. Front. Plant Sci. 10, 370. https://doi.org/10.3389/FPLS.2019.00370. Jordan, C.F., 1969. Derivation of leaf-area index from quality of light on the forest floor. Ecology 50 (4), 663–666. https://doi.org/10.2307/1936256. Kedia, A.C., Kapos, B., Liao, S., Draper, J., Eddinger, J., Updike, C., Frazier, A.E., 2021. An integrated spectral–structural workflow for invasive vegetation mapping in an arid region using drones. Drones 5 (1), 19. https://doi.org/10.3390/DRONES5010019. Khojely, D.M., Ibrahim, S.E., Sapey, E., Han, T., 2018. History, current status, and prospects of soybean production and research in sub-Saharan Africa. Crop J. 6 (3), 226–235. https://doi.org/10.1016/j.cj.2018.03.006. Crop Science Society of China/Institute of Crop Sciences. Kursa, Miron B., Rudnicki, W.R., 2010. Feature selection with the boruta package. J. Stat. Software 36 (11), 1–13. https://doi.org/10.18637/jss.v036.i11. Kursa, Bartosz, Miron, Rudnicki, W.R., 2021. Package ‘boruta’-wrapper algorithm for all relevant feature selection. https://gitlab.com/mbq/Boruta/. Kwak, G.H., Park, N.W., 2019. Impact of texture information on crop classification with machine learning and UAV images. Appl. Sci. 9 (4) https://doi.org/10.3390/ app9040643. Leutner, B., Horning, N., Schwalb-Willmann, J., Hijmans, R.J., 2019. Tools for remote sensing data analysis-package ‘RStoolbox. https://github.com/bleutner/ RStoolbox. Ma, B.L., Dwyer, L.M., Costa, C., Cober, E.R., Morrison, M.J., 2001. Early prediction of soybean yield from canopy reflectance measurements. Agron. J. 93 (6), 1227–1234. https://doi.org/10.2134/AGRONJ2001.1227. Maimaitijiang, M., Sagan, V., Sidike, P., Hartling, S., Esposito, F., Fritschi, F.B., 2020. Soybean yield prediction from UAV using multimodal data fusion and deep learning. Remote Sensing of Environment 237 (December 2019). https://doi.org/10.1016/j.rse.2019.111599. Makanza, R., Zaman-Allah, M., Cairns, J.E., Magorokosho, C., Tarekegne, A., Olsen, M., Prasanna, B.M., 2018. High-throughput phenotyping of canopy cover and senescence in maize field trials using aerial digital canopy imaging. Rem. Sens. 10 (2), 330. https://doi.org/10.3390/RS10020330, 2018, Vol. 10, Page 330. Malambo, L., Popescu, S.C., Murray, S.C., Putman, E., Pugh, N.A., Horne, D.W., Richardson, G., Sheridan, R., Rooney, W.L., Avant, R., Vidrine, M., McCutchen, B., Baltensperger, D., Bishop, M., 2018. Multitemporal field-based plant height estimation using 3D point clouds generated from small unmanned aerial systems high- resolution imagery. Int. J. Appl. Earth Obs. Geoinf. 64, 31–42. https://doi.org/10.1016/J.JAG.2017.08.014. McFEETERS, S.K., 2007. The use of the Normalized Difference Water Index (NDWI) in the delineation of open water features. Https://Doi.Org/10.1080/ 01431169608948714. https://doi.org/10.1080/01431169608948714, 17,7,1425-1432. Nguyen, Q.H., Ly, H.B., Ho, L.S., Al-Ansari, N., Van Le, H., Tran, V.Q., Prakash, I., Pham, B.T., 2021. Influence of data splitting on performance of machine learning models in prediction of shear strength of soil. Math. Probl Eng. https://doi.org/10.1155/2021/4832864, 2021. Noi, P.T., Degener, J., Kappas, M., 2017. Comparison of multiple linear regression, cubist regression, and random forest algorithms to estimate daily air surface temperature from dynamic combinations of MODIS LST data. Rem. Sens. 9 (5), 398. https://doi.org/10.3390/RS9050398, 2017, Vol. 9, Page 398. Oladoye, A., 2015. Physicochemical properties of soil under two different depths in a tropical forest of international institute of tropical agriculture, Abeokuta, Ibadan , Nigeria. J. Res. Forest. Wildlife Environ. 7 (1), 40–54. https://www.ajol.info/index.php/jrfwe/article/view/116910. Perry, C.R., Lautenschlager, L.F., 1984. Functional equivalence of spectral vegetation indices [Species, leaf area, stress, biomass, multispectral scanner measurements, Landsat, remote sensing]. Remote Sensing of Environment. https://agris.fao.org/agris-search/search.do?recordID=US19850043085. 16 T.R. Alabi et al. R e m o t e S e n s in g A p p l i c a t io n s : S o c ie t y a n d E n v i r o n m e n t 27 (2022) 100782 Pinty, B., Verstraete, M.M., 1992. GEMI: a non-linear index to monitor global vegetation from satellites. Vegetatio 101 (1), 15–20. https://doi.org/10.1007/ BF00031911, 1992 101:1. Qi, J., Chehbouni, A., Huete, A.R., Kerr, Y.H., Sorooshian, S., 1994. A modified soil adjusted vegetation index. Remote Sensing of Environment 48 (2), 119–126. https://doi.org/10.1016/0034-4257(94)90134-1. Quinlan, J.R., Quinlan, J.R., 1992. Learning with continuous classes. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.34.885, 343-348. Randelović, P., Ðordević, V., Milić, S., Bale ević-Tubić, S., Petrović, K., Miladinović, J., Ðukić, V., 2020. Prediction of soybean plant density using a machine learning model and vegetation indices extracted from RGB images taken with a UAV. Agronomy 10 (8). https://doi.org/10.3390/agronomy10081108. Räsänen, A., Virtanen, T., 2019. Data and resolution requirements in mapping vegetation in spatially heterogeneous landscapes. Remote Sensing of Environment 230 (December 2018), 111207. https://doi.org/10.1016/j.rse.2019.05.026. Richardson, A.J., Weigand, C., 1977. Distinguishing vegetation from soil background information. Photogramm. Eng. Rem. Sens. http://www.asprs.org/wp-content/ uploads/pers/1977journal/dec/1977_dec_1541-1552.pdf. Rischbeck, P., Elsayed, S., Mistele, B., Barmeier, G., Heil, K., Schmidhalter, U., 2016. Data fusion of spectral, thermal and canopy height parameters for improved yield prediction of drought stressed spring barley. Eur. J. Agron. 78, 44–59. https://doi.org/10.1016/J.EJA.2016.04.013. Roth, L., Streit, B., 2018. Predicting cover crop biomass by lightweight UAS-based RGB and NIR photography: an applied photogrammetric approach. Precis. Agric. 19 (1), 93–114. https://doi.org/10.1007/S11119-017-9501-1. Rouse, J.W., Hass, R., Schell, J.A., Deering, D.W., 1974. Monitoring vegetation systems in the great plains with ERTS. In: Freden, S.C., Mercanti, E.P., Becker, M.A. (Eds.), Third Earth Resources Technology Satellite-1 Symposium, vol. 1. Technical Presentations, NASA. https://www.scopus.com/record/display.uri?eid=2-s2. 0-24344476424&origin=inward&txGid=efc9a15464a8e9860966c08a43cafc7b. Washington, D.C. Sagan, V., Maimaitijiang, M., Sidike, P., Maimaitiyiming, M., Erkbol, H., Hartling, S., Peterson, K.T., Peterson, J., Burken, J., Fritschi, F., 2019. Uav/satellite multiscale data fusion for crop monitoring and early stress detection. Int. Arch. Photogrammet. Rem. Sens. Spat. Inform. Sci. ISPRS Arch. 42 (2/W13), 715–722. https://doi.org/10.5194/isprs-archives-XLII-2-W13-715-2019. Sanchez-Pinto, L.N., Venable, L.R., Fahrenbach, J., Churpek, M.M., 2018. Comparison of variable selection methods for clinical predictive modeling. Int. J. Med. Inf. 116 (October 2017), 10–17. https://doi.org/10.1016/j.ijmedinf.2018.05.006. Sankaran, S., Zhou, J., Khot, L.R., Trapp, J.J., Mndolwa, E., Miklas, P.N., 2018. High-throughput field phenotyping in dry bean using small unmanned aerial vehicle based multispectral imagery. Comput. Electron. Agric. 151, 84–92. https://doi.org/10.1016/J.COMPAG.2018.05.034. Santos, M., 2019. Soybean varieties in sub-Saharan Africa. Afr. J. Food Nutr. Sci. 19 (5), 15136–15139. https://doi.org/10.18697/ajfand.88.SILFarmDoc06. Shook, J., Gangopadhyay, T., Wu, L., Ganapathysubramanian, B., Sarkar, S., Singh, A.K., 2021. Crop yield prediction integrating genotype and weather variables using deep learning. PLoS One 16 (6 June 2021), 1–19. https://doi.org/10.1371/journal.pone.0252402. Sidike, P., Sagan, V., Qumsiyeh, M., Maimaitijiang, M., Essa, A., Asari, V., 2018. Adaptive trigonometric transformation function with image contrast and color enhancement: application to unmanned aerial system imagery. Geosci. Rem. Sens. Lett. IEEE 15 (3), 404–408. https://doi.org/10.1109/LGRS.2018.2790899. Sinclair, T.R., Marrou, H., Soltani, A., Vadez, V., Chandolu, K.C., 2014. Soybean production potential in Africa. Global Food Secur. 3 (1), 31–40. https://doi.org/ 10.1016/j.gfs.2013.12.001. Singh, A., Ganapathysubramanian, B., Singh, A.K., Sarkar, S., 2016. Machine learning for high-throughput stress phenotyping in plants. Trends Plant Sci. 21 (2), 110–124. https://doi.org/10.1016/J.TPLANTS.2015.10.015. Speiser, J.L., Miller, M.E., Tooze, J., Ip, E., 2019. A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl. 134, 93–101. https://doi.org/10.1016/j.eswa.2019.05.028. Stanton, C., Starek, M.J., Elliott, N., Brewer, M., Maeda, M.M., Chu, T., 2017. Unmanned aircraft system-derived crop height and normalized difference vegetation index metrics for sorghum yield and aphid stress assessment. J. Appl. Remote Sens. 11 (2), 026035 https://doi.org/10.1117/1.JRS.11.026035. Suab, S.A., Avtar, R., 2020. Unmanned aerial vehicle system (UAVS) applications in forestry and plantation operations: experiences in sabah and sarawak, Malaysian borneo. In: Unmanned Aerial Vehicle: Applications in Agriculture and Environment. https://doi.org/10.1007/978-3-030-27157-2_8. Tao, H., Feng, H., Xu, L., Miao, M., Yang, G., Yang, X., Fan, L., 2020. Estimation of the yield and plant height of winter wheat using UAV-based hyperspectral images. Sensors 20 (4), 1231. https://doi.org/10.3390/S20041231, 2020, Vol. 20, Page 1231. Thenkabail, P.S., Smith, R.B., De Pauw, E., 2000. Hyperspectral vegetation indices and their relationships with agricultural crop characteristics. Remote Sensing of Environment 71 (2), 158–182. https://doi.org/10.1016/S0034-4257(99)00067-X. Thiam, A.K., 1997. Geographic Information Systems and Remote SensingMethods for Assessing and Monitoring Land Degradation in the Sahel:The Case of Southern Mauritania. Clark University, Worcester Massachusetts. Toda, Y., Kaga, A., Kajiya-Kanegae, H., Hattori, T., Yamaoka, S., Okamoto, M., Tsujimoto, H., Iwata, H., 2021. Genomic prediction modeling of soybean biomass using UAV-based remote sensing and longitudinal model parameters. Plant Genome 14 (3), e20157. https://doi.org/10.1002/TPG2.20157. Wang, B., Oldham, C., Hipsey, M.R., 2016. Comparison of machine learning techniques and variables for groundwater dissolved organic nitrogen prediction in an urban area. Procedia Eng. 154, 1176–1184. https://doi.org/10.1016/J.PROENG.2016.07.527. Watanabe, K., Guo, W., Arai, K., Takanashi, H., Kajiya-Kanegae, H., Kobayashi, M., Yano, K., Tokunaga, T., Fujiwara, T., Tsutsumi, N., Iwat, H., 2017. High- throughput phenotyping of sorghum plant height using an unmanned aerial vehicle and its application to genomic prediction modeling. Front. Plant Sci. 8 https://doi.org/10.3389/FPLS.2017.00421. Wu, Q., Qi, B., Zhao, T.-J., Yao, X.-F., Zhu, Y., Gai, J.-Y., 2013. A tentative study on utilization of canopy hyperspectral reflectance to estimate canopy growth and seed yield in soybean. Acta Agron. Sin. 39 (2), 309. https://doi.org/10.3724/SP.J.1006.2013.00309. Yang, G., Liu, J., Zhao, C., Li, Z., Huang, Y., Yu, H., Xu, B., Yang, X., Zhu, D., Zhang, X., Zhang, R., Feng, H., Zhao, X., Li, Z., Li, H., Yang, H., 2017. Unmanned aerial vehicle remote sensing for field-based crop phenotyping: current status and perspectives. Front. Plant Sci. 8, 1111. https://doi.org/10.3389/FPLS.2017.01111/ BIBTEX. Yeom, J., Jung, J., Chang, A., Ashapure, A., Maeda, M., Maeda, A., Landivar, J., 2019. Comparison of vegetation indices derived from UAV data for differentiation of tillage effects in agriculture. Rem. Sens. 11 (13) https://doi.org/10.3390/rs11131548. Yoosefzadeh-Najafabadi, M., Earl, H.J., Tulpan, D., Sulik, J., Eskandari, M., 2021. Application of machine learning algorithms in plant breeding: predicting yield from hyperspectral reflectance in soybean. Front. Plant Sci. 11 (January), 1–14. https://doi.org/10.3389/fpls.2020.624273. Yu, N., Li, L., Schmitz, N., Tian, L.F., Greenberg, J.A., Diers, B.W., 2016. Development of methods to improve soybean yield estimation and predict plant maturity with an unmanned aerial vehicle based platform. Remote Sensing of Environment 187, 91–101. https://doi.org/10.1016/J.RSE.2016.10.005. Zhang, S., Liu, L., 2014. The potential of the MERIS Terrestrial Chlorophyll Index for crop yield prediction. Http://Dx.Doi.Org/10.1080/2150704X.2014.963734. https://doi.org/10.1080/2150704X.2014.963734, 5,8,733-742. Zheng, H., Cheng, T., Zhou, M., Li, D., Yao, X., Tian, Y., Cao, W., Zhu, Y., 2019. Improved estimation of rice aboveground biomass combining textural and spectral analysis of UAV imagery. Precis. Agric. 20 (3), 611–629. https://doi.org/10.1007/s11119-018-9600-7. Zhou, J., Li, E., Wei, H., Li, C., Qiao, Q., Armaghani, D.J., 2019. Random forests and cubist algorithms for predicting shear strengths of rockfill materials. Appl. Sci. 9 (8), 1621. https://doi.org/10.3390/APP9081621, 2019, Vol. 9, Page 1621. Zhou, X., Zheng, H.B., Xu, X.Q., He, J.Y., Ge, X.K., Yao, X., Cheng, T., Zhu, Y., Cao, W.X., Tian, Y.C., 2017. Predicting grain yield in rice using multi-temporal vegetation indices from UAV-based multispectral and digital imagery. ISPRS J. Photogrammetry Remote Sens. 130, 246–255. https://doi.org/10.1016/J. ISPRSJPRS.2017.05.003. Zvoleff, Alex, 2020. Glcm: calculate textures from grey-level Co-occurrence matrices (GLCMs) version 1.6.5 from CRAN. CRAN Package ‘Glcm. https://rdrr.io/cran/ glcm/. 17