SOIL, 10, 189–209, 2024 https://doi.org/10.5194/soil-10-189-2024 © Author(s) 2024. This work is distributed under the Creative Commons Attribution 4.0 License. SOIL Reference soil groups map of Ethiopia based on legacy data and machine learning-technique: EthioSoilGrids 1.0 Ashenafi Ali1,2,3,4, Teklu Erkossa3, Kiflu Gudeta2, Wuletawu Abera4, Ephrem Mesfin2, Terefe Mekete2, Mitiku Haile6, Wondwosen Haile7, Assefa Abegaz1, Demeke Tafesse12, Gebeyhu Belay7, Mekonen Getahun8,9, Sheleme Beyene10, Mohamed Assen1, Alemayehu Regassa11, Yihenew G. Selassie9, Solomon Tadesse12, Dawit Abebe13, Yitbarek Wolde13, Nesru Hussien2, Abebe Yirdaw2, Addisu Mera2, Tesema Admas2, Feyera Wakoya2, Awgachew Legesse2, Nigat Tessema2,10, Ayele Abebe14, Simret Gebremariam2, Yismaw Aregaw2, Bizuayehu Abebaw2, Damtew Bekele12, Eylachew Zewdie7, Steffen Schulz3, Lulseged Tamene4, and Eyasu Elias2,5 1Department of Geography and Environmental Studies, Addis Ababa University (AAU), Addis Ababa, Ethiopia 2Ministry of Agriculture (MoA), Addis Ababa, Ethiopia 3Deutsche Gesellschaft für Internationale Zusammenarbeit (GIZ), Addis Ababa, Ethiopia 4International Center for Tropical Agriculture (CIAT), Addis Ababa, Ethiopia 5Center for Environmental Science, Addis Ababa University, Addis Ababa, Ethiopia 6Land Resource Management and Environmental Protection, Mekelle University, Mekelle, Ethiopia 7private consultant: Addis Ababa, Ethiopia 8Amhara Design and Supervision Enterprise (ADSE), Bahir Dar, Ethiopia 9Department of Natural Resources Management, BahirDar University (BDU), Bahir Dar, Ethiopia 10School of Plant and Horticultural Science, Hawassa University (HU), Hawassa, Ethiopia 11Department of Natural Resource Management, Jimma University (JU), Jimma, Ethiopia 12Ethiopian Construction Design and Supervision Works Corporation (ECDSWCo), Addis Ababa, Ethiopia 13Engineering Corporation of Oromia, Addis Ababa, Ethiopia 14National Soil Testing Center, MoA, Addis Ababa, Ethiopia Correspondence: Ashenafi Ali (ashenafi.ali@aau.edu.et, ashenafi2010ali@gmail.com) Received: 6 May 2022 – Discussion started: 23 May 2022 Revised: 21 November 2023 – Accepted: 9 January 2024 – Published: 5 March 2024 Abstract. Up-to-date digital soil resource information and its comprehensive understanding are crucial to sup- porting crop production and sustainable agricultural development. Generating such information through conven- tional approaches consumes time and resources, and is difficult for developing countries. In Ethiopia, the soil resource map that was in use is qualitative, dated (since 1984), and small scaled (1 : 2 M), which limit its practical applicability. Yet, a large legacy soil profile dataset accumulated over time and the emerging machine-learning modeling approaches can help in generating a high-quality quantitative digital soil map that can provide better soil information. Thus, a group of researchers formed a Coalition of the Willing for soil and agronomy data- sharing and collated about 20 000 soil profile data and stored them in a central database. The data were cleaned and harmonized using the latest soil profile data template and 14 681 profile data were prepared for modeling. Random forest was used to develop a continuous quantitative digital map of 18 World Reference Base (WRB) soil groups at 250 m resolution by integrating environmental covariates representing major soil-forming factors. The map was validated by experts through a rigorous process involving senior soil specialists or pedologists checking the map based on purposely selected district-level geographic windows across Ethiopia. The map is expected to be of tremendous value for soil management and other land-based development planning, given its improved spatial resolution and quantitative digital representation. Published by Copernicus Publications on behalf of the European Geosciences Union. 190 A. Ali et al.: EthioSoilGrids 1.0 1 Introduction Soils are important resources that support the development and production of various economic, social, and ecosystem services, and are useful in climate change mitigation and adaptation (Baveye et al., 2016). Data on the physical and chemical characteristics of soils and their spatial distribution are needed to define and plan their functions over time and space, which are important steps toward sustainable use and management of soils (Elias, 2016; Hengl et al., 2017). In Ethiopia, soil surveys and mapping have been con- ducted at various scales with varying scopes, approaches, methodologies, qualities, and levels of detail (Abayneh, 2001; Abayneh and Berhanu, 2007; Berhanu, 1994; Elias, 2016; Zewdie, 2013). The most recent countrywide digital soil mapping efforts focused primarily on soil characteristics (Ali et al., 2020; Iticha and Chalsissa, 2019; Tamene et al., 2017), although soil class maps are equally important for al- locating a particular soil unit for specific use (Leenaars et al., 2020a; Wadoux et al., 2020). Many attempts have been made to improve digital soil information systems (Hengl et al., 2021, 2017, 2015; Poggio et al., 2021). However, the initia- tives were based on limited and unevenly distributed soil pro- file data (e.g., 1.15 soil profiles per 1000 km2 for Ethiopia), which restricts the accuracy and applicability of the products. In Ethiopia, thousands of soil profile data have been col- lected since the 1960s (Erkossa et al., 2022), but these data were scattered across different institutions and individuals (Ali et al., 2020). Furthermore, countrywide quantitative and gridded spatial soil-type information does not exist (Elias, 2016). The Ethiopian Soil Information System (EthioSIS) project attempted to develop a countrywide digital soil map focusing on topsoil characteristics, including plant nutri- ent content, but overlooked soil resource mapping (Ali et al., 2020; Elias, 2016), despite a strong need for a high- resolution soil resource map (Mulualem et al., 2018). Ethiopia has an area of about 1.14× 106 km2 consisting of varied environments, making its soils extremely heteroge- neous. Capturing the heterogeneity using conventional soil survey and mapping approaches is an expensive and time- consuming endeavor (Hounkpatin et al., 2018). This can be circumvented by using available legacy soil profile data accu- mulated over decades and by tapping into the potential of ad- vanced analytical techniques to develop high-resolution digi- tal soil maps (Hounkpatin et al., 2018; Kempen, 2012, 2009). Therefore, the objectives of this study were to (i) develop a national legacy soil profile dataset that can be used as an in- put for various digital soil mapping exercises, and (ii) gener- ate an improved 250 m digital Reference Soil Groups (RSGs) map of Ethiopia. 2 Methods 2.1 The study area The study area covered the entire area of Ethiopia (1.14× 106 km2) located between 3 and 15° N, and between 33 and 48° E (Fig. 1). The topography of the country is marked by a large altitudinal variation, ranging from 126 m below sea level at Dalol in the northeast to 4620 m at Ras Dashen Mountain in the northwest (Billi, 2015; Enyew and Steen- eveld, 2014). Ethiopia’s wide range of topography, climate, parent material, and land use types created conditions for the formation of different soil types (Abayneh, 2005; Berhanu and Ochtman, 1974; Donahue, 1972; Mesfin, 1998; Nyssen et al., 2019; Virgo and Munro, 1978; Zewdie, 2013, 1999). More than 33 % of the country is covered by the central, upper, and highland complex (Abegaz et al., 2022), which embraces Africa’s most prominent mountain system (Hurni, 1998). The country’s complex topography strongly determines both rainfall and temperature patterns, by modifying the in- fluence of the large-scale ocean–land–atmosphere pattern, thus creating diverse localized climates. Spatially, rainfall is characterized by a general decreasing trend in the direction from west to east, north, northeast, south and southeast. The lowlands in the southeast and northeast, covering approxi- mately 55 % of the country’s land area, are characterized by arid and semi-arid climates. Annual rainfall ranges from less than 300 mm in the southeastern and northwestern low- lands to over 2000 mm in the southwestern highlands (south- ern portion of the western highlands). The eastern lowlands get rain twice a year, in April–May and October–November, with two dry periods in between. The total annual precipi- tation in this region varies from less than 500 to 1000 mm. The driest of all regions is the Denakil Plain, which receives less than 500 mm of rain and sometimes none (Fazzini et al., 2015). Temperatures are also greatly influenced by the rapidly changing altitude, and the mean monthly values vary from ∼ 35 °C in the northeast lowlands to less than 7.5 °C over the north and central highlands. The country is characterized by a wide variety of geo- logical formations (Abyneh, 2005; Alemayehu et al., 2014; Elias, 2016; Zewdie, 2013). These include (i) recent and old volcanic activities; (ii) the highlands consisting of igneous rocks (mainly basalts); (iii) steep-sided valleys characterized by strong colluvial and alluvial deposits; (iv) metamorphic rocks exposed by denudation process; and (v) various sedi- mentary rocks such as limestone and sandstone in the rela- tively lower areas. Diverse biophysical factors affecting the spatial distribu- tion of vegetated land cover which in turn, both as single and combined factors, result in diverse soil types and prop- erties across Ethiopia’s landscapes (Hurni, 1998; Nyssen et al., 2019; WLRC-AAU, 2018). The spatiotemporal vegeta- SOIL, 10, 189–209, 2024 https://doi.org/10.5194/soil-10-189-2024 A. Ali et al.: EthioSoilGrids 1.0 191 Figure 1. Location map of Ethiopia (inset) and overview map of Esri World Topographic Map. tion cover of the country has been characterized by a long history of land use and land cover changes (WLRC-AAU, 2018). In terms of the type and spatial coverage of major land use and land cover classes, woody vegetation (forest, wood- land, and shrub and bush lands) covers about 57 % of the country in accordance with the national 2016 map (WLRC- AAU, 2018). This is followed by cultivated land (20 %) and grasslands (12 %). Barren lands are estimated to cover about 1/10 of the area of the country while other minor lands with ecological significance (i.e., wetlands, water bodies, and sub- afro-alpine and afro-alpine) cover about 1.2 % of the coun- try’s land mass. 2.2 Legacy soil profile data collation and preparation The soil profile data generated over decades through vari- ous soil survey missions were kept in a variety of formats with limited accessibility. There has been no institution with a mandate to coordinate the generation, collation, harmoniza- tion, and sharing of soil profile data. This led to the formation of a group of individuals and institutions who were willing to exchange soil and agronomy data. Established in 2018, the group known as the Coalition of the Willing (CoW) was committed to addressing the challenges posed by the lack of soil and agronomy data access and sharing in the country (Tamene et al., 2021). The CoW conducted a national soil and agronomy data ecosystem mapping which revealed that a plethora of legacy soil resource datasets exist across different institutions and individuals (Ali et al., 2020). The assessment also revealed that a sizable proportion of the data holders were willing to share the data in their custody, provided that some regula- tions were put in place to administer the data. The CoW de- veloped and approved internal data-sharing guidelines (CoW, 2020), and facilitated data collation campaigns, which in- volved both formal and informal approaches to data holders. Through a data collation campaign, soil profile data col- lected between the 1970s and 2021 were acquired from over 88 diverse sources (Ali et al., 2020; Tamene et al., 2021). Ini- tially, 8000 profile data points were collated and subjected to improved modeling techniques to create a provisional WRB reference soil group map of Ethiopia. This was presented to various partners and data-holding institutions to demonstrate the power of data sharing. This created awareness and en- abled us to mobilize and collate over 20 000 legacy soil pro- file data. These data were then added to the national data repository. https://doi.org/10.5194/soil-10-189-2024 SOIL, 10, 189–209, 2024 192 A. Ali et al.: EthioSoilGrids 1.0 The data had varying levels of completeness in terms of soil field and environmental descriptions and laboratory anal- ysis. These required a rigorous expert-based quality assess- ment and standardization before being compiled into a har- monized format. The expanded version of the Africa Soil Profile (AfSP) database (Leenaars et al., 2014) template was used for standardizing and harmonizing the data. Out of the collated soil profile data, 14 681 georeferenced data points were extracted based on completeness and cleanness for the purposes of modeling. The cleaned soil profile data set con- tained, at least, the reference soil group (RSG) nomenclature as outlined in the WRB legend. While the original soil pro- file records were set in different coordinate systems, all were projected into the adopted standard georeferencing system, namely, WGS84, decimal degrees in the QGIS (3.20.2) en- vironment (QGIS Development Team, 2021). To verify their position, soil profile locations were plotted using a standard WGS84 coordinate system to verify that points matched the site description, geomorphological settings, and at the very least the source project boundary outline. The accuracy of the data depends on the quality and reli- ability of the survey data themselves, which in turn requires expert knowledge and experience in soil description and clas- sification (Leenaars et al., 2020a). In this study, data clean- ing, validation, reclassification, and verification were carried out by a team of prominent national pedologists and soil sur- veyors, including those involved in the generation of some of the soil profile data themselves (Fig. 2). In addition, the Ministry of Agriculture (MoA) soil sur- vey and mapping experts and other volunteers validated the legacy soil profile observations. This led to the reclassifica- tion of the soil types as deemed necessary. Such validation and reclassification involved re-examining the geomorpho- logical setup of the soil profile locations using Google Earth as well as reviewing the site and soil descriptions and the cor- responding laboratory data, and reviewing the proposed soil type. The harmonized datasets in the database were used as input soil profile data for modeling and mapping IUSS WRB reference soil groups. 2.3 Preparation and selection of environmental covariates 2.3.1 Covariate acquisition and preparation In order to develop spatially continuous soil class and/or type maps, data on environmental covariates that represent directly or indirectly the soil-forming factors have to be inte- grated with soil profile data (Hengl and MacMillan, 2019). Environmental covariates are spatially explicit proxies of soil-forming factors based on the soil–environment relation- ship (McBratney et al., 2003; Shi et al., 2018). Acquisi- tion and preparation of covariates represent a crucial step in digital soil mapping using machine-learning algorithms (McBratney et al., 2003). In this study, 68 potential candi- date environmental variables representing soil-forming fac- tors (climate, organisms, relief, parent material, and time) were derived from diverse remote sensing products and the- matic maps (Hengl and MacMillan, 2019; McBratney et al., 2003). Relief and topography-related covariates were derived from a 90 m Shuttle Radar Topography Mission (SRTM) dig- ital elevation model (DEM) (Vågen, 2010). Climate-related variables including long-term mean, minimum, maximum, and standard deviation temperature as well as precipita- tion data for the period between 1983 and 2016 (Dinku et al., 2014) were acquired from Enhancing National Cli- mate Services (ENACTS-NMA) initiatives with 4 km res- olutions (Dinku et al., 2014). Moderate-resolution imag- ing spectroradiometer (MODIS) imagery raw bands and de- rived indices (Vågen, 2010) were downloaded from USGS EarthExplorer (https://earthexplorer.usgs.gov/, last access: 12 November 2021) to represent vegetation-related factors. National geological (Tefera et al., 1996) and land use and land cover (WLRC-AAU, 2018) thematic maps of Ethiopia were gathered to represent parent material and organisms, re- spectively. Downscaling (disaggregating) or upscaling (aggregating) of rasters was also performed to match the target resolu- tion. A 250 m spatial resolution was chosen to accommodate both the spatial resolution of the major covariate inputs and make it applicable for large-scale analysis. All layers were masked for buildings and water bodies by the national bound- ary of Ethiopia and a stacked layer was created using the raster package (R Core Team, 2020) to extract covariate val- ues at the locations of soil profiles. One-hot encoding using the dummyVars function available in Caret package (Kuhn, 2008) was used to pre-process and convert categorical co- variates into a binary vector. Each element of the binary vector represents the presence or absence of that category. One-hot encoding is beneficial because it enables machine- learning algorithms to interpret categorical variables as nu- merical features. The covariate pre-processing, visual inspec- tion for inconsistencies, and resampling to a target grid of 250 m were conducted in QGIS [3.20.2] (QGIS Develop- ment Team, 2021), SAGA GIS [7.8.2] (Conrad et al., 2015) and R [version 4.05] (R Core Team, 2020) software pack- ages. All input data were projected to a common Lambert azimuthal equal-area projection with the latitude of origin at 8.65 and center of meridian at 39.64, which is the cen- ter point for Ethiopia. This projection was selected since it is effective in minimizing area distortions over land. Each co- variate was adjusted to have an identical spatial resolution, extent, and projection using two resampling methods. Con- tinuous covariates were resampled using the bilinear spline method, whereas categorical covariates were resampled us- ing the nearest neighbor method. SOIL, 10, 189–209, 2024 https://doi.org/10.5194/soil-10-189-2024 https://earthexplorer.usgs.gov/ A. Ali et al.: EthioSoilGrids 1.0 193 Figure 2. Schematic presentation of data acquisition and workflow. 2.3.2 Covariate selection Selecting an optimal set of covariates to effectively repre- sent the soil–environment relationship is a key step in digital soil mapping (DSM) since improper selection of covariates will affect the quality of model outputs (Shi et al., 2018). In this study, near-zero variance assessment was conducted us- ing the nearZeroVar function available in the R caret package (Kuhn, 2008) to identify and remove environmental variables that have little or no variance. In addition, preliminary ran- dom forest model training was performed to assess and iden- tify covariates having high variable importance. After expert judgment, a total of 27 environmental variables (24 contin- uous and 3 categorical) were selected for modeling and pre- dicting RSGs. 2.4 Modeling and mapping soil types or reference soil groups 2.4.1 Model tuning and quantitative evaluation In digital soil mapping, machine-learning techniques have been extensively used to determine the relationship between soil types and environmental variables (McBratney et al., 2003). Many machine-learning models were developed in the past decades for digital soil mapping to spatially pre- dict soil classes based on existing soil data and soil-forming environmental covariates (Heung et al., 2016). Random for- est (RF), a tree-based ensemble method, is one of the most promising machine-learning techniques available for digital soil mapping (Breiman, 2001; Heung et al., 2016). RF has gained popularity due to its high overall accuracy and has been widely used in predictive soil mapping (Brungard et al., 2015; Hengl et al., 2018). Examples of the main strengths of the RF model are its ability to handle numerical and categor- ical data without any assumption of the probability distribu- tion, and its robustness against nonlinearity and overfitting (Breiman, 2001; Svetnik et al., 2003). While building the RF model, data were split into training (80 %) and testing (20 %) components using random sampling for training the model and evaluating its performance, respectively (Kuhn, 2008). Hyper-parameter optimization and repeated cross-validation on the training dataset were performed for optimal model ap- plication using the ranger method of the Caret package. The three tuning parameters for ranger method are mtry, splitrule, and .min.node.size. Generally this function is used to tune the parameters in modeling in an automated fashion, as this will automatically check all the possible tuning parameters and return the optimized parameters on which the model gives the best accuracy. Model tuning was performed with a repeated 10-fold cross-validation procedure applying multi- ple combinations of hyper-parameters for the ranger method. This is a fast implementation of RF particularly suited for high-dimensional data (Wright and Ziegler, 2017). Then the number of covariates used for the splits (mtry), splitting rules (splitrule), and minimum node size (min.node.size) were op- timized. The parameter ntree was adjusted to 1000 in the model, and mtry values (10, 15, 20), min.node.size values (5, 10, 15), and splitrule values (“variance”, “extratrees”, and “maxstat”) were fed for the optimization procedure. The ac- curacy of the testing dataset was related to the model per- formance for the new dataset, indicating the capacity of the model to predict at the unsampled location. A confusion matrix was also used to calculate a cross-tabulation of ob- served and predicted classes with associated statistics, i.e., producer’s accuracy and user’s accuracy. https://doi.org/10.5194/soil-10-189-2024 SOIL, 10, 189–209, 2024 194 A. Ali et al.: EthioSoilGrids 1.0 2.4.2 Software and computational framework In this study, various open-source software packages that provide a comprehensive set of tools and diverse capabilities were used for data preparation, analysis, and visualization. Data pre-processing and preparation were performed using QGIS (QGIS Development Team, 2021) and SAGA GIS (Conrad et al., 2015). For statistical analysis and machine- learning modeling, R (R Core Team, 2020) and relevant li- braries were installed on a Windows server, 2016 standard with 250 GB of working memory, to handle the challenges associated with large-scale data processing and analysis. 2.4.3 Expert evaluation of spatial patterns of the beta-version soil map Visual inspection of the DSM output over the terrain was used to identify abnormalities and assess how effectively it depicts landscape components (Rossiter et al., 2022). For this, we employed an expert-based qualitative assessment of the model output. This technique was used to complement model-based accuracy assessment and confirm agreement soil specialists or pedologists checking the map based on purposely selected district-level geographic windows across Ethiopia, representing different agro-ecological zones known to have diverse soil occurrences, and that were familiar to the panel of experts. Accordingly, an expert validation work- shop was conducted using the first version of the reference soil groups (RSGs) map. About 45 multi-disciplinary scien- tists including soil surveyors, pedologists, geologists, and ge- omorphologists were drawn from national and international research, development, and higher-learning institutions to re- view the draft RSG map in plenary discussions. This was followed by breakout sessions where groups of experts eval- uated the map based on their experience and knowledge of soil–landscape relations of the country and examined geo- graphic windows. Most importantly, disagreements regarding RSG occur- rence and patterns of the modeling outputs across topo- sequences and contrasting soil-forming factor sequences were identified and discussed. Further, inferences on parts of the DSM framework that require improvement were rec- ommended. After finalizing the evaluation at the group-level assessment, each group presented the results in the plenary followed by a discussion to get feedback from other par- ticipants. Following the plenary discussions, the participants created a group of six senior pedologists to work on the rec- ommendations including changing the quality mask layer, validating the additional data obtained during the event, and assessing the re-modeling outputs. After the second model was re-run, the group of senior pedologists together with geospatial experts re-evaluated the output using the selected districts based on the feedback from the first review, which was mainly on areas where there were “minor” and “major” concerns. Consequently, some improvements were made, e.g., in the areas where Vertisols, Fluvisols, and Leptosols were overestimated. Further, under- estimated RSGs (Alisols, Solonetz, Planosols, Acrisols, Lix- isols, Phaeozems, and Gleysols) showed a slight increase in area coverage and pattern improvements. However, the total area of Leptosols and Cambisols increased from the first run due to the partial exclusion of the mask layer used in the first round of modeling. The mask layer used in the first run was criticized for quality issues as it excluded significant soil ar- eas and due to its weakness in capturing non-soil areas such as rock outcrops, salt flats, swamps, and sand dunes. Never- theless, the spatial patterns of these soils occurring across previously considered “non-soil areas” were examined by the panel of experts. In parallel, geospatial and soil experts checked the raster map of the RSGs in the GIS environ- ment to ensure areas with “no concern” before re-running the model are kept the same or changes are accepted by the panel of experts. The map from the second run is presented in this paper as EthioSoilGrids version 1.0 product. 3 Results and discussion 3.1 Soil profile datasets Using the IUSS WRB (2015), the preliminary identified 14 742 georeferenced legacy soil profiles were classified and/or reclassified into 23 RSGs. Nearly 90 % of the soil pro- file points represented Vertisols, followed by Luvisols, Cam- bisols, Leptosols, Fluvisols, and Nitisols, which were found to be the dominant soil types in Ethiopia (Fig. 3). The re- maining 10 % represented the Regosols, Alisols, Andosols, Arenosols, Calcisols, Solonetz, Lixisols, Phaeozems, Solon- chaks, Acrisols, Planosols, Gleysols, Umbrisols, Ferralsols, Gypsisols, Plinthosols, and Stagnosols. According to this study, about 72 % of the IUSS WRB (2015) RSGs were confirmed to occur in Ethiopia. This reconfirms the characterization of Ethiopia as a land of soil diversity being endowed with a diverse range of soil types (Elias, 2016; Mishra et al., 2004). One of the limitations with legacy soil data in categorical mapping is the imbal- anced soil samples, in that all classes are not equally repre- sented (Wadoux et al., 2020). For this study, soil profiles with fewer than 30 observations were objectively excluded from the model after examining the accuracy and spatial distribu- tion of each RSG. Five RSGs (Umbrisols, Ferralsols, Gyp- sisols, Plinthosols, and Stagnosols) were excluded from the model and the EthioSoilGrids version 1.0 map. After excluding the built-up and water surface areas, the average soil profile density was 13.1 per 1000 km2 (Fig. 4), but the actual density varied across the different parts of the country. The variation tends to follow river basins, sub- basins, and agricultural land-use type-based studies from which most of the legacy data were pulled. For instance, in 30 intervention districts of the Capacity Building for Scaling up of Evidence-Based Best Practices in Agricultural Production SOIL, 10, 189–209, 2024 https://doi.org/10.5194/soil-10-189-2024 A. Ali et al.: EthioSoilGrids 1.0 195 Figure 3. Number of soil profile points per WRB reference soil groups. Figure 4. Spatial distribution of collated legacy soil profile data. in Ethiopia (CASCAPE) project, the average profile density was about 87 profiles per 1000 km2 for a total area of about 26 830 km2 (Leenaars et al., 2020a). Similarly, semi-detailed soil mapping missions in 15 districts conducted through the Bilateral Ethiopia–Netherlands Effort for Food, Income and Trade (BENEFIT)-REALISE project generated about 217 observations per 1000 km2 (Leenaars et al., 2020b). A soil type and depth map compilation and updating mis- sion at a 1 : 250 000 scale by the Water Land Resource Cen- ter (WLRC) of Addis Ababa University collated and used about 3949 legacy soil profiles for the entire country (Ali et al., 2020), which is approximately 3.5 profiles per 1000 km2. Although the distribution is not even and the eastern low- lands are sparsely represented, the number of data used in this study is 8.5 times higher than the 1712 legacy soil pro- files data currently existing in the Africa soil profile database (Batjes et al., 2020; Leenaars et al., 2014). The distribution of the soil profiles across the 32 agro- ecological zones (AEZ) of Ethiopia revealed that all, except two – tepid per-humid mid-highland (0.13 % landmass) and very cold sub-humid sub-afro-alpine to afro-alpine (0.03 % landmass) – were represented by soil profile observations. Furthermore, about 95 % of the profile observations repre- sented 91 % of the AEZ aerial coverage (Appendix A). The distribution of legacy soil profiles varied across AEZs. In general, the top-ranked lowland AEZs with roughly 56 % area coverage were represented by 23 % of the total profile observations, whereas top-ranked highland AEZs with 20 % area coverage received 47 % of profile observations. For in- stance, warm desert, warm moist, hot arid, and warm sub- moist lowlands with area coverage of around 20 %, 15 %, 11 %, and 10 %, were represented roughly by 3 %, 11 %, 2 %, and 7 % of the total profiles, respectively. Tepid moist mid-highlands (8 % area coverage), tepid sub-humid mid- highlands (7 % area coverage), and tepid sub-moist mid- highlands (5 % area coverage) each were represented by 20 %, 15 %, and 12 % of the profiles, respectively. 3.2 Modeling and mapping 3.2.1 Variable importance The RSG spatial pattern is primarily influenced by long- term average surface reflectance, flow-based DEM indices, and precipitation. Figure 5 shows variables of importance for determining RSG spatial prediction. The top-ranked vari- ables were (i) long-term MODIS near-infrared (NIR) re- flectance, (ii) multiresolution index of valley bottom flatness, (iii) long-term mean day–land surface temperature, (iv) long- term mean soil moisture, (v) standard deviation of long-term precipitation, (vi) long-term mean precipitation, and (vii) to- pographic wetness index. MODIS long-term mean spectral signatures showed high relative importance. According to Hengl et al. (2017), ac- https://doi.org/10.5194/soil-10-189-2024 SOIL, 10, 189–209, 2024 196 A. Ali et al.: EthioSoilGrids 1.0 counting for seasonal vegetation fluctuation and inter-annual variations in surface reflectance, long-term temporal signa- tures of the soil surface, derived as monthly averages from long-term MODIS imagery, were more effective. Further- more, Hengl and MacMillan (2019) explained that long-term average seasonal signatures of surface reflectance provide a better indication of soil characteristics compared with only a single snapshot of surface reflectance. The multi-resolution valley bottom flatness index, a DEM- derived topography index, is the second top-ranked covariate driving soil variability across Ethiopia. This hydrological/- soil removal and accumulation or deposition index is used to distinguish valley floor and ridgetop landscape positions (Soil Science Division Staff, 2017) greatly responsible for multiple soil-forming processes to operate over a particu- lar landscape, resulting in a wide range of soil development. The influence of topography on spatial soil variation is man- ifested in every landscape of Ethiopia (Belay, 1997; Mesfin, 1998; Nyssen et al., 2019; Zewdie, 2013). Long-term daily mean land surface temperature, mean soil moisture, rainfall standard deviation, and mean annual rain- fall were among the top-ranked covariates for predicting the spatial variation of RSGs across the country. In Ethiopia, dif- ferent soil genesis studies revealed that climate has a signif- icant influence on soil development and properties and is, therefore, responsible for the existence of widely varying soils in the country (Abayneh, 2005; Abayneh et al., 2006; Fikru, 1988, 1980; Zewdie, 2013). Among the most important covariates for predicting RSGs in the Ethiopian highlands are monthly average soil mois- ture for January (ranked third), long-term average soil mois- ture (ranked fourth), and monthly average soil moisture for August (ranked fifth) (Leenaars et al., 2020a). In the current study, soil moisture was among the 10 top-ranked covariates in modeling and explaining long-distance soil type variabil- ity across the country. In this study, lithology showed a relatively low influence on soil variability that may be due to the use of a coarse-scale and less detailed lithology map, which may not sufficiently capture the spatial variability of the parent materials. 3.2.2 Model performance The parameter optimization process resulted in mtry= 20, split rule= extra trees and minimum node size= 5. The over- all accuracy of the model was 56.24 % which ranged between 54.43 % and 58.1 % with a 95 % confidence interval. The kappa values based on the internal cross-validation and test- ing dataset showed that the overall model performance pro- duced using 10-fold cross-validation with the repeated fitting was 48 %. Considering similar area-based digital soil class mapping efforts, the overall accuracy was in line with the accuracies that were typically reported for soil class maps developed with RF models (Leenaars et al., 2020a) and sta- tistical methods (Heung et al., 2016). Table 1 shows the con- fusion matrix at validation/testing points, i.e., 20 % of the ob- servation. Further, the matrix indicates the producer’s accu- racy (class representation of observed versus predicted) and user’s accuracy were not similar for all RSGs. The map pu- rity is in the order of Lixisols, Calcisols, Alisols, Phaeozems, Vertisols, Andosols, Solonchaks, Fluvisols, Arenosols, Lep- tosols, Luvisols, Nitisols, and Cambisols. However, Verti- sols, Calcisols, and Andosols are the observed classes that are best represented by the map followed by Fluvisols, Al- isols, Nitisols, Leptosols, Luvisols, and Cambisols. Global soil grids at 250 m resolution used machine- learning algorithms to map the global WRB RSGs with map purity and weighted kappa of 28 % and 42 %, respectively (Hengl et al., 2017). The SoilGrids 250 m WRB soil groups/- classes prediction output–spatial soil patterns were not eval- uated based on expert knowledge while in this study we did an extensive back-and-forth qualitative assessment by a panel of pedologists. The quantitative accuracy in the present study (about 56 %) coupled with an expert-based qualitative eval- uation of the predicted maps indicated the development and achievement of a substantially enhanced national product for users of spatial soil resource information. This finding is a step forward and acceptable considering that SoilGrids maps are not expected to be as accurate as locally produced maps and models that use many more local-point data and finer lo- cal variables (Mulder et al., 2016). Further, the data and find- ings in this study can help improve the soil maps of Africa as they partially address the concern by Hengl et al. (2017), who recognized that WRB RSGs modeling in the global Soil- Grids 250 m is critically uncertain for parts of Africa. This is mainly attributed to limited access to more local point data by regional and global modeling initiatives, unlike the present study which accessed a large number of legacy soil profile datasets. 3.2.3 Modeling and mapping: EthioSoilGrids version 1.0 The study identified 18 RSGs in Ethiopia, mapped at 250 m resolution (Fig. 6). The model prediction showed that seven soil reference groups including Cambisols, Leptosols, Ver- tisols, Fluvisols, Nitisols, Luvisols, and Calcisols covered nearly 98 % of the total land area of the country (Fig. 7). Five soil reference groups (Solonchaks, Arenosols, Regosols, An- dosols, and Alisols) were estimated to cover about 2 % of the land area, while trace coverages of Solonetz, Planosols, Acrisols, Lixisols, Phaeozems, and Gleysols were also found in some pocket areas. In terms of spatial distribution, Nitisols and Luvisols dom- inated the northwestern and southwestern highlands while the southeastern lowlands were dominantly covered by Cam- bisols, Calcisols, and Fluvisols with some Solonchaks. The Vertisols extensively cover the north and southwestern low- lands along with the Ethiopia–Sudan border areas and cen- tral highland plateaus. The probability of occurrence of each RSG was mapped (Appendix C) in each modeling spatial SOIL, 10, 189–209, 2024 https://doi.org/10.5194/soil-10-189-2024 A. Ali et al.: EthioSoilGrids 1.0 197 Figure 5. Random forest covariate relative importance for modeling RSGs. Note: prep= precipitation; prep_sd= standard devia- tion of precipitation; tmax=maximum temperature; tmin=minimum temperature; trange= temperature range; tav_sd= standard devi- ation of average temperature; pet= potential evapotranspiration; lstd= land surface temperature–day; lstn= land surface temperature– night; soil_moist= soil moisture; soil_temp= soil temperature; DEM = digital elevation model (elevation); twi= topographic wet- ness index; aspect= topographic aspect; curv= topographic curvature; conv= topographic convergence index; ls= slope length and steepness factor (ls_factor); morph= terrain morphometry; mrvbf=multiresolution index of valley bottom flatness; slope= slope class (%); ndvi= normalized difference vegetation index (NDVI); evi= enhanced vegetation index (EVI); lulc= land use/land cover; lithol- ogy= geology; ref1= red band; ref2= near-infrared; ref7=mid-infrared. window (i.e., the cell size of 250 m× 250 m). The dominant RSGs were aggregated based on the most probable RSGs in each spatial modeling window. There was high correspon- dence between the seven top-ranked prediction probabilities and observed soil types as confirmed visually by overlaying observed classes and prediction probabilities. The overall occurrence and the relative position of each of the RSGs along the topo-sequence and its association with other RSGs agree with previous works (Abayneh et al., 2006; Ali et al., 2010; Abdenna et al., 2018; Asma- maw and Mohammed, 2012; Belay, 2000, 1998, 1997, 1996; Driessen et al., 2001; Elias, 2016; FAO, 1984a; Fikre, 2003; Mitiku, 1987; Mohammed and Belay, 2008; Mohammed and Solomon, 2012; Mulugeta et al., 2021; Nyssen et al., 2019; Sheleme, 2017; Shimeles et al., 2007; Tolossa, 2015; Zewdie, 2013). However, in some cases, the position of the RSGs along the topo-sequence and the association with other RSGs require further investigation. The disparities observed might be attributed to the positional accuracy of legacy point ob- servations, the modeling approach, and most importantly the level of detail and scale/resolution of the environmen- tal variables used in this study. We used the currently avail- able coarse-resolution national geological map and hence soil parent material might be inadequately represented in the model, which probably resulted in irregular RSG se- quences. For instance, the main driving factors to establish and explain the soil-landscape variability in the May-Leiba catchment of northern Ethiopia were geology (soil parent material) and different mass movements (Van de Wauw et al., 2008). These factors led to Cambisols–Vertisols cate- nas on basalt and Regosols–Cambisols–Vertisols catenas on limestone formations. Similar studies identified parent mate- rial strongly determines the soil type (e.g., Vertisol, Luvisol, Cambisol) (Nyssen et al., 2019). In general, in areas where there is complex soil diversity and distribution of soils, one of the most important parameters is to identify parent ma- terial including effective techniques to capture and delineate mass movement bodies, and human-induced soil erosion and https://doi.org/10.5194/soil-10-189-2024 SOIL, 10, 189–209, 2024 198 A. Ali et al.: EthioSoilGrids 1.0 Table 1. Confusion matrix of random forest RSG prediction (at validation/testing observations). Prediction Reference A cr is ol s A lis ol s A nd os ol s A re no so ls C al ci so ls C am bi so ls Fl uv is ol s G le ys ol s L ep to so ls L ix is ol s L uv is ol s N iti so ls Ph ae oz em s Pl an os ol s R eg os ol s So lo nc ha ks So lo ne tz s V er tis ol s U se ra cc ur ac y To ta l Acrisols 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0.33 3 Alisols 0 40 0 0 0 0 1 1 0 0 9 4 0 0 2 0 0 2 0.68 59 Andosols 0 0 28 1 1 3 5 0 2 0 2 0 0 0 0 0 1 1 0.64 44 Arenosols 0 0 0 11 0 2 1 0 0 0 5 0 0 0 0 0 0 1 0.55 20 Calcisols 0 0 0 0 21 0 1 0 0 0 2 0 0 0 0 0 0 5 0.72 29 Cambisols 2 3 6 9 1 197 28 2 35 2 47 16 5 1 16 3 3 28 0.49 404 Fluvisols 1 0 3 5 1 34 144 0 9 0 15 7 0 0 1 5 5 17 0.58 247 Gleysols 0 0 0 0 0 0 1 2 0 0 1 0 0 1 0 0 0 0 0.40 5 Leptosols 0 1 4 3 3 47 11 0 176 0 27 7 1 0 32 0 0 24 0.52 336 Lixisols 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1.00 1 Luvisols 2 16 3 8 0 34 13 2 33 3 216 30 3 0 25 1 0 41 0.50 430 Nitisols 6 8 0 0 1 23 8 3 18 8 29 132 0 1 8 0 1 21 0.49 267 Phaeozems 0 0 0 0 0 0 0 0 0 0 1 0 2 0 0 0 0 0 0.67 3 Planosols 0 0 0 0 0 0 0 0 0 0 1 1 0 5 1 0 0 1 0.55 9 Regosols 0 0 0 0 0 7 1 0 7 1 8 1 0 0 22 0 0 5 0.42 52 Solonchaks 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 3 1 0 0.60 5 Solonetzs 0 0 0 0 1 4 1 0 0 0 0 0 0 0 0 1 6 0 0.46 13 Vertisols 3 1 3 5 5 92 32 2 61 3 81 31 5 5 25 2 6 641 0.64 1003 Producer 0.07 0.58 0.60 0.26 0.62 0.44 0.58 0.17 0.51 0.06 0.49 0.58 0.13 0.38 0.17 0.20 0.25 0.81 0.56 – accuracy Total 15 69 47 42 34 443 247 12 342 18 445 229 16 13 132 15 24 787 – 2930 deposition areas (Leenaars et al., 2020a; Nyssen et al., 2019; Van de Wauw et al., 2008). Considering the third position of Cambisols in the or- der of frequency of occurrence of RSGs per point observa- tions (following Vertisols and Luvisols), these soils seem to be over-represented on the map (ranked first) apparently at the expense of Vertisols and Luvisols, and to some extent in places of Leptosols and other RSGs. This might be at- tributed to the fact that Cambisols create a geographical con- tinuation with Vertisols and/or Luvisols at the lower slopes and Leptosols/Regosols at the higher slopes, suggesting the presence of some bordering soil qualities in respective tran- sitional zones (Ali et al., 2010; Asmamaw and Mohammed, 2012; Sheleme, 2017; Zewdie, 2013). The proportion of area mapped as Cambisols (34 %) re- vealed new insights compared with the information from the most cited spatial soil maps: Cambisols ranked second (21 %), second (16 %), fourth (9 %), and fourth (8 %) as re- ported by Berhanu (1980), FAO (1984b, 1998), and Soil- Grids – Hengl et al. (2017), respectively. This might be due to (i) the number and distribution of profile observa- tions, which is more extensive than the previous ones; (ii) the type and level of details of covariates considered; (iii) vari- ations and rearrangements in the keys for classification of the RSGs among soil classification versions used in previous studies and misclassification/confusion of Vertisols with Ver- tic Cambisols, as legacy soil profile data come from diverse sources. 3.3 Expert validation of the soil map Expert knowledge of soil–landscape relations and soil dis- tribution remains important for evaluating the predictive soil mapping results and assessing whether the predicted spatial patterns make sense from a pedological viewpoint (Hengl et al., 2017; Poggio et al., 2021; Rossiter et al., 2022). An important step in qualitative model evaluation is, therefore, expert assessment, whereby professionals with broad expe- rience in soil survey and mapping can evaluate and improve the quality of the soil resource map. This can highlight ar- eas of agreement or concern across the landscape (Rossiter et al., 2022). The expert validation workshop provided use- ful insights and tangible improvements to the development of the map. While the plenary discussion provided an overview of the approaches followed in developing the map, the group discussions helped to have an in-depth review of the selected polygons of the map assigned to them. Participants were split into five groups (with 8–10 members each) and chose up to 60 polygons representing areas with which at least one of the group members has sufficient information, including data sources. Overall, the groups checked a total of 126 polygons (Fig. 8), which were fairly evenly distributed across the coun- try. The group members displayed the polygons one by one in a GIS environment and discussed the predicted dominant and associated soil RSGs and labeled them in one of three confirmation categories: (1) confirmed with “no concern”, (2) confirmed with “minor concern”, and (3) confirmed with “major concern”. Confirmation with “no concern” was made SOIL, 10, 189–209, 2024 https://doi.org/10.5194/soil-10-189-2024 A. Ali et al.: EthioSoilGrids 1.0 199 Figure 6. Major reference soil groups of Ethiopia (EthioSoilGrid V1.0). Figure 7. The area coverage (in %) for the major WRB RSGs. Note: the remaining 10 RSGs-Arenosols (0.44 %), Regosols (0.35 %), An- dosols (0.31 %), Alisols (0.16 %), Solonetzs (0.04 %), Planosols (0.04 %), Acrisols (0.02 %), Lixisols (0.02 %), Phaeozems (0.02 %), and Gleysols (0.01 %) were not plotted because of their relatively small area coverage. when all members of a group agreed on the types, the rela- tive coverage, and the patterns of the predicted soils within the polygon. Confirmation with “minor concern” was made when all or some of the team members agreed on the pre- dicted soil types within the polygons but did not agree on the order of abundance or the probability occurrence of one or two soils including observed spatial patterns. Confirmation with “major concern” was made when all members of the team did not agree on the predicted soil type, or when the presence of another soil type, other than the predicted types, was noted. All three groups rated the accuracy of the map at 60+%; of the 126 polygons, they expressed no concern for 63 %, minor concern for 23 %, and major concern for 14 % of the polygons. Furthermore, differences in the prevalence of RSGs and patterns of the modeling outputs across different soil-forming factor sequences, as well as inferences about which areas of the DSM framework still need work, were identified and elaborated on by the expert input and are pre- sented in the subsequent sections. 3.4 Evaluation of results, limitations, and future direction Up-to-date soil resource spatial information is critically missing at the required scale and extent in Ethiopia. As a re- sult, resource management strategies miss their targets. Fur- thermore, the absence of such data at a required resolution and extent forced developers of decision support tools to pick and use the data they can access and afford. As a result, model outputs appear more site-specific or representation be- comes homogeneous over the very heterogeneous landscapes that exist in reality. On the other hand, in large areas and com- plex landscapes such as Ethiopia, it is very difficult to address the demand for reasonably accurate and detailed soil-type maps using a conventional approach due to the costs involved and to the resources and time this requires. For instance, given the vastness of the country and the heterogeneous land- scapes, a new conventional soil survey mission requires at https://doi.org/10.5194/soil-10-189-2024 SOIL, 10, 189–209, 2024 200 A. Ali et al.: EthioSoilGrids 1.0 Figure 8. The spatial distribution of districts validated by stakeholders and feedback categories according to the level of concerns raised. least 170 000 profile point observations to map the entire ter- restrial land mass of Ethiopia at a scale of 1 : 250 000 with at least one observation per square centimeter. Moreover, the soil profile data requirement definitely could have been much higher as we increase the scale of mapping and den- sity of observations. In the present study, machine-learning techniques combined with expert input were implemented to produce a countrywide soil resource map of Ethiopia at rea- sonably higher accuracy and with less time and cost com- pared with conventional methods. In addition, rescue, com- pilations, and standardization of about 14 681 geo-referenced legacy soil profiles that can be included in the National Soil Information System (NSIS) of Ethiopia and the World Soil Information Center will support future national, regional, and global DSM efforts. The approach used here demonstrates the power of data and analytics to map the soil resources of Ethiopia, and the output is an exemplary use case for similar digital content development efforts in Ethiopia and beyond. Moreover, in this study the quality-monitoring processes and methods were followed to filter dubious soil profiles as well as soil classification and harmonization protocols. Thereafter, the study followed a robust modeling framework and generated new insights into the relative area coverage of WRB RSGs of Ethiopia. In addition, the study provided co- herent and up-to-date digital quantitative gridded spatial soil resource information to support the successful implementa- tion of various digital agricultural solutions and decision sup- port tools (DSTs). The spatially explicit limitation of the present study is re- vealed by expert-based qualitative evaluation of spatial pat- terns across objectively selected geographic windows and prominent contrasting landscapes of Ethiopia. This qualita- tive assessment indicated areas of concern in terms of how well EthioSoilGrids version 1.0 represents soil geography across a mosaic of the country’s landscapes. For instance, in the northeastern lowlands of Ethiopia, mainly along the “De- nakil” depression, Fluvisols, Cambisols, and Vertisols were found on the map in areas where normally other soil types were expected to occur. In this area, the expected prediction and area coverage of Leptosols has probably been overshad- owed by Fluvisols and Cambisols. Similarly, in some parts of western Ethiopia landscapes, the prediction of Vertisols over- shadows other RSGs, which resulted in an underestimation of the area coverage of Fluvisols (along the “Akobo”, “Gilo”, and “Baro” rivers and their tributaries) and Alisols. Likewise, in the central parts of northwestern Ethiopia, the prediction of Nitisols was overshadowed by Vertisols and Luvisols, result- ing in a likely underestimation of the Nitisols area coverage. The relatively low model performance and some classifi- cation errors in some of the examined geographic windows (e.g., the Denakil depression, along Akobo, Baro, and Gilo rivers and the Somali region) are probably due to the paucity of samples from those areas (Fig. 4), the inadequacy of the dataset by RSGs, and over-representation of the dataset by some RSGs, such as Vertisols, Luvisols, and Cambisols. Bal- anced datasets are ideal to allow decision tree algorithms to produce better classification but for datasets with uneven class size, the generated classification model might be biased toward the majority class (Hounkpatin et al., 2018; Wadoux et al., 2020). In addition, uncertainty around the quality of the covariates included, not the covariates considered in the modeling process including management, use of validation SOIL, 10, 189–209, 2024 https://doi.org/10.5194/soil-10-189-2024 A. Ali et al.: EthioSoilGrids 1.0 201 methods that do not sufficiently control the effect of clustered samples, and small sample size for some RSGs could have possibly biased the modeling results in some geographic ar- eas. To improve the modeling performance, future studies could explore (i) adding data for under-represented geo- graphic areas, land uses, and covariate spaces; (ii) opportu- nities to include other covariates (parent material and man- agement) that could capture the variability of the country’s heterogeneous landscapes; (iii) dimension reduction of co- variates; (iv) use of remedial measures for imbalances in sample sizes; (v) comparing different cross-validation meth- ods; (vi) use of an ensemble modeling approach and/or robust modeling technique that accommodates neighbor- hood size and connectivity analyses; (vii) use of a better- resolution/quality mask layer to segregate non-soil areas (rock outcrops, salt flats, sand dunes, and water bodies) from mapping areas; and (viii) implementation of quantita- tive and qualitative comparisons of national, regional, and global legacy soil maps/soil grids with new DSM products in terms of how well DSM products represent soil geography. In addition, future digital soil mapping strategies in Ethiopia may require consideration of new soil sampling missions in under-represented areas; adoption of standard soil sam- pling, description guidelines, and soil classification systems including soil physicochemical and mineralogical analysis; and a combination of local soil nomenclature/classification systems with RSGs and development of a map of RSGs with qualifiers. At the moment the under-sampled and under- represented areas are the Somali region, the Denakil, and the western and northwestern border areas of Ethiopia (Fig. 4). Despite these limitations, and to the best of our knowledge, the EthioSoilGrids v1.0 product provides the most complete soil information available for Ethiopia. 4 Conclusions Coherent and up-to-date countrywide digital soil information is essential to support digital agricultural transformation ef- forts. This study involved collation, cleaning, harmonization, and validation of the legacy soil profile datasets, involving soil scientists with different backgrounds individually and in groups. To develop the 250 m digital soil resource map, a machine-learning modeling approach and expert validation were applied to the harmonized soil database and environ- mental covariates affecting soil-forming processes. Accord- ingly, about 20 000 soil profile data were collated, out of which about 14 681 were used for the modeling and map- ping of 18 RSGs out of the 23 RSGs identified. Although unevenly distributed, the legacy soil profile data used in the modeling covered most of the agro-ecologies of the country. Among the 18 RSGs mapped, the highest number of ob- served (3935) profiles represent Vertisols, followed by Lu- visols, Cambisols, and Leptosols, while Gleysols were rep- resented with the lowest number (63) of profiles. The mod- eling revealed that the most important covariates for predict- ing RSGs in Ethiopia are MODIS long-term reflectance, mul- tiresolution index of valley bottom flatness, land surface tem- perature, soil moisture, long-term mean annual rainfall, and wetness index of the landscape. Our 10-fold spatial cross-validation result showed an over- all accuracy of about 56 % with varying accuracy levels among RSGs. The modeling result revealed that seven major soil reference groups including Cambisols (34 %), Leptosols (20 %), Vertisols (18 %), Fluvisols (10 %) Nitisols (7 %), Lu- visols (6 %), and Calcisols (3 %) covered nearly 98 % of the total land area of the country, while minor coverage of other RSGs (Solonchaks, Arenosols, Regosols, Andosols, Alisols, Solonetzs, Planosols, Acrisols, Lixisols, Phaeozems, and Gleysols) was also detected in some areas. Compared with the existing soil resource map, the coverage of the first three major soil groups has substantially increased, which is related to the increased availability of soil profile data cover- ing larger areas of the country, implying that these soils were previously underestimated. Cambisols and Vertisols which together represent nearly half of the total land area are rel- atively young with inherent fertility, suggesting a high agri- cultural potential for the country. However, given their limi- tations, these and the other soil types require the implementa- tion of suitable land, water, and crop management techniques to sustainably exploit their potential. The EthioSoilGrids version 1.0 product from this first countrywide RSGs modeling effort requires complementary activities. These include modeling and mapping that should go beyond RSGs and need to include second-level classifica- tions including principal and supplementary qualifiers. Fur- thermore, a soil atlas of Ethiopia with details of the soil physicochemical properties needs to be prepared together with the map, which the authors and/or others responsible need to prioritize in their future research endeavors. https://doi.org/10.5194/soil-10-189-2024 SOIL, 10, 189–209, 2024 202 A. Ali et al.: EthioSoilGrids 1.0 Appendix A: Legacy soil profile data distribution Table A1. Distribution of legacy soil profile data by agroecology zones. Major agroecological zones AEZ area Profiles coverage (%)a observation (%)b Warm arid lowland plains 19.76 3.40 Warm moist lowlands 15.12 10.74 Hot arid lowland plains 10.79 2.44 Warm sub-moist lowlands 9.63 6.94 Tepid moist mid highlands 8.05 20.21 Warm sub-humid lowlands 7.11 5.69 Tepid sub-humid mid highlands 6.63 15.26 Tepid sub-moist mid highlands 5.17 12.39 Warm semi-arid lowlands 2.75 3.23 Tepid humid mid highlands 2.65 2.48 Warm humid lowlands 2.29 0.45 Cool moist mid highlands 1.74 4.15 Hot sub-humid lowlands 1.67 0.07 Cool sub-moist mid highlands 1.16 3.00 Cool humid mid highlands 0.82 1.01 Warm per-humid lowlands 0.68 0.01 Hot moist lowlands 0.59 3.56 Hot sub-moist lowlands 0.56 0.03 Cool sub-humid mid highlands 0.52 1.38 Tepid arid mid highlands 0.43 0.39 Hot semi-arid lowlands 0.40 2.05 Tepid semi-arid mid highlands 0.19 0.67 Cold moist sub-afro-alpine to afro-alpine 0.07 0.16 Cold sub-moist mid highlands 0.07 0.04 Cold sub-humid sub-afro-alpine to afro-alpine 0.06 0.03 Cold humid sub-afro-alpine to afro-alpine 0.06 0.01 Very cold humid sub-afro-alpine 0.04 0.02 Very cold sub-moist mid highlands 0.02 0.02 Very cold moist sub-afro-alpine to afro-alpine 0.01 0.03 Hot per-humid lowlands 0.01 0.15 Tepid perhumid mid highland 0.13 0 Very cold sub-humid sub-afro-alpine to afro-alpine 0.03 0 Note: a total area of Ethiopia 1.14× 106 km2; b total number of profiles 14 681. SOIL, 10, 189–209, 2024 https://doi.org/10.5194/soil-10-189-2024 A. Ali et al.: EthioSoilGrids 1.0 203 Appendix B: Environmental covariates Table B1. List, description, spatial and temporal extent, and source of covariates used in modeling the reference soil groups. Categories Covariates Descriptions Spatial Temporal Source resolution resolution Climate prep Precipitation 4 km 1981–2016 ENACTS (Dinku et al., 2014) prep_sd Standard deviation of precipitation 4 km 1981–2016 Derived from ENACTS (Dinku et al., 2014) tmax Maximum temperature 4 km 1983–2016 ENACTS (Dinku et al., 2014) tmin Minimum temperature 4 km 1983–2016 ENACTS (Dinku et al., 2014) trange Temperature range 4 km 1983–2016 ENACTS (Dinku et al., 2014) tav_sd Standard deviation of average temperature 4 km 1983–2016 Derived from ENACTS (Dinku et al., 2014) pet Potential evapotranspiration 4 km 1981–2016 Derived from ENACTS (Dinku et al., 2014) using modified Penman method lstd Land surface temperature–day (Aqua MODIS-MYD11A2, time series monthly average) 1000 m 2002–2018 AfSISa lstn Land surface temperature–night (Aqua MODIS-MYD11A2, time series monthly average) 1000 m 2002–2018 AfSIS soil_moist Soil moisture (derived from one- dimensional soil water balance) 4 km 1981–2016 Ethiopian Digital AgroClimate Advisory Platform (EDACaP) soil_temp Soil temperature 30 km 1979–2019 ERA 5-Reanalysis ECMWF datab Topography DEM Digital elevation model (Elevation) 90 m – SRTM-DEM (Vågen, 2010) twi Topographic wetness index 90 m – SAGA GIS-based SRTM-DEM derivative aspect Topographic aspect 90 m – SAGA GIS-based SRTM-DEM derivative curv Topographic curvature 90 m – SAGA GIS-based SRTM-DEM derivative conv Topographic convergence index 90 m – SAGA GIS-based SRTM-DEM derivative ls Slope length and steepness factor (ls_factor) 90 m – SAGA GIS-based SRTM-DEM derivative morph Terrain morphometry 90 m – SAGA GIS-based SRTM-DEM derivative mrvbf Multiresolution index of valley bottom flatness 90 m – SAGA GIS-based SRTM-DEM derivative slope Slope class (%) 90 m – SAGA GIS-based SRTM-DEM derivative https://doi.org/10.5194/soil-10-189-2024 SOIL, 10, 189–209, 2024 204 A. Ali et al.: EthioSoilGrids 1.0 Table B1. Continued. Categories Covariates Descriptions Spatial Temporal Source resolution resolution Vegetation ndvi Normalized difference vegetation index (NDVI) (MODIS-MODIS MOD13Q1, time series monthly average) 250 m 2000–2021 AfSISa evi Enhanced vegetation index (EVI) (MODIS-MODIS MOD13Q1, time series monthly average) 250 m 2000–2021 AfSIS lulc Land use/landcover 30 m 2010 Water and land resource Center–Addis Ababa University (WLRC-AAU, 2018) parent material lithology Geology/parent material 1 : 2 000 000 1996 The Ethiopian Geological Survey (Tefera et al., 1996) MODIS spectral reflectance ref1 Red band (MODIS-MODIS MOD13Q1, time series monthly average) 250 m 2000–2018 AfSISa ref2 Near-infrared (MODIS-MODIS MOD13Q1, time series monthly average) 250 m 2000–2018 AfSIS ref7 Mid-infrared (MODIS-MODIS MOD13Q1, time series monthly average) 250 m 2000–2018 AfSIS a Africa Soil Information Service (AfSIS). b Fifth generation European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric reanalysis of the global climate. Appendix C: Probability of occurrence of reference soil groups Figure C1. Occurrence probability maps of Cambisols, Leptosols, Vertisols, and Fluvisols. SOIL, 10, 189–209, 2024 https://doi.org/10.5194/soil-10-189-2024 A. Ali et al.: EthioSoilGrids 1.0 205 Figure C2. Occurrence probability maps of Nitisols, Luvisols, and Calcisols. Data availability. Full data will be available upon request based on the CoW guideline (CoW, 2020; https://ethioagridata.com/, last access: 7 November 2023) and the MoA “Soil and Agronomy Data Management, Use and Sharing” directive No. 974/2023 Ethiopia (https://nsis.moa.gov.et/, last access: 7 November 2023). Author contributions. AshA, TE, KG, WA, and LT conceived and designed the study, performed the analysis and wrote the first draft with substantial input and feedback from all authors. EM, TM, NH, AY, AM, TA, FW, AL, NT, AyeA, SG, YA, and BA contributed to input data preparation, data encoding, and harmonization. Legacy data validation and review of subsequent versions of the paper were performed by MH, WH, AssA, DT, GB, MG, SB, MA, AR, YGS, ST, DA, YW, DB, EZ, SS, and EE. Competing interests. The contact author has declared that none of the authors has any competing interests. Disclaimer. Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, pub- lished maps, institutional affiliations, or any other geographical rep- resentation in this paper. While Copernicus Publications makes ev- ery effort to include appropriate place names, the final responsibility lies with the authors. Acknowledgements. We sincerely appreciate the Coalition of the Willing (CoW) members who are instrumental in providing, collating, cleaning, standardizing, and harmonizing the legacy soil profile data used in generating the soil resource map of Ethiopia at 250 m resolution. The CoW team also deserves credit for inspir- ing many to share data and develop an integrated national database related to agronomy and soil profile data. The leadership of the Natural Resource Development Sector and Soil Resource Informa- tion and Mapping Directorate of the Ministry of Agriculture (MoA) played a crucial role. This includes assigning experts from the min- istry and other organizations who worked on collating, encoding, harmonizing, and processing the soil survey legacy data; and mod- eling and prediction of reference soil groups using robust machine- learning algorithms and high-performance computing servers are the foundation for the soil resource map. Various institutions, as well as the late and current soil surveyors and pedologists, deserve special recognition for their contributions to the generation and sharing of soil profile data. We owe a debt of gratitude to ISRIC and the bilateral Ethiopia–Netherlands projects (cascape and BENEFIT- REALISE) funded by the Directorate-General for International Co- operation (DGIS) of the Netherlands Ministry of Foreign Affairs through the Netherlands Embassy in Ethiopia, which have been crucial in providing capacity building to the MoA, and national soil and geospatial experts. Many thanks are due to Eyasu Elias, Arie van Kekem, Tewodros Tefera, Mulugeta Diro, Johan Leenaars, Bas Kempen, Stephan Mantel, and Maria Ruiperez Gonzalez who have been organizing and providing training on soil classifica- tions and digital soil mapping to the MoA, as well as national soil and geospatial experts, during the Ethiopia–Netherlands bilat- eral projects period. The senior pedologists and soil surveyors who provided invaluable support to check and harmonize thousands of soil profiles and laboratory results are sincerely appreciated. They worked very hard with positive energy, for which we are very grate- ful. In addition, the same group of experts and additional experts who supported the validation of the preliminary soil resource map https://doi.org/10.5194/soil-10-189-2024 SOIL, 10, 189–209, 2024 https://ethioagridata.com/ https://nsis.moa.gov.et/ 206 A. Ali et al.: EthioSoilGrids 1.0 deserve credit for their commitment to contributing their expertise. We thank Degefe Tebebe, Sileshi Gudeta, and Neil Munro for sup- port in the extraction of climate covariates as well as for providing critical technical support and comments that helped improve the pa- per. Our sincere appreciation also goes to the continued and persis- tent support of GIZ-Ethiopia mainly through the project Support- ing Soil Health Interventions in Ethiopia (SSHI), which supported and facilitated the activities of the CoW. The Alliance of Biover- sity and CIAT is greatly acknowledged for coordinating CoW and its efforts and for supporting the implementation of activities that are of high national importance. We would also like to sincerely thank the Excellence in Agronomy (EiA) CGIAR Initiative, which has made huge contributions to this project in terms of funding and building skills of the various teams. The Water, Land and Ecosys- tems (WLE) and Climate Change, Agriculture and Food Security (CCAFS) programs of the CGIAR also provided support in vari- ous forms. Recently, our work has benefited from the Accelerating Impacts of CGIAR Climate Research in Africa (AICCRA) project supported by the World Bank in terms of data, analytics, and re- sources to support data linkage and integration. Financial support. This work is financially supported by the Deutsche Gesellschaft für Internationale Zusammenarbeit (GIZ) through the project “Supporting Soil Health Interventions in Ethiopia,” funded by the Bill & Melinda Gates Foundation. This work was supported, in whole or in part, by the Bill & Melinda Gates Foundation (INV-005460). Under the grant conditions of the Foundation, a Creative Commons Attribution 4.0 Generic License has already been assigned to the Author Accepted Manuscript ver- sion that might arise from this submission. Review statement. This paper was edited by Kristof Van Oost and reviewed by Skye Wills and one anonymous referee. References Abayneh, E.: Application of Geographic Information System (GIS) for soil resource study in Ethiopia, in: Proceedings of the Na- tional Sensitization Workshop on Agro metrology and GIS, 17– 18 December 2001, Addis Ababa, Ethiopia, 162–169, 2001. Abayneh, E.: Characteristics, Genesis and Classification of Reddish Soils from Sidamo Region of Ethiopia, PhD thesis, Universiti Putra Malaysia, 2005. Abayneh, E. and Berhanu, D.: Soil Survey in Ethiopia: Past, Present and the Future, in: Proceedings of the 8th Conference of the Ethiopian Society of Soil Science, Soils for sustainable devel- opment, 27–28 April, 2006, Addis Ababa, Ethiopia, 2007. Abayneh, E., Zauyah, S., Hanafi, M. M., and Rosenani, A. B.: Gene- sis and classification of sesquioxidic soils from volcanic rocks in sub-humid tropical highlands of Ethiopia, Geoderma, 136, 682– 695, https://doi.org/10.1016/j.geoderma.2006.05.006, 2006. Abdenna, D., Yli-Halla, M., Mohamed, M., and Wogi, L.: Soil clas- sification of humid Western Ethiopia: A transect study along a toposequence in Didessa watershed, Catena, 163, 184–195, https://doi.org/10.1016/j.catena.2017.12.020, 2018. Abegaz, A., Ashenafi, A., Tamene, L., Abera, W., and Smith, J. U.: Modeling long-term attainable soil organic carbon sequestration across the highlands of Ethiopia, Environ. Dev. Sustain., 24, 131– 5162, https://doi.org/10.1007/s10668-021-01653-0, 2022. Alemayehu, R., Van Daele, K., De Paepe, P., Dumon, M., Deckers, J., Asfawossen, A., and Van Ranst, E.: Characterizing weathering intensity and trends of geological materials in the Gilgel Gibe catchment, southwestern Ethiopia, J. Afr. Earth Sci., 99, 568– 580, https://doi.org/10.1016/j.jafrearsci.2014.05.012, 2014. Ali, A., Abayneh, E., and Sheleme, B.: Characterizing soils of Delbo Wegene watershed, J. Soil Sci. Environ. Manage., 1, 184– 199, 2010. Ali, A., Tamene, L., and Erkossa, T.: Identifying, Cataloguing, and Mapping Soil and Agronomic Data in Ethiopia, CIAT Publi- cation No. 506, International Center for Tropical Agriculture (CIAT), Addis Ababa, Ethiopia, https://hdl.handle.net/10568/ 110868 (last access: 21 November 2021), 2020. Asmamaw, L. and Mohammed, A.: Characteristics and classifica- tion of the soils of Gerado catchment, Northeastern Ethiopia, Ethiopian Journal of Natural Resources, 12, 1–22, 2012. Batjes, N. H., Ribeiro, E., and van Oostrum, A.: Standardised soil profile data to support global mapping and modelling (WoSIS snapshot 2019), Earth Syst. Sci. Data, 12, 299–320, https://doi.org/10.5194/essd-12-299-2020, 2020. Baveye, P. C., Jacques, B., and John, G.: Soil “Ecosys- tem” Services and Natural Capital: Critical Appraisal of Re- search on Uncertain Ground, Front. Environ. Sci., 4, 41, https://doi.org/10.3389/fenvs.2016.00041, 2016. Belay, T.: Characteristics and Landscape relationships of Vertisols and Vertic Luvisols of Melbe, Tigray, Ethiopia, SINET, 19, 93– 115, 1996. Belay, T.: Variabilities of Soil Catena on Degraded Hill Slopes of Wtiya Catchment, Wello, Ethiopia, SINET, 20, 151–175, 1997. Belay, T.: Pedogenesis and soil-geomorphic relationships on the Piedmont slopes of Wurgo Valley, South Welo, Ethiopia, SINET, 21, 91–111, 1998. Belay, T.: Characteristics and classification of soils of Gora Daget forest, South welo highlands, Ethiopia, SINET, 23, 35–51, 2000. Berhanu, D.: A survey of studies conducted about soil resources ap- praisal and evaluation for rural development in Ethiopia, Institute of Agricultural Research, Addis Ababa, Ethiopia, 1980. Berhanu, D.: The soils of Ethiopia: Annotated bibliography, Re- gional Soil Conservation Unit (RSCU), Swedish International Development Authority (SIDA), Tech. handbook no. 9, 1994. Berhanu, D. and Ochtman, L.: Soil resource appraisal and evalu- ation studies for rural development in Ethiopia, meeting of the east African sub-committee for soil correlation and land evalua- tion, Nairobi, Kenya, FAO World Soil Resources Rep. 46, 63–70, 1974. Billi, P.: Geomorphological landscapes of Ethiopia, in: Landscapes and Landforms of Ethiopia, World Geomorphological Land- scapes, Springer, Dordrecht, 3–32, https://doi.org/10.1007/978- 94-017-8026-1_1, 2015. Breiman, L.: RandomForests, Mach. Learn., 45, 5–32, https://doi.org/10.1023/A:1010933404324, 2001. Brungard, C. W., Boettinger, J. J., Duniway, M. C., Wills, S. A., and Edwards, W. T. C.: Machine learning for predicting soil classes in three semi-arid landscapes, Geoderma, 239–240, 68– 83, https://doi.org/10.1016/j.geoderma.2014.09.019, 2015. SOIL, 10, 189–209, 2024 https://doi.org/10.5194/soil-10-189-2024 https://doi.org/10.1016/j.geoderma.2006.05.006 https://doi.org/10.1016/j.catena.2017.12.020 https://doi.org/10.1007/s10668-021-01653-0 https://doi.org/10.1016/j.jafrearsci.2014.05.012 https://hdl.handle.net/10568/110868 https://hdl.handle.net/10568/110868 https://doi.org/10.5194/essd-12-299-2020 https://doi.org/10.3389/fenvs.2016.00041 https://doi.org/10.1007/978-94-017-8026-1_1 https://doi.org/10.1007/978-94-017-8026-1_1 https://doi.org/10.1023/A:1010933404324 https://doi.org/10.1016/j.geoderma.2014.09.019 A. Ali et al.: EthioSoilGrids 1.0 207 CoW (Coalition of the Willing): Coalition of the Willing for soil and agronomy data access, management and sharing, Data Shar- ing Guidelines, Ethiopian Institute of Agricultural Research (EIAR), Addis Ababa, Ethiopia, 28 pp., https://hdl.handle.net/ 10568/107988 (last access: 5 December 2021), 2020. Conrad, O., Bechtel, B., Bock, M., Dietrich, H., Fischer, E., Gerlitz, L., Wehberg, J., Wichmann, V., and Böhner, J.: System for Auto- mated Geoscientific Analyses (SAGA) v. 2.1.4, Geosci. Model Dev., 8, 1991–2007, https://doi.org/10.5194/gmd-8-1991-2015, 2015. Dinku, T., Block, P., Sharoff, J., Hailemariam, K., Osgood, D., del Corral, J., Rémi Cousin, R., and Thomson, M. C.: Bridging crit- ical gaps in climate services and applications in Africa, Earth Perspectives, 1, 1–13, https://doi.org/10.1186/2194-6434-1-15, 2014. Donahue, R. L.: Ethiopia: Taxonomy, cartography and ecology of soils, Michigan State Univ., African Stud. Center and Inst. Int. Agric., Comm., Ethiopian Stud., Occasional Papers Series, Monograph 1, 1972. Driessen, P. M., Deckers, J., Spaargaren, O., and Nachtergaele, F.: Lecture notes on the major soils of the world, world soil re- sources reports No. 94, FAO, Rome, https://edepot.wur.nl/82729 (last access: 12 December 2021), 2001. Elias, E.: Soils of the Ethiopian Highlands: Geomorphology and Properties, CASCAPE Project, ALTERRA, Wageningen UR, the Netherlands, https://library.wur.nl/WebQuery/isric/2259099 (last access: 11 November 2021), 2016. Enyew, B. D. and Steeneveld, G. J.: Analysing the impact of topography on precipitation and flooding on the Ethiopian highlands, J. Geol. Geosci., 3, https://gert-jan.steeneveld.wur.nl/ enyewsteeneveld2014.pdf (last access: 13 August 2021), 2014. Erkossa, T., Laekemariam, F., Abera, W., and Tamene, L.: Evolu- tion of soil fertility research and development in Ethiopia: From reconnaissance to data-mining approaches, Exp. Agr., 58, E4, https://doi.org/10.1017/S0014479721000235, 2022. FAO: Assistance to Land Use-Planning, Ethiopia: Provisional Soil Association Map of Ethiopia, Field document No. 6, The United Nations Development Programme and Food and Agriculture Or- ganization, FAO, Rome, https://www.fao.org/3/ar767e/ar767e. pdf (last access: 5 July 2021), 1984a. FAO: Assistance to Land Use-Planning, Ethiopia: Geomorphology and soils, Field Document AG DP/ETH/78/003, The United Na- tions Development Programme and FAO, FAO, Rome, 1984b. FAO: The Soil and Terrain Database for north-eastern Africa, Crop production systems zones of the GAD sub region, Land and wa- ter digital media series no. 2, FAO, Rome, Italy, 1998. FAO: Guideline for Soil Description, 4th Edn., FAO, Rome, Italy, https://www.fao.org/publications/card/en/c/ 903943c7-f56a-521a-8d32-459e7e0cdae9/ (last access: 23 February 2021), 2006. Fazzini, M., Bisci, C., and Billi, P.: The Climate of Ethiopia, in: Landscapes and Landforms of Ethiopia, World Geomorpholog- ical Landscapes, edited by: Billi, P., Springer, Dordrecht, the Netherlands, 65–87, https://doi.org/10.1007/978-94-017-8026- 1_3, 2015. Fikre, M.: Pedogenesis of major volcanic soils of the southern cen- tral Rift Valley region, Ethiopia, MSc Thesis, 270 pp., University of Saskatchewan, Saskatoon, Canada, 2003. Fikru, A.: Soil resources of Ethiopia, in: Natural Resources Degra- dation a Challenge to Ethiopia, First Natural Resources Conser- vation Conference, Institute of Agricultural Research (IAR), 7–8 February 1980, Addis Ababa, Ethiopia, 1980. Fikru, A.: Need for Soil Survey Studies, in: Proceedings of the first soils science research review workshop, 11–14 February 1987, Addis Ababa, Ethiopia, 1988. Hengl, T. and MacMillan, R. A.: Predictive Soil Mapping with R, OpenGeoHub foundation, Wageningen, the Netherlands, https: //soilmapper.org/ (last access: 14 September 2021), ISBN 978-0- 359-30635-0, 2019. Hengl, T., Heuvelink, G. B. M., Kempen, B., Leenaars, J. G. B., Walsh, M. G., Shepherd, K. D., Sila, A., MacMillan, R. A., Mendes de Jesus, J., Tamene, L., and Tondoh, J. E.: Mapping soil properties of Africa at 250 m resolution: random forest signifi- cantly improve current predictions, PLoS ONE, 10, e0125814, https://doi.org/10.1371/journal.pone.0125814, 2015. Hengl, T., Mendes de Jesus, J., Heuvelink, G. B., Ruiperez Gonzalez, M., Kilibarda, M., Blagotić, A., Shangguan, W., Wright, M. N., Geng, X., Bauer-Marschallinger, B., Gue- vara, M. A., Vargas, R., MacMillan, R. A., Batjes, N. H., Leenaars, J. G., Ribeiro, E., Wheeler, I., Mantel, S., and Kempen, B.: SoilGrids250m: Global gridded soil informa- tion based on machine learning, PloS one, 12, e0169748, https://doi.org/10.1371/journal.pone.0169748, 2017. Hengl, T., Nussbaum, M., Wright, M. N., Heuvelink, G. B. M., and Gräler, B.: Random forest as a generic framework for predic- tive modeling of spatial and spatio-temporal variables, PeerJ, 6, e5518, https://doi.org/10.7717/peerj.5518, 2018. Hengl, T., Miller, M., Križan, J., Shepherd, K. D., Sila, A., Kilibarda, M., Antonijevi´c, O., Glušica, L., Dobermann, A., Haefele, S. M., McGrath, S. P., Acquah, G. E., Collinson, J., Parente, L., Sheykhmousa, M., Saito, K., Johnson, J. M., Chamberlin, J., Silatsa, F., Yemefack, M., Wendt, J., MacMil- lan, R. A., Wheeler, I., and Crouch, J.: African soil prop- erties and nutrients mapped at 30 m spatial resolution using two-scale ensemble machine learning, Sci. Rep., 11, 6130, https://doi.org/10.1038/s41598-021-85639-y, 2021. Heung, B., Hung, C. H., Zhang, J., Knudby, A., Bulmer, C. E., and Schmidt, M. G.: An overview and comparison of machine- learning techniques for classification purposes in digital soil mapping, Geoderma, 265, 62–77, 2016. Hounkpatin, K. O. L., Schmidt, K., Stumpf, F., Forkuor, G., Behrens, T., Scholten, T., Amelung, W., and Welp, G.: Pre- dicting reference soil groups using legacy data: A data pruning and Random Forest approach for tropical environ- ment (Dano catchment, Burkina Faso), Sci. Rep., 8, 9959, https://doi.org/10.1038/s41598-018-28244-w, 2018. Hurni, H.: Agro-ecological Belts of Ethiopia: Explanatory Notes on three maps at a scale of 1 : 1,000,000, Soil Cons. Res. Pro., Uni- versity of Bern, (Switzerland) in Association with the Ministry of Agriculture, Addis Ababa, https://edepot.wur.nl/484855 (last access: 6 June 2021), 1998. Iticha, B. and Chalsissa, T.: Digital soil mapping for site- specific management of soils, Geoderma, 351, 85–91, https://doi.org/10.1016/j.geoderma.2019.05.026, 2019. IUSS WRB (IUSS Working Group): World Reference Base for Soil Resources 2014, update 2015 International soil classifi- cation system for naming soils and creating legends for soil https://doi.org/10.5194/soil-10-189-2024 SOIL, 10, 189–209, 2024 https://hdl.handle.net/10568/107988 https://hdl.handle.net/10568/107988 https://doi.org/10.5194/gmd-8-1991-2015 https://doi.org/10.1186/2194-6434-1-15 https://edepot.wur.nl/82729 https://library.wur.nl/WebQuery/isric/2259099 https://gert-jan.steeneveld.wur.nl/enyewsteeneveld2014.pdf https://gert-jan.steeneveld.wur.nl/enyewsteeneveld2014.pdf https://doi.org/10.1017/S0014479721000235 https://www.fao.org/3/ar767e/ar767e.pdf https://www.fao.org/3/ar767e/ar767e.pdf https://www.fao.org/publications/card/en/c/903943c7-f56a-521a-8d32-459e7e0cdae9/ https://www.fao.org/publications/card/en/c/903943c7-f56a-521a-8d32-459e7e0cdae9/ https://doi.org/10.1007/978-94-017-8026-1_3 https://doi.org/10.1007/978-94-017-8026-1_3 https://soilmapper.org/ https://soilmapper.org/ https://doi.org/10.1371/journal.pone.0125814 https://doi.org/10.1371/journal.pone.0169748 https://doi.org/10.7717/peerj.5518 https://doi.org/10.1038/s41598-021-85639-y https://doi.org/10.1038/s41598-018-28244-w https://edepot.wur.nl/484855 https://doi.org/10.1016/j.geoderma.2019.05.026 208 A. Ali et al.: EthioSoilGrids 1.0 maps, World Soil Resources Reports No. 106, FAO, Rome, https: //www.fao.org/3/i3794en/I3794en.pdf (last access: 11 February 2019), 2015. Kempen, B., Brus, D. J., Heuvelink, G. B. M., and Stoorvogel, J. J.: Updating the 1 : 50,000 Dutch soil map using legacy soil data: A multinomial logistic regression approach, Geoderma, 151, 311– 326, https://doi.org/10.1016/j.geoderma.2009.04.023, 2009. Kempen, B., Brus, D. J., Stoorvogel, J. J., Heuvelink, G. B. M., and de Vries, F.: Efficiency comparison of conventional and digital soil mapping for updating soil maps, Soil Sci. Soc. Am. J., 76, 2097–2115, https://doi.org/10.2136/sssaj2011.0424, 2012. Kuhn, M.: Building predictive Models in R using the caret package, J. Stat. Softw., 28, 1–26, https://doi.org/10.18637/jss.v028.i05, 2008. Leenaars, J. G. B., van Oostrum, A. J. M., and Ruiperez, G. M.: Africa Soil Profiles Database, Version 1.2. A compilation of georeferenced and standardised legacy soil profile data for Sub- Saharan Africa (with dataset), ISRIC Report 2014/01, Africa Soil Information Service (AfSIS) project and ISRIC – World Soil Information, Wageningen, https://library.wur.nl/WebQuery/isric/ 2259472 (last access: 7 August 2023), 2014. Leenaars, J. G. B., Elias, E., Wösten, J. H. M., Ruiperez- González, M., and Kempen, B.: Mapping the ma- jor soil-landscape resources of the Ethiopian High- lands using random forest, Geoderma, 361, 114067, https://doi.org/10.1016/j.geoderma.2019.114067, 2020a. Leenaars, J. G. B., Ruiperez, M., González, M., Kempen, B., and Mantel, S.: Semi-detailed soil resource survey and mapping of REALISE woredas in Ethiopia, Project report to the BENEFIT- REALISE programme, December 2020, ISRIC – World Soil In- formation, Wageningen, The Netherlands, https://www.isric.org/ projects/realise-survey-and-mapping-soil-resources (last access: 18 October 2021), 2020b. McBratney, A. B., Santos, M. M., and Minasny, B.: On digital soil mapping, Geoderma, 117, 3–52, 2003. Mesfin, A.: Nature and Management of Ethiopian Soils, 272 pp., Alamaya University of Agriculture, Alamaya, Ethiopia, 1998. Mishra, B. B., Gebrekidan, H., and Kibret, K.: Soils of Ethiopia: Perception, appraisal and constraints in relation to food secu- rity, International journal of food, agriculture and environment, 2, 269–279, 2004. Mitiku, H.: Genesis, characteristic and classification of the Central Highland soils of Ethiopia, PhD Thesis, 399 pp., State University of Ghent, Belgium, 1987. Mohammed, A. and Belay, T.: Characteristics and classification of the soils of the Plateau of Simen Mountains National Park (SMNP), Ethiopia, SINET, 31, 89–102, 2008. Mohammed, A. and Solomon, T.: Characteristics and fertility qual- ity of the irrigated soils of Sheneka, Ethiopia, Ethiopian Journal of Natural Resources, 12, 1–22, 2012. Mulder, V. L., Lacoste, M., Richer de Forges, A. C., and Arrouays, D.: GlobalSoilMap France: high resolution spatial modelling the soils of France up to two meter depth, Sci. Total Environ., 573, 1352–1369, 2016. Mulualem, A., Gobezie, T. B., Kasahun, B., and Demese, M.: Recent Developments in Soil Fertility Mapping and Fertilizer Advisory Services in Ethiopia, A Position Paper, https://www. researchgate.net/publication/327764748/ (last access: 7 October 2021), 2018. Mulugeta, T., Seid, A., Kefyialew, T., Mulugeta, F., and Tadla, G.: Characterization and Classification of Soils of Askate Subwater- shed, Northeastern Ethiopia, Agri. For. Fisheries, 10, 112–122, https://doi.org/10.11648/j.aff.20211003.13, 2021. Nyssen, J., Tielens, S., Tesfamichael, G., Tigist, A., Kassa, T., Wauw, J., Degeyndt, K., Descheemaeker, K., Kassa, A., Mitiku, H., and Amanuel, Z.: Understanding spatial patterns of soils for sustainable agriculture in northern Ethiopia’s tropical mountains, PLoS ONE, 14, e0224041, https://doi.org/10.1371/journal.pone.0224041, 2019. Poggio, L., de Sousa, L. M., Batjes, N. H., Heuvelink, G. B. M., Kempen, B., Ribeiro, E., and Rossiter, D.: SoilGrids 2.0: pro- ducing soil information for the globe with quantified spatial un- certainty, SOIL, 7, 217–240, https://doi.org/10.5194/soil-7-217- 2021, 2021. QGIS Development Team: QGIS Geographic Information System, Open Source Geospatial Foundation Project, https://qgis.org/en/ site/ (last access: 17 August 2021), 2021. R Core Team: R: A Language and Environment for Statistical Com- puting, R Foundation for Statistical Computing, Vienna, http: //www.R-project.org/ (last access: 14 September 2021), 2020. Rossiter, D. G., Poggio, L., Beaudette, D., and Libohova, Z.: How well does digital soil mapping represent soil geogra- phy? An investigation from the USA, SOIL, 8, 559–586, https://doi.org/10.5194/soil-8-559-2022, 2022. Sheleme, B.: Topographic positions and land use impacted soil properties along Humbo Larena-Ofa Sere toposequence, South- ern Ethiopia, Journal of Soil Science and Environmental Man- agement, 8, 135–147, https://doi.org/10.5897/JSSEM2017.0643, 2017. Shi, J., Yang, L., Zhu, A.-X., Qin, C., Liang, P., Zeng, C., and Pei, T.: Machine-Learning Variables at Different Scales vs. Knowledge-based Variables for Mapping Multi- ple Soil Properties, Soil Sci. Soc. Am. J., 82, 645–656, https://doi.org/10.2136/sssaj2017.11.0392, 2018. Shimeles, D., Mohamed, A., and Abayneh, E.: Characteristics and classification of the soils of Tenocha Wenchacher Micro catch- ment, South west Shewa, Ethiopia, Ethiopian Journal of Natural Resources, 9, 37–62, 2007. Soil Science Division Staff: Soil survey manual, edited by: Ditzler, C., Scheffe, K., and Monger, H. C., USDA Hand- book 18, Government Printing Office, Washington, D.C., USA, https://www.nrcs.usda.gov/sites/default/files/2022-09/ The-Soil-Survey-Manual.pdf (last access: 6 October 2020), 2017. Svetnik, V., Liaw, A., Tong, C., Culberson, J. C., Sheridan, R. P., and Feuston, B. P.: Random forest: a classification and regression tool for compound classification and QSAR modeling, J. Chem. Inf. Comp. Sci., 43, 1947–1958, https://doi.org/10.1021/ci034160g, 2003. Tamene, L., Erkossa, T., Tafesse, T., Abera, W., and Schultz, S.: A coalition of the willing powering data-driven solutions for Ethiopian agriculture, CIAT Publication No. 518, CIAT, Addis Ababa, Ethiopia, 2021. Tamene, L. D., Amede, T., Kihara, J., Tibebe, D., and Schulz, S.: A review of soil fertility management and crop response to fertil- izer application in Ethiopia: towards development of site- and context-specific fertilizer recommendation, CIAT Publication No. 443, International Center for Tropical Agriculture (CIAT), SOIL, 10, 189–209, 2024 https://doi.org/10.5194/soil-10-189-2024 https://www.fao.org/3/i3794en/I3794en.pdf https://www.fao.org/3/i3794en/I3794en.pdf https://doi.org/10.1016/j.geoderma.2009.04.023 https://doi.org/10.2136/sssaj2011.0424 https://doi.org/10.18637/jss.v028.i05 https://library.wur.nl/WebQuery/isric/2259472 https://library.wur.nl/WebQuery/isric/2259472 https://doi.org/10.1016/j.geoderma.2019.114067 https://www.isric.org/projects/realise-survey-and-mapping-soil-resources https://www.isric.org/projects/realise-survey-and-mapping-soil-resources https://www.researchgate.net/publication/327764748/ https://www.researchgate.net/publication/327764748/ https://doi.org/10.11648/j.aff.20211003.13 https://doi.org/10.1371/journal.pone.0224041 https://doi.org/10.5194/soil-7-217-2021 https://doi.org/10.5194/soil-7-217-2021 https://qgis.org/en/site/ https://qgis.org/en/site/ http://www.R-project.org/ http://www.R-project.org/ https://doi.org/10.5194/soil-8-559-2022 https://doi.org/10.5897/JSSEM2017.0643 https://doi.org/10.2136/sssaj2017.11.0392 https://www.nrcs.usda.gov/sites/default/files/2022-09/The-Soil-Survey-Manual.pdf https://www.nrcs.usda.gov/sites/default/files/2022-09/The-Soil-Survey-Manual.pdf https://doi.org/10.1021/ci034160g A. Ali et al.: EthioSoilGrids 1.0 209 Addis Ababa, Ethiopia, https://hdl.handle.net/10568/82996 (last access: 17 July 2021), 2017. Tefera, M., Chernet, T., and Workineh, H.: Geological Map of Ethiopia, Addis Ababa, Ethiopia: Federal Democratic Republic of Ethiopia, Ministry of Mines and Energy, Ethiopian Institute of Geological Surveys, Addis Ababa, Ethiopia, 1996. Tolossa, A. R.: Vertic Planosols in the Highlands of South-Western Ethiopia: Genesis, Characteristics and Use, Ghent University, Faculty of Sciences, Ghent, Belgium, http://hdl.handle.net/1854/ LU-5991501 (last access: 23 June 2021), 2015. Vågen, T. G.: Africa Soil Information Service: Hydrologically Cor- rected/Adjusted SRTM DEM (AfrHySRTM), International Cen- ter for Tropical Agriculture – Tropical Soil Biology and Fertil- ity Institute (CIAT-TSBF), World Agroforestry Centre (ICRAF), Center for International Earth Science Information Network (CIESIN), Columbia University, https://cmr.earthdata.nasa.gov/ search/concepts/C1214155420-SCIOPS (last access: 18 Febru- ary 2021), 2010. Van de Wauw, J., Baert, G., Moeyersons, J., Nyssen, J., De Geyndt, K., Nurhussen, T., Amanauel, A., Poesen, J., and Deckers, J.: Soil-landscape relationships in the basalt-dominated highlands of Tigay, Ethiopia, Catena, 75, 117–127, 2008. Virgo, K. J. and Munro, R. N.: Soil and erosion features of the Cen- tral Plateau region of Tigrai, Ethiopia, Geoderma, 20, 131–157, 1978. Wadoux, A. M. J. C., Minasny, B., and McBratney, A. B.: Ma- chine learning for digital soil mapping: Applications, chal- lenges and suggested solutions, Earth Sci. Rev., 210, 103359, https://doi.org/10.31223/osf.io/8eq6s, 2020. Westphal, E.: Agricultural Systems in Ethiopia, Agricultural Re- search Report 826, https://edepot.wur.nl/361350 (last access: 19 March 2021), 1975. WLRC-AAU (Water and Land Resource Centre-Addis Ababa Uni- versity): Land use/land cover mapping, change detection and characterization of Ethiopia, Water Land Resource Centre, Ad- dis Ababa University, Addis Ababa, Ethiopia, 2018. Wright, M. N. and Ziegler, A.: Ranger: A fast implementation of random forests for high dimensional data in C++ and R, J. Stat. Softw., 77, 1–17, https://doi.org/10.18637/jss.v077.i01, 2017. Zwedie, E.: Selected physical, chemical, and mineralogical charac- teristics of major soils occurring in Chercher highlands, Eastern Ethiopia, Ethiopian Journal of Natural Resources, 1, 173–185, 1999. Zewdie, E.: Properties of major Agricultural Soils of Ethiopia, Lam- bert Academic Publishing, Germany, 2013. https://doi.org/10.5194/soil-10-189-2024 SOIL, 10, 189–209, 2024 https://hdl.handle.net/10568/82996 http://hdl.handle.net/1854/LU-5991501 http://hdl.handle.net/1854/LU-5991501 https://cmr.earthdata.nasa.gov/search/concepts/C1214155420-SCIOPS https://cmr.earthdata.nasa.gov/search/concepts/C1214155420-SCIOPS https://doi.org/10.31223/osf.io/8eq6s https://edepot.wur.nl/361350 https://doi.org/10.18637/jss.v077.i01 Abstract Introduction Methods The study area Legacy soil profile data collation and preparation Preparation and selection of environmental covariates Covariate acquisition and preparation Covariate selection Modeling and mapping soil types or reference soil groups Model tuning and quantitative evaluation Software and computational framework Expert evaluation of spatial patterns of the beta-version soil map Results and discussion Soil profile datasets Modeling and mapping Variable importance Model performance Modeling and mapping: EthioSoilGrids version 1.0 Expert validation of the soil map Evaluation of results, limitations, and future direction Conclusions Appendix A: Legacy soil profile data distribution Appendix B: Environmental covariates Appendix C: Probability of occurrence of reference soil groups Data availability Author contributions Competing interests Disclaimer Acknowledgements Financial support Review statement References