Soil legacy data rescue via GlobalSoilMap and other international and national initiatives

Legacy soil data have been produced over 70 years in nearly all countries of the world. Unfortunately, data, information and knowledge are still currently fragmented and at risk of getting lost if they remain in a paper format. To process this legacy data into consistent, spatially explicit and continuous global soil information, data are being rescued and compiled into databases. Thousands of soil survey reports and maps have been scanned and made available online. The soil profile data reported by these data sources have been captured and compiled into databases. The total number of soil profiles rescued in the selected countries is about 800,000. Currently, data for 117, 000 profiles are compiled and harmonized according to GlobalSoilMap specifications in a world level database (WoSIS). The results presented at the country level are likely to be an underestimate. The majority of soil data is still not rescued and this effort should be pursued. The data have been used to produce soil property maps. We discuss the pro and cons of top-down and bottom-up approaches to produce such maps and we stress their complementarity. We give examples of success stories. The first global soil property maps using rescued data were produced by a top-down approach and were released at a limited resolution of 1km in 2014, followed by an update at a resolution of 250m in 2017. By the end of 2020, we aim to deliver the first worldwide product that fully meets the GlobalSoilMap specifications.


Introduction
Unprecedented demands are being placed on the world's soil resources [1][2][3][4][5]. Responding to these challenging demands requires relevant, reliable and applicable information [6][7]. Unfortunately, data, information and knowledge of the world's soil resources are currently fragmented and even at risk of being lost or forgotten, due to the costs involved with maintaining analogue paper based soil data holdings and archives and the physical deterioration or disintegration of these paper based sources, especially in tropical conditions, together with the risk of the storage buildings (fire, storm, war…). If this were to happen, it would be a disaster not only because soil data are central to many of the major global issues the world is facing [3][4][5], but also because tremendous resources went into the efforts to collect and analyze these data and comparable future soil data collection would certainly be cost prohibitive in many countries and not justifiable without first having made optimal use of earlier collected data.. Therefore, existing legacy and heritage soil survey data holdings across the world are being rescued, compiled and processed into a common, consistent and geographically contiguous applicable dataset of relevant soil properties covering the planet's land surface. The legacy soil data holdings, including tens of thousands of published soil reports and soil maps, have been produced over 70 years by nearly all countries and numerous institutions using different procedures, laboratory methods, standards, scales, taxonomic classification systems and geo-referencing systems. They represent a true myriad of primary data (millions of soil profile point observations) and secondary data (derived properties and conventional soil polygon maps).
Hence, obtaining the required amount of primary soil data to produce the above mentioned products, by sampling through new soil surveys, would entail astronomic costs. In comparison, it is relatively cost efficient to utilize existing soil data and make them available and suitable for use. However, one of the major challenges is to integrate the best available legacy data from various local and national sources. This challenge became vital to the GlobalSoilMap project as it relies upon soil data rescue from a myriad of fragmented analogue soil data holdings worldwide to a globally coherent and complete soil information product.

A C C E P T E D M A N U S C R I P T
Rescuing soil data includes three major steps: 1) the maintenance of libraries and holdings including scanning of thousands and thousands of analogue paper reports and maps into digital formats and assigning metadata to each object, allowing each object to be queryable, accessible and available online. In addition, it is also ensuring the safety of the data through proper backup of existing digital data entries. 2) compilation of the soil data under a common standard from the rescued data sources. This is done by entry and collation of legacy soil profiles and data (e.g. lineage, point location and year of recording, soil classification and, for soil depth intervals, soil morphologic observations and soil analytical measurements including values, units and methods used) from soil reports into a dedicated soil profile database and by digitizing legacy soil maps from published paper soil maps into a digital soil polygon database, followed by data standardization, harmonization and quality control.
3) when compiled under a common standard the legacy data are then used to generate gridded soil property maps within the GlobalSoilMap initiative according to the GlobalSoilMap specifications [11].
The gridded maps are subsequently made freely available online to a wider user community. This community is potentially very large and includes soil scientists and soil mappers, agronomists, climate change modelers, biodiversity conservation specialists, economists, hydrologists, land-use planners, governments and policy makers, among others.
In this paper we provide an overview of the recent soil data rescuing activities linked to the GlobalSoilMap project and other international and national initiatives. Finally, we give some examples of success stories at the world, continental and country level from selected projects that achieved Soil Grids or final GlobalSoilMap products, thereby demonstrating the importance of data rescue activities of existing soil data. properties in a gridded format, without giving access to the original soil profile point data that was used for these predictions. The final GlobalSoilMap product represents an updateable outcome i.e.

A C C E P T E D M A N U S C R I P T
when new or additional soil profile data are available a new updated soil map can be quickly produced thus continuously improving the accuracy of the collaborative product.
The final product will be a globally and harmonized distributed grid map. However, besides data availability, achieving these global results would require distributed datasets to be harmonized at national, continental and global levels [e.g. [12][13][14]. In order to achieve this goal the GlobalSoilMap project developed guidelines and specifications [11]. Distributed and strong computational capacities are needed to generate the maps at aimed for resolution.
Regardless of being national, continental and/or global, the following data rescue and grid map production steps are generally necessary, including references to GlobalSoilMap specific activities: 1. Identify and rescue legacy soil reports and maps and make digital scans with metadata publicly available (analogue carriers of data), A general framework has been proposed by Minasny and McBratney [15] and the complete process is fully described in the GlobalSoilMap specifications [11] and in a synthesis paper [7]. In this paper, we illustrate steps 1 to 4 and the efforts made for rescuing the primary soil data; we then provide a few examples of success stories achieving final products derived from the rescued primary data (steps 5-8) and we discuss the potential of future soil profile data rescue and the main issues related to their use.  [16-18; 134] and those data are, for the GlobalSoilMap properties, all standardized and available at www.isric.org/explore/wosis/accessing-wosis-deriveddatasets. In absolute terms, the total of soil profiles existing and stored in the selected countries databases is obviously much higher and is currently about 800,000. Regrettably, large numbers of soil profiles stored in many country databases are yet not standardized and harmonized according to a global standard and are not shared. Note that the numbers given in the table of soil profiles at the world level, at the continental level (ISRIC [16][17][18], Sub-Saharan Africa [19][20][21], Latin America and Caribbean [22], European Union [23][24][25][26]) and at the country level cannot be summed together. Large numbers of profiles compiled in the world database originate from the continental databases which originate to large extents from the national ones and from national survey reports. The difference in the number of data in the WoSIS database (World Soil Information Service) and the continental databases compared to the selected countries data is likely due to the time and capacity needed to identify the data sources and to capture, translate and harmonize the data, which is a job most efficiently and effectively done by the national data holders. Indeed, as stated by Rossiter [27], much of the data are still proprietary and regrettably not generally accessible and unfortunately the question of open access to primary soil data is not resolved. Nevertheless, considerable successful efforts have been made since 2009 by ISRIC to rescue and add value to soil data in many countries where quality soil data have been generated and reported over the years, but where the data infrastructure

Synthesis of legacy soil profile data
is not up to standards and the data is in great danger of being lost (e.g. Sub-Saharan countries, [19][20][21]). Overall, we observe large discrepancies between countries, either in the total number of soil profiles compiled or in the efforts put in place in data rescuing, over the years 2009 and 2015 . Table 2 provides the links to databases when they are available on the web. Database models and management systems are described by Batjes [17,18,134] at the world level, by Leenaars et al., [19,20] for Africa and by Hiederer [23] and Hollis et al., [25] for Europe. Netherlands [59][60], Denmark [44][45]), whereas some very large countries are just beginning their data rescuing efforts (e.g., Russia [46]).

Soil profile data rescue efforts
In the following sections, we present a few of many soil profile data rescue efforts. We focus on data rescue efforts that have led to final products in line with the GlobalSoilMap specifications.

WoSIS data (World Soil Information Service)
The World Soil Information Service (WoSIS) database is developed at ISRIC [134] within the conceptual framework of the Global Soil Information Facility which facilitates collaborative bottom-up initiatives to process and exchange soil data at the global level (www.isric.org/explore/wosis). Ideally, primary soil profile data are being managed and maintained by the national data owners whereby the data are connected and made queryable online by an interoperable infrastructure through data exchange standards. Since 2009 these standards continue to be defined and developed by the global soil community, but is a very slow process. Anticipating these standards being developed further, the configuration of WoSIS is that of a centralized database which accommodates current, more conventional, data exchange mechanisms between collaborative organizations to collate and harmonize soil data and which therewith meets both short term and long term goals of collaborative soil mapping.

A C C E P T E D M A N U S C R I P T
The databases at the higher level (world, continent) are actually compilations of data, under a common standard, from databases and reports originating at the lower level (national and subnational) shared by collaborative partner organizations. So far, one snapshot of the WoSIS data has been released in July 2016 (http://geonode.isric.org/layers/geonode:wosis 201607 profiles). The world level data are spatially irregularly distributed, with some parts of the world being relatively dense while other parts having still very sparse point data or no data at all (Fig. 2).
This distribution is strongly related to the amount of data previously shared through collaborative projects and to the amounts of data currently published by the various countries and institutions due to current and recent data policies, but is also influenced by limited capacities and a prioritization of the effort. Very large differences are observed between densities at the country level and the density at the world level (for instance in France, Iran, Indonesia). More generally, we hope that a map such as presented in Figure 2 will encourage countries to collaborate through a bottom-up approach and to provide data access to WoSIS and/or to develop and share their own country level products according to the GlobalSoilMap specifications similar to the most recent ones developed in some countries [e.g.
France, Scotland, USA, Australia, Denmark]. The WoSIS data collection effort has proven to be very useful in producing the first world-wide SoilGrids at 1 km resolution [16] followed by a world-wide grid at a 250 m resolution [95]. These global grids were preceded by grids at similar resolution for the Sub-Saharan Africa region [96][97]
The data were captured from 540 data sources with full lineage specified; about 25% of the profiles were extracted from earlier ISRIC datasets, 30% from other digital datasets and 45% from analogue reports (503). It includes data for approximately 140 soil properties, including soil analytical data The data rescue in Sub-Saharan Africa has resulted in gridded soil maps for all primary and derived soil properties mentioned in the GlobalSoilMap specifications [11], including electrical conductivity, bulk density, plant-available water holding capacity and depth l to bedrock and effective root zone depth (for maize) [104][105][106][107]; In this region, legacy data proved particularly relevant, compared to newly sampled topsoil data, 1) to allow cost effective mapping detailed and consistent at both the continental and national extent and 2) to assess the effective depth and volume of the soil in which

France
In France, an important data rescue effort led to a 69% increase of the number of soil profiles data from 2009 to 2015 (Fig. 5) [41][42] giving an impressive coverage at adequate density of the French territory.

Australia
Australia has a rich but non-uniform and incomplete archive of existing soil mapping and site data. The state and territory government agencies are primarily responsible for the collection and management of soil data within their territories, in addition, CSIRO, Universities and Geoscience institutions have collected data and hold records. Thus, there are at least 13 independent and unique soils data management systems, some eight with formal responsibilities for regional, national or specific data A C C E P T E D M A N U S C R I P T [28]. For at least the last 70 years, these agencies have been collecting soil site data, and for some 40 years have used various forms of data systems (in most cases developed within the institution). Before the GlobalSoilMap project initiation, these soil site datasets were not compiled into a consistent data set conforming to a single standard. The GlobalSoilMap project provided the impetus for combining some 281,000 soil profiles into a single uniform database using data interoperability approaches and a consistent database schema for the project data collation [28][29]. Also contained in this database are 2.5 million laboratory measurements. Figure 6 shows the progress between 2009 (the launch of the GlobalSoilMap project) and 2015. Very large areas that had very sparse information in a consistent national collation (for instance in western and northern parts of the country) are now covered by a large amount of soil profile data now available for new mapping and estimation.

Other
It was found that some countries not only rescue soil profile data but also soil descriptions captured by hand auger borings. This is partly the case for France (see http://www.fao.org/soils-portal/soil-survey/soil-maps-and-databases/fao-soil-legacy-maps/en/. During the AfSIS/GlobalSoilMap project [19][20][21], thousands of selected soil reports and maps of Sub-Saharan Africa were scanned at ISRIC and made available online. Moreover, thousands of additional soil maps, and associated soil reports, of Africa were identified from other libraries and holdings in Europe and Africa (i.e., IRD, WOSSAC, FAO, UGhent) and after duplicate removal were added to the ISRIC library collection, including online access to digital scans with full metadata (Fig. 7).
The Africa Soil Maps database represents a spatial inventory of approximately 5,000 legacy soil maps recently made available online at the ISRIC library. Soil maps originating from six European archives and a few African national countries were identified and added to the library through a large effort to harmonize metadata and exclude duplicates (Figure 7). Some legacy soil maps that had been scanned have also been digitized into a GIS-database format, including information about the topology, geometry and legends. The Malawi data has been used by ISRIC for producing a Soil and Terrain Georeferencing and data quality control proved to be major challenges in collating these legacy soil data, and are described in [70][71] the first soil mapping applications in [72]. The national database of Nigerian soil profiles currently contains about1,900 profiles, nearly 50% more soil profiles has been added since 2011 and used for a range of applications [72][73][74] and we expect these additional soil profile data from Nigeria to be made publicly available online with the original collaborative initiative.

India
In India, the National Bureau of Soil Survey and Land Use Planning (NBSS&LUP), under the Indian Council of Agricultural Research (ICAR), is the agency for collecting and generating soil data in India.
With a network of centers throughout the country, the agency has generated soil resources maps at the 1:1,000,000 scale at the country level, at the 1:250,000 at state and union territory levels, at 1:50,000 for 83 out of 640 districts, and at 1:5,000 scale for 70 watersheds. These resource maps provide layer-wise soil information on soil texture, organic carbon contents, pH, nutrients, cation exchange capacity and in limited cases, water holding capacity. There are few other organizations who also compile such data; however, a harmonized and searchable soil database is yet to be developed.

Indonesia
In Indonesia, soil resource inventories have been conducted since 1905 by the Indonesian Centre for Agricultural Land Resource Research and Development (ICALRD) and its colonial and postindependence predecessors for various purposes (e.g. agricultural planning, erosion hazard assessment, and soil fertility monitoring). This has resulted in soil survey reports and soil maps (e.g. [47]). Various databases have been developed to store soil data in Indonesia. As of 2016, 100% of Indonesia is covered by a 1:250,000 scale map and 40% by detailed maps (≤1:50,000 scale). In addition, a land system map at the scale of 1:250,000 is available for the whole country and there is an ongoing effort to scan soil survey reports and hardcopy maps.

South Korea
In South Korea, detailed soil maps (1:25,000) are now available for the entire country, both in hard

A C C E P T E D M A N U S C R I P T
copies and digital format. Furthermore, highly detailed soil maps (1:5,000), surveyed from 1995 to 1999 for the entire country, were digitized and made available for the public, through the website (http://soil.rda.go.kr). Two soil databases were constructed, as part of the soil information system of Korea. The first is a spatial database of computerized soil maps at a variety of scales (1:250,000, 1:50,000, 1:25,000, and 1:5,000). The second database is a parcel-based soil fertility (chemical properties) database, containing around 7,000,000 data objects.

France
In France, a preliminary analysis of national soil information and potential for delivering GlobalSoilMap products has been made in 2013 and published in 2014 [112]. At the end of 2015, a catalogue of 5,854 soil maps became available at http://www.gissol.fr/outils/refersols-340. About half of the collection is currently being digitized and 407 soil maps are accessible as complete database. This effort is a long-term ongoing process, with major emphasis on building a harmonized database. Priority

A C C E P T E D M A N U S C R I P T
is given to maps with scales ranging between 1:250,000 -1:50,000. [41][42].

Scotland
In

Latvia
In Latvia, analogous soil maps  of agricultural land at the scale of 1:10, 000 were digitized and a database was created. The database consists of two data sets: 1) polygon characterization, including the year of mapping, soil type according to genetic classification and the textural group) and 2) soil profile data, including the year of mapping, soil type according to genetic classification, the textural group (topsoil, bottom layer), and integrated textural group (topsoil and bottom layer), pH value, depth of CaCO 3 . Altogether, the database contains data from 543601 polygons and 746 soil profile descriptions [87]. Some attempt was done to convert the soil units from National classification to the WRB 2014. The technical work is finished but the database is not yet publically available due to the discussions in which portal to place it and who will be responsible for its maintenance.

Usefulness and limitations of rescued soil maps for GlobalSoilMap
Soil properties can be derived from both detailed soil maps (generally a cartographic scale of 1:100,000 or more detailed) and soil point data (i.e. measurements down the soil profile at a georeferenced location). When using soil maps only, the most used methods are: extracting soil properties from a soil map, using a spatially weighted measure of central tendency (e.g. the mean), or spatial disaggregation of soil maps (e.g., [38,54,[113][114][115]).
When only soil maps are available, soil properties can be extracted from soil maps according to the distributional concepts underlying the soil mapping units. In some cases, it will be appropriate to estimate soil properties using an area-weighted mean, as was done for example in the United States [51][52]. However, in most circumstances, the original soil map will have information on the factors controlling soil distribution within an individual map unit. This is most commonly based on terrain (e.g. a catena or characteristic toposequence). The widespread availability of fine-resolution terrain variables, now allows the soil properties to be ‗disaggregated' at soil type levels occurring within soil mapping polygons. Recent examples of this kind of approach canbe found in [38,54,[113][114][115]].
An extension of this approach is to use areas where there is a detailed understanding of soil distribution as a basis for extrapolation to a broader domain, examples can be found in [116][117][118].
Moreover, soil map units and soil point data can be used together to improve gridded predictions of soil properties. Soil map units can be used as a co-variate for scorpan kriging (i.e. a prediction method

A C C E P T E D M A N U S C R I P T
using both spatial co-variates linked to the controlling factors of soil distribution and to the points location, [9]), for instance [119][120][121][122][123][124]. This often implies merging different soil map units in order to reduce their number [123][124]. Specific information can be extracted from soil maps (e.g., parent material, broad soil classes, soil textural classes, eg., [124]) and also used as a co-variate. This will often require some merging of classes too. Note that depending on the target soil property the most efficient merging of classes can differ and often requires the soil surveyor expert knowledge. For instance, in France, different parent material classifications may be used as co-variates for soil texture and for pH mapping [124]. Finally, independent predictions from soil maps and from point data can be merged and weighted through ensemble methods (e.g. [98]).
Using soil maps over large territories often requires huge harmonizing efforts. Indeed different soil maps may have been produced by different soil surveyors, having different objectives and various pedological concepts. The scales may also differ between soil maps. For instance huge efforts have been invested in harmonizing the European geographical Soil Database (e.g., [125]) and the US soil map (e.g., [108]). Attempts to update the world soil map using SOTER methodology are still ongoing in various parts of the world (e.g., [111; 126]).
Finally, even if soil maps cannot be considered as truly independent validation data, they are often useful to evaluate some gridded products and to check inconstancies between gridded predictions and expert delineations of broad soil classes.

Success stories
The final goal of the project is to provide a global freely available high-resolution dataset on key soil properties which is either downloadable or accessible through web-services. This dataset will include

World-level
SoilGrids (e.g., [16; 95]) are the first globally consistent and contiguous complete gridded soil properties maps of the world, derived from rescued legacy soil profile data through DSM techniques,

A C C E P T E D M A N U S C R I P T
and was released by ISRIC. Despite some limitations (grid cell area, and rather low accuracy in some areas); they constitute a first proof of concept and example on what can potentially be achieved at the world level. However, they do not describe sufficient variability at short distances. Despite these limitations at the local level, the SoilGrids provide key support for global modeling efforts.
Soilgrids250 m [95] was recently released on the ISRIC website, showing significant improvements compared to the 1 km product. ISRIC is waiting from feedback from countries. However, the number of soil profiles available for model calibration remained limited (only just over 100,000). One of the main advantages of releasing such products may be to identify the parts of the world where data is obviously missing. This may convince countries either to provide data to ISRIC and therewith to the global soil science community, to develop their own bottom-up products through collaborative efforts to fill the gaps, to correct the obvious errors or to simply enhance the accuracy where insufficient for national purposes. Obviously there will also be parts of the world where there will be no data at all or where data has been lost. SoilGrids will therefore be useful to fill these gaps. Another possibility is to collaborate by evaluating and validating global SoilGrids products with national profile datasets or predictions or to make national datasets available to improve the global predictions.

Continent-level
The situation in Sub-Saharan Africa is similar to that of the world level, with two products released: AfSoilgrids1km [96] and AfSoilgrids250m [97]. A considerable effort has been made to rescue soil profile data that were in danger of being lost and that are now compiled into the Africa Soil Profiles database [14,[19][20][21]. This effort involved two full time positions over a period of nearly five years, plus a number of students assisting in the digitization process and collaboration with six countries, including training sessions. The data rescue in this region has resulted in maps for all properties mentioned in the GlobalSoilMap specifications [11].
Considerable efforts have been made in training and raising technical capacity at locations in seven countries as well as more generally through the yearly Springschool and guest research at ISRIC. Confidence Limits. Figure 8 shows the Version 0.1 map of soil organic carbon [52].
Here, the highest amounts of organic carbon are found in north central and north east US, mainly associated with forest and south east mainly associated with wetlands. The US product has been produced by mainly using harmonized soil maps from the Digital General Soil Map of the United States or STATSGO2. This is a broad-based inventory of soils at scales 1:250,000, available online at http://websoilsurvey.nrcs.usda.gov/.
In addition to that, numerous countries have indicated their willingness to join the GlobalSoilMap project.

A C C E P T E D M A N U S C R I P T 7 Discussion
The number of soil profiles available in national databases is likely underestimated, since responses to our questionnaire from a large number of countries were missed. Moreover, rescuing soil data is an ongoing effort and the number of rescued soil profiles is anticipated to increase substantially. Some countries are involved in long-term soil data rescuing efforts and are far from having completed their programmes. France, for instance, continues an effort to to enrich the national soil database. Unfortunately, no organized data archiving systems exist in these countries to integrate these data and make it available for further use, so these data sources remain only in personal datasets. Making the use of the WoSIS database could contribute to solving this issue.
Other countries (e.g., India, China, Russia, South Korea) have indicated their legacy databases were still under construction. Indeed, most of the these countries are still actively searching for legacy soil information with the potential of many survey reports still to be rescued or retrieved. Therefore, it seems that an enormous potential remains in many countries. The largest country of the world, Indeed, very large discrepancies exist among, and even within, national soil databases irrespective of their geographical support (points of polygons). These databases strongly differ in their range of measured soil parameters and in the analytical measurement standards used. Moreover, uniformity in methodology and coverage, albeit existing in some countries, is far from common even among national systems. In view of this situation, it is clear that harmonisation and co-ordination are necessary in order to develop approaches that rescue, harmonize, and curate the existing amount of legacy soil data that is being collected [e.g. 14,17,20,22,35,47,53,79,134]. Furthermore, converting results from different analytical protocols to one standard can be done by applying pedotransfer functions, such as listed in [11], which was recently done in the US for pH and bulk density [12][13] and in Africa for available water holding capacity and root zone depth [105]..
Nevertheless, soil data rescue efforts have already proven effective in delivering harmonized gridded products of soil properties, with various degrees of resolution and accuracy, and in some cases even covering the world. Numerous countries and institutions have indicated their willingness to join the GlobalSoilMap initiative. A new working group of the International Union of Soil Sciences has been recently created at the end of 2016. As the number of rescued soil data will greatly increase in the near future, it will enable us to deliver consistent high quality products more easily, updated when newly collected data become available. We define a process as ‗bottom-up' when it comes from a country level action. Most data rescue programmes are based on curating original data from countries and may therefore be considered as ‗bottom-up'. However, the spatial modelling for prediction can be done at the country level, or at the world level as a whole. One of the major expected outcomes of data rescuing is the encouragement and development of country specific bottom-up products (or ‗mixed' products using ensemble techniques) and capacity development. This should limit the use of generic top-down product approaches, which will nevertheless remain necessary to fill gaps where soil data is missing or lost. We emphasize that GlobalSoilMap is not a static product, but is planned to evolve continuously, as new data or new techniques become available. Legal restrictions related to data property and privacy are serious issues for building an operational worldwide centralized or distributed database of soil profiles and to the complete worldwide and consistent product, useable by global modelers and a host of other users. This is why, when possible, bottom-up approaches in A C C E P T E D M A N U S C R I P T compiling data and producing maps are preferable to top-down.. Another advantage of local modelling is that it may give better results than global modelling which generalizes more the relations between co-variates and soil properties. Indeed, the relative importance of driving factors and co-variates may strongly differ between physiographic areas. This is why utilizing all the data available at country level generally allows to deliver better quality products. It also encourages countries to develop their own capacities, have ownership and support future developments of revised versions of maps representing their mandated country territories. Nevertheless, top-down products, in soil modelling as well as soil data compilation, are certainly useful for GlobalSoilMap as a whole, for a number of reasons: -They provided early proof of concept, -They provide a generic product which is complete and covers the globe, being relevant for global users and updateable through country specific possibly collaborative initiatives, -They allow to fill gaps where soil data is missing or lost, -They provide geographically continuous data products that are synchronized/harmonized at state/country boundaries and will certainly be useful for final worldwide harmonization, -They can be combined with country level products, for instance by using ensemble approaches (refs) Ultimately, the 90x90 m grid resolution sought by GlobalSoilMap, in addition to providing a seamless product for the global modeling community, is aimed to provide suitable data to a wide variety of communities that makes decisions at various levels from local (field) to national scale and beyond.
In this context, the end-user must be informed about the quality of the products, since these maps are predictions which come along with a prediction uncertainty. However, how to properly estimate the prediction uncertainties (and even the uncertainty of the uncertainty) is still a matter of discussion and a question of further research. Several options are described in the GlobalSoilMap specifications [11] and in [129]. Higher level products can be relatively easily validated with lower level data.
Furthermore, there is an ongoing effort to better define the accuracy of predictions [51,78,86,93,[129][130][131] and the sources of uncertainties. Another challenge is how to take into account some large uncertainties, or imprecision in original locations of soil profiles. This is especially relevant and challenging when data of high-resolution are envisioned to be the final products (3 arc-sec). Also, the question of influence on the age of the data rescued has to be solved. Most soil properties are rather stable and have little change (coarse fragments, texture, CEC, soil depth) or change only slowly and A C C E P T E D M A N U S C R I P T steadily over time. However, some properties are rather rapidly changing due to changes in land-use (e.g. pH, soil organic carbon). For instance, a significant change in peat extension in the Netherlands has been recently shown leading to updating soil maps [132]. Moreover, some soil properties may also change very rapidly, at a very local scale, due to farm management practices and thus becoming obsolete for representing the current state of soil. At least, a map of the sampling dates should be added to the GlobalSoilMap specifications. A first draft of this map could be produced rather simply, e.g. by kriging the dates of sampling of the original point data, and would indicate places where data is obviously obsolete.
The issues related to dates not only apply to sampling periods but also to the co-variates used.
Obviously, given the long time needed for soil formation, a large number of co-variates used in digital soil mapping do not reflect the reality at some periods of the pedogenesis. Topographic indexes are generally computed using up to date digital terrain models and do not reflect the various steps of geomorphological changes over time. Current climatic data relevance can also be discussed as many soils developed under largely different climatic periods. Indeed as outlined by Grunwald [10] the time factor is much less used in digital soil mapping than other scorpan factors.Ideally, if GlobalSoilMap products are to be used for monitoring, the products should be harmonized to a common date (e.g.

2010)
, and if funds permit, the products should also be based on newly sampled data. Commonly, most of the current initiatives emphasizing the need for newly sampled data, based on the arguments presented here, focus on collecting new data from topsoil only (e.g. [99][100][101][102][103]). Compared to topsoil sampling, a major advantage of the legacy soil profiles data is that these were sampled to a depth of generally 120 cm or more, providing a more in-depth understanding of soil functions related to various environmental aspects and adequate data for analyses and modelling. Therefore, we recommend that new sampling campaigns sample the full soil profile as well. Indeed, collecting data at different times may be used to assess temporal changes and to perform multi-temporal data updates and queries.
Using legacy soil profiles data, Stockmann et al., [133] recently generated products following GlobalSoilMap specifications and incorporating a dynamic component.

A C C E P T E D M A N U S C R I P T 8 Conclusion
GlobalSoilMap is the first digital soil mapping project having set specifications which have been agreed upon by an international soil science community. Its aim is to cover the entire world with a high resolution grid of predicted key soil properties along with their prediction uncertainties, thereby supporting other scientific disciplines and local management efforts. Significant progress has been achieved since its launch. Data rescue is considered an essential prerequisite to achieve the products and tremendous progress has been made. It is essential that this process be continued; myriads of soil reports and soil maps are certainly still collecting dust on shelves. We encourage soil scientists and librarians to make them available to the soil science community, ideally with digitized georeferenced soil profile data, either at country, continental or world level. Fortunately, numerous countries have indicated their willingness to join the project and continue this important work.
We believe that combining countries and worldwide predictions could lead to a first product completely      ) at the 6 standard depths for continental USA.