Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022 © Author(s) 2022. This work is distributed under the Creative Commons Attribution 4.0 License. A map of global peatland extent created using machine learning (Peat-ML) Joe R. Melton1, Ed Chan2, Koreen Millard3, Matthew Fortier1, R. Scott Winton4,5,6, Javier M. Martín-López7, Hinsby Cadillo-Quiroz8, Darren Kidd9, and Louis V. Verchot7 1Climate Research Division, Environment and Climate Change Canada, Victoria, BC, Canada 2Climate Research Division, Environment and Climate Change Canada, Toronto, ON, Canada 3Geography and Environmental Studies, Carleton University, Ottawa, ON, Canada 4Institute of Biogeochemistry and Pollutant Dynamics, ETH Zurich, 8092 Zurich, Switzerland 5Department of Surface Waters, Eawag, Swiss Federal Institution of Aquatic Science and Technology, 6047 Kastanienbaum, Switzerland 6Department of Earth System Science, Stanford University, Stanford, CA 94305, USA 7Agroecosystems and Sustainable Landscapes Program, Alliance Bioversity-CIAT, Cali, Colombia 8School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA 9Natural Values Science Services, Department of Natural Resources and Environment, Hobart, Tasmania, Australia Correspondence: Joe R. Melton (joe.melton@ec.gc.ca) Received: 21 December 2021 – Discussion started: 14 February 2022 Revised: 4 May 2022 – Accepted: 6 May 2022 – Published: 20 June 2022 Abstract. Peatlands store large amounts of soil carbon generated by comparing Peat-ML against a high-quality, ex- and freshwater, constituting an important component of the tensively ground-truthed map generated by Ducks Unlimited global carbon and hydrologic cycles. Accurate information Canada for the Canadian Boreal Plains region. This compari- on the global extent and distribution of peatlands is presently son suggests our map to be of comparable quality to mapping lacking but is needed by Earth system models (ESMs) to products generated through more traditional approaches, at simulate the effects of climate change on the global carbon least for boreal peatlands. and hydrologic balance. Here, we present Peat-ML, a spa- tially continuous global map of peatland fractional coverage generated using machine learning (ML) techniques suitable for use as a prescribed geophysical field in an ESM. Inputs Copyright statement. The works published in this journal are to our statistical model follow drivers of peatland formation distributed under the Creative Commons Attribution 4.0 License. and include spatially distributed climate, geomorphological This license does not affect the Crown copyright work, which is re-usable under the Open Government Licence (OGL). The and soil data, and remotely sensed vegetation indices. Avail- Creative Commons Attribution 4.0 License and the OGL are able maps of peatland fractional coverage for 14 relatively interoperable and do not conflict with, reduce or limit each other. extensive regions were used along with mapped ecoregions of non-peatland areas to train the statistical model. In addi- © Crown copyright 2022 tion to qualitative comparisons to other maps in the literature, we estimated model error in two ways. The first estimate used the training data in a blocked leave-one-out cross-validation 1 Introduction strategy designed to minimize the influence of spatial au- tocorrelation. That approach yielded an average r2 of 0.73 Peatlands are estimated to cover about three percent of the with a root-mean-square error and mean bias error of 9.11 % land surface but contain approximately a third of the soil and −0.36 %, respectively. Our second error estimate was carbon and roughly a tenth of surface freshwater (Joosten and Clarke, 2002; Jackson et al., 2017) and are vulnerable Published by Copernicus Publications on behalf of the European Geosciences Union. Model description paper 4710 Joe R. Melton et al.: Machine-learning-based peatland extent to destabilization due to climate change and anthropogenic and it misses peatlands in regions where peatland coverage is pressures, including drainage and land use change. Their im- known to exist, e.g. the Republic of Sakha (Yakutia, Russia), portance in the carbon and hydrologic cycles motivates their as it is dependent upon mapping products existing for each inclusion in Earth system models (ESMs) to better under- region. stand their potential impact on the climate system. Since the In describing their dataset, Yu et al. (2010) state that, “ac- land surface of ESMs is grid based, a prerequisite for in- curate true peatland coverage and distribution is not available tegrating peatlands into these models is to define the loca- for many mapped regions”. Over a decade after the publica- tion and the fractional cover of peatlands on the model grid. tion of Yu et al. (2010), this statement remains accurate. Peat- However, peatlands have generally been overlooked in land- lands have traditionally been mapped through field surveys scape databases and their mapping remains challenging (e.g. and manual inspection of aerial photography (e.g. Tarnocai Krankina et al., 2008; Minasny et al., 2019). et al., 2011). These approaches are costly and labour inten- As peatlands are commonly considered a type of wetland sive and become impractical as the study region becomes that contains large amounts of organic carbon in the soil, sev- large or remote. As noted by Loisel et al. (2017), it is also dif- eral studies have set peatland distribution based on maps of ficult to distinguish upland forests from forested peatlands in soil organic matter density (e.g. Wania et al., 2009; Bech- the boreal region and between (sub)arctic tundra vegetation told et al., 2019; Hugelius et al., 2020). However, using soil and peatlands in the higher latitudes using aerial photogra- organic matter databases alone in determining peatland dis- phy. Digital soil mapping (DSM) is an alternative approach tribution tends to overlook the vegetation and subsurface hy- to determining global peatland cover. DSM techniques typi- drology, but most importantly they rely heavily on the fidelity cally combine field surveys with peatland covariates and sta- of the soil carbon dataset. Another approach has been to use tistical models to produce maps of predicted peatland area a soil map together with global wetland maps or inundation (McBratney et al., 2003). Following Minasny et al. (2019), extent maps (e.g. Köchy et al., 2015). These wetland and in- the peatland covariates useful to DSM can be determined undated area databases have mostly been produced through from the drivers of peatland formation, indicators of peat mapping of shallow surface water based on remote-sensing presence, and sensors able to measure the indicators. data, as in the Global Inundation Extent from Multi-Satellites The drivers of peatland formation are scale dependent initiative (GIEMS; Prigent et al., 2007; Papa et al., 2010) and (Limpens et al., 2008) and thus the intended spatial extent the Surface WAter Microwave Product Series (SWAMPS; and mapping resolution of the DSM product is an important Schroeder et al., 2015), or land cover mapping using sur- consideration. For DSM on a regional to global scale, as is face observations and moderate-resolution imaging spectro- the case when mapping for ESM use, the principal drivers radiometer (MODIS) data, as in the Global Lake and Wet- of peatland formation are climate, vegetation, and terrain. lands Database (GLWD-3; Lehner and Döll, 2004). These Minasny et al. (2019) suggest, for these drivers at this spa- wetland mapping products are, however, of limited utility tial scale, that the indicators of peatland presence are climate for peatland modelling applications as they generally do not data (primarily temperature and precipitation); land use and agree well amongst themselves (Melton et al., 2013), which land cover information; and elevation, slope, and terrain at- is also the case for peatland mapping products (as is dis- tributes. Possible sensors for regional- to global-scale map- cussed later) and may exhibit biases depending on how they ping include optical and radar imagery, topographic remote- were generated (see discussion in Bohn et al., 2015). In ad- sensing data (digital elevation models, DEMs), and climate dition, in the boreal zone and some areas of the tropics such datasets. The statistical models used as part of DSM vary, as the Amazon (Lähteenoja and Roucoux, 2010), some peat- but here we use a machine learning (ML) algorithm to derive lands are not inundated, and thus using hydrological char- a global map of peatland extent intended for use in ESM ap- acteristics alone can underestimate their extent (Matthews, plications. As field surveys are impractical to conduct on a 1989; Prigent et al., 2007). Other studies, such as Largeron global scale, we rely upon peatland mapping studies on re- et al. (2018) or Leifeld and Menichetti (2018), have used gional scales to train our ML models and evaluate their re- a global peatland distribution map derived from a paleon- sults. In Sect. 2 we define peatlands in the context of our tological perspective (Yu et al., 2010). However, Yu et al. mapping approach and describe the datasets used for model (2010) is an estimated map of binary polygons that does not training and the ML approach and algorithms used. Section 3 provide quantitative information on fractional coverage. The discusses the results of the ML algorithms and our model most comprehensive global peatland map we are aware of is performance estimation strategy and limitations of our ap- PEATMAP (Xu et al., 2018), which was generated through a proach. Section 4 presents our overall conclusions. meta-analysis of regional-scale mapping products of varying spatial resolution and provenance (general land cover maps, soil databases, and a hybrid expert system). This dataset is not well suited as a peatland mask for ESM use as the res- olution of some of its parent datasets leaves large polygons of complete peatland cover in regions where this is unlikely Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022 Joe R. Melton et al.: Machine-learning-based peatland extent 4711 2 Materials and methods The climate predictors were derived from the TerraClimate dataset (Abatzoglou et al., 2018). TerraClimate is available 2.1 Definition of peatlands at high spatial resolution (1/24◦) and provides monthly cli- mate and climatic water balance variables spanning the 1958 As there is no single, universally adopted definition of peat- to 2015 period. TerraClimate uses the WorldClim dataset lands, we follow Joosten and Clarke (2002) in defining them for high spatial resolution climatic normals, which is com- as areas with or without vegetation that contain a naturally bined with the time-varying climate of the Climate Research accumulating peat layer at the surface. While the definition of Unit Ts4.0 (CRU Ts4.0; Harris et al., 2020), where the time- peat, as defined by the percent dead organic material by dry varying anomalies of CRU Ts4.0 are interpolated to the high- mass, varies considerably in the literature (e.g. Gumbricht resolution climatology of WorldClim. The Japanese 55-year et al., 2017; Page et al., 2011), we choose a more inclusive reanalysis (JRA55; Kobayashi et al., 2015) is used to fill lower minimum value of 30 % to ensure we can capture the in where CRU Ts4.0 has no climate stations contributing diversity of global peatlands. When using peatland mapping to its record (such as parts of South America, Africa, and datasets that contain continuous peat depths (Sect. 2.3), we smaller islands) and was the sole data source for solar radi- have used a minimum thickness of 30 cm of peat to delineate ation and wind speeds. Abatzoglou et al. (2018) notes that peatlands, similar to Gumbricht et al. (2017). This depth limit the water balance model, used to generate some of the vari- is the most common amongst national datasets (but see dis- ables listed in Table 1, is simple and does not account for cussion on exceptions or the implications of different values vegetation heterogeneity or their physiological response un- in Loisel et al., 2017). der varying environmental conditions. For the climate predic- tors, we computed seasonal means across the available years, 2.2 Data acquisition and preparation i.e. December–February (DJF), March–May (MAM), June– August (JJA), and September–November (SON). Given that The general process of data preparation, model training, these seasonal means are likely less important in tropical re- and evaluation is illustrated in Fig. 1. All training (regional gions, we did investigate using annual minimum and maxi- peatland and non-peatland mapping products) and predictor mum values in place of seasonal ones but did not see a sig- (peatland covariates) data were converted from their native nificant impact on predicted peatland fractional cover. format (commonly GeoTiff rasters or vector-based GIS for- Soil predictors were obtained from the 250 m resolution mats such as shapefiles) to netCDF format and remapped OpenLandMap (Hengl, 2018) including soil bulk density onto a common 5 arcmin (ca. 0.0833◦, corresponding to (kg m−3), clay content (%), sand content (%), organic carbon 9.26 km at the Equator and 4.63 km at 60◦ N) grid using cli- content (%), and soil water content at field capacity (33 kPa). mate data operators (CDO; Schulzweida, 2020), a geospatial These soil variables are derived from an ensemble of ma- data abstraction software library (GDAL/OGR; GDAL/OGR chine learning algorithms trained on a global compilation contributors, 2021), and/or netCDF Operators (NCO) (Zen- of soil profiles (Hengl and MacMillan, 2019). We used the der, 2008). The original resolutions of the data sources are 30 cm depth estimate for all soil variables. each listed below. All ML runs and evaluations were per- Terrain information is provided by the 250 m resolution formed on the 5 arcmin grid. version of Geomorpho90m (Amatulli et al., 2020) for 17 dif- ferent geomorphometry variables describing numerous as- 2.2.1 Predictors (peatland covariates) pects of the land surface (see Table 1). This geomorphology dataset has an original resolution of 90 m, the same resolu- We used a suite of predictors that fell into four main types: tion as the Multi-Error-Removed Improved Terrain (MERIT) climate, soils, vegetation, and terrain (geomorphology). Ta- DEM (Yamazaki et al., 2017) from which it was derived. ble 1 lists each predictor grouped by predictor source and MERIT-DEM is a merged and error-corrected product based type. The climate, vegetation, and soil predictors were ex- on the ALOS World 3D – 30 m (AW3D30) and Shuttle Radar tracted from the Google Earth Engine data catalogue (Gore- Topography Mission (SRTM3) datasets. lick et al., 2017). The geomorphological dataset was down- Information about the vegetation state was provided by loaded directly from its authors’ website (Amatulli et al., several datasets. Shimada et al. (2014) created a seamless 2020, last access: 16 January 2020). Sampling across the dif- global mosaic image from the Phased Array type L-band ferent years provided by each dataset is assumed to be rela- Synthetic Aperture Radar (PALSAR/PALSAR2). This image tively unimportant as peatland extent is not highly dynamic was created with 25 m grid cells on an annual timescale. In across decadal timescales, especially considering the scale creating the mosaic, at each location within a year the im- of our grid cells (Loisel et al., 2013). An additional predic- ages chosen were those showing minimum response to sur- tor was the calculated length of the longest day of the year face moisture. The images were then ortho-rectified, slope (hours) for each cell on the 5 arcmin grid. The longest day corrected, and had a destriping procedure to equalize differ- of the year could be used by the model to determine tropical ences between strips that could occur due to conditions at versus extratropical regions. time of acquisition. As the dataset’s intended purpose was to https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022 4712 Joe R. Melton et al.: Machine-learning-based peatland extent Figure 1. Flow chart of the machine learning procedure. provide a global mask of forest cover (Shimada et al., 2014) able quality (according to the dataset’s quality flags) were soil moisture differences were purposefully minimized, and excluded. Given the original data do not have composite thus this dataset is likely of more limited use to predict peat- monthly values, the mean, minimum (min), maximum (max), land extent than would otherwise be expected for an L-band and standard deviation (SD) were all calculated based upon radar product (Izumi et al., 2019; Touzi et al., 2018). How- all values within a year and then the average was taken across ever, likely due to the significant computational effort re- all years. quired to produce a global L-band product, we are not aware Vegetation phenology information is provided by the of another product publicly available. MCD12Q2 V6 Land Cover Dynamics product (Friedl et al., The MODIS Terra net primary productivity product 2019). The MODIS vegetation phenology product provides (MOD17A3.055 NPP) is available annually on a 1 km phenological information such as the dates of green-up, peak, grid (Running et al., 2011). This version of MODIS NPP and senescence along with variables related to the range and (v. 5.5) is corrected for issues relating to cloud-contaminated summation of the EVI (see Table 1). Since this is an annual MODIS leaf area index fraction of photosynthetically active product the mean, min, max, and SD values are calculated radiation (LAI-FPAR) inputs to the MOD17 algorithm. We across all years. averaged the data over the available 2000–2015 period. We also considered the global surface water (GSW) Vegetation indices are provided by the Suomi National dataset of Pekel et al. (2016) but did not include it as a predic- Polar-Orbiting Partnership (S-NPP) NASA Visible Infrared tor dataset. We found this dataset to be unsuitable for peat- Imaging Radiometer Suite (VIIRS) product VNP13A1, land prediction due to its reliance on Landsat imagery. Treed which is generated by selecting the best pixel at 500 m res- peatlands, peatlands smaller than 30 m by 30 m, and peat- olution over a 16 d acquisition period. The VIIRS data are lands where the water table is below the peat surface, such as generated for three vegetation indices including the normal- bogs, would not be well captured by GSW. A visual inspec- ized difference vegetation index (NDVI), which uses both red tion of GSW over some of our training regions (see Sect. 2.3) and near-infrared (NIR) bands, and two enhanced vegetation showed poor correlation between GSW water presence and indices (EVI, EVI2), which also include the blue band with mapped peatland area. EVI2 designed for intercomparison with other EVI products that do not use a blue band (Table 1). EVI is more sensitive 2.3 Training data to canopy cover, while NDVI is more sensitive to chlorophyll (Huete et al., 2002). All snow, cloud, or cloud shadow pix- For training and testing the machine learning model, peatland els and any pixels that were not excellent, good, or accept- fractional cover was selected as the target variable. However, accurate estimates of peatland fractional cover are not widely Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022 Joe R. Melton et al.: Machine-learning-based peatland extent 4713 Table 1. Potential peatland co-variates used as predictor variables for the ML algorithms to predict peatland fractional cover. The treatment of variables is discussed in Sect. 2.2.1. The predictor variables in bold were selected for the final model (see Sect. 2.4.3). Type Source and resolution (time period) Predictor Climate TerraClimate (Abatzoglou et al., 2018) Actual evapotranspirationa, climate water deficita, soil watera, 1/24◦ (1985–2015) potential evapotranspiration (Penman–Monteith), precipitation accumulated, downward surface shortwave radiation, snow water equivalenta, runoffa, Palmer Drought Severity Index (PDSI), minimum temperature, maximum temperature, vapour pressure, vapour pressure deficit, 10 m wind speed Soils Open Land Maps (Hengl, 2018) Soil bulk density, clay content, sand content, soil water content, 250 m (–) at field capacity (33 kPa), organic carbon content Terrain Geomorpho90m (Amatulli et al., 2020) Slope, aspect, eastness, northness, convergence indexb, 250 m (–) compound topographic indexc, stream power indexd, first and second directional derivatives (east–west, north–south), profile curvaturee, tangential curvaturef, elevation standard deviation, geomorphology landformg, roughness indices, topographic position index, maximum elevation deviation Vegetation PALSAR/PALSAR2 (Shimada et al., 2014) HHh and HVi polarization backscattering coefficient 25 m (2007–2010) MOD17A3 V055 (Running et al., 2011) Net primary productivity 1 km (2000–2015) S-NPP VIIRS vegetation indices Enhanced vegetation index (EVI)j, EVI2k, near-infrared (VNP13A1) (Didan and Barreto, 2018) radiation (NIR), shortwave infrared radiation reflectance (SWIR1l), 500 m (2012–2019) SWIR2m, SWIR3n, normalized difference vegetation index (NDVI), NIR reflectanceo, green reflectancep, blue reflectanceq, red reflectancer MODIS Global Vegetation Phenology Dormancy, EVI_Amplitude, EVI_Area, EVI_Minimum, (MCD12Q2 V6 Land Cover Dynamics) Greenup, Maturity, MidGreendown, MidGreenup, (Friedl et al., 2019) 500 m (2001–2018) Peak, Senescence Geographic Calculated Length of the longest day of the year in hours a Derived using a one-dimensional soil water balance model. b Ranges between 100 for sinks (convergent areas) and −100 for ridges (divergent areas). Flat areas are 0. c Also known as topographic wetness index (Beven and Kirkby, 1979). d Defined as the product of the tangent of the local slope angle and the upstream catchment area. e Measures the rate of change of a slope along a flow line. Convex slopes accelerate water flowing along them while concave slopes decelerate flow. f Measures the perpendicular rate of change to the slope gradient. This captures the convergence (concave curvature) and divergence (convex curvature) of flow across a surface. g For example, flat, spur, valley, calculated using morphometry techniques based on pattern recognition. h Horizontal transmit and horizontal receive. i Horizontal transmit and vertical receive. j Three-band enhanced vegetation index. k Two-band EVI using only red and NIR band. l 1230–1250 nm. m 1580–1640 nm. n 2225–2275 nm. o 846–885 nm. p 545–656 nm. q 478–498 nm. r 600–680 nm. available, as discussed in Sect. 1. Recently, Minasny et al. tent (tens of thousands of square kilometres, but we allow (2019) reviewed the present state of peatland mapping. They smaller mapping products if they are located in highly under- found 90 recent studies mapping peatlands, with many de- represented regions), that have attempted to validate their lineating peatland extents using ecological and environmen- peatland extents, and which are readily available in digital tal field studies in combination with land cover from remote formats. We have acquired peatland extent estimates for 14 sensing; however, the studies seldom conduct validation of major regions (Fig. 2) including Canada, the taiga zone of the their mapping, and uncertainty estimates are rare (e.g. Ta- West Siberian Lowlands (WSL), Scotland, the Netherlands, ble 4 in Minasny et al., 2019). Additionally, the definition the St. Petersburg region of Russia, New Zealand, Tasmania, of peat can vary between countries and studies (e.g. Table 2 the Cuvette Centrale in the Congo, Indonesia, the Pastaza- in Minasny et al., 2019), making assembling an internally Marañón foreland basin (PMFB) in northeastern Peru, and consistent global dataset of peatland extents challenging. In the peatlands along the Peruvian Rio Madre de Dios, along selecting peatland extent estimates for our training data, we with some peatland-free regions. have chosen studies that are of sufficiently large spatial ex- https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022 4714 Joe R. Melton et al.: Machine-learning-based peatland extent > 60 cm depth (Geological Survey of Finland, 2018). The dataset was created through air photo interpretation and field mapping with the smallest polygon size about 6 ha. A database for the peatlands of Scotland was recently pub- lished by Aitkenhead and Coull (2019). Peatland cover was determined using back-propagation neural networks trained with peatland survey, climate, topography, Landsat imagery, geologic, and land cover data. Aitkenhead and Coull (2019) reports an r2 of 0.67 for peat depth, which we used to de- termine peatland fractional cover. Peatlands were assumed to have > 30 cm peat, and pixels with peat deeper than that were assigned 100 % peatland cover and 0 % elsewhere. The Derived Irish Peat Map version 2 (DIPMv2) (Con- Figure 2. Training data for the LightGBM algorithm. Areas in white nolly and Holden, 2009) was compiled from the land cover have no data. The green letters denote the blocks used for the cross- and soil maps of Ireland using a rules-based decision tree validation scheme. The training block limits were chosen as de- methodology. Connolly and Holden (2009) estimate the scribed in Sect. 2.4.2. overall accuracy of DIPMv2 to be 85 %. From the DIPMv2, we included raised bogs, low-level Atlantic blanket bogs, and high-level montane blanket bogs in producing a peatland Peatland coverage data for Canada, which has ca. 13 % of cover map. the land surface covered with peatlands, comes from Ducks Wageningen Environmental Research recently updated the Unlimited Canada (hereafter DUC; Smith et al., 2007) and Soil Map of the Netherlands (1 : 50000 scale) including peat The Peatlands of Canada database (Tarnocai et al., 2011). depth using a combination of boreholes and ordinary krig- Both datasets defined peatlands as wetlands (bogs, fens, ing (Brouwer et al., 2018; Brouwer and Walvoort, 2019). For swamps, or marshes) with massive deposits of peat at least each region, a number of boreholes were not used in cali- 40 cm thick, as is the convention in Canada. The Peatlands bration of the kriging model (roughly 10 %) and retained for of Canada database was primarily derived from soil surveys evaluation. Based on evaluation against the validation bore- and air photo interpretation. Shapefiles were available with hole subset, the average peat depth error varied between re- information on bog, fen, and bog–fen features with ≥ 1 % gions but was commonly between 10 and 20 cm. We used the peat coverage (Tarnocai et al., 2011). The DUC dataset cov- peat depth to delineate peatland area based on a threshold of ers the 74.1× 104 km2 Boreal Plains region and was derived 30 cm where thicknesses greater than that were assumed to from a satellite-based remote sensing classification system be 100 % peatland and 0 % elsewhere. validated by 5034 field sites (Smith et al., 2007). Draper et al. (2014) mapped peatlands for a region of The peatlands of the taiga zone of the West Siberian Low- Amazonia in northwestern Peru (the Pastaza-Marañón fore- lands (WSL) is estimated by Terentieva et al. (2016) to be land basin; PMFB). A support vector machine (SVM) clas- 52.4× 104 km2, or 4 %–12 % of the global wetland area. To sifier was trained with Landsat, ALOS/PALSAR, and Shut- conduct this mapping, Terentieva and co-workers used a su- tle Radar Topography Mission (SRTM) elevation data. Along pervised classification scheme for Landsat imagery that was with forest census plots and peat thickness measurements, a trained on field data and high-resolution images from 28 test supervised classification method was used to train the SVM sites. They estimate their accuracy at 79 % based on 1082 and determine the distribution of peatland vegetation types, 10× 10 pixel size validation polygons. as well as above- and below-ground carbon stocks. The three The St. Petersburg region of Russia was mapped by Pflug- peat-forming vegetation types were pole forest, palm swamp, macher et al. (2007) using MODIS Nadir bidirectional re- and open peatlands. flectance distribution function adjusted reflectance (NBAR). The Cuvette Centrale is located in the central Congo basin. The MODIS-NBAR reflectances were combined with empir- Dargie et al. (2017) used a digital elevation model (DEM) ical regression models to determine sub-pixel peatland cover- to remove steep slopes and high ground, optical data (Land- age. To fit the models, Pflugmacher et al. (2007) drew upon sat Enhanced Thematic Mapper, ETM+) to identify probable forest inventory data for observed peatland fractional cover swamp vegetation, which we used as a proxy for peatland over 1105 MODIS pixels with half used for model fitting fractional coverage, and radar backscatter (L-band synthetic and half for validation. Error analysis showed good predic- aperture radar; ALOS PALSAR) to identify surface water un- tion capability with correlation with observations of r = 0.92 der forest cover. Together these approaches were used to pro- for unmined peatlands. Pflugmacher et al. (2007) found the duce a maximum likelihood tree. They then conducted nine region to have approximately 10 % peatland cover. transects of length 2.5 to 20 km to ground truth the data. Most The Finnish Geologic Survey superficial deposits 1 : peatlands in this region are located within large interfluvial 200000 map displays peat deposits at 0–30, 30–60, and basins and are largely rain-fed and ombrotrophic. The areal Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022 Joe R. Melton et al.: Machine-learning-based peatland extent 4715 extent of peat in the Cuvette Centrale was estimated to be gions in Sect. 3.3. A further region of zero peatland extent 14.6× 104 km2 (Dargie et al., 2017). was defined according to a map of soil organic carbon for Indonesian peatlands have been mapped by Wetlands In- the Casanare flooded savannas of Colombia (Martín-López ternational in a series of publications (Wetlands Interna- et al., 2022) and expert opinion based upon field observa- tional, 2003, 2004, 2006). The maps have been derived from tions. We also set peatland area to zero for any pixels that are regional-scale maps and project reports, soil maps, Land- ice covered as shown in the Global Land Ice Measurements sat imagery, and ground truthing. This dataset uses a 30 cm from Space (GLIMS) dataset (GLIMS and NSIDC, 2018). threshold of peat thickness to delineate peatlands. National maps of New Zealand peatlands were derived 2.4 Machine learning approach from the Fundamental Soil Layers (FSL) soil maps published at 1 : 50000 scale by the New Zealand Land Resource Inven- 2.4.1 LightGBM and hyperparameter optimization tory (NZLRI; Landcare Research NZ Ltd, 2000). The poly- gons in the FSL maps were manually created from aerial The statistical modelling was conducted in the Python pro- photograph analysis with ground truthing. Peatlands were se- gramming language (v. 3.8.3). We use a gradient boosting lected by choosing the organic soils class. decision tree (GBDT) algorithm called LightGBM (Ke et al., Organic soil and peat mapping was undertaken by the De- 2017). Decision tree algorithms make iterative splits to par- partment of Natural Resources and Environment, Tasmania, tition data according to different criteria. The decision tree to provide decision support for fire management and sup- will split each node at the feature with the largest information pression activities in the Tasmanian Wilderness World Her- gain, i.e. the most informative. For GBDTs, the information itage Area (Kidd et al., 2021). A DSM approach was used gain is usually measured by the variance after splitting. To to predict organic soil and peat areas using new and exist- avoid issues with overfitting of a decision tree, GBDT algo- ing soil site data, intersected with a range of environmen- rithms use the boosting technique, which combines multiple tal predictor datasets, which included vegetation mapping, decision trees in series to achieve better predictive power as legacy soil mapping, wetlands, digital elevation models, ter- each tree in the series attempts to minimize the errors in the rain derivatives, remote sensing (multispectral green or bare previous tree. The error minimization steps occur through a areas, gamma radiometrics, Sentinel RADAR), and climate. form of gradient descent in function space where each tree is A binary “presence–absence” calibration set of site data was trained on a residual vector that measures the magnitude and used to create a digital map index (0–1). Modelling was un- direction of the true target relative to the previous tree (loss dertaken using regression trees with 10-fold cross-validation, function), which successive iterations minimize. where spatial output values closer to “1” were deemed to be meeting the environmental conditions conducive to peat for- 2.4.2 Cross-validation approach mation. The organic soil extent modelling R2 calibration and validation values were 0.77 and 0.70, respectively. Map vali- To provide estimates of the error associated with the dation by expert review determined that spatial index values LightGBM predictions we adopted a blocked-leave-one-out > 0.75 were highly likely to be peat (or organic) soils (Kidd (BLOO) strategy, which is recommended for applications et al., 2021). where the predictors could be expected to exhibit spatial au- Peatlands along the Rio Madre de Dios in Peru were tocorrelation (Roberts et al., 2017; Meyer et al., 2019; Ploton mapped by Householder et al. (2012) using Landsat imagery et al., 2020). BLOO tends to produce estimates of prediction and field observations. They identified 295 peatlands from error that are closer to the “true” error (Roberts et al., 2017), remote-sensing imagery covering 294 km2 and from 0.1 to particularly in cases where the sampling strategy is clustered 35.0 km2 in size. Field verification was performed at 35 peat- (Rocha et al., 2021). We chose to block our cross-validation lands giving over 800 georeference validation data points. (CV) regions based on longitudinal limits to allow both bo- To increase the number of cells for model training and also real and tropical peatlands to potentially be represented in improve representation of peatland-free landscapes, we in- each block. The optimal number of training blocks is an im- cluded polygons of ecoregions that should contain little to portant determination. Choosing blocks that are too small no peatlands from Olson et al. (2001), thus all areas in these risks incorrectly increasing our CV-determined model ac- ecoregions and biomes were considered to have zero peat- curacy due to spatial autocorrelation issues, while choosing land extent. The ecoregions chosen were the global distribu- overly large blocks will result in information loss and wors- tion of the Desert and Xeric Shrublands biome, excluding 15 ens our assessed model accuracy unduly. We determine the ecoregions that had a non-zero peatland extent within at least optimal number of blocks by comparing the length scale of one grid cell according to PEATMAP. This was to ensure we autocorrelation of the model residuals with our block sizes. take a conservative approach to the use of these non-peatland Figure A1 shows the autocorrelation tends to zero at a length masks. Two South American ecoregions (Beni Savanna and scale (sill) of around 500 km. To accommodate this we set a the Rio Negro Campinarana; Fig. 2) were also included as minimum block size of 10◦ of longitude (which corresponds peat-free regions. We discuss the inclusion of these ecore- to roughly 500 km at 65◦ latitude). Based on the constraints https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022 4716 Joe R. Melton et al.: Machine-learning-based peatland extent Table 2. Training data (regional peatland mapping products) for the machine learning model. Region Source Peatland determination technique Boreal Plains of Canada Ducks Unlimited Canada Satellite imagery with > 5000 site visits (Smith et al., 2007) Rest of Canada Tarnocai et al. (2011) Primarily from soil surveys and air photo interpretation West Siberian Lowlands Terentieva et al. (2016) Supervised classification of Landsat trained on field data (taiga zone) St. Petersburg region (Russia) Pflugmacher et al. (2007) Regression models from MODIS-NBAR reflectance Finland Geological Survey of Finland (2018) Field mapping and air photo interpretation Scotland Aitkenhead and Coull (2019) Neural networks trained with survey data and covariates Ireland Connolly and Holden (2009) Rules based decision tree with land cover and soil maps Netherlands Brouwer et al. (2018) Ordinary kriging with boreholes for calibration Brouwer and Walvoort (2019) and evaluation Amazonia* Draper et al. (2014) SVM supervised classification using elevation, optical, and radar remote-sensing data Congo basin Dargie et al. (2017) Combination of DEM, Landsat ETM+, and ALOS (Cuvette Centrale) PALSAR along with ground truthing transects Indonesia Wetlands International Collation of regional maps, soil surveys, Landsat (2003, 2004, 2006) imagery verified by ground truthing New Zealand Landcare Research NZ Ltd (2000) Collation of regional maps and soil surveys Tasmania Kidd et al. (2021) ML with terrain, vegetation mapping, and satellite spectra covariates including seasonal Sentinel-1 coverage Rio Madre de Dios (Peru) Householder et al. (2012) Landsat imagery with field mapping * Pastaza-Marañón foreland basin (PMFB) in northwestern Peru of our minimum block size and the need for a roughly even tion with cross-validation (RFECV) (Pedregosa et al., 2011), number of training cells in each block, we end up partition- which is a form of backward feature elimination. ing the globe into 14 blocks as shown in Fig. 2. The CV was Multicollinearity was accounted for by using the calcu- performed by holding out one block, training the LightGBM lated variance inflation factor (VIF) to identify and remove algorithm over the other blocks, and then using that trained highly correlated variables (Alin, 2010). VIF uses ordinary model to predict the peatland extent over the held-out block. least-squares regression to determine collinearity with the This was performed for each block in turn and the results score determined by averaged to give an estimate of the prediction error. 1 VIF= 2 , (1)(1−Rj ) 2.4.3 Predictor selection and model optimization where R2j is the multiple coefficient of determination for the From the potential peatland covariates listed in Table 1, and feature j on the other features (covariates) defined as the ratio discussed in Sect. 2.2.1, we processed 163 global peatland between the sum of squares due to the regression (SSR) and features that could be used by the machine learning model. the total sum of squares (SST), However, it is likely that many of these predictors will have SSR low predictive power and duplicate information provided by R2j = . (2)SST other predictors, leading to over-fitting by the ML algorithm (Dormann et al., 2013). To select only the most relevant fea- One approach would be to simply set a threshold VIF value tures we used both iterative feature removal based on the and remove all features with VIF values higher than this calculated multicollinearity and recursive feature elimina- threshold in a single step. However, in order to avoid the Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022 Joe R. Melton et al.: Machine-learning-based peatland extent 4717 elimination of potentially important features, we chose in- The third and fourth most important variables are soil organic stead to conduct the exclusion process iteratively. First, each carbon at 30 cm depth and shortwave infrared radiation re- feature had its VIF score calculated. Then all features with flectance at 2225–2275 nm (SWIR3). The remaining less im- a VIF value higher than 5 (corresponding to a R2j of 0.8) portant features (∼< 5 %) relate to climate (DJF snow water were ranked according to their information gain calculated equivalent, MAM vapour pressure, DJF shortwave radiation by LightGBM, and the feature with the lowest gain was re- and wind speed, SON runoff, and TerraClimate-derived DJF moved. The model was then retrained and the VIF value soil water). recalculated. If features remained that had a VIF above the Minasny et al. (2019) suggest the indicators of peat- threshold value, the same ranking and removal would occur land presence on a regional to global scale are climate until all remaining features had a VIF value below threshold. data (primarily temperature and precipitation); land use and This step retained 30 features (listed in Table A1). The VIF land cover information; and elevation, slope, and terrain value chosen is quite stringent, well below what Dormann attributes. Slope has also been used in several terrestrial et al. (2013) suggest as a critical value (10). ecosystem models as a means to predict wetland areas; We use RFECV with BLOO CV (using the same blocks as i.e. the flatter a region, the more likely water will stag- described in Sect. 2.4.2) in an iterative manner to ascertain nate, allowing wetland formation (e.g. Kaplan, 2002; Arora the optimal number of features. RFECV first trains the Light- et al., 2018). Interestingly, the top two predictors are impor- GBM algorithm on the original number of features (here 30) tant components of the Kaplan (2002) wetland determina- with the features ranked for their importance, based on infor- tion scheme. The geomorphological features appear to pro- mation gain, for the model’s root-mean-square error (RMSE) vide further information about the land surface characteris- as determined by the BLOO CV. The least important feature tics that can allow peatland formation distinct from that of is removed and the model is retrained using the new sub- slope alone. The importance of the SOC variables demon- set of features. By retraining the model after each feature is strates the close relationship between SOC and peat soils, held out, we avoid the issue of extrapolation that can occur in as has been exploited for peatland mapping in the past (e.g. permutation-based approaches (as described in Hooker et al., Hugelius et al., 2020). The importance of SWIR3 likely 2021). The algorithm can then produce an estimate of model reflects its utility in determining wet earth from dry earth skill as a function of the number of features trained (Fig. A2). and providing information about the vegetation water sta- The RFECV algorithm will choose an optimal number of fea- tus. SWIR3 is particularly useful as a feature as it can help tures based on the greatest model skill. Based on Fig. A2, 16 delineate fens, as otherwise the ML model lacks a predic- features (highlighted in Table 1) were selected as the optimal tor of groundwater contributions to surface water, as well as number to retain for the optimization process and the final peatlands from uplands in general, as SWIR reflectance is model. generally sensitive to soil moisture, soil type, and vegetation GBDT algorithms tend to require hyperparameter tuning leaf area index and water content (Wang et al., 2008; Tian to ensure the model is performing optimally. We employed and Philpot, 2015). Of the climate predictors, DJF SWE and Bayesian optimization on 11 LightGBM hyperparameters DJF shortwave radiation could have been used by the ML (Table A2) using the hyperopt package (Bergstra et al., 2015) model to distinguish boreal from tropical peatlands. Vapour over 500 trials. In each trial, the final 16 predictors identified pressure may also have some utility in determining peatlands in the steps above were used in the LightGBM model to op- due to the differing evapotranspiration response of peatlands timize the model’s calculated RMSE based upon the BLOO from upland forests (Helbig et al., 2020). In general, how- CV. The optimized parameters were then used to generate the ever, all the climate variables were of relatively small impor- Peat-ML map. tance, with roughly 5 % or less importance as measured by information gain. Figure 3 also shows the feature importance as found by 3 Results and discussion the BLOO CV for each block (whereby each block in the figure shows the feature importance ranking when that block 3.1 Predictor importance was not trained upon for the CV). Looking at feature impor- tance broken down in this manner reveals some remarkable The top 10 predictors based on information gain as deter- consistency in some predictors, e.g. relatively low impor- mined by the LightGBM algorithm are shown in Fig. 3. tance predictors (< 10 %) remain consistently less important. Based on the full LightGBM model runs (hereafter Peat- While other features have highly variable importance princi- ML), the most informative feature is the geomorphological pally slope, geomorphon, and SOC-30 cm. These three vari- landform (e.g. flat, spur, valley, peak), which is calculated ables can switch order of importance when trained to exclude using morphometry techniques based on pattern recognition certain training blocks during the BLOO CV. When trained (Amatulli et al., 2020). The next most important predictor is with all training data (full model; black diamonds in Fig. 3), terrain slope, defined as the rate of change in elevation along predictor importance is generally close to the middle of the the direction of the water flow line (Amatulli et al., 2020). range set by the blocks from the BLOO CV, excluding some https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022 4718 Joe R. Melton et al.: Machine-learning-based peatland extent Figure 3. Predictor importance based on percent information gain for the top 10 features as determined by the LightGBM algorithm. The feature ranking is shown for each of the blocks used during the BLOO CV (coloured dots; see Sect. 2.4.2). The feature importance from the full model simulation is shown by the black diamonds. SWIR3 is the shortwave infrared radiation reflectance for 2225–2275 nm, geomorphon is the geomorphological landform, SWE is the snow water equivalent, and SOC is soil organic carbon at 30 cm depth. See Table 1 and Sect. 2.2.1 for more details. of the more minor features such as SON runoff or DJF wind lion square kilometres), and an older estimate of Gorham speed. This demonstrates that, given there are only 14 blocks, (1991), but it is at the lower bound suggested by Loisel excluding training data as part of the BLOO CV can have et al. (2017). In the tropics, our model estimate is roughly relatively large consequences, especially as each peatland re- the same as PEATMAP but only a little over half of the ex- gion has its own particular characteristics as evidenced by tent estimated by Gumbricht et al. (2017). The Gumbricht the changing predictor importance. For example, the Cuvette et al. (2017) map was produced through a hybrid approach Central, western Amazonia, and tropical islands of Asia all that uses hydrological modelling, remote-sensing products, appear to differ significantly regarding characteristics such hydro-geomorphology from topographic data, and expert as- as peat depth, structure, carbon density, etc. (see Table 1 in sessment. It is only available across the tropics (maximum Dargie et al., 2017). 40◦ N). The spatial distribution of the predicted peatlands will now 3.2 Predicted peatland extents be examined in detail. We focus on regions that have either multiple other peatland mapping products for comparison or 3.2.1 Global contain large areas of predicted peatlands. Global peatland extent as predicted by Peat-ML is shown in 3.2.2 Boreal peatlands: Europe and Russia Fig. 4. When Peat-ML is compared to PEATMAP (Xu et al., 2018), many major peatlands regions appear similar includ- Figure 5 shows the peatland extent in the WSL, western Rus- ing Canada, the WSL, the Cuvette Centrale of the Congo, sia, and parts of Scandinavia for Peat-ML, its training data, and parts of Scandinavia. However, the two maps also differ PEATMAP, Hugelius et al. (2020), and the Boreal–Arctic substantially. The regions with the most notable difference Wetland and Lake Dataset (BAWLD) (Olefeldt et al., 2021). between the two products include Alaska, parts of Africa ex- The Hugelius et al. (2020) dataset is derived from the mean cluding the Congo, and eastern Siberia. There are more in- of two soil datasets and is only available for the Northern termediate peatland extents predicted by Peat-ML, whereas Hemisphere (> 23◦ N). The BAWLD product is derived from PEATMAP tends to show more regions of 100 % peatland expert assessment that is then extrapolated through the use of extent with less gradation between peatlands. Our estimated random forest models and geospatial datasets across the bo- global peatland extent at 4.04 106 km2× is similar to the real and Arctic regions. The original spatial resolution is rel- PEATMAP estimate of 4.23 6 2× 10 km (Table 3). atively coarse at 1◦ by 1◦. For the WSL region, all four prod- Our Northern Hemisphere (> 23◦ N) estimates of 3.0 mil- ucts are similar, with only slight differences in the peatland lion square kilometres is lower than the other available es- fractional cover (rather than its spatial distribution). Peat-ML timates including PEATMAP (3.2 million square kilome- shows strong similarity with its training data as would be ex- tres), the lower bound of Hugelius et al. (2020) (3.2 mil- pected. PEATMAP stands out compared to the other maps Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022 Joe R. Melton et al.: Machine-learning-based peatland extent 4719 Figure 4. Global peatland extent as estimated by Peat-ML along with PEATMAP (Xu et al., 2018). Table 3. Peatland extents as estimated by Peat-ML alongside other literature estimates. Region Source Peatland extent (km2) Global Peat-ML 4.04 6× 10 PEATMAP 4.23× 106 Northern Hemisphere (> 23◦ N) Peat-ML 3.00 106× Gorham (1991) a 3.46× 106 Loisel et al. (2017) b 3.0–3.5× 106 PEATMAP 3.19× 106 Hugelius et al. (2020) 3.7± 0.5 106× Tropics (23.5◦ S–23.5◦ N) Peat-ML 0.96 106× PEATMAP 0.94× 106 Gumbricht et al. (2017) 1.70× 106 Canadian Boreal Plains Peat-ML 185× 103 DUC 186 103× PEATMAPc 185× 103 Hugelius et al. (2020) 164 103× Webster et al. (2018) 269× 103 a Boreal and subarctic peatlands. b Suggested best estimate for modern peatland area. Includes a summary of other estimates which range between 2.4 and 4.0× 106 km2. c Here PEATMAP’s underlying data source is Tarnocai et al. (2011). https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022 4720 Joe R. Melton et al.: Machine-learning-based peatland extent Figure 5. Maps of eastern European and Russian peatlands including (a) training data used by the ML model; (b) Peat-ML-predicted peatlands; and the peatland coverage from (c) PEATMAP (Xu et al., 2018), (d) Hugelius et al. (2020), and (e) the Boreal–Arctic Wetland and Lake Dataset (BAWLD; Olefeldt et al., 2021). due to its almost binary peatland coverage showing either All maps show relatively similar distributions of peatlands high values or no peatlands with little gradation in between. surrounding the Baltic Sea (Fig. 5). None of the maps in- Compared to Hugelius et al. (2020), Peat-ML shows less dicate peatlands by the Caspian Sea as seen in PEATMAP, peatlands in the northern edge of the Northwestern region except some small extents (1 %–3 % predicted by Peat-ML) of Russia but more by the White Sea. Both PEATMAP and to the northwest of those depicted in PEATMAP. Peat-ML do not show peatlands near the mouth of the Kara As with Eastern Europe, Western Europe is similar in that River to the northwest of the terminus of the Ural Mountains, PEATMAP shows a more binary representation of the peat- as evident in Hugelius et al. (2020) and BAWLD, while Peat- land extent compared to the other maps (Fig. A5). Peat-ML ML and BAWLD show few peatlands on the Yamal Penin- and Hugelius et al. (2020) have fairly similar peatland dis- sula, where both PEATMAP and Hugelius et al. (2020) sug- tributions and extents. The main differences are expressed gest appreciable extents. Generally, Peat-ML has more simi- in small pockets of peatlands, e.g. eastern Spain has scat- larity to PEATMAP than Hugelius et al. (2020) and BAWLD tered peatlands in Hugelius et al. (2020) that are not found over the western Russian domain. in Peat-ML or PEATMAP, whereas in western Hungary both Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022 Joe R. Melton et al.: Machine-learning-based peatland extent 4721 Hugelius et al. (2020) and PEATMAP show small peatlands simulations with Peat-ML and maps from other modelling not predicted to be as extensive by Peat-ML. processes (e.g. Gumbricht et al., 2017), we noticed predic- tions for high peatland coverage in areas of South America 3.2.3 Boreal peatlands: Canada and Alaska where peat is not known to occur. This includes seasonally flooded savannas, such as the Llanos de Moxos (Beni Sa- For the northern contiguous USA, for Canada, and for vanna) and Llanos Orientales of Colombia and Venezuela. A Alaska, peatlands extents are shown in Fig. 6. Alaskan peat- recent field expedition searching for peat in the Colombian lands predicted by Peat-ML have some similarity to the Llanos failed to discover any peat deposits (Martín-López Hugelius et al. (2020) map and BAWLD, with extensive peat- et al., 2022), which could indicate that these tropical savanna lands in western Alaska (Lower Yukon region). These peat- biomes are generally not able to form extensive peat deposits. lands are not evident in PEATMAP, which shows less ex- Additionally, white sand ecosystems are not known to sup- tensive but high-coverage peatlands along the southern and port extensive peatlands, and thus we also excluded the Rio eastern edges of the state. Peat-ML, Hugelius et al. (2020), Negro Campinarana ecoregion that corresponds with white and BAWLD predict peatlands along the Alaska North Slope sandy soils (Spodosols/Podzols) and not Histosols. Without that are not evident with PEATMAP. Other reports suggest these negative data, we would likely overpredict peat extent extensive wetlands in Alaska (e.g. Glass, 1992), but we are in South America rather severely. not aware of any mapping product detailing peatland-specific Peat-ML predicts an extensive peatland in the PMFB and coverage. central Amazonia. The extent of peatlands in this region is For Peat-ML, the Canadian peatlands from Tarnocai et al. lower than in PEATMAP, mainly due the generally lower ex- (2011) and DUC (Smith et al., 2007) are used as training tent per grid cell, despite being in broadly similar regions. data, which naturally gives good correspondence between Both PEATMAP and Peat-ML show peatlands along the Fig. 6a and c. For a more informative comparison of the gen- northeastern coast of the continent. Peat-ML predicts smaller eral model skill for boreal peatland regions, Peat-ML predic- peatland extents (generally < 10 %–15 % coverage) in the tions from the BLOO CV simulation are also shown, as this Pantanal and along the Paraguay River as it joins the Paraná would give some indication of predictions without the bene- River down to the Rio de la Plata, which are not evident in fit of training upon a particular region’s peatlands (Fig. 6b). PEATMAP. Generally, all datasets shown in Fig. 6 display some strong There are some non-peatland river floodplains that Peat- similarities, with large peatlands shown for the Hudson’s Bay ML characterizes as peatlands, such as Colombia’s Rio Lowlands (HBL), the Mackenzie Delta, and across the Bo- Guaviare. This river may be too dynamic to allow ex- real Plains, yet important differences are also visible. Web- tensive peat formation due to relatively rapid meandering ster et al. (2018) shows little peatland along the southern edge that would scour away peat-forming depressions faster than of the Hudson’s Bay, perhaps due to their peatland determi- the organic matter can accumulate or else bury potential nation model’s emphasis on treed peatlands. Webster et al. peat with mineral sediments from the Andes (Junk, 1982). (2018) also show generally higher peatland coverage where Given the lack of an appropriate predictor for these hydro- peatlands are present than the other datasets. Hugelius et al. geomorphological processes operating on decadal to centen- (2020) predicts extensive but relatively low coverage across nial timescales, it is not surprising that Peat-ML may over- much of the Canadian eastern Arctic that is not found in any estimate peat extent in these ecosystems. Other areas, like of the other peatland maps. Of the five peatland maps, the Colombia’s Amazon catchment region, might be suscepti- most closely corresponding peatland extents appear to be be- ble to similar processes as these regions are suggested to tween PEATMAP, BAWLD, and Peat-ML. be floodplain forests in Ricaurte et al. (2017); however, their The northern USA has some peatlands around the Great map is based on the CORINE Land Cover data for Colombia Lakes evident in PEATMAP and Hugelius et al. (2020) (∼ (IDEAM, 2010). Other areas in Colombia where Peat-ML 10 %–60 %), which are also predicted but appear less exten- predicts peatlands include parts of the Orinoco catchment re- sive in Peat-ML (usually ∼ 1 %–15 %). Besides the cover- gion, where Ricaurte et al. (2017) shows flooded grassland age differences, the products have a similar spatial extent, savannas and riparian wetlands, and the Caribbean catchment although PEATMAP’s peatlands are more commonly higher region, where peatlands are indicated by CORINE with other coverage per identified peatland. wetland types. Given that the CORINE land cover product is based upon remote sensing with little ground truthing, it is 3.2.4 Tropical peatlands: South America and Central possible that several of these wetland regions shown in Ri- America caurte et al. (2017) are actually peat-forming regions, mak- ing it difficult to definitively evaluate Peat-ML against this South American peatlands are shown in Fig. 7. Peat-ML dataset. Besides the occasional small peatland area (e.g. in peatland training data for this region (Fig. 7a) are currently the Páramo of Ecuador; Hribljan et al., 2017), there are few limited, encompassing only Peru’s Pastaza-Marañón fore- sources of high-quality peatland mapping products for South land basin (PMFB) and the Rio Madre de Dios. In early America to evaluate Peat-ML against. While Peru has the https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022 4722 Joe R. Melton et al.: Machine-learning-based peatland extent Figure 6. Peatland extents for Canada, the northern contiguous USA, and Alaska for (a) Peat-ML, (b) Peat-ML from the BLOO CV, (c) the training data used for the ML model, and four other peatland extent products (d–f). PMFB mapped by Draper et al. (2014) and the Rio Madre Mexico, which is not evident in PEATMAP. A desk-based de Dios by Householder et al. (2012) and is proposed to have assessment of peatlands based upon cartographic approaches extensive peatlands by Gumbricht et al. (2017) and Peat-ML, with solicited expert assessment shows similar distributions there is presently no national peatland inventory (López Gon- of peatland extent, but with less peatlands in the Yucatán (Pe- zales et al., 2020). ters and Tegetmeyer, 2019). The Yucatán peninsula has rela- Peat-ML predicts more peatland extent than PEATMAP tively extensive marsh and mangrove coastal wetlands but is in Central America (Fig. A6). Much of the predicted peat- a karstic landscape with a highly permeable carbonate sub- lands are close to coastlines, in particular along the Atlantic strate (Adame et al., 2013) suggesting Peat-ML is overesti- coasts of Mexico, Nicaragua, Costa Rica, and Cuba. Peat-ML mating peat extent for the inland portions of the peninsula. places more extensive peatlands on the Yucatán Peninsula of Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022 Joe R. Melton et al.: Machine-learning-based peatland extent 4723 Figure 7. South American peatland extents. Panel (a) shows Peat-ML training data, panel (b) shows Peat-ML-predicted peatland coverage, and panel (c) shows PEATMAP (Xu et al., 2018), which is taken from Gumbricht et al. (2017) for this region. 3.2.5 Tropical peatlands: Africa and the Indonesian Papua New Guinea, Brunei, and Malaysia are entirely model- Archipelago predicted areas (Fig. A7). While Malaysian peatlands ap- pear similar between Peat-ML and PEATMAP, Papua New African peatlands (Fig. 8) are also poorly mapped, making it Guinea is quite different. PEATMAP shows extensive peat- difficult to evaluate the Peat-ML results. There are notable lands in the central mountainous region of the country, while differences between Peat-ML and PEATMAP. PEATMAP Peat-ML has the peatlands placed in the surrounding lowland shows very few peatlands outside the Congo’s Cuvette Cen- regions. There is some indication that the mountainous re- trale, whereas Peat-ML has relatively extensive peatlands in gions should have extensive peatlands (Hope, 2015). These South Sudan along the border of the Central African Re- peatland complexes appear to be sufficiently different from public and Chad. This is in general agreement with more the Peat-ML training data that the ML model is unable to qualitative African peatland extent estimates (Grundling and predict them. Grootjans, 2016) and demonstrates Peat-ML’s ability to rea- sonably determine peatland extents in regions where reliable 3.3 Model quality estimation spatially explicit mapping products are absent. Regardless, Peat-ML may still be underestimating African peatlands due Besides the qualitative discussion above, we estimated the to a lack of appropriate training data. An example is the quality of our predicted peatland extent through two differ- newly documented peatlands in the Okavango Delta (Geli- ent approaches. First, we compared our model results against nas, 2018), which have a dominantly herbaceous vegeta- the training data detailed in Sect. 2.3. For this analysis, we tion cover (sedges, papyrus, grasses), while our only training performed a BLOO CV as described in Sect. 2.4.2. Peat-ML dataset for Africa is a swamp forest (Cuvette Centrale). Fu- (CV) accuracy had an r 2 of 0.72, a mean bias error (MBE) of ture iterations of Peat-ML may profit from some active map- −0.29 %, and an RMSE of 9.11 % (Fig. 9b). The model per- ping campaigns presently underway in East Africa (Alexan- formance across each of the 12 training blocks can be seen in dra Barthalmes, personal communication, 2021) that could Figs. A3 and A4. While the mean r 2 across all training blocks provide much needed training data and thereby improve pre- was 0.72, it ranged from a low of 0.20 (predicting for block dictions for the peatland regions of Africa. Improving under- F in the BLOO CV in Fig. 2) to a high of 0.88 (block E). One standing of African peatland extents will likely remain chal- caveat of our error estimation presented here is that we are lenging; however, due to land use pressures that may com- computing it based upon the datasets used for model train- plicate peatland identification and mapping as suggested by ing. If these datasets themselves have errors or omissions, as Grundling and Grootjans (2016), African peatlands are heav- would be expected, then this will diminish the accuracy of ily utilized by rural populations that depend on the peatland’s our error estimation, as well as the quality of the ML model water and organic soils for crop cultivation. itself, since they form the benchmark that Peat-ML is com- While much of the Indonesian Archipelago contains train- pared against. ing data for the ML algorithm, the neighbouring states of https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022 4724 Joe R. Melton et al.: Machine-learning-based peatland extent be providing the algorithm only peatland-rich training data, leaving the model poorly trained for peatland-poor regions. Machine learning algorithms are best suited to interpolation problems (e.g. McCartney et al., 2020), and thus it is best to produce training data that give the full range of conditions under which the model is expected to produce predictions. Additionally, for the peatlands of South America, we found that we were overpredicting peatland extents as determined by expert opinion and field observation, primarily due to the paucity of high-quality peatland maps from the continent. As more high-quality peatland mapping products become avail- able from presently poorly mapped regions, the use of these ecoregions and biomes could be removed or reduced. The second approach to estimate the quality of our peat- land map focuses on the Boreal Plains (BP) region of Canada, where we have several peatland products for com- parison (Fig. A8). The DUC remote-sensing-based dataset for this region is uniquely well ground truthed, with over 5000 site visits over its 74.1× 104 km2 area. The DUC dataset has a peatland extent of 186× 103 km2 (Table 3) for the BP region, which is about the same as PEATMAP (whose underlying data source in this region is Tarnocai et al. (2011)). Peat-ML (CV) estimates 199 103× km2 (this is derived from the BLOO CV simulations to allow a more fair comparison; it is 185 103× km2 when estimated by the full model, i.e. Peat-ML) while Hugelius et al. (2020) es- timates 164 3 2× 10 km and Webster et al. (2018) estimates 269× 103 km2. We can estimate a confidence interval us- ing ±2× the Peat-ML (CV) RMSE, which gives a range of 140 103× to 234× 103 km2. This range suggests that the predicted extent is only significantly different between Peat- ML (CV) and Webster et al. (2018). Given its quality, we take the DUC dataset as our benchmark and use it to deter- mine the accuracy of Peat-ML and other products (Table 4). Figure 8. Peatland extent over central Africa. Panel (a) shows the As expected, Peat-ML compares well with the DUC dataset, ML training data, panel (b) shows the Peat-ML-predicted peatland as it is trained using that dataset. A more useful compari- extent, and panel (c) shows the PEATMAP extent from Xu et al. son is with Peat-ML (CV), where the model is not trained (2018). with the DUC dataset. Peat-ML (CV) has the second low- est RMSE, mean bias, and explained variance scores after Tarnocai et al. (2011) in all instances (Table 4). For the DUC Peat-ML likely underestimates peatland coverage, as can region, the Peat-ML (CV) results indicate a higher predictive be seen from its negative MBE (also visible in the regres- performance than a peatland mapping product based on soil sion line shown in Fig. 9). We hypothesize that this low databases (Hugelius et al., 2020); another based on boosted bias may stem from the use of biomes and ecoregions to de- regression trees using forest structure maps, bioclimatic vari- note peatland-free areas. It is possible, since these regions ables, and surface slopes (Webster et al., 2018); and one are fairly coarsely defined, that we may be inadvertently as- based upon ML models informed by expert assessment, al- signing small-scale, niche peatland areas as non-peatlands though BAWLD has the lowest spatial resolution, which may (although we take measures to avoid this; see Sect. 2.3). If have impeded its performance against the high-resolution that is the case, we would be training the model to miss the DUC dataset. Peat-ML (CV) is, however, outperformed by characteristics of these more niche peatland environments a more traditional and labour-intensive product based on air- and biassing our results. We use the ecoregions and biomes photo interpretation and soil surveys (Tarnocai et al., 2011), from Olson et al. (2001) to delineate these non-peatland re- although the performance difference is relatively small (e.g. gions to counter the fact that high-quality peatland datasets RMSE difference of 0.39 %). This indicates that our model, are typically created only for peatland-rich regions. With- for this region at least, is of similar or higher quality com- out inclusion of this peatland-poor training data, we would Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022 Joe R. Melton et al.: Machine-learning-based peatland extent 4725 Figure 9. Scatterplots of Peat-ML-predicted peatland cover versus actual peatland cover (from the datasets listed in Table 2) for the full model (a) and as determined by the BLOO CV (b). pared to other peatland mapping products available from a between the vegetation-based predictor and peatland extent. diverse range of methodologies. An additional challenge is the importance of seasonality of covariates (e.g. climate, vegetation indices) that differ signif- 3.4 Limitations of our approach icantly between the tropics and high latitudes based on their local dynamics. This may be addressed in future versions of The purpose of our study is to produce a map of peatland Peat-ML by training separate models for both regions along- distribution for use as an input geophysical field for ESMs side predictors tailored to the dynamics of each region, al- with integrated peatland models. It is tempting to ask whether though that depends on a greatly increasing availability of our technique can give any insights into peat formation or tropical training datasets to ensure well-trained models. the conditions necessary for a peatland to develop and per- In addition, as discussed in Sect. 3.3, it would be benefi- sist. While our approach is not prescriptive like Hugelius cial to include mapping products for regions where peatlands et al. (2020), where peatlands are defined based upon the are relatively sparse. As our peatland sampling strategy was soil carbon at a location, it is challenging to derive causal determined by the availability of high-quality peatland maps, information from our simulations. Many of the top features we were not able to choose systematic (Rocha et al., 2021) determined by the LightGBM algorithm (Fig. 3) are related or feature-based sampling strategies that could be more opti- to geomorphological characteristics, soil carbon, vegetation mal for peatland prediction. Our approach would also benefit and soil water status, and climate. However, peatlands them- from greater availability of processed, global-scale products selves will alter the environment they form within (e.g. fill in that should be sensitive to water status below the peat surface depressions with peat or alter the hydrologic balance for the like L-band synthetic aperture radar (e.g. Touzi et al., 2013). vegetation), and thus it is difficult to differentiate cause from effect. A weakness of our approach lies in the availability of train- 4 Conclusions ing data. Our training data for peatland distribution are gen- erally biased towards the high latitudes. While we have good We present a new global peatland fractional coverage map, coverage of peatland presence in Canada and western Siberia Peat-ML, at a scale of 5 arcmin resolution. Peat-ML was (see Sect. 2.2), we presently lack extensive high-quality peat- generated using machine learning techniques drawing upon land distribution maps for much of the Southern Hemisphere drivers of peatland formation that include spatially dis- and tropics. However, we expect new products to become tributed climate, soil, geomorphology, and vegetation data. available over time (e.g. Anda et al., 2021; Bourgeau-Chavez The ML model was trained using maps of peatland frac- et al., 2021). As one of our main predictors is sensitive to tional coverage for 14 relatively extensive regions along with vegetation (SWIR3), there is also the possibility that peat- masks of non-peatland areas. To evaluate Peat-ML, we qual- land types that are not represented in our training data (e.g. itatively compared it to other available peatland maps, and mangroves and marshes in the neotropics or papyrus marshes we also quantified model performance using two approaches. of Africa) will be poorly represented by the available train- The first approach is based on a blocked leave-one-out cross- ing data that the ML algorithm uses to derive a relationship validation strategy designed to minimize the influence of https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022 4726 Joe R. Melton et al.: Machine-learning-based peatland extent Table 4. Statistical comparison of peatland map products as evaluated against the DUC dataset (Smith et al., 2007). RMSE is the root-mean- 2 square error. The explained variance score (calculated as 1 σ (y−ŷ)− 2 , where y is the observations, ŷ is the prediction, and σ is the standardσ (y) deviation) has a best possible value of 1.0, with lower scores indicating worse performance. Mapping product RMSE (%) Mean bias (%) Explained variance score (–) Peat-ML 12.60 0.18 0.68 Peat-ML (CV) 17.50 −1.52 0.38 Hugelius et al. (2020) 18.00 2.61 0.35 PEATMAP* 17.11 −0.06 0.40 Webster et al. (2018) 23.25 −9.49 0.07 BAWLD Olefeldt et al. (2021) 22.24 −9.33 0.16 * Tarnocai et al. (2011) is the underlying data source for PEATMAP in the DUC domain spatial autocorrelation. Based upon that approach, Peat-ML has an average r2 of 0.73 with a root-mean-square error and mean bias error of 9.11 % and −0.36 %, respectively, when evaluated against our model training data. Our second model quality estimate was generated by comparing Peat- ML against a high-quality, extensively ground-truthed map for the 74.1 104 km2× Canadian Boreal Plains region. This comparison suggests Peat-ML is of comparable or higher quality than other presently available peatland mapping prod- ucts. Future versions of Peat-ML would benefit from further high-quality and ground-truthed datasets of peatland extent, especially in tropical regions. Appendix A Figure A1. Correlogram showing the spatial correlation be- tween model residuals as a function of distance computed using Moran’s I . Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022 Joe R. Melton et al.: Machine-learning-based peatland extent 4727 Figure A2. Cross-validation scores against the number of features selected by RFECV (see Sect. 2.4.3). Figure A3. Scatterplots of full model Peat-ML-predicted peatland extent and peatland extent from the peatland training datasets over the 14 BLOO CV blocks. https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022 4728 Joe R. Melton et al.: Machine-learning-based peatland extent Figure A4. Scatterplots of the CV trials for Peat-ML (Peat-ML CV)-predicted peatland extent and peatland extent from the peatland training datasets over the 14 BLOO CV blocks. Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022 Joe R. Melton et al.: Machine-learning-based peatland extent 4729 Figure A5. Maps of western European peatlands, including (a) training data used by the ML model; (b) Peat-ML-predicted peatlands; and the peatland coverage from (c) PEATMAP (Xu et al., 2018), (d) Hugelius et al. (2020), and (e) BAWLD (Olefeldt et al., 2021), whose domain only partly extends over the region displayed. https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022 4730 Joe R. Melton et al.: Machine-learning-based peatland extent Figure A6. Peatland extent over Central America. Panel (a) shows the ML training data, panel (b) shows the Peat-ML-predicted peatland extent, and panel (c) shows the PEATMAP extent from Xu et al. (2018). Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022 Joe R. Melton et al.: Machine-learning-based peatland extent 4731 Figure A7. Peatland extent over the Indonesian archipelago. Panel (a) shows the ML training data, panel (b) shows the Peat-ML-predicted peatland extent, and panel (c) shows the PEATMAP extent from Xu et al. (2018). https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022 4732 Joe R. Melton et al.: Machine-learning-based peatland extent Figure A8. Maps of peatland extent for the Boreal Plains of Canada. Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022 Joe R. Melton et al.: Machine-learning-based peatland extent 4733 Table A1. The 30 predictors selected using VIF with a threshold value of 5. The final 16 features that were further selected by the RFECV algorithm for use in the final model are listed in Table 1. See Sect. 2.2.1 for further discussion on the variable processing. Category Short name Variable Data source Climatea soil_DJF soil water TerraClimate (Abatzoglou et al., 2018) srad_DJF downward surface shortwave radiation swe_DJF snow water equivalent ws_DJF wind speed vap_MAM vapour pressure ro_SON runoff pdsi_SON Palmer Drought Severity Index Soils OLM_soil_organic_carbon_30cm organic carbon content Open Land Maps (Hengl, 2018) OLM_Soil_BulkDensity_30cm soil bulk density Vegetation Dormancy_1 dormancy MODIS (MCD12Q2 V6) Senescence_2 senescence (Friedl et al., 2019) EVI_Amplitude_2 enhanced vegetation index amplitude EVI_Area_1 sum of EVI1 from greenup to dormancy EVI_Area_2 sum of EVI2 from greenup to dormancy minMODIS_NPP minimum NPP MOD17A3 V055 (Running et al., 2011) SWIR3_reflectance_mean_SON shortwave infrared radiation reflectanceb S-NPP VIIRS (Didan and Barreto, 2018) Terrain spi stream power indexc Geomorpho90m geom geomorphon (Amatulli et al., 2020) slope slope tcurve tangential curvatured rough-scale scale of terrain roughness dev-magnitude, dev-scale maximum elevation deviation value dx first directional derivative (east-west)e dxy,dyy second directional derivativef convergence convergence indexg aspect-sine, aspect-cosine sine(cosine) of aspecth northness northnessi a The means of DJF, MAM ,JJA, and SON refer to the 3-month periods indicated by the first letter of each month, respectively. b 2225–2275 nm. c Product between the upstream catchment area and the tangent of the local slope angle. d Measures the rate of change perpendicular to the slope gradient and is related to the convergence and divergence of flow across a surface. e The rate of change of the elevation in a specific direction. f The rate of change of the slope in a predetermined direction. g Terrain variable that details the convergent areas as channels and divergent areas as ridges. It has a value of −100 for ridges, 0 for planar or flat areas, and up to 100 for sink areas. h Angular direction that a slope faces. i Calculated from sine of the slope multiplied by the cosine. Northness gives a continuous measure of the orientation combined with the slope. For the Northern Hemisphere, a northness approaching 1 gives a northern exposure on a vertical slope (that is a slope exposed to a very low amount of solar radiation), conversely a northness of −1 gives a very steep southern slope that would be highly exposed to solar radiation. https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022 4734 Joe R. Melton et al.: Machine-learning-based peatland extent Table A2. LightGBM hyperparameters that underwent Bayesian Review statement. This paper was edited by David Lawrence and optimization and their final optimized values. See Sect. 2.2.1 for reviewed by two anonymous referees. further discussion on the variable processing. See Pedregosa et al. (2011) documentation for further discussion about each hyperpa- rameter. Name Range Optimized value References boosting_type gbdt, dart, goss dart Abatzoglou, J. T., Dobrowski, S. Z., Parks, S. A., and Hegewisch, num_leaves 10–50 30 K. C.: TerraClimate, a high-resolution global dataset of monthly n_estimators 50–300 250 learning_rate 0.005–0.4 0.18817013045111064 climate and climatic water balance from 1958–2015, Sci. Data, max_bin 25–300 95 5, 170191, https://doi.org/10.1038/sdata.2017.191, 2018. max_depth 1–15 6 Adame, M. F., Kauffman, J. B., Medina, I., Gamboa, J. N., Tor-− subsample_for_bin 20 000–300 000 80 000 res, O., Caamal, J. P., Reza, M., and Herrera-Silveira, J. A.: min_child_samples 5–60 10 Carbon stocks of tropical coastal wetlands within the karstic reg_alpha 0–1 0.705705986914311 landscape of the Mexican Caribbean, PLoS One, 8, e56569, reg_lambda 0–1 0.9086692536858783 https://doi.org/10.1371/journal.pone.0056569, 2013. colsample_bytree 0.5–1.0 0.8251441062858274 Aitkenhead, M. J. and Coull, M. C.: Mapping soil pro- file depth, bulk density and carbon stock in Scotland us- ing remote sensing and spatial covariates, Eur. J. Soil Sci., https://doi.org/10.1111/ejss.12916, 2019. Alin, A.: Multicollinearity, Wiley Interdiscip. Rev. Comput. Stat., 2, 370–374, https://doi.org/10.1002/wics.84, 2010. Code and data availability. Python code for the statistical mod- Amatulli, G., McInerney, D., Sethi, T., Strobl, P., and Domisch, S.: elling is available at https://doi.org/10.5281/zenodo.6345309 Geomorpho90m, empirical evaluation and accuracy assessment (Melton et al., 2022). A netCDF format version of the Peat- of global high-resolution geomorphometric layers, Sci. Data, 7, ML dataset is available at https://doi.org/10.5281/zenodo.5794336 162, https://doi.org/10.1038/s41597-020-0479-6, 2020. (Melton et al., 2021). Anda, M., Ritung, S., Suryani, E., Sukarman, Hikmat, M., Yatno, E., Mulyani, A., Subandiono, R. E., Suratman, and Husnain: Revis- iting tropical peatlands in Indonesia: Semi-detailed mapping, ex- Author contributions. JRM conceptualized the study. EC, JRM, tent and depth distribution assessment, Geoderma, 402, 115235, and MF performed data curation, formal analysis, investigation, and https://doi.org/10.1016/j.geoderma.2021.115235, 2021. software and methodology development. JMML, RSW, and KM Arora, V. K., Melton, J. R., and Plummer, D.: An assessment of also contributed to methodology development. KM, JMML, HCQ, natural methane fluxes simulated by the CLASS-CTEM model, and DK provided resources. JRM and EC did the visualization. Biogeosciences, 15, 4683–4709, https://doi.org/10.5194/bg-15- Validation was done by RSW, JMML, HCQ, LVV, DK, EC, and 4683-2018, 2018. JRM. JRM wrote the original draft of the manuscript. All authors Bechtold, M., De Lannoy, G. J. M., Koster, R. D., Reichle, R. H., reviewed and edited the final manuscript. Mahanama, S. P., Bleuten, W., Bourgault, M. A., Brümmer, C., Burdun, I., Desai, A. R., Devito, K., Grünwald, T., Gry- goruk, M., Humphreys, E. R., Klatt, J., Kurbatova, J., Lo- hila, A., Munir, T. M., Nilsson, M. B., Price, J. S., Röhl, Competing interests. The contact author has declared that neither M., Schneider, A., and Tiemeyer, B.: PEAT–CLSM: A Spe- they nor their co-authors have any competing interests. cific Treatment of Peatland Hydrology in the NASA Catchment Land Surface Model, J. Adv. Model. Earth Sy., 11, 2130–2162, https://doi.org/10.1029/2018MS001574, 2019. Disclaimer. Publisher’s note: Copernicus Publications remains Bergstra, J., Komer, B., Eliasmith, C., Yamins, D., and Cox, neutral with regard to jurisdictional claims in published maps and D. D.: Hyperopt: a Python library for model selection and hy- institutional affiliations. perparameter optimization, Comput. Sci. Discov., 8, 014008, https://doi.org/10.1088/1749-4699/8/1/014008, 2015. Beven, K. J. and Kirkby, M. J.: A physically based, variable con- Acknowledgements. We acknowledge the efforts of Yuanqiao Wu tributing area model of basin hydrology, Hydrol. Sci. Bull., 24, and Diana Verseghy, who led an earlier effort to predict global peat- 43–69, https://doi.org/10.1080/02626667909491834, 1979. land extents using machine learning approaches. We thank Dirk Bohn, T. J., Melton, J. R., Ito, A., Kleinen, T., Spahni, R., Stocker, Flugmacher, Matt Aitkenhead, Fokke Brouwer, Freddie Draper, B. D., Zhang, B., Zhu, X., Schroeder, R., Glagolev, M. V., Greta Dargie, and Rudiyanto for sharing their peatland mapping Maksyutov, S., Brovkin, V., Chen, G., Denisov, S. N., Eliseev, products. We also thank Camila Delgado-Montes for processing the A. V., Gallego-Sala, A., McDonald, K. C., Rawlins, M. A., Ri- Rio Madre de Dios data. We have adopted the colour bar scheme ley, W. J., Subin, Z. M., Tian, H., Zhuang, Q., and Kaplan, J. O.: from Hugelius et al. (2020) for our peatland extent plots. Lastly, we WETCHIMP-WSL: intercomparison of wetland methane emis- thank Michel Bechtold for comments about an earlier study that we sions models over West Siberia, Biogeosciences, 12, 3321–3349, used to improve the design of this study. https://doi.org/10.5194/bg-12-3321-2015, 2015. Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022 Joe R. Melton et al.: Machine-learning-based peatland extent 4735 Bourgeau-Chavez, L. L., Grelik, S. L., Battaglia, M. J., Leisman, tial analysis for everyone, Remote Sens. Environ., 202, 18–27, D. J., Chimner, R. A., Hribljan, J. A., Lilleskov, E. A., Draper, https://doi.org/10.1016/j.rse.2017.06.031, 2017. F. C., Zutta, B. R., Hergoualc’h, K., Bhomia, R. K., and Läh- Gorham, E.: Northern Peatlands: Role in the Carbon Cycle and teenoja, O.: Advances in Amazonian Peatland Discrimination Probable Responses to Climatic Warming, Ecol. Appl., 1, 182– With Multi-Temporal PALSAR Refines Estimates of Peatland 195, https://doi.org/10.2307/1941811, 1991. Distribution, C Stocks and Deforestation, Front. Earth Sci. Chin., Grundling, P. and Grootjans, A. P.: Peatlands of Africa, in: The 9, 1019, https://doi.org/10.3389/feart.2021.676748, 2021. Wetland Book: II: Distribution, Description and Conservation, Brouwer, F. and Walvoort, D. J. J.: Basisregistratie Ondergrond edited by: Finlayson, C. M., Milton, G. R., Prentice, R. C., (BRO) – actualisatie bodemkaart : Herkartering van de bodem and Davidson, N. C., Springer Netherlands, Dordrecht, 1–10, in Eemland, Tech. Rep. 2352-2739, Wettelijke Onderzoekstaken https://doi.org/10.1007/978-94-007-6173-5_112-1, 2016. Natuur & Milieu, Wageningen, 2019. Gumbricht, T., Roman-Cuesta, R. M., Verchot, L., Herold, M., Brouwer, F., Vries, F. D., and Walvoort, D. J. J.: Basisregistratie Wittmann, F., Householder, E., Herold, N., and Murdiyarso, D.: Ondergrond (BRO) actualisatie bodemkaart : Herkartering van An expert system model for mapping tropical wetlands and peat- de bodem in Flevoland, Tech. Rep. 2352-2739, Wettelijke On- lands reveals South America as the largest contributor, Glob. derzoekstaken Natuur & Milieu, Wageningen, 2018. Chang. Biol., 23, 3581–3599, https://doi.org/10.1111/gcb.13689, Connolly, J. and Holden, N. M.: Mapping peat soils in Ireland: 2017. updating the derived Irish peat map, Ir. Geogr., 42, 343–352, Harris, I., Osborn, T. J., Jones, P., and Lister, D.: Version 4 of the https://doi.org/10.1080/00750770903407989, 2009. CRU TS monthly high-resolution gridded multivariate climate Dargie, G. C., Lewis, S. L., Lawson, I. T., Mitchard, E. T. A., Page, dataset, Sci. Data, 7, 109, https://doi.org/10.1038/s41597-020- S. E., Bocko, Y. E., and Ifo, S. A.: Age, extent and carbon storage 0453-3, 2020. of the central Congo Basin peatland complex, Nature, 542, 86– Helbig, M., Waddington, J. M., Alekseychik, P., Amiro, B. D., Au- 90, https://doi.org/10.1038/nature21048, 2017. rela, M., Barr, A. G., Black, T. A., Blanken, P. D., Carey, S. K., Didan, K. and Barreto, A.: VIIRS/NPP Vegetation In- Chen, J., Chi, J., Desai, A. R., Dunn, A., Euskirchen, E. S., Flana- dices 16-Day L3 Global 500m SIN Grid V001, USGS, gan, L. B., Forbrich, I., Friborg, T., Grelle, A., Harder, S., He- https://doi.org/10.5067/VIIRS/VNP13A1.001, 2018. liasz, M., Humphreys, E. R., Ikawa, H., Isabelle, P.-E., Iwata, Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, H., Jassal, R., Korkiakoski, M., Kurbatova, J., Kutzbach, L., Lin- G., Marquéz, J. R. G., Gruber, B., Lafourcade, B., Leitão, P. J., droth, A., Löfvenius, M. O., Lohila, A., Mammarella, I., Marsh, Münkemüller, T., McClean, C., Osborne, P. E., Reineking, B., P., Maximov, T., Melton, J. R., Moore, P. A., Nadeau, D. F., Schröder, B., Skidmore, A. K., Zurell, D., and Lautenbach, S.: Nicholls, E. M., Nilsson, M. B., Ohta, T., Peichl, M., Petrone, Collinearity: a review of methods to deal with it and a simula- R. M., Petrov, R., Prokushkin, A., Quinton, W. L., Reed, D. E., tion study evaluating their performance, Ecography, 36, 27–46, Roulet, N. T., Runkle, B. R. K., Sonnentag, O., Strachan, I. B., https://doi.org/10.1111/j.1600-0587.2012.07348.x, 2013. Taillardat, P., Tuittila, E.-S., Tuovinen, J.-P., Turner, J., Ueyama, Draper, F. C., Roucoux, K. H., Lawson, I. T., Mitchard, E. T. A., M., Varlagin, A., Wilmking, M., Wofsy, S. C., and Zyrianov, Coronado, E. N. H., Lähteenoja, O., Montenegro, L. T., San- V.: Increasing contribution of peatlands to boreal evapotranspi- doval, E. V., Zaráte, R., and Baker, T. R.: The distribution and ration in a warming climate, Nat. Clim. Chang., 10, 555–560, amount of carbon in the largest peatland complex in Amazo- https://doi.org/10.1038/s41558-020-0763-7, 2020. nia, Environ. Res. Lett., 9, 124017, https://doi.org/10.1088/1748- Hengl, T.: Soil property layers from openlandmap.org. All 9326/9/12/124017, 2014. data are available under the Open Data Commons Open Friedl, M., Gray, J., and Sulla-Menashe, D.: MCD12Q2 MOD- Database License (ODbL) and/or Creative Commons IS/Terra+Aqua Land Cover Dynamics Yearly L3 Global 500m Attribution-ShareAlike 4.0 International license (CC BY- SIN Grid V006, https://doi.org/10.5067/MODIS/MCD12Q2.006 SA), https://doi.org/10.5281/zenodo.2525663 (last access: (last access: 4 September 2020), 2019. 4 September 2020), 2018. GDAL/OGR contributors: GDAL/OGR Geospatial Data Abstrac- Hengl, T. and MacMillan, R. A.: Predictive Soil Mapping with R, tion software Library, Open Source Geospatial Foundation, https: Lulu.com, 2019. //gdal.org (last access: 28 December 2020), 2021. Hooker, G., Mentch, L., and Zhou, S.: Unrestricted Permutation Gelinas, N.: Into the Okavango, USA, https://www. forces Extrapolation: Variable Importance Requires at least One nationalgeographic.org/projects/okavango/ (last access: 11 Oc- More Model, or There Is No Free Variable Importance, arXiv: tober 2021), 2018. 1905.03151 (stat.ME), 2021. Geological Survey of Finland: Superficial deposits of Finland 1 : Hope, G. S.: Peat in the mountains of new guinea, Mires Peat, 15, 200000 (sediment polygons) v.10.1, 2018. 1–21, 2015. Glass, R. L.: Alaska Wetland Resources, Tech. Rep. 2425, U.S. Ge- Householder, J. E., Janovec, J. P., Tobler, M. W., Page, S., and Läh- ological Survey, Water-Supply Paper 2425, 1992. teenoja, O.: Peatlands of the Madre de Dios River of Peru: distri- GLIMS and NSIDC: Global Land Ice Measurements from Space bution, geomorphology, and habitat diversity, Wetlands, 32, 359– glacier database. Compiled and made available by the interna- 368, 2012. tional GLIMS community and the National Snow and Ice Data Hribljan, J. A., Suarez, E., Bourgeau-Chavez, L., Endres, S., Center, Boulder CO, USA, https://doi.org/10.7265/N5V98602 Lilleskov, E. A., Chimbolema, S., Wayson, C., Serocki, E., (last access: 4 March 2021), 2018. and Chimner, R. A.: Multidate, multisensor remote sensing Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., reveals high density of carbon-rich mountain peatlands in and Moore, R.: Google Earth Engine: Planetary-scale geospa- https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022 4736 Joe R. Melton et al.: Machine-learning-based peatland extent the páramo of Ecuador, Glob. Chang. Biol., 23, 5412–5425, of SOC stocks for the tropics, permafrost regions, wetlands, and https://doi.org/10.1111/gcb.13807, 2017. the world, SOIL, 1, 351–365, https://doi.org/10.5194/soil-1-351- Huete, A., Didan, K., Miura, T., Rodriguez, E. P., Gao, X., and Fer- 2015, 2015. reira, L. G.: Overview of the radiometric and biophysical perfor- Lähteenoja, O. and Roucoux, K.: Inception, history and develop- mance of the MODIS vegetation indices, Remote Sens. Environ., ment of peatlands in the Amazon Basin, PAGES News, 18, 27– 83, 195–213, https://doi.org/10.1016/S0034-4257(02)00096-2, 28, https://doi.org/10.22498/pages.18.1.27, 2010. 2002. Landcare Research NZ Ltd: Fundamental Soil Layer – New Zealand Hugelius, G., Loisel, J., Chadburn, S., Jackson, R. B., Jones, M., Soil Classification, https://doi.org/10.7931/L10T0 (last access: MacDonald, G., Marushchak, M., Olefeldt, D., Packalen, M., 4 January 2020), 2000. Siewert, M. B., Treat, C., Turetsky, M., Voigt, C., and Yu, Z.: Largeron, C., Krinner, G., Ciais, P., and Brutel-Vuilmet, C.: Imple- Large stocks of peatland carbon and nitrogen are vulnerable to menting northern peatlands in a global land surface model: de- permafrost thaw, P. Natl. Acad. Sci. USA, 117, 20438–20446, scription and evaluation in the ORCHIDEE high-latitude version https://doi.org/10.1073/pnas.1916387117, 2020. model (ORC-HL-PEAT), Geosci. Model Dev., 11, 3279–3297, IDEAM: Leyenda nacional de coberturas de la tierra: metodología https://doi.org/10.5194/gmd-11-3279-2018, 2018. CORINE Land Cover adaptada para Colombia: Escala 1 : Lehner, B. and Döll, P.: Development and validation of a global 100000, edited by: Martínez Ardila, N. J. and Murcia García, U. database of lakes, reservoirs and wetlands, J. Hydrol., 296, 1–22, G., Ministerio De Ambiente, Vivienda Y Desarrollo Territorial https://doi.org/10.1016/j.jhydrol.2004.03.028, 2004. Instituto De Hidrología, Meteorología Y Estudios Ambientales – Leifeld, J. and Menichetti, L.: The underappreciated potential of IDEAM, ISBN 978-958-806729-2, 2010. peatlands in global climate change mitigation strategies, Nat. Izumi, Y., Widodo, J., Kausarian, H., Demirci, S., Taka- Commun., 9, 1071, https://doi.org/10.1038/s41467-018-03406- hashi, A., Razi, P., Nasucha, M., Yang, H., and Tetuko 6, 2018. S. S., J.: Potential of soil moisture retrieval for trop- Limpens, J., Berendse, F., Blodau, C., Canadell, J. G., Freeman, ical peatlands in Indonesia using ALOS-2 L-band full- C., Holden, J., Roulet, N., Rydin, H., and Schaepman-Strub, polarimetric SAR data, Int. J. Remote Sens., 40, 5938–5956, G.: Peatlands and the carbon cycle: from local processes to https://doi.org/10.1080/01431161.2019.1584927, 2019. global implications – a synthesis, Biogeosciences, 5, 1475–1491, Jackson, R. B., Lajtha, K., Crow, S. E., Hugelius, G., Kramer, https://doi.org/10.5194/bg-5-1475-2008, 2008. M. G., and Piñeiro, G.: The Ecology of Soil Carbon: Pools, Vul- Loisel, J., Yu, Z., Parsekian, A., Nolan, J., and Slater, L.: nerabilities, and Biotic and Abiotic Controls, Annu. Rev. Ecol. Quantifying landscape morphology influence on peatland lat- Evol. S., 48, 419–445, https://doi.org/10.1146/annurev-ecolsys- eral expansion using ground-penetrating radar (GPR) and 112414-054234, 2017. peat core analysis, J. Geophys. Res.-Biogeo., 118, 373–384, Joosten, H. and Clarke, D.: Wise use of mires and peatlands, Inter- https://doi.org/10.1002/jgrg.20029, 2013. national Mire Conservation Group and International Peat Soci- Loisel, J., van Bellen, S., Pelletier, L., Talbot, J., Hugelius, G., Kar- ety, ISBN 951-97744-8-3, 304, 2002. ran, D., Yu, Z., Nichols, J., and Holmquist, J.: Insights and is- Junk, W. J.: Amazonian flood plains: their ecology, present and sues with estimating northern peatland carbon stocks and fluxes potential use, Revue d’Hydrobiologie Tropicale, 15, 285–301, since the Last Glacial Maximum, Earth-Sci. Rev., 165, 59–80, 1982. https://doi.org/10.1016/j.earscirev.2016.12.001, 2017. Kaplan, J. O.: Wetlands at the Last Glacial Maximum: Distribu- López Gonzales, M., Hergoualc’h, K., Angulo Núñez, Ó., Baker, tion and methane emissions, Geophys. Res. Lett., 29, 3-1–3-4, T., Chimner, R., del Águila Pasquel, J., del Castillo Torres, https://doi.org/10.1029/2001GL013366, 2002. D., Freitas Alvarado, L., Fuentealba Durand, B., García Gon- Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., zales, E., Honorio Coronado, E., Kazuyo, H., Lilleskov, E., and Liu, T.-Y.: LightGBM: A Highly Efficient Gradient Boost- Málaga Durán, N., Maldonado Fonkén, M., Martín Brañas, ing Decision Tree, in: Advances in Neural Information Process- M., Vargas, T. M., Planas Clarke, A. M., Roucoux, K., and ing Systems 30, edited by: Guyon, I., Luxburg, U. V., Bengio, Vacalla Ochoa, F.: What do we know about Peruvian peat- S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., lands?, Center for International Forestry Research (CIFOR), 3146–3154, Curran Associates, Inc., 2017. https://doi.org/10.17528/cifor/007848, 2020. Kidd, D., Moreton, R., and Brown, G.: Tasmanian Organic Soil Martín-López, J. M., Verchot, L., Martius, C., and da Silva, M.: Mapping Project, Methods Report. Nature Conservation Report Modeling the spatial distribution of soil organic carbon and 21/2, unpublished report, 2021. carbon stocks for the Casanare flooded Savannas, Colombia, Kobayashi, S., Ota, Y., Harada, Y., Ebita, A., Moriya, M., Onoda, EGU General Assembly 2022, Vienna, Austria, 23–27 May H., Onogi, K., Kamahori, H., Kobayashi, C., Endo, H., Miyaoka, 2022, EGU22-1840, https://doi.org/10.5194/egusphere-egu22- K., and Takahashi, K.: The JRA-55 Reanalysis: General Spec- 1840, 2022. ifications and Basic Characteristics, J. Meteorol. Soc. JPN, 93, Matthews, E.: Global data bases on distribution, characteristics 5–48, https://doi.org/10.2151/jmsj.2015-001, 2015. and methane emission of natural wetlands: Documentation of Krankina, O. N., Pflugmacher, D., Friedl, M., Cohen, W. B., Nel- archived data tape, NASA Goddard Space Flight Center, Green- son, P., and Baccini, A.: Meeting the challenge of mapping peat- belt, MD, USA, 1989. lands with remotely sensed data, Biogeosciences, 5, 1809–1820, McBratney, A. B., Mendonça Santos, M. L., and Minasny, https://doi.org/10.5194/bg-5-1809-2008, 2008. B.: On digital soil mapping, Geoderma, 117, 3–52, Köchy, M., Hiederer, R., and Freibauer, A.: Global distribution of https://doi.org/10.1016/S0016-7061(03)00223-4, 2003. soil organic carbon – Part 1: Masses and frequency distributions Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022 Joe R. Melton et al.: Machine-learning-based peatland extent 4737 McCartney, M., Haeringer, M., and Polifke, W.: Comparison of Ma- at the global scale, 199–2004, J. Geophys. Res.-Atmos., 115, chine Learning Algorithms in the Interpolation and Extrapolation https://doi.org/10.1029/2009JD012674, 2010. of Flame Describing Functions, J. Eng. Gas Turbines Power, 142, Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, 061009, https://doi.org/10.1115/1.4045516, 2020. B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, Melton, J. R., Wania, R., Hodson, E. L., Poulter, B., Ringeval, B., V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Per- Spahni, R., Bohn, T., Avis, C. A., Beerling, D. J., Chen, G., rot, M., and Duchesnay, E.: Scikit-learn: Machine Learning in Eliseev, A. V., Denisov, S. N., Hopcroft, P. O., Lettenmaier, D. Python, J. Mach. Learn. Res., 12, 2825–2830, 2011. P., Riley, W. J., Singarayer, J. S., Subin, Z. M., Tian, H., Zürcher, Pekel, J.-F., Cottam, A., Gorelick, N., and Belward, S., Brovkin, V., van Bodegom, P. M., Kleinen, T., Yu, Z. C., A. S.: High-resolution mapping of global surface wa- and Kaplan, J. O.: Present state of global wetland extent and ter and its long-term changes, Nature, 540, 418–422, wetland methane modelling: conclusions from a model inter- https://doi.org/10.1038/nature20584, 2016. comparison project (WETCHIMP), Biogeosciences, 10, 753– Peters, J. and Tegetmeyer, C.: Inventory of peatlands in the 788, https://doi.org/10.5194/bg-10-753-2013, 2013. Caribbean and first description of priority areas, Tech. rep., Pro- Melton, J. R., Chan, E., Millard, K., Fortier, M., Winton, ceedings of the Greifswald Mire Centre, 2019. R. S., Martín-López, J. M., Cadillo-Quiroz, H., Kidd, D., Pflugmacher, D., Krankina, O. N., and Cohen, W. B.: and Verchot, L. V.: A map of global peatland extent cre- Satellite-based peatland mapping: Potential of the ated using machine learning (Peat-ML), Zenodo [data set], MODIS sensor, Glob. Planet. Change, 56, 248–257, https://doi.org/10.5281/zenodo.5794336, 2021. https://doi.org/10.1016/j.gloplacha.2006.07.019, 2007. Melton, J. R., Chan, E., Millard, K., Fortier, M., Winton, R. S., Ploton, P., Mortier, F., Réjou-Méchain, M., Barbier, N., Picard, Martín-López, J. M., Cadillo-Quiroz, H., Kidd, D., and Ver- N., Rossi, V., Dormann, C., Cornu, G., Viennois, G., Bayol, chot, L. V.: Code for ‘A map of global peatland extent cre- N., Lyapustin, A., Gourlet-Fleury, S., and Pélissier, R.: Spa- ated using machine learning (Peat-ML)’ (0.9), Zenodo [code], tial validation reveals poor predictive performance of large- https://doi.org/10.5281/zenodo.6345309, 2022. scale ecological mapping models, Nat. Commun., 11, 4540, Meyer, H., Reudenbach, C., Wöllauer, S., and Nauss, T.: https://doi.org/10.1038/s41467-020-18321-y, 2020. Importance of spatial predictor variable selection in ma- Prigent, C., Papa, F., Aires, F., Rossow, W. B., and Matthews, chine learning applications – Moving from data repro- E.: Global inundation dynamics inferred from multiple satel- duction to spatial prediction, Ecol. Modell., 411, 108815, lite observations, 1993–2000, J. Geophys. Res.-Atmos., 112, https://doi.org/10.1016/j.ecolmodel.2019.108815, 2019. https://doi.org/10.1029/2006JD007847, 2007. Minasny, B., Berglund, O., Connolly, J., Hedley, C., de Vries Folk- Ricaurte, L. F., Olaya-Rodríguez, M. H., Cepeda-Valencia, J., Lara, ert, Gimona, A., Kempen, B., Kidd, D., Lilja, H., Malone, D., Arroyave-Suárez, J., Max Finlayson, C., and Palomo, I.: B., McBratney, A., Roudier, P., O’Rourke, S., Rudiyanto, Future impacts of drivers of change on wetland ecosystem Padarian, J., Poggio, L., ten Caten, A., Thompson, D., services in Colombia, Glob. Environ. Change, 44, 158–169, Tuve, C., and Widyatmanti, W.: Digital mapping of peat- https://doi.org/10.1016/j.gloenvcha.2017.04.001, 2017. lands – A critical review, Earth-Sci. Rev., 196, 102870, Roberts, D. R., Bahn, V., Ciuti, S., Boyce, M. S., Elith, J., Guillera- https://doi.org/10.1016/j.earscirev.2019.05.014, 2019. Arroita, G., Hauenstein, S., Lahoz-Monfort, J. J., Schröder, B., Olefeldt, D., Hovemyr, M., Kuhn, M. A., Bastviken, D., Bohn, T. Thuiller, W., Warton, D. I., Wintle, B. A., Hartig, F., and Dor- J., Connolly, J., Crill, P., Euskirchen, E. S., Finkelstein, S. A., mann, C. F.: Cross-validation strategies for data with temporal, Genet, H., Grosse, G., Harris, L. I., Heffernan, L., Helbig, M., spatial, hierarchical, or phylogenetic structure, Ecography, 40, Hugelius, G., Hutchins, R., Juutinen, S., Lara, M. J., Malhotra, 913–929, https://doi.org/10.1111/ecog.02881, 2017. A., Manies, K., McGuire, A. D., Natali, S. M., O’Donnell, J. A., Rocha, A. D., Groen, T. A., Skidmore, A. K., and Wille- Parmentier, F.-J. W., Räsänen, A., Schädel, C., Sonnentag, O., men, L.: Role of Sampling Design When Predicting Strack, M., Tank, S. E., Treat, C., Varner, R. K., Virtanen, T., Spatially Dependent Ecological Data With Remote Sens- Warren, R. K., and Watts, J. D.: The Boreal–Arctic Wetland and ing, IEEE Trans. Geosci. Remote Sens., 59, 663–674, Lake Dataset (BAWLD), Earth Syst. Sci. Data, 13, 5127–5149, https://doi.org/10.1109/TGRS.2020.2989216, 2021. https://doi.org/10.5194/essd-13-5127-2021, 2021. Running, S., Mu, Q., and Zhao, M.: MOD17A3 Olson, D. M., Dinerstein, E., Wikramanayake, E. D., Burgess, MODIS/Terra Net Primary Production Yearly L4 N. D., Powell, G. V. N., Underwood, E. C., D’amico, Global 1km SIN Grid V055, MODIS [data set], J. A., Itoua, I., Strand, H. E., Morrison, J. C., Loucks, https://doi.org/10.5067/MODIS/MOD17A3.006, 2011. C. J., Allnutt, T. F., Ricketts, T. H., Kura, Y., Lamoreux, Schroeder, R., McDonald, K., Chapman, B., Jensen, K., Podest, J. F., Wettengel, W. W., Hedao, P., and Kassem, K. R.: E., Tessler, Z., Bohn, T., and Zimmermann, R.: Devel- Terrestrial Ecoregions of the World: A New Map of Life opment and Evaluation of a Multi-Year Fractional Sur- on Earth, BioScience, 51, 933, https://doi.org/10.1641/0006- face Water Data Set Derived from Active/Passive Mi- 3568(2001)051[0933:teotwa]2.0.co;2, 2001. crowave Remote Sensing Data, Remote Sens., 7, 16688–16732, Page, S. E., Rieley, J. O., and Banks, C. J.: Global and regional im- https://doi.org/10.3390/rs71215843, 2015. portance of the tropical peatland carbon pool, Glob. Chang. Biol., Schulzweida, U.: CDO User Guide, Zenodo, 17, 798–818, https://doi.org/10.1111/j.1365-2486.2010.02279.x, https://doi.org/10.5281/zenodo.4246983, 2020. 2011. Shimada, M., Itoh, T., Motooka, T., Watanabe, M., Shiraishi, T., Papa, F., Prigent, C., Aires, F., Jimenez, C., Rossow, W. B., and Thapa, R., and Lucas, R.: New global forest/non-forest maps Matthews, E.: Interannual variability of surface water extent https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022 4738 Joe R. Melton et al.: Machine-learning-based peatland extent from ALOS PALSAR data (2007–2010), Remote Sens. Environ., Webster, K. L., Bhatti, J. S., Thompson, D. K., Nelson, S. A., Shaw, 155, 13–31, https://doi.org/10.1016/j.rse.2014.04.014, 2014. C. H., Bona, K. A., Hayne, S. L., and Kurz, W. A.: Spatially- Smith, K. B., Smith, C. E., Forest, S. F., and Richard, A. J.: A field integrated estimates of net ecosystem exchange and methane guide to the wetlands of the boreal plains ecozone of Canada, fluxes from Canadian peatlands, Carbon Balance Manag., 13, 16, Tech. rep., Ducks Unlimited Canada, Western Boreal Office: Ed- https://doi.org/10.1186/s13021-018-0105-5, 2018. monton, Alberta, 2007. Wetlands International: Wetlands International Map of Peatland Tarnocai, C., Kettles, I. M., and Lacelle, B.: Peatlands of Canada, Distribution Area and Carbon Content in Sumatera 1990–2002 Tech. Rep. Open File 6551, Geological Survey of Canada, 2011. Wetlands International – Indonesia Programme & Wildlife Habi- Terentieva, I. E., Glagolev, M. V., Lapshina, E. D., Sabrekov, tat Canada, Tech. rep., Wetlands International, Bogor, 2003. A. F., and Maksyutov, S.: Mapping of West Siberian Wetlands International: Wetlands International Map of Peatland taiga wetland complexes using Landsat imagery: implica- Distribution Area and Carbon Content in Kalimantan 2000–2002 tions for methane emissions, Biogeosciences, 13, 4615–4626, Wetlands International – Indonesia Programme & Wildlife Habi- https://doi.org/10.5194/bg-13-4615-2016, 2016. tat Canada, Tech. rep., Wetlands International, Bogor, 2004. Tian, J. and Philpot, W. D.: Relationship between surface soil water Wetlands International: Wetlands International Cadangan Karbon content, evaporation rate, and water absorption band depths in Bawah Permukaan di Papua Wetlands International – Indonesia SWIR reflectance spectra, Remote Sens. Environ., 169, 280–289, Programme & Wildlife Habitat Canada, Tech. rep., Wetlands In- https://doi.org/10.1016/j.rse.2015.08.007, 2015. ternational, Bogor, 2006. Touzi, R., Omari, K., Gosselin, G., and Sleep, B.: Polarimetric L- Xu, J., Morris, P. J., Liu, J., and Holden, J.: PEATMAP: band ALOS for peatland subsurface water monitoring, in: Con- Refining estimates of global peatland distribution ference Proceedings of 2013 Asia-Pacific Conference on Syn- based on a meta-analysis, Catena, 160, 134–140, thetic Aperture Radar (APSAR), 53–56, 2013. https://doi.org/10.1016/j.catena.2017.09.010, 2018. Touzi, R., Omari, K., Sleep, B., and Jiao, X.: Scattered and Yamazaki, D., Ikeshima, D., Tawatari, R., Yamaguchi, T., Received Wave Polarization Optimization for Enhanced Peat- O’Loughlin, F., Neal, J. C., Sampson, C. C., Kanae, land Classification and Fire Damage Assessment Using Po- S., and Bates, P. D.: A high-accuracy map of global larimetric PALSAR, IEEE J. Sel. Top. Appl., 11, 4452–4477, terrain elevations, Geophys. Res. Lett., 44, 5844–5853, https://doi.org/10.1109/JSTARS.2018.2873740, 2018. https://doi.org/10.1002/2017gl072874, 2017. Wang, L., Qu, J. J., Hao, X., and Zhu, Q.: Sensitivity studies Yu, Z., Loisel, J., Brosseau, D. P., Beilman, D. W., and Hunt, of the moisture effects on MODIS SWIR reflectance and veg- S. J.: Global peatland dynamics since the Last Glacial Maximum, etation water indices, Int. J. Remote Sens., 29, 7065–7075, Geophys. Res. Lett., 37, https://doi.org/10.1029/2010GL043584, https://doi.org/10.1080/01431160802226034, 2008. 2010. Wania, R., Ross, I., and Prentice, I. C.: Integrating peat- Zender, C. S.: Short communication: Analysis of self- lands and permafrost into a dynamic global vegeta- describing gridded geoscience data with netCDF Op- tion model: 1. Evaluation and sensitivity of physical erators (NCO), Environ. Model. Softw., 23, 1338–1342, land surface processes, Global Biogeochem. Cycles, 23, https://doi.org/10.1016/j.envsoft.2008.03.004, 2008. https://doi.org/10.1029/2008GB003412, 2009. Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022