Geosci. Model Dev., 15, 4709–4738, 2022
https://doi.org/10.5194/gmd-15-4709-2022
© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.
A map of global peatland extent created using
machine learning (Peat-ML)
Joe R. Melton1, Ed Chan2, Koreen Millard3, Matthew Fortier1, R. Scott Winton4,5,6, Javier M. Martín-López7,
Hinsby Cadillo-Quiroz8, Darren Kidd9, and Louis V. Verchot7
1Climate Research Division, Environment and Climate Change Canada, Victoria, BC, Canada
2Climate Research Division, Environment and Climate Change Canada, Toronto, ON, Canada
3Geography and Environmental Studies, Carleton University, Ottawa, ON, Canada
4Institute of Biogeochemistry and Pollutant Dynamics, ETH Zurich, 8092 Zurich, Switzerland
5Department of Surface Waters, Eawag, Swiss Federal Institution of Aquatic Science and Technology,
6047 Kastanienbaum, Switzerland
6Department of Earth System Science, Stanford University, Stanford, CA 94305, USA
7Agroecosystems and Sustainable Landscapes Program, Alliance Bioversity-CIAT, Cali, Colombia
8School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
9Natural Values Science Services, Department of Natural Resources and Environment, Hobart, Tasmania, Australia
Correspondence: Joe R. Melton (joe.melton@ec.gc.ca)
Received: 21 December 2021 – Discussion started: 14 February 2022
Revised: 4 May 2022 – Accepted: 6 May 2022 – Published: 20 June 2022
Abstract. Peatlands store large amounts of soil carbon generated by comparing Peat-ML against a high-quality, ex-
and freshwater, constituting an important component of the tensively ground-truthed map generated by Ducks Unlimited
global carbon and hydrologic cycles. Accurate information Canada for the Canadian Boreal Plains region. This compari-
on the global extent and distribution of peatlands is presently son suggests our map to be of comparable quality to mapping
lacking but is needed by Earth system models (ESMs) to products generated through more traditional approaches, at
simulate the effects of climate change on the global carbon least for boreal peatlands.
and hydrologic balance. Here, we present Peat-ML, a spa-
tially continuous global map of peatland fractional coverage
generated using machine learning (ML) techniques suitable
for use as a prescribed geophysical field in an ESM. Inputs Copyright statement. The works published in this journal are
to our statistical model follow drivers of peatland formation distributed under the Creative Commons Attribution 4.0 License.
and include spatially distributed climate, geomorphological This license does not affect the Crown copyright work, which
is re-usable under the Open Government Licence (OGL). The
and soil data, and remotely sensed vegetation indices. Avail- Creative Commons Attribution 4.0 License and the OGL are
able maps of peatland fractional coverage for 14 relatively interoperable and do not conflict with, reduce or limit each other.
extensive regions were used along with mapped ecoregions
of non-peatland areas to train the statistical model. In addi- © Crown copyright 2022
tion to qualitative comparisons to other maps in the literature,
we estimated model error in two ways. The first estimate used
the training data in a blocked leave-one-out cross-validation 1 Introduction
strategy designed to minimize the influence of spatial au-
tocorrelation. That approach yielded an average r2 of 0.73 Peatlands are estimated to cover about three percent of the
with a root-mean-square error and mean bias error of 9.11 % land surface but contain approximately a third of the soil
and −0.36 %, respectively. Our second error estimate was carbon and roughly a tenth of surface freshwater (Joosten
and Clarke, 2002; Jackson et al., 2017) and are vulnerable
Published by Copernicus Publications on behalf of the European Geosciences Union.
Model description paper
4710 Joe R. Melton et al.: Machine-learning-based peatland extent
to destabilization due to climate change and anthropogenic and it misses peatlands in regions where peatland coverage is
pressures, including drainage and land use change. Their im- known to exist, e.g. the Republic of Sakha (Yakutia, Russia),
portance in the carbon and hydrologic cycles motivates their as it is dependent upon mapping products existing for each
inclusion in Earth system models (ESMs) to better under- region.
stand their potential impact on the climate system. Since the In describing their dataset, Yu et al. (2010) state that, “ac-
land surface of ESMs is grid based, a prerequisite for in- curate true peatland coverage and distribution is not available
tegrating peatlands into these models is to define the loca- for many mapped regions”. Over a decade after the publica-
tion and the fractional cover of peatlands on the model grid. tion of Yu et al. (2010), this statement remains accurate. Peat-
However, peatlands have generally been overlooked in land- lands have traditionally been mapped through field surveys
scape databases and their mapping remains challenging (e.g. and manual inspection of aerial photography (e.g. Tarnocai
Krankina et al., 2008; Minasny et al., 2019). et al., 2011). These approaches are costly and labour inten-
As peatlands are commonly considered a type of wetland sive and become impractical as the study region becomes
that contains large amounts of organic carbon in the soil, sev- large or remote. As noted by Loisel et al. (2017), it is also dif-
eral studies have set peatland distribution based on maps of ficult to distinguish upland forests from forested peatlands in
soil organic matter density (e.g. Wania et al., 2009; Bech- the boreal region and between (sub)arctic tundra vegetation
told et al., 2019; Hugelius et al., 2020). However, using soil and peatlands in the higher latitudes using aerial photogra-
organic matter databases alone in determining peatland dis- phy. Digital soil mapping (DSM) is an alternative approach
tribution tends to overlook the vegetation and subsurface hy- to determining global peatland cover. DSM techniques typi-
drology, but most importantly they rely heavily on the fidelity cally combine field surveys with peatland covariates and sta-
of the soil carbon dataset. Another approach has been to use tistical models to produce maps of predicted peatland area
a soil map together with global wetland maps or inundation (McBratney et al., 2003). Following Minasny et al. (2019),
extent maps (e.g. Köchy et al., 2015). These wetland and in- the peatland covariates useful to DSM can be determined
undated area databases have mostly been produced through from the drivers of peatland formation, indicators of peat
mapping of shallow surface water based on remote-sensing presence, and sensors able to measure the indicators.
data, as in the Global Inundation Extent from Multi-Satellites The drivers of peatland formation are scale dependent
initiative (GIEMS; Prigent et al., 2007; Papa et al., 2010) and (Limpens et al., 2008) and thus the intended spatial extent
the Surface WAter Microwave Product Series (SWAMPS; and mapping resolution of the DSM product is an important
Schroeder et al., 2015), or land cover mapping using sur- consideration. For DSM on a regional to global scale, as is
face observations and moderate-resolution imaging spectro- the case when mapping for ESM use, the principal drivers
radiometer (MODIS) data, as in the Global Lake and Wet- of peatland formation are climate, vegetation, and terrain.
lands Database (GLWD-3; Lehner and Döll, 2004). These Minasny et al. (2019) suggest, for these drivers at this spa-
wetland mapping products are, however, of limited utility tial scale, that the indicators of peatland presence are climate
for peatland modelling applications as they generally do not data (primarily temperature and precipitation); land use and
agree well amongst themselves (Melton et al., 2013), which land cover information; and elevation, slope, and terrain at-
is also the case for peatland mapping products (as is dis- tributes. Possible sensors for regional- to global-scale map-
cussed later) and may exhibit biases depending on how they ping include optical and radar imagery, topographic remote-
were generated (see discussion in Bohn et al., 2015). In ad- sensing data (digital elevation models, DEMs), and climate
dition, in the boreal zone and some areas of the tropics such datasets. The statistical models used as part of DSM vary,
as the Amazon (Lähteenoja and Roucoux, 2010), some peat- but here we use a machine learning (ML) algorithm to derive
lands are not inundated, and thus using hydrological char- a global map of peatland extent intended for use in ESM ap-
acteristics alone can underestimate their extent (Matthews, plications. As field surveys are impractical to conduct on a
1989; Prigent et al., 2007). Other studies, such as Largeron global scale, we rely upon peatland mapping studies on re-
et al. (2018) or Leifeld and Menichetti (2018), have used gional scales to train our ML models and evaluate their re-
a global peatland distribution map derived from a paleon- sults. In Sect. 2 we define peatlands in the context of our
tological perspective (Yu et al., 2010). However, Yu et al. mapping approach and describe the datasets used for model
(2010) is an estimated map of binary polygons that does not training and the ML approach and algorithms used. Section 3
provide quantitative information on fractional coverage. The discusses the results of the ML algorithms and our model
most comprehensive global peatland map we are aware of is performance estimation strategy and limitations of our ap-
PEATMAP (Xu et al., 2018), which was generated through a proach. Section 4 presents our overall conclusions.
meta-analysis of regional-scale mapping products of varying
spatial resolution and provenance (general land cover maps,
soil databases, and a hybrid expert system). This dataset is
not well suited as a peatland mask for ESM use as the res-
olution of some of its parent datasets leaves large polygons
of complete peatland cover in regions where this is unlikely
Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022
Joe R. Melton et al.: Machine-learning-based peatland extent 4711
2 Materials and methods The climate predictors were derived from the TerraClimate
dataset (Abatzoglou et al., 2018). TerraClimate is available
2.1 Definition of peatlands at high spatial resolution (1/24◦) and provides monthly cli-
mate and climatic water balance variables spanning the 1958
As there is no single, universally adopted definition of peat- to 2015 period. TerraClimate uses the WorldClim dataset
lands, we follow Joosten and Clarke (2002) in defining them for high spatial resolution climatic normals, which is com-
as areas with or without vegetation that contain a naturally bined with the time-varying climate of the Climate Research
accumulating peat layer at the surface. While the definition of Unit Ts4.0 (CRU Ts4.0; Harris et al., 2020), where the time-
peat, as defined by the percent dead organic material by dry varying anomalies of CRU Ts4.0 are interpolated to the high-
mass, varies considerably in the literature (e.g. Gumbricht resolution climatology of WorldClim. The Japanese 55-year
et al., 2017; Page et al., 2011), we choose a more inclusive reanalysis (JRA55; Kobayashi et al., 2015) is used to fill
lower minimum value of 30 % to ensure we can capture the in where CRU Ts4.0 has no climate stations contributing
diversity of global peatlands. When using peatland mapping to its record (such as parts of South America, Africa, and
datasets that contain continuous peat depths (Sect. 2.3), we smaller islands) and was the sole data source for solar radi-
have used a minimum thickness of 30 cm of peat to delineate ation and wind speeds. Abatzoglou et al. (2018) notes that
peatlands, similar to Gumbricht et al. (2017). This depth limit the water balance model, used to generate some of the vari-
is the most common amongst national datasets (but see dis- ables listed in Table 1, is simple and does not account for
cussion on exceptions or the implications of different values vegetation heterogeneity or their physiological response un-
in Loisel et al., 2017). der varying environmental conditions. For the climate predic-
tors, we computed seasonal means across the available years,
2.2 Data acquisition and preparation i.e. December–February (DJF), March–May (MAM), June–
August (JJA), and September–November (SON). Given that
The general process of data preparation, model training, these seasonal means are likely less important in tropical re-
and evaluation is illustrated in Fig. 1. All training (regional gions, we did investigate using annual minimum and maxi-
peatland and non-peatland mapping products) and predictor mum values in place of seasonal ones but did not see a sig-
(peatland covariates) data were converted from their native nificant impact on predicted peatland fractional cover.
format (commonly GeoTiff rasters or vector-based GIS for- Soil predictors were obtained from the 250 m resolution
mats such as shapefiles) to netCDF format and remapped OpenLandMap (Hengl, 2018) including soil bulk density
onto a common 5 arcmin (ca. 0.0833◦, corresponding to (kg m−3), clay content (%), sand content (%), organic carbon
9.26 km at the Equator and 4.63 km at 60◦ N) grid using cli- content (%), and soil water content at field capacity (33 kPa).
mate data operators (CDO; Schulzweida, 2020), a geospatial These soil variables are derived from an ensemble of ma-
data abstraction software library (GDAL/OGR; GDAL/OGR chine learning algorithms trained on a global compilation
contributors, 2021), and/or netCDF Operators (NCO) (Zen- of soil profiles (Hengl and MacMillan, 2019). We used the
der, 2008). The original resolutions of the data sources are 30 cm depth estimate for all soil variables.
each listed below. All ML runs and evaluations were per- Terrain information is provided by the 250 m resolution
formed on the 5 arcmin grid. version of Geomorpho90m (Amatulli et al., 2020) for 17 dif-
ferent geomorphometry variables describing numerous as-
2.2.1 Predictors (peatland covariates) pects of the land surface (see Table 1). This geomorphology
dataset has an original resolution of 90 m, the same resolu-
We used a suite of predictors that fell into four main types: tion as the Multi-Error-Removed Improved Terrain (MERIT)
climate, soils, vegetation, and terrain (geomorphology). Ta- DEM (Yamazaki et al., 2017) from which it was derived.
ble 1 lists each predictor grouped by predictor source and MERIT-DEM is a merged and error-corrected product based
type. The climate, vegetation, and soil predictors were ex- on the ALOS World 3D – 30 m (AW3D30) and Shuttle Radar
tracted from the Google Earth Engine data catalogue (Gore- Topography Mission (SRTM3) datasets.
lick et al., 2017). The geomorphological dataset was down- Information about the vegetation state was provided by
loaded directly from its authors’ website (Amatulli et al., several datasets. Shimada et al. (2014) created a seamless
2020, last access: 16 January 2020). Sampling across the dif- global mosaic image from the Phased Array type L-band
ferent years provided by each dataset is assumed to be rela- Synthetic Aperture Radar (PALSAR/PALSAR2). This image
tively unimportant as peatland extent is not highly dynamic was created with 25 m grid cells on an annual timescale. In
across decadal timescales, especially considering the scale creating the mosaic, at each location within a year the im-
of our grid cells (Loisel et al., 2013). An additional predic- ages chosen were those showing minimum response to sur-
tor was the calculated length of the longest day of the year face moisture. The images were then ortho-rectified, slope
(hours) for each cell on the 5 arcmin grid. The longest day corrected, and had a destriping procedure to equalize differ-
of the year could be used by the model to determine tropical ences between strips that could occur due to conditions at
versus extratropical regions. time of acquisition. As the dataset’s intended purpose was to
https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022
4712 Joe R. Melton et al.: Machine-learning-based peatland extent
Figure 1. Flow chart of the machine learning procedure.
provide a global mask of forest cover (Shimada et al., 2014) able quality (according to the dataset’s quality flags) were
soil moisture differences were purposefully minimized, and excluded. Given the original data do not have composite
thus this dataset is likely of more limited use to predict peat- monthly values, the mean, minimum (min), maximum (max),
land extent than would otherwise be expected for an L-band and standard deviation (SD) were all calculated based upon
radar product (Izumi et al., 2019; Touzi et al., 2018). How- all values within a year and then the average was taken across
ever, likely due to the significant computational effort re- all years.
quired to produce a global L-band product, we are not aware Vegetation phenology information is provided by the
of another product publicly available. MCD12Q2 V6 Land Cover Dynamics product (Friedl et al.,
The MODIS Terra net primary productivity product 2019). The MODIS vegetation phenology product provides
(MOD17A3.055 NPP) is available annually on a 1 km phenological information such as the dates of green-up, peak,
grid (Running et al., 2011). This version of MODIS NPP and senescence along with variables related to the range and
(v. 5.5) is corrected for issues relating to cloud-contaminated summation of the EVI (see Table 1). Since this is an annual
MODIS leaf area index fraction of photosynthetically active product the mean, min, max, and SD values are calculated
radiation (LAI-FPAR) inputs to the MOD17 algorithm. We across all years.
averaged the data over the available 2000–2015 period. We also considered the global surface water (GSW)
Vegetation indices are provided by the Suomi National dataset of Pekel et al. (2016) but did not include it as a predic-
Polar-Orbiting Partnership (S-NPP) NASA Visible Infrared tor dataset. We found this dataset to be unsuitable for peat-
Imaging Radiometer Suite (VIIRS) product VNP13A1, land prediction due to its reliance on Landsat imagery. Treed
which is generated by selecting the best pixel at 500 m res- peatlands, peatlands smaller than 30 m by 30 m, and peat-
olution over a 16 d acquisition period. The VIIRS data are lands where the water table is below the peat surface, such as
generated for three vegetation indices including the normal- bogs, would not be well captured by GSW. A visual inspec-
ized difference vegetation index (NDVI), which uses both red tion of GSW over some of our training regions (see Sect. 2.3)
and near-infrared (NIR) bands, and two enhanced vegetation showed poor correlation between GSW water presence and
indices (EVI, EVI2), which also include the blue band with mapped peatland area.
EVI2 designed for intercomparison with other EVI products
that do not use a blue band (Table 1). EVI is more sensitive 2.3 Training data
to canopy cover, while NDVI is more sensitive to chlorophyll
(Huete et al., 2002). All snow, cloud, or cloud shadow pix- For training and testing the machine learning model, peatland
els and any pixels that were not excellent, good, or accept- fractional cover was selected as the target variable. However,
accurate estimates of peatland fractional cover are not widely
Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022
Joe R. Melton et al.: Machine-learning-based peatland extent 4713
Table 1. Potential peatland co-variates used as predictor variables for the ML algorithms to predict peatland fractional cover. The treatment
of variables is discussed in Sect. 2.2.1. The predictor variables in bold were selected for the final model (see Sect. 2.4.3).
Type Source and resolution (time period) Predictor
Climate TerraClimate (Abatzoglou et al., 2018) Actual evapotranspirationa, climate water deficita, soil watera,
1/24◦ (1985–2015) potential evapotranspiration (Penman–Monteith), precipitation
accumulated, downward surface shortwave radiation, snow water
equivalenta, runoffa, Palmer Drought Severity Index (PDSI),
minimum temperature, maximum temperature, vapour pressure,
vapour pressure deficit, 10 m wind speed
Soils Open Land Maps (Hengl, 2018) Soil bulk density, clay content, sand content, soil water content,
250 m (–) at field capacity (33 kPa), organic carbon content
Terrain Geomorpho90m (Amatulli et al., 2020) Slope, aspect, eastness, northness, convergence indexb,
250 m (–) compound topographic indexc, stream power indexd, first and
second directional derivatives (east–west, north–south), profile
curvaturee, tangential curvaturef, elevation standard deviation,
geomorphology landformg, roughness indices, topographic position
index, maximum elevation deviation
Vegetation PALSAR/PALSAR2 (Shimada et al., 2014) HHh and HVi polarization backscattering coefficient
25 m (2007–2010)
MOD17A3 V055 (Running et al., 2011) Net primary productivity
1 km (2000–2015)
S-NPP VIIRS vegetation indices Enhanced vegetation index (EVI)j, EVI2k, near-infrared
(VNP13A1) (Didan and Barreto, 2018) radiation (NIR), shortwave infrared radiation reflectance (SWIR1l),
500 m (2012–2019)
SWIR2m, SWIR3n, normalized difference vegetation index (NDVI),
NIR reflectanceo, green reflectancep, blue reflectanceq, red
reflectancer
MODIS Global Vegetation Phenology Dormancy, EVI_Amplitude, EVI_Area, EVI_Minimum,
(MCD12Q2 V6 Land Cover Dynamics) Greenup, Maturity, MidGreendown, MidGreenup,
(Friedl et al., 2019) 500 m (2001–2018) Peak, Senescence
Geographic Calculated Length of the longest day of the year in hours
a Derived using a one-dimensional soil water balance model. b Ranges between 100 for sinks (convergent areas) and −100 for ridges (divergent areas). Flat areas are 0.
c Also known as topographic wetness index (Beven and Kirkby, 1979). d Defined as the product of the tangent of the local slope angle and the upstream catchment area.
e Measures the rate of change of a slope along a flow line. Convex slopes accelerate water flowing along them while concave slopes decelerate flow. f Measures the
perpendicular rate of change to the slope gradient. This captures the convergence (concave curvature) and divergence (convex curvature) of flow across a surface. g For
example, flat, spur, valley, calculated using morphometry techniques based on pattern recognition. h Horizontal transmit and horizontal receive. i Horizontal transmit and
vertical receive. j Three-band enhanced vegetation index. k Two-band EVI using only red and NIR band. l 1230–1250 nm. m 1580–1640 nm. n 2225–2275 nm.
o 846–885 nm. p 545–656 nm. q 478–498 nm. r 600–680 nm.
available, as discussed in Sect. 1. Recently, Minasny et al. tent (tens of thousands of square kilometres, but we allow
(2019) reviewed the present state of peatland mapping. They smaller mapping products if they are located in highly under-
found 90 recent studies mapping peatlands, with many de- represented regions), that have attempted to validate their
lineating peatland extents using ecological and environmen- peatland extents, and which are readily available in digital
tal field studies in combination with land cover from remote formats. We have acquired peatland extent estimates for 14
sensing; however, the studies seldom conduct validation of major regions (Fig. 2) including Canada, the taiga zone of the
their mapping, and uncertainty estimates are rare (e.g. Ta- West Siberian Lowlands (WSL), Scotland, the Netherlands,
ble 4 in Minasny et al., 2019). Additionally, the definition the St. Petersburg region of Russia, New Zealand, Tasmania,
of peat can vary between countries and studies (e.g. Table 2 the Cuvette Centrale in the Congo, Indonesia, the Pastaza-
in Minasny et al., 2019), making assembling an internally Marañón foreland basin (PMFB) in northeastern Peru, and
consistent global dataset of peatland extents challenging. In the peatlands along the Peruvian Rio Madre de Dios, along
selecting peatland extent estimates for our training data, we with some peatland-free regions.
have chosen studies that are of sufficiently large spatial ex-
https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022
4714 Joe R. Melton et al.: Machine-learning-based peatland extent
> 60 cm depth (Geological Survey of Finland, 2018). The
dataset was created through air photo interpretation and field
mapping with the smallest polygon size about 6 ha.
A database for the peatlands of Scotland was recently pub-
lished by Aitkenhead and Coull (2019). Peatland cover was
determined using back-propagation neural networks trained
with peatland survey, climate, topography, Landsat imagery,
geologic, and land cover data. Aitkenhead and Coull (2019)
reports an r2 of 0.67 for peat depth, which we used to de-
termine peatland fractional cover. Peatlands were assumed
to have > 30 cm peat, and pixels with peat deeper than that
were assigned 100 % peatland cover and 0 % elsewhere.
The Derived Irish Peat Map version 2 (DIPMv2) (Con-
Figure 2. Training data for the LightGBM algorithm. Areas in white nolly and Holden, 2009) was compiled from the land cover
have no data. The green letters denote the blocks used for the cross- and soil maps of Ireland using a rules-based decision tree
validation scheme. The training block limits were chosen as de- methodology. Connolly and Holden (2009) estimate the
scribed in Sect. 2.4.2. overall accuracy of DIPMv2 to be 85 %. From the DIPMv2,
we included raised bogs, low-level Atlantic blanket bogs,
and high-level montane blanket bogs in producing a peatland
Peatland coverage data for Canada, which has ca. 13 % of cover map.
the land surface covered with peatlands, comes from Ducks Wageningen Environmental Research recently updated the
Unlimited Canada (hereafter DUC; Smith et al., 2007) and Soil Map of the Netherlands (1 : 50000 scale) including peat
The Peatlands of Canada database (Tarnocai et al., 2011). depth using a combination of boreholes and ordinary krig-
Both datasets defined peatlands as wetlands (bogs, fens, ing (Brouwer et al., 2018; Brouwer and Walvoort, 2019). For
swamps, or marshes) with massive deposits of peat at least each region, a number of boreholes were not used in cali-
40 cm thick, as is the convention in Canada. The Peatlands bration of the kriging model (roughly 10 %) and retained for
of Canada database was primarily derived from soil surveys evaluation. Based on evaluation against the validation bore-
and air photo interpretation. Shapefiles were available with hole subset, the average peat depth error varied between re-
information on bog, fen, and bog–fen features with ≥ 1 % gions but was commonly between 10 and 20 cm. We used the
peat coverage (Tarnocai et al., 2011). The DUC dataset cov- peat depth to delineate peatland area based on a threshold of
ers the 74.1× 104 km2 Boreal Plains region and was derived 30 cm where thicknesses greater than that were assumed to
from a satellite-based remote sensing classification system be 100 % peatland and 0 % elsewhere.
validated by 5034 field sites (Smith et al., 2007). Draper et al. (2014) mapped peatlands for a region of
The peatlands of the taiga zone of the West Siberian Low- Amazonia in northwestern Peru (the Pastaza-Marañón fore-
lands (WSL) is estimated by Terentieva et al. (2016) to be land basin; PMFB). A support vector machine (SVM) clas-
52.4× 104 km2, or 4 %–12 % of the global wetland area. To sifier was trained with Landsat, ALOS/PALSAR, and Shut-
conduct this mapping, Terentieva and co-workers used a su- tle Radar Topography Mission (SRTM) elevation data. Along
pervised classification scheme for Landsat imagery that was with forest census plots and peat thickness measurements, a
trained on field data and high-resolution images from 28 test supervised classification method was used to train the SVM
sites. They estimate their accuracy at 79 % based on 1082 and determine the distribution of peatland vegetation types,
10× 10 pixel size validation polygons. as well as above- and below-ground carbon stocks. The three
The St. Petersburg region of Russia was mapped by Pflug- peat-forming vegetation types were pole forest, palm swamp,
macher et al. (2007) using MODIS Nadir bidirectional re- and open peatlands.
flectance distribution function adjusted reflectance (NBAR). The Cuvette Centrale is located in the central Congo basin.
The MODIS-NBAR reflectances were combined with empir- Dargie et al. (2017) used a digital elevation model (DEM)
ical regression models to determine sub-pixel peatland cover- to remove steep slopes and high ground, optical data (Land-
age. To fit the models, Pflugmacher et al. (2007) drew upon sat Enhanced Thematic Mapper, ETM+) to identify probable
forest inventory data for observed peatland fractional cover swamp vegetation, which we used as a proxy for peatland
over 1105 MODIS pixels with half used for model fitting fractional coverage, and radar backscatter (L-band synthetic
and half for validation. Error analysis showed good predic- aperture radar; ALOS PALSAR) to identify surface water un-
tion capability with correlation with observations of r = 0.92 der forest cover. Together these approaches were used to pro-
for unmined peatlands. Pflugmacher et al. (2007) found the duce a maximum likelihood tree. They then conducted nine
region to have approximately 10 % peatland cover. transects of length 2.5 to 20 km to ground truth the data. Most
The Finnish Geologic Survey superficial deposits 1 : peatlands in this region are located within large interfluvial
200000 map displays peat deposits at 0–30, 30–60, and basins and are largely rain-fed and ombrotrophic. The areal
Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022
Joe R. Melton et al.: Machine-learning-based peatland extent 4715
extent of peat in the Cuvette Centrale was estimated to be gions in Sect. 3.3. A further region of zero peatland extent
14.6× 104 km2 (Dargie et al., 2017). was defined according to a map of soil organic carbon for
Indonesian peatlands have been mapped by Wetlands In- the Casanare flooded savannas of Colombia (Martín-López
ternational in a series of publications (Wetlands Interna- et al., 2022) and expert opinion based upon field observa-
tional, 2003, 2004, 2006). The maps have been derived from tions. We also set peatland area to zero for any pixels that are
regional-scale maps and project reports, soil maps, Land- ice covered as shown in the Global Land Ice Measurements
sat imagery, and ground truthing. This dataset uses a 30 cm from Space (GLIMS) dataset (GLIMS and NSIDC, 2018).
threshold of peat thickness to delineate peatlands.
National maps of New Zealand peatlands were derived 2.4 Machine learning approach
from the Fundamental Soil Layers (FSL) soil maps published
at 1 : 50000 scale by the New Zealand Land Resource Inven- 2.4.1 LightGBM and hyperparameter optimization
tory (NZLRI; Landcare Research NZ Ltd, 2000). The poly-
gons in the FSL maps were manually created from aerial The statistical modelling was conducted in the Python pro-
photograph analysis with ground truthing. Peatlands were se- gramming language (v. 3.8.3). We use a gradient boosting
lected by choosing the organic soils class. decision tree (GBDT) algorithm called LightGBM (Ke et al.,
Organic soil and peat mapping was undertaken by the De- 2017). Decision tree algorithms make iterative splits to par-
partment of Natural Resources and Environment, Tasmania, tition data according to different criteria. The decision tree
to provide decision support for fire management and sup- will split each node at the feature with the largest information
pression activities in the Tasmanian Wilderness World Her- gain, i.e. the most informative. For GBDTs, the information
itage Area (Kidd et al., 2021). A DSM approach was used gain is usually measured by the variance after splitting. To
to predict organic soil and peat areas using new and exist- avoid issues with overfitting of a decision tree, GBDT algo-
ing soil site data, intersected with a range of environmen- rithms use the boosting technique, which combines multiple
tal predictor datasets, which included vegetation mapping, decision trees in series to achieve better predictive power as
legacy soil mapping, wetlands, digital elevation models, ter- each tree in the series attempts to minimize the errors in the
rain derivatives, remote sensing (multispectral green or bare previous tree. The error minimization steps occur through a
areas, gamma radiometrics, Sentinel RADAR), and climate. form of gradient descent in function space where each tree is
A binary “presence–absence” calibration set of site data was trained on a residual vector that measures the magnitude and
used to create a digital map index (0–1). Modelling was un- direction of the true target relative to the previous tree (loss
dertaken using regression trees with 10-fold cross-validation, function), which successive iterations minimize.
where spatial output values closer to “1” were deemed to be
meeting the environmental conditions conducive to peat for- 2.4.2 Cross-validation approach
mation. The organic soil extent modelling R2 calibration and
validation values were 0.77 and 0.70, respectively. Map vali- To provide estimates of the error associated with the
dation by expert review determined that spatial index values LightGBM predictions we adopted a blocked-leave-one-out
> 0.75 were highly likely to be peat (or organic) soils (Kidd (BLOO) strategy, which is recommended for applications
et al., 2021). where the predictors could be expected to exhibit spatial au-
Peatlands along the Rio Madre de Dios in Peru were tocorrelation (Roberts et al., 2017; Meyer et al., 2019; Ploton
mapped by Householder et al. (2012) using Landsat imagery et al., 2020). BLOO tends to produce estimates of prediction
and field observations. They identified 295 peatlands from error that are closer to the “true” error (Roberts et al., 2017),
remote-sensing imagery covering 294 km2 and from 0.1 to particularly in cases where the sampling strategy is clustered
35.0 km2 in size. Field verification was performed at 35 peat- (Rocha et al., 2021). We chose to block our cross-validation
lands giving over 800 georeference validation data points. (CV) regions based on longitudinal limits to allow both bo-
To increase the number of cells for model training and also real and tropical peatlands to potentially be represented in
improve representation of peatland-free landscapes, we in- each block. The optimal number of training blocks is an im-
cluded polygons of ecoregions that should contain little to portant determination. Choosing blocks that are too small
no peatlands from Olson et al. (2001), thus all areas in these risks incorrectly increasing our CV-determined model ac-
ecoregions and biomes were considered to have zero peat- curacy due to spatial autocorrelation issues, while choosing
land extent. The ecoregions chosen were the global distribu- overly large blocks will result in information loss and wors-
tion of the Desert and Xeric Shrublands biome, excluding 15 ens our assessed model accuracy unduly. We determine the
ecoregions that had a non-zero peatland extent within at least optimal number of blocks by comparing the length scale of
one grid cell according to PEATMAP. This was to ensure we autocorrelation of the model residuals with our block sizes.
take a conservative approach to the use of these non-peatland Figure A1 shows the autocorrelation tends to zero at a length
masks. Two South American ecoregions (Beni Savanna and scale (sill) of around 500 km. To accommodate this we set a
the Rio Negro Campinarana; Fig. 2) were also included as minimum block size of 10◦ of longitude (which corresponds
peat-free regions. We discuss the inclusion of these ecore- to roughly 500 km at 65◦ latitude). Based on the constraints
https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022
4716 Joe R. Melton et al.: Machine-learning-based peatland extent
Table 2. Training data (regional peatland mapping products) for the machine learning model.
Region Source Peatland determination technique
Boreal Plains of Canada Ducks Unlimited Canada Satellite imagery with > 5000 site visits
(Smith et al., 2007)
Rest of Canada Tarnocai et al. (2011) Primarily from soil surveys and air photo interpretation
West Siberian Lowlands Terentieva et al. (2016) Supervised classification of Landsat trained on field data
(taiga zone)
St. Petersburg region (Russia) Pflugmacher et al. (2007) Regression models from MODIS-NBAR reflectance
Finland Geological Survey of Finland (2018) Field mapping and air photo interpretation
Scotland Aitkenhead and Coull (2019) Neural networks trained with survey data and covariates
Ireland Connolly and Holden (2009) Rules based decision tree with land cover and soil maps
Netherlands Brouwer et al. (2018) Ordinary kriging with boreholes for calibration
Brouwer and Walvoort (2019) and evaluation
Amazonia* Draper et al. (2014) SVM supervised classification using elevation, optical,
and radar remote-sensing data
Congo basin Dargie et al. (2017) Combination of DEM, Landsat ETM+, and ALOS
(Cuvette Centrale)
PALSAR along with ground truthing transects
Indonesia Wetlands International Collation of regional maps, soil surveys, Landsat
(2003, 2004, 2006) imagery verified by ground truthing
New Zealand Landcare Research NZ Ltd (2000) Collation of regional maps and soil surveys
Tasmania Kidd et al. (2021) ML with terrain, vegetation mapping, and satellite
spectra covariates including seasonal Sentinel-1 coverage
Rio Madre de Dios (Peru) Householder et al. (2012) Landsat imagery with field mapping
* Pastaza-Marañón foreland basin (PMFB) in northwestern Peru
of our minimum block size and the need for a roughly even tion with cross-validation (RFECV) (Pedregosa et al., 2011),
number of training cells in each block, we end up partition- which is a form of backward feature elimination.
ing the globe into 14 blocks as shown in Fig. 2. The CV was Multicollinearity was accounted for by using the calcu-
performed by holding out one block, training the LightGBM lated variance inflation factor (VIF) to identify and remove
algorithm over the other blocks, and then using that trained highly correlated variables (Alin, 2010). VIF uses ordinary
model to predict the peatland extent over the held-out block. least-squares regression to determine collinearity with the
This was performed for each block in turn and the results score determined by
averaged to give an estimate of the prediction error.
1
VIF= 2 , (1)(1−Rj )
2.4.3 Predictor selection and model optimization
where R2j is the multiple coefficient of determination for the
From the potential peatland covariates listed in Table 1, and feature j on the other features (covariates) defined as the ratio
discussed in Sect. 2.2.1, we processed 163 global peatland between the sum of squares due to the regression (SSR) and
features that could be used by the machine learning model. the total sum of squares (SST),
However, it is likely that many of these predictors will have SSR
low predictive power and duplicate information provided by R2j = . (2)SST
other predictors, leading to over-fitting by the ML algorithm
(Dormann et al., 2013). To select only the most relevant fea- One approach would be to simply set a threshold VIF value
tures we used both iterative feature removal based on the and remove all features with VIF values higher than this
calculated multicollinearity and recursive feature elimina- threshold in a single step. However, in order to avoid the
Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022
Joe R. Melton et al.: Machine-learning-based peatland extent 4717
elimination of potentially important features, we chose in- The third and fourth most important variables are soil organic
stead to conduct the exclusion process iteratively. First, each carbon at 30 cm depth and shortwave infrared radiation re-
feature had its VIF score calculated. Then all features with flectance at 2225–2275 nm (SWIR3). The remaining less im-
a VIF value higher than 5 (corresponding to a R2j of 0.8) portant features (∼< 5 %) relate to climate (DJF snow water
were ranked according to their information gain calculated equivalent, MAM vapour pressure, DJF shortwave radiation
by LightGBM, and the feature with the lowest gain was re- and wind speed, SON runoff, and TerraClimate-derived DJF
moved. The model was then retrained and the VIF value soil water).
recalculated. If features remained that had a VIF above the Minasny et al. (2019) suggest the indicators of peat-
threshold value, the same ranking and removal would occur land presence on a regional to global scale are climate
until all remaining features had a VIF value below threshold. data (primarily temperature and precipitation); land use and
This step retained 30 features (listed in Table A1). The VIF land cover information; and elevation, slope, and terrain
value chosen is quite stringent, well below what Dormann attributes. Slope has also been used in several terrestrial
et al. (2013) suggest as a critical value (10). ecosystem models as a means to predict wetland areas;
We use RFECV with BLOO CV (using the same blocks as i.e. the flatter a region, the more likely water will stag-
described in Sect. 2.4.2) in an iterative manner to ascertain nate, allowing wetland formation (e.g. Kaplan, 2002; Arora
the optimal number of features. RFECV first trains the Light- et al., 2018). Interestingly, the top two predictors are impor-
GBM algorithm on the original number of features (here 30) tant components of the Kaplan (2002) wetland determina-
with the features ranked for their importance, based on infor- tion scheme. The geomorphological features appear to pro-
mation gain, for the model’s root-mean-square error (RMSE) vide further information about the land surface characteris-
as determined by the BLOO CV. The least important feature tics that can allow peatland formation distinct from that of
is removed and the model is retrained using the new sub- slope alone. The importance of the SOC variables demon-
set of features. By retraining the model after each feature is strates the close relationship between SOC and peat soils,
held out, we avoid the issue of extrapolation that can occur in as has been exploited for peatland mapping in the past (e.g.
permutation-based approaches (as described in Hooker et al., Hugelius et al., 2020). The importance of SWIR3 likely
2021). The algorithm can then produce an estimate of model reflects its utility in determining wet earth from dry earth
skill as a function of the number of features trained (Fig. A2). and providing information about the vegetation water sta-
The RFECV algorithm will choose an optimal number of fea- tus. SWIR3 is particularly useful as a feature as it can help
tures based on the greatest model skill. Based on Fig. A2, 16 delineate fens, as otherwise the ML model lacks a predic-
features (highlighted in Table 1) were selected as the optimal tor of groundwater contributions to surface water, as well as
number to retain for the optimization process and the final peatlands from uplands in general, as SWIR reflectance is
model. generally sensitive to soil moisture, soil type, and vegetation
GBDT algorithms tend to require hyperparameter tuning leaf area index and water content (Wang et al., 2008; Tian
to ensure the model is performing optimally. We employed and Philpot, 2015). Of the climate predictors, DJF SWE and
Bayesian optimization on 11 LightGBM hyperparameters DJF shortwave radiation could have been used by the ML
(Table A2) using the hyperopt package (Bergstra et al., 2015) model to distinguish boreal from tropical peatlands. Vapour
over 500 trials. In each trial, the final 16 predictors identified pressure may also have some utility in determining peatlands
in the steps above were used in the LightGBM model to op- due to the differing evapotranspiration response of peatlands
timize the model’s calculated RMSE based upon the BLOO from upland forests (Helbig et al., 2020). In general, how-
CV. The optimized parameters were then used to generate the ever, all the climate variables were of relatively small impor-
Peat-ML map. tance, with roughly 5 % or less importance as measured by
information gain.
Figure 3 also shows the feature importance as found by
3 Results and discussion the BLOO CV for each block (whereby each block in the
figure shows the feature importance ranking when that block
3.1 Predictor importance was not trained upon for the CV). Looking at feature impor-
tance broken down in this manner reveals some remarkable
The top 10 predictors based on information gain as deter- consistency in some predictors, e.g. relatively low impor-
mined by the LightGBM algorithm are shown in Fig. 3. tance predictors (< 10 %) remain consistently less important.
Based on the full LightGBM model runs (hereafter Peat- While other features have highly variable importance princi-
ML), the most informative feature is the geomorphological pally slope, geomorphon, and SOC-30 cm. These three vari-
landform (e.g. flat, spur, valley, peak), which is calculated ables can switch order of importance when trained to exclude
using morphometry techniques based on pattern recognition certain training blocks during the BLOO CV. When trained
(Amatulli et al., 2020). The next most important predictor is with all training data (full model; black diamonds in Fig. 3),
terrain slope, defined as the rate of change in elevation along predictor importance is generally close to the middle of the
the direction of the water flow line (Amatulli et al., 2020). range set by the blocks from the BLOO CV, excluding some
https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022
4718 Joe R. Melton et al.: Machine-learning-based peatland extent
Figure 3. Predictor importance based on percent information gain for the top 10 features as determined by the LightGBM algorithm.
The feature ranking is shown for each of the blocks used during the BLOO CV (coloured dots; see Sect. 2.4.2). The feature importance
from the full model simulation is shown by the black diamonds. SWIR3 is the shortwave infrared radiation reflectance for 2225–2275 nm,
geomorphon is the geomorphological landform, SWE is the snow water equivalent, and SOC is soil organic carbon at 30 cm depth. See
Table 1 and Sect. 2.2.1 for more details.
of the more minor features such as SON runoff or DJF wind lion square kilometres), and an older estimate of Gorham
speed. This demonstrates that, given there are only 14 blocks, (1991), but it is at the lower bound suggested by Loisel
excluding training data as part of the BLOO CV can have et al. (2017). In the tropics, our model estimate is roughly
relatively large consequences, especially as each peatland re- the same as PEATMAP but only a little over half of the ex-
gion has its own particular characteristics as evidenced by tent estimated by Gumbricht et al. (2017). The Gumbricht
the changing predictor importance. For example, the Cuvette et al. (2017) map was produced through a hybrid approach
Central, western Amazonia, and tropical islands of Asia all that uses hydrological modelling, remote-sensing products,
appear to differ significantly regarding characteristics such hydro-geomorphology from topographic data, and expert as-
as peat depth, structure, carbon density, etc. (see Table 1 in sessment. It is only available across the tropics (maximum
Dargie et al., 2017). 40◦ N).
The spatial distribution of the predicted peatlands will now
3.2 Predicted peatland extents be examined in detail. We focus on regions that have either
multiple other peatland mapping products for comparison or
3.2.1 Global contain large areas of predicted peatlands.
Global peatland extent as predicted by Peat-ML is shown in 3.2.2 Boreal peatlands: Europe and Russia
Fig. 4. When Peat-ML is compared to PEATMAP (Xu et al.,
2018), many major peatlands regions appear similar includ- Figure 5 shows the peatland extent in the WSL, western Rus-
ing Canada, the WSL, the Cuvette Centrale of the Congo, sia, and parts of Scandinavia for Peat-ML, its training data,
and parts of Scandinavia. However, the two maps also differ PEATMAP, Hugelius et al. (2020), and the Boreal–Arctic
substantially. The regions with the most notable difference Wetland and Lake Dataset (BAWLD) (Olefeldt et al., 2021).
between the two products include Alaska, parts of Africa ex- The Hugelius et al. (2020) dataset is derived from the mean
cluding the Congo, and eastern Siberia. There are more in- of two soil datasets and is only available for the Northern
termediate peatland extents predicted by Peat-ML, whereas Hemisphere (> 23◦ N). The BAWLD product is derived from
PEATMAP tends to show more regions of 100 % peatland expert assessment that is then extrapolated through the use of
extent with less gradation between peatlands. Our estimated random forest models and geospatial datasets across the bo-
global peatland extent at 4.04 106 km2× is similar to the real and Arctic regions. The original spatial resolution is rel-
PEATMAP estimate of 4.23 6 2× 10 km (Table 3). atively coarse at 1◦ by 1◦. For the WSL region, all four prod-
Our Northern Hemisphere (> 23◦ N) estimates of 3.0 mil- ucts are similar, with only slight differences in the peatland
lion square kilometres is lower than the other available es- fractional cover (rather than its spatial distribution). Peat-ML
timates including PEATMAP (3.2 million square kilome- shows strong similarity with its training data as would be ex-
tres), the lower bound of Hugelius et al. (2020) (3.2 mil- pected. PEATMAP stands out compared to the other maps
Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022
Joe R. Melton et al.: Machine-learning-based peatland extent 4719
Figure 4. Global peatland extent as estimated by Peat-ML along with PEATMAP (Xu et al., 2018).
Table 3. Peatland extents as estimated by Peat-ML alongside other literature estimates.
Region Source Peatland extent (km2)
Global Peat-ML 4.04 6× 10
PEATMAP 4.23× 106
Northern Hemisphere (> 23◦ N) Peat-ML 3.00 106×
Gorham (1991) a 3.46× 106
Loisel et al. (2017) b 3.0–3.5× 106
PEATMAP 3.19× 106
Hugelius et al. (2020) 3.7± 0.5 106×
Tropics (23.5◦ S–23.5◦ N) Peat-ML 0.96 106×
PEATMAP 0.94× 106
Gumbricht et al. (2017) 1.70× 106
Canadian Boreal Plains Peat-ML 185× 103
DUC 186 103×
PEATMAPc 185× 103
Hugelius et al. (2020) 164 103×
Webster et al. (2018) 269× 103
a Boreal and subarctic peatlands. b Suggested best estimate for modern peatland area. Includes a summary of
other estimates which range between 2.4 and 4.0× 106 km2. c Here PEATMAP’s underlying data source is
Tarnocai et al. (2011).
https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022
4720 Joe R. Melton et al.: Machine-learning-based peatland extent
Figure 5. Maps of eastern European and Russian peatlands including (a) training data used by the ML model; (b) Peat-ML-predicted
peatlands; and the peatland coverage from (c) PEATMAP (Xu et al., 2018), (d) Hugelius et al. (2020), and (e) the Boreal–Arctic Wetland
and Lake Dataset (BAWLD; Olefeldt et al., 2021).
due to its almost binary peatland coverage showing either All maps show relatively similar distributions of peatlands
high values or no peatlands with little gradation in between. surrounding the Baltic Sea (Fig. 5). None of the maps in-
Compared to Hugelius et al. (2020), Peat-ML shows less dicate peatlands by the Caspian Sea as seen in PEATMAP,
peatlands in the northern edge of the Northwestern region except some small extents (1 %–3 % predicted by Peat-ML)
of Russia but more by the White Sea. Both PEATMAP and to the northwest of those depicted in PEATMAP.
Peat-ML do not show peatlands near the mouth of the Kara As with Eastern Europe, Western Europe is similar in that
River to the northwest of the terminus of the Ural Mountains, PEATMAP shows a more binary representation of the peat-
as evident in Hugelius et al. (2020) and BAWLD, while Peat- land extent compared to the other maps (Fig. A5). Peat-ML
ML and BAWLD show few peatlands on the Yamal Penin- and Hugelius et al. (2020) have fairly similar peatland dis-
sula, where both PEATMAP and Hugelius et al. (2020) sug- tributions and extents. The main differences are expressed
gest appreciable extents. Generally, Peat-ML has more simi- in small pockets of peatlands, e.g. eastern Spain has scat-
larity to PEATMAP than Hugelius et al. (2020) and BAWLD tered peatlands in Hugelius et al. (2020) that are not found
over the western Russian domain. in Peat-ML or PEATMAP, whereas in western Hungary both
Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022
Joe R. Melton et al.: Machine-learning-based peatland extent 4721
Hugelius et al. (2020) and PEATMAP show small peatlands simulations with Peat-ML and maps from other modelling
not predicted to be as extensive by Peat-ML. processes (e.g. Gumbricht et al., 2017), we noticed predic-
tions for high peatland coverage in areas of South America
3.2.3 Boreal peatlands: Canada and Alaska where peat is not known to occur. This includes seasonally
flooded savannas, such as the Llanos de Moxos (Beni Sa-
For the northern contiguous USA, for Canada, and for vanna) and Llanos Orientales of Colombia and Venezuela. A
Alaska, peatlands extents are shown in Fig. 6. Alaskan peat- recent field expedition searching for peat in the Colombian
lands predicted by Peat-ML have some similarity to the Llanos failed to discover any peat deposits (Martín-López
Hugelius et al. (2020) map and BAWLD, with extensive peat- et al., 2022), which could indicate that these tropical savanna
lands in western Alaska (Lower Yukon region). These peat- biomes are generally not able to form extensive peat deposits.
lands are not evident in PEATMAP, which shows less ex- Additionally, white sand ecosystems are not known to sup-
tensive but high-coverage peatlands along the southern and port extensive peatlands, and thus we also excluded the Rio
eastern edges of the state. Peat-ML, Hugelius et al. (2020), Negro Campinarana ecoregion that corresponds with white
and BAWLD predict peatlands along the Alaska North Slope sandy soils (Spodosols/Podzols) and not Histosols. Without
that are not evident with PEATMAP. Other reports suggest these negative data, we would likely overpredict peat extent
extensive wetlands in Alaska (e.g. Glass, 1992), but we are in South America rather severely.
not aware of any mapping product detailing peatland-specific Peat-ML predicts an extensive peatland in the PMFB and
coverage. central Amazonia. The extent of peatlands in this region is
For Peat-ML, the Canadian peatlands from Tarnocai et al. lower than in PEATMAP, mainly due the generally lower ex-
(2011) and DUC (Smith et al., 2007) are used as training tent per grid cell, despite being in broadly similar regions.
data, which naturally gives good correspondence between Both PEATMAP and Peat-ML show peatlands along the
Fig. 6a and c. For a more informative comparison of the gen- northeastern coast of the continent. Peat-ML predicts smaller
eral model skill for boreal peatland regions, Peat-ML predic- peatland extents (generally < 10 %–15 % coverage) in the
tions from the BLOO CV simulation are also shown, as this Pantanal and along the Paraguay River as it joins the Paraná
would give some indication of predictions without the bene- River down to the Rio de la Plata, which are not evident in
fit of training upon a particular region’s peatlands (Fig. 6b). PEATMAP.
Generally, all datasets shown in Fig. 6 display some strong There are some non-peatland river floodplains that Peat-
similarities, with large peatlands shown for the Hudson’s Bay ML characterizes as peatlands, such as Colombia’s Rio
Lowlands (HBL), the Mackenzie Delta, and across the Bo- Guaviare. This river may be too dynamic to allow ex-
real Plains, yet important differences are also visible. Web- tensive peat formation due to relatively rapid meandering
ster et al. (2018) shows little peatland along the southern edge that would scour away peat-forming depressions faster than
of the Hudson’s Bay, perhaps due to their peatland determi- the organic matter can accumulate or else bury potential
nation model’s emphasis on treed peatlands. Webster et al. peat with mineral sediments from the Andes (Junk, 1982).
(2018) also show generally higher peatland coverage where Given the lack of an appropriate predictor for these hydro-
peatlands are present than the other datasets. Hugelius et al. geomorphological processes operating on decadal to centen-
(2020) predicts extensive but relatively low coverage across nial timescales, it is not surprising that Peat-ML may over-
much of the Canadian eastern Arctic that is not found in any estimate peat extent in these ecosystems. Other areas, like
of the other peatland maps. Of the five peatland maps, the Colombia’s Amazon catchment region, might be suscepti-
most closely corresponding peatland extents appear to be be- ble to similar processes as these regions are suggested to
tween PEATMAP, BAWLD, and Peat-ML. be floodplain forests in Ricaurte et al. (2017); however, their
The northern USA has some peatlands around the Great map is based on the CORINE Land Cover data for Colombia
Lakes evident in PEATMAP and Hugelius et al. (2020) (∼ (IDEAM, 2010). Other areas in Colombia where Peat-ML
10 %–60 %), which are also predicted but appear less exten- predicts peatlands include parts of the Orinoco catchment re-
sive in Peat-ML (usually ∼ 1 %–15 %). Besides the cover- gion, where Ricaurte et al. (2017) shows flooded grassland
age differences, the products have a similar spatial extent, savannas and riparian wetlands, and the Caribbean catchment
although PEATMAP’s peatlands are more commonly higher region, where peatlands are indicated by CORINE with other
coverage per identified peatland. wetland types. Given that the CORINE land cover product is
based upon remote sensing with little ground truthing, it is
3.2.4 Tropical peatlands: South America and Central possible that several of these wetland regions shown in Ri-
America caurte et al. (2017) are actually peat-forming regions, mak-
ing it difficult to definitively evaluate Peat-ML against this
South American peatlands are shown in Fig. 7. Peat-ML dataset. Besides the occasional small peatland area (e.g. in
peatland training data for this region (Fig. 7a) are currently the Páramo of Ecuador; Hribljan et al., 2017), there are few
limited, encompassing only Peru’s Pastaza-Marañón fore- sources of high-quality peatland mapping products for South
land basin (PMFB) and the Rio Madre de Dios. In early America to evaluate Peat-ML against. While Peru has the
https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022
4722 Joe R. Melton et al.: Machine-learning-based peatland extent
Figure 6. Peatland extents for Canada, the northern contiguous USA, and Alaska for (a) Peat-ML, (b) Peat-ML from the BLOO CV, (c) the
training data used for the ML model, and four other peatland extent products (d–f).
PMFB mapped by Draper et al. (2014) and the Rio Madre Mexico, which is not evident in PEATMAP. A desk-based
de Dios by Householder et al. (2012) and is proposed to have assessment of peatlands based upon cartographic approaches
extensive peatlands by Gumbricht et al. (2017) and Peat-ML, with solicited expert assessment shows similar distributions
there is presently no national peatland inventory (López Gon- of peatland extent, but with less peatlands in the Yucatán (Pe-
zales et al., 2020). ters and Tegetmeyer, 2019). The Yucatán peninsula has rela-
Peat-ML predicts more peatland extent than PEATMAP tively extensive marsh and mangrove coastal wetlands but is
in Central America (Fig. A6). Much of the predicted peat- a karstic landscape with a highly permeable carbonate sub-
lands are close to coastlines, in particular along the Atlantic strate (Adame et al., 2013) suggesting Peat-ML is overesti-
coasts of Mexico, Nicaragua, Costa Rica, and Cuba. Peat-ML mating peat extent for the inland portions of the peninsula.
places more extensive peatlands on the Yucatán Peninsula of
Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022
Joe R. Melton et al.: Machine-learning-based peatland extent 4723
Figure 7. South American peatland extents. Panel (a) shows Peat-ML training data, panel (b) shows Peat-ML-predicted peatland coverage,
and panel (c) shows PEATMAP (Xu et al., 2018), which is taken from Gumbricht et al. (2017) for this region.
3.2.5 Tropical peatlands: Africa and the Indonesian Papua New Guinea, Brunei, and Malaysia are entirely model-
Archipelago predicted areas (Fig. A7). While Malaysian peatlands ap-
pear similar between Peat-ML and PEATMAP, Papua New
African peatlands (Fig. 8) are also poorly mapped, making it Guinea is quite different. PEATMAP shows extensive peat-
difficult to evaluate the Peat-ML results. There are notable lands in the central mountainous region of the country, while
differences between Peat-ML and PEATMAP. PEATMAP Peat-ML has the peatlands placed in the surrounding lowland
shows very few peatlands outside the Congo’s Cuvette Cen- regions. There is some indication that the mountainous re-
trale, whereas Peat-ML has relatively extensive peatlands in gions should have extensive peatlands (Hope, 2015). These
South Sudan along the border of the Central African Re- peatland complexes appear to be sufficiently different from
public and Chad. This is in general agreement with more the Peat-ML training data that the ML model is unable to
qualitative African peatland extent estimates (Grundling and predict them.
Grootjans, 2016) and demonstrates Peat-ML’s ability to rea-
sonably determine peatland extents in regions where reliable 3.3 Model quality estimation
spatially explicit mapping products are absent. Regardless,
Peat-ML may still be underestimating African peatlands due Besides the qualitative discussion above, we estimated the
to a lack of appropriate training data. An example is the quality of our predicted peatland extent through two differ-
newly documented peatlands in the Okavango Delta (Geli- ent approaches. First, we compared our model results against
nas, 2018), which have a dominantly herbaceous vegeta- the training data detailed in Sect. 2.3. For this analysis, we
tion cover (sedges, papyrus, grasses), while our only training performed a BLOO CV as described in Sect. 2.4.2. Peat-ML
dataset for Africa is a swamp forest (Cuvette Centrale). Fu- (CV) accuracy had an r
2 of 0.72, a mean bias error (MBE) of
ture iterations of Peat-ML may profit from some active map- −0.29 %, and an RMSE of 9.11 % (Fig. 9b). The model per-
ping campaigns presently underway in East Africa (Alexan- formance across each of the 12 training blocks can be seen in
dra Barthalmes, personal communication, 2021) that could Figs. A3 and A4. While the mean r
2 across all training blocks
provide much needed training data and thereby improve pre- was 0.72, it ranged from a low of 0.20 (predicting for block
dictions for the peatland regions of Africa. Improving under- F in the BLOO CV in Fig. 2) to a high of 0.88 (block E). One
standing of African peatland extents will likely remain chal- caveat of our error estimation presented here is that we are
lenging; however, due to land use pressures that may com- computing it based upon the datasets used for model train-
plicate peatland identification and mapping as suggested by ing. If these datasets themselves have errors or omissions, as
Grundling and Grootjans (2016), African peatlands are heav- would be expected, then this will diminish the accuracy of
ily utilized by rural populations that depend on the peatland’s our error estimation, as well as the quality of the ML model
water and organic soils for crop cultivation. itself, since they form the benchmark that Peat-ML is com-
While much of the Indonesian Archipelago contains train- pared against.
ing data for the ML algorithm, the neighbouring states of
https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022
4724 Joe R. Melton et al.: Machine-learning-based peatland extent
be providing the algorithm only peatland-rich training data,
leaving the model poorly trained for peatland-poor regions.
Machine learning algorithms are best suited to interpolation
problems (e.g. McCartney et al., 2020), and thus it is best to
produce training data that give the full range of conditions
under which the model is expected to produce predictions.
Additionally, for the peatlands of South America, we found
that we were overpredicting peatland extents as determined
by expert opinion and field observation, primarily due to the
paucity of high-quality peatland maps from the continent. As
more high-quality peatland mapping products become avail-
able from presently poorly mapped regions, the use of these
ecoregions and biomes could be removed or reduced.
The second approach to estimate the quality of our peat-
land map focuses on the Boreal Plains (BP) region of
Canada, where we have several peatland products for com-
parison (Fig. A8). The DUC remote-sensing-based dataset
for this region is uniquely well ground truthed, with over
5000 site visits over its 74.1× 104 km2 area. The DUC
dataset has a peatland extent of 186× 103 km2 (Table 3)
for the BP region, which is about the same as PEATMAP
(whose underlying data source in this region is Tarnocai
et al. (2011)). Peat-ML (CV) estimates 199 103× km2 (this
is derived from the BLOO CV simulations to allow a more
fair comparison; it is 185 103× km2 when estimated by the
full model, i.e. Peat-ML) while Hugelius et al. (2020) es-
timates 164 3 2× 10 km and Webster et al. (2018) estimates
269× 103 km2. We can estimate a confidence interval us-
ing ±2× the Peat-ML (CV) RMSE, which gives a range
of 140 103× to 234× 103 km2. This range suggests that the
predicted extent is only significantly different between Peat-
ML (CV) and Webster et al. (2018). Given its quality, we
take the DUC dataset as our benchmark and use it to deter-
mine the accuracy of Peat-ML and other products (Table 4).
Figure 8. Peatland extent over central Africa. Panel (a) shows the As expected, Peat-ML compares well with the DUC dataset,
ML training data, panel (b) shows the Peat-ML-predicted peatland as it is trained using that dataset. A more useful compari-
extent, and panel (c) shows the PEATMAP extent from Xu et al. son is with Peat-ML (CV), where the model is not trained
(2018). with the DUC dataset. Peat-ML (CV) has the second low-
est RMSE, mean bias, and explained variance scores after
Tarnocai et al. (2011) in all instances (Table 4). For the DUC
Peat-ML likely underestimates peatland coverage, as can region, the Peat-ML (CV) results indicate a higher predictive
be seen from its negative MBE (also visible in the regres- performance than a peatland mapping product based on soil
sion line shown in Fig. 9). We hypothesize that this low databases (Hugelius et al., 2020); another based on boosted
bias may stem from the use of biomes and ecoregions to de- regression trees using forest structure maps, bioclimatic vari-
note peatland-free areas. It is possible, since these regions ables, and surface slopes (Webster et al., 2018); and one
are fairly coarsely defined, that we may be inadvertently as- based upon ML models informed by expert assessment, al-
signing small-scale, niche peatland areas as non-peatlands though BAWLD has the lowest spatial resolution, which may
(although we take measures to avoid this; see Sect. 2.3). If have impeded its performance against the high-resolution
that is the case, we would be training the model to miss the DUC dataset. Peat-ML (CV) is, however, outperformed by
characteristics of these more niche peatland environments a more traditional and labour-intensive product based on air-
and biassing our results. We use the ecoregions and biomes photo interpretation and soil surveys (Tarnocai et al., 2011),
from Olson et al. (2001) to delineate these non-peatland re- although the performance difference is relatively small (e.g.
gions to counter the fact that high-quality peatland datasets RMSE difference of 0.39 %). This indicates that our model,
are typically created only for peatland-rich regions. With- for this region at least, is of similar or higher quality com-
out inclusion of this peatland-poor training data, we would
Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022
Joe R. Melton et al.: Machine-learning-based peatland extent 4725
Figure 9. Scatterplots of Peat-ML-predicted peatland cover versus actual peatland cover (from the datasets listed in Table 2) for the full
model (a) and as determined by the BLOO CV (b).
pared to other peatland mapping products available from a between the vegetation-based predictor and peatland extent.
diverse range of methodologies. An additional challenge is the importance of seasonality of
covariates (e.g. climate, vegetation indices) that differ signif-
3.4 Limitations of our approach icantly between the tropics and high latitudes based on their
local dynamics. This may be addressed in future versions of
The purpose of our study is to produce a map of peatland Peat-ML by training separate models for both regions along-
distribution for use as an input geophysical field for ESMs side predictors tailored to the dynamics of each region, al-
with integrated peatland models. It is tempting to ask whether though that depends on a greatly increasing availability of
our technique can give any insights into peat formation or tropical training datasets to ensure well-trained models.
the conditions necessary for a peatland to develop and per- In addition, as discussed in Sect. 3.3, it would be benefi-
sist. While our approach is not prescriptive like Hugelius cial to include mapping products for regions where peatlands
et al. (2020), where peatlands are defined based upon the are relatively sparse. As our peatland sampling strategy was
soil carbon at a location, it is challenging to derive causal determined by the availability of high-quality peatland maps,
information from our simulations. Many of the top features we were not able to choose systematic (Rocha et al., 2021)
determined by the LightGBM algorithm (Fig. 3) are related or feature-based sampling strategies that could be more opti-
to geomorphological characteristics, soil carbon, vegetation mal for peatland prediction. Our approach would also benefit
and soil water status, and climate. However, peatlands them- from greater availability of processed, global-scale products
selves will alter the environment they form within (e.g. fill in that should be sensitive to water status below the peat surface
depressions with peat or alter the hydrologic balance for the like L-band synthetic aperture radar (e.g. Touzi et al., 2013).
vegetation), and thus it is difficult to differentiate cause from
effect.
A weakness of our approach lies in the availability of train- 4 Conclusions
ing data. Our training data for peatland distribution are gen-
erally biased towards the high latitudes. While we have good We present a new global peatland fractional coverage map,
coverage of peatland presence in Canada and western Siberia Peat-ML, at a scale of 5 arcmin resolution. Peat-ML was
(see Sect. 2.2), we presently lack extensive high-quality peat- generated using machine learning techniques drawing upon
land distribution maps for much of the Southern Hemisphere drivers of peatland formation that include spatially dis-
and tropics. However, we expect new products to become tributed climate, soil, geomorphology, and vegetation data.
available over time (e.g. Anda et al., 2021; Bourgeau-Chavez The ML model was trained using maps of peatland frac-
et al., 2021). As one of our main predictors is sensitive to tional coverage for 14 relatively extensive regions along with
vegetation (SWIR3), there is also the possibility that peat- masks of non-peatland areas. To evaluate Peat-ML, we qual-
land types that are not represented in our training data (e.g. itatively compared it to other available peatland maps, and
mangroves and marshes in the neotropics or papyrus marshes we also quantified model performance using two approaches.
of Africa) will be poorly represented by the available train- The first approach is based on a blocked leave-one-out cross-
ing data that the ML algorithm uses to derive a relationship validation strategy designed to minimize the influence of
https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022
4726 Joe R. Melton et al.: Machine-learning-based peatland extent
Table 4. Statistical comparison of peatland map products as evaluated against the DUC dataset (Smith et al., 2007). RMSE is the root-mean-
2
square error. The explained variance score (calculated as 1 σ (y−ŷ)− 2 , where y is the observations, ŷ is the prediction, and σ is the standardσ (y)
deviation) has a best possible value of 1.0, with lower scores indicating worse performance.
Mapping product RMSE (%) Mean bias (%) Explained variance score (–)
Peat-ML 12.60 0.18 0.68
Peat-ML (CV) 17.50 −1.52 0.38
Hugelius et al. (2020) 18.00 2.61 0.35
PEATMAP* 17.11 −0.06 0.40
Webster et al. (2018) 23.25 −9.49 0.07
BAWLD Olefeldt et al. (2021) 22.24 −9.33 0.16
* Tarnocai et al. (2011) is the underlying data source for PEATMAP in the DUC domain
spatial autocorrelation. Based upon that approach, Peat-ML
has an average r2 of 0.73 with a root-mean-square error
and mean bias error of 9.11 % and −0.36 %, respectively,
when evaluated against our model training data. Our second
model quality estimate was generated by comparing Peat-
ML against a high-quality, extensively ground-truthed map
for the 74.1 104 km2× Canadian Boreal Plains region. This
comparison suggests Peat-ML is of comparable or higher
quality than other presently available peatland mapping prod-
ucts. Future versions of Peat-ML would benefit from further
high-quality and ground-truthed datasets of peatland extent,
especially in tropical regions.
Appendix A
Figure A1. Correlogram showing the spatial correlation be-
tween model residuals as a function of distance computed using
Moran’s I .
Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022
Joe R. Melton et al.: Machine-learning-based peatland extent 4727
Figure A2. Cross-validation scores against the number of features selected by RFECV (see Sect. 2.4.3).
Figure A3. Scatterplots of full model Peat-ML-predicted peatland extent and peatland extent from the peatland training datasets over the 14
BLOO CV blocks.
https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022
4728 Joe R. Melton et al.: Machine-learning-based peatland extent
Figure A4. Scatterplots of the CV trials for Peat-ML (Peat-ML CV)-predicted peatland extent and peatland extent from the peatland training
datasets over the 14 BLOO CV blocks.
Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022
Joe R. Melton et al.: Machine-learning-based peatland extent 4729
Figure A5. Maps of western European peatlands, including (a) training data used by the ML model; (b) Peat-ML-predicted peatlands; and
the peatland coverage from (c) PEATMAP (Xu et al., 2018), (d) Hugelius et al. (2020), and (e) BAWLD (Olefeldt et al., 2021), whose domain
only partly extends over the region displayed.
https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022
4730 Joe R. Melton et al.: Machine-learning-based peatland extent
Figure A6. Peatland extent over Central America. Panel (a) shows the ML training data, panel (b) shows the Peat-ML-predicted peatland
extent, and panel (c) shows the PEATMAP extent from Xu et al. (2018).
Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022
Joe R. Melton et al.: Machine-learning-based peatland extent 4731
Figure A7. Peatland extent over the Indonesian archipelago. Panel (a) shows the ML training data, panel (b) shows the Peat-ML-predicted
peatland extent, and panel (c) shows the PEATMAP extent from Xu et al. (2018).
https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022
4732 Joe R. Melton et al.: Machine-learning-based peatland extent
Figure A8. Maps of peatland extent for the Boreal Plains of Canada.
Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022
Joe R. Melton et al.: Machine-learning-based peatland extent 4733
Table A1. The 30 predictors selected using VIF with a threshold value of 5. The final 16 features that were further selected by the RFECV
algorithm for use in the final model are listed in Table 1. See Sect. 2.2.1 for further discussion on the variable processing.
Category Short name Variable Data source
Climatea soil_DJF soil water TerraClimate (Abatzoglou et al., 2018)
srad_DJF downward surface shortwave radiation
swe_DJF snow water equivalent
ws_DJF wind speed
vap_MAM vapour pressure
ro_SON runoff
pdsi_SON Palmer Drought Severity Index
Soils OLM_soil_organic_carbon_30cm organic carbon content Open Land Maps (Hengl, 2018)
OLM_Soil_BulkDensity_30cm soil bulk density
Vegetation Dormancy_1 dormancy MODIS (MCD12Q2 V6)
Senescence_2 senescence (Friedl et al., 2019)
EVI_Amplitude_2 enhanced vegetation index amplitude
EVI_Area_1 sum of EVI1 from greenup to dormancy
EVI_Area_2 sum of EVI2 from greenup to dormancy
minMODIS_NPP minimum NPP MOD17A3 V055
(Running et al., 2011)
SWIR3_reflectance_mean_SON shortwave infrared radiation reflectanceb S-NPP VIIRS
(Didan and Barreto, 2018)
Terrain spi stream power indexc Geomorpho90m
geom geomorphon (Amatulli et al., 2020)
slope slope
tcurve tangential curvatured
rough-scale scale of terrain roughness
dev-magnitude, dev-scale maximum elevation deviation value
dx first directional derivative (east-west)e
dxy,dyy second directional derivativef
convergence convergence indexg
aspect-sine, aspect-cosine sine(cosine) of aspecth
northness northnessi
a The means of DJF, MAM ,JJA, and SON refer to the 3-month periods indicated by the first letter of each month, respectively. b 2225–2275 nm. c Product between the upstream
catchment area and the tangent of the local slope angle. d Measures the rate of change perpendicular to the slope gradient and is related to the convergence and divergence of
flow across a surface. e The rate of change of the elevation in a specific direction. f The rate of change of the slope in a predetermined direction. g Terrain variable that details the
convergent areas as channels and divergent areas as ridges. It has a value of −100 for ridges, 0 for planar or flat areas, and up to 100 for sink areas. h Angular direction that a
slope faces. i Calculated from sine of the slope multiplied by the cosine. Northness gives a continuous measure of the orientation combined with the slope. For the Northern
Hemisphere, a northness approaching 1 gives a northern exposure on a vertical slope (that is a slope exposed to a very low amount of solar radiation), conversely a northness of
−1 gives a very steep southern slope that would be highly exposed to solar radiation.
https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022
4734 Joe R. Melton et al.: Machine-learning-based peatland extent
Table A2. LightGBM hyperparameters that underwent Bayesian Review statement. This paper was edited by David Lawrence and
optimization and their final optimized values. See Sect. 2.2.1 for reviewed by two anonymous referees.
further discussion on the variable processing. See Pedregosa et al.
(2011) documentation for further discussion about each hyperpa-
rameter.
Name Range Optimized value References
boosting_type gbdt, dart, goss dart Abatzoglou, J. T., Dobrowski, S. Z., Parks, S. A., and Hegewisch,
num_leaves 10–50 30 K. C.: TerraClimate, a high-resolution global dataset of monthly
n_estimators 50–300 250
learning_rate 0.005–0.4 0.18817013045111064 climate and climatic water balance from 1958–2015, Sci. Data,
max_bin 25–300 95 5, 170191, https://doi.org/10.1038/sdata.2017.191, 2018.
max_depth 1–15 6 Adame, M. F., Kauffman, J. B., Medina, I., Gamboa, J. N., Tor-−
subsample_for_bin 20 000–300 000 80 000 res, O., Caamal, J. P., Reza, M., and Herrera-Silveira, J. A.:
min_child_samples 5–60 10 Carbon stocks of tropical coastal wetlands within the karstic
reg_alpha 0–1 0.705705986914311 landscape of the Mexican Caribbean, PLoS One, 8, e56569,
reg_lambda 0–1 0.9086692536858783 https://doi.org/10.1371/journal.pone.0056569, 2013.
colsample_bytree 0.5–1.0 0.8251441062858274 Aitkenhead, M. J. and Coull, M. C.: Mapping soil pro-
file depth, bulk density and carbon stock in Scotland us-
ing remote sensing and spatial covariates, Eur. J. Soil Sci.,
https://doi.org/10.1111/ejss.12916, 2019.
Alin, A.: Multicollinearity, Wiley Interdiscip. Rev. Comput. Stat.,
2, 370–374, https://doi.org/10.1002/wics.84, 2010.
Code and data availability. Python code for the statistical mod- Amatulli, G., McInerney, D., Sethi, T., Strobl, P., and Domisch, S.:
elling is available at https://doi.org/10.5281/zenodo.6345309 Geomorpho90m, empirical evaluation and accuracy assessment
(Melton et al., 2022). A netCDF format version of the Peat- of global high-resolution geomorphometric layers, Sci. Data, 7,
ML dataset is available at https://doi.org/10.5281/zenodo.5794336 162, https://doi.org/10.1038/s41597-020-0479-6, 2020.
(Melton et al., 2021). Anda, M., Ritung, S., Suryani, E., Sukarman, Hikmat, M., Yatno, E.,
Mulyani, A., Subandiono, R. E., Suratman, and Husnain: Revis-
iting tropical peatlands in Indonesia: Semi-detailed mapping, ex-
Author contributions. JRM conceptualized the study. EC, JRM, tent and depth distribution assessment, Geoderma, 402, 115235,
and MF performed data curation, formal analysis, investigation, and https://doi.org/10.1016/j.geoderma.2021.115235, 2021.
software and methodology development. JMML, RSW, and KM Arora, V. K., Melton, J. R., and Plummer, D.: An assessment of
also contributed to methodology development. KM, JMML, HCQ, natural methane fluxes simulated by the CLASS-CTEM model,
and DK provided resources. JRM and EC did the visualization. Biogeosciences, 15, 4683–4709, https://doi.org/10.5194/bg-15-
Validation was done by RSW, JMML, HCQ, LVV, DK, EC, and 4683-2018, 2018.
JRM. JRM wrote the original draft of the manuscript. All authors Bechtold, M., De Lannoy, G. J. M., Koster, R. D., Reichle, R. H.,
reviewed and edited the final manuscript. Mahanama, S. P., Bleuten, W., Bourgault, M. A., Brümmer,
C., Burdun, I., Desai, A. R., Devito, K., Grünwald, T., Gry-
goruk, M., Humphreys, E. R., Klatt, J., Kurbatova, J., Lo-
hila, A., Munir, T. M., Nilsson, M. B., Price, J. S., Röhl,
Competing interests. The contact author has declared that neither M., Schneider, A., and Tiemeyer, B.: PEAT–CLSM: A Spe-
they nor their co-authors have any competing interests. cific Treatment of Peatland Hydrology in the NASA Catchment
Land Surface Model, J. Adv. Model. Earth Sy., 11, 2130–2162,
https://doi.org/10.1029/2018MS001574, 2019.
Disclaimer. Publisher’s note: Copernicus Publications remains Bergstra, J., Komer, B., Eliasmith, C., Yamins, D., and Cox,
neutral with regard to jurisdictional claims in published maps and D. D.: Hyperopt: a Python library for model selection and hy-
institutional affiliations. perparameter optimization, Comput. Sci. Discov., 8, 014008,
https://doi.org/10.1088/1749-4699/8/1/014008, 2015.
Beven, K. J. and Kirkby, M. J.: A physically based, variable con-
Acknowledgements. We acknowledge the efforts of Yuanqiao Wu tributing area model of basin hydrology, Hydrol. Sci. Bull., 24,
and Diana Verseghy, who led an earlier effort to predict global peat- 43–69, https://doi.org/10.1080/02626667909491834, 1979.
land extents using machine learning approaches. We thank Dirk Bohn, T. J., Melton, J. R., Ito, A., Kleinen, T., Spahni, R., Stocker,
Flugmacher, Matt Aitkenhead, Fokke Brouwer, Freddie Draper, B. D., Zhang, B., Zhu, X., Schroeder, R., Glagolev, M. V.,
Greta Dargie, and Rudiyanto for sharing their peatland mapping Maksyutov, S., Brovkin, V., Chen, G., Denisov, S. N., Eliseev,
products. We also thank Camila Delgado-Montes for processing the A. V., Gallego-Sala, A., McDonald, K. C., Rawlins, M. A., Ri-
Rio Madre de Dios data. We have adopted the colour bar scheme ley, W. J., Subin, Z. M., Tian, H., Zhuang, Q., and Kaplan, J. O.:
from Hugelius et al. (2020) for our peatland extent plots. Lastly, we WETCHIMP-WSL: intercomparison of wetland methane emis-
thank Michel Bechtold for comments about an earlier study that we sions models over West Siberia, Biogeosciences, 12, 3321–3349,
used to improve the design of this study. https://doi.org/10.5194/bg-12-3321-2015, 2015.
Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022
Joe R. Melton et al.: Machine-learning-based peatland extent 4735
Bourgeau-Chavez, L. L., Grelik, S. L., Battaglia, M. J., Leisman, tial analysis for everyone, Remote Sens. Environ., 202, 18–27,
D. J., Chimner, R. A., Hribljan, J. A., Lilleskov, E. A., Draper, https://doi.org/10.1016/j.rse.2017.06.031, 2017.
F. C., Zutta, B. R., Hergoualc’h, K., Bhomia, R. K., and Läh- Gorham, E.: Northern Peatlands: Role in the Carbon Cycle and
teenoja, O.: Advances in Amazonian Peatland Discrimination Probable Responses to Climatic Warming, Ecol. Appl., 1, 182–
With Multi-Temporal PALSAR Refines Estimates of Peatland 195, https://doi.org/10.2307/1941811, 1991.
Distribution, C Stocks and Deforestation, Front. Earth Sci. Chin., Grundling, P. and Grootjans, A. P.: Peatlands of Africa, in: The
9, 1019, https://doi.org/10.3389/feart.2021.676748, 2021. Wetland Book: II: Distribution, Description and Conservation,
Brouwer, F. and Walvoort, D. J. J.: Basisregistratie Ondergrond edited by: Finlayson, C. M., Milton, G. R., Prentice, R. C.,
(BRO) – actualisatie bodemkaart : Herkartering van de bodem and Davidson, N. C., Springer Netherlands, Dordrecht, 1–10,
in Eemland, Tech. Rep. 2352-2739, Wettelijke Onderzoekstaken https://doi.org/10.1007/978-94-007-6173-5_112-1, 2016.
Natuur & Milieu, Wageningen, 2019. Gumbricht, T., Roman-Cuesta, R. M., Verchot, L., Herold, M.,
Brouwer, F., Vries, F. D., and Walvoort, D. J. J.: Basisregistratie Wittmann, F., Householder, E., Herold, N., and Murdiyarso, D.:
Ondergrond (BRO) actualisatie bodemkaart : Herkartering van An expert system model for mapping tropical wetlands and peat-
de bodem in Flevoland, Tech. Rep. 2352-2739, Wettelijke On- lands reveals South America as the largest contributor, Glob.
derzoekstaken Natuur & Milieu, Wageningen, 2018. Chang. Biol., 23, 3581–3599, https://doi.org/10.1111/gcb.13689,
Connolly, J. and Holden, N. M.: Mapping peat soils in Ireland: 2017.
updating the derived Irish peat map, Ir. Geogr., 42, 343–352, Harris, I., Osborn, T. J., Jones, P., and Lister, D.: Version 4 of the
https://doi.org/10.1080/00750770903407989, 2009. CRU TS monthly high-resolution gridded multivariate climate
Dargie, G. C., Lewis, S. L., Lawson, I. T., Mitchard, E. T. A., Page, dataset, Sci. Data, 7, 109, https://doi.org/10.1038/s41597-020-
S. E., Bocko, Y. E., and Ifo, S. A.: Age, extent and carbon storage 0453-3, 2020.
of the central Congo Basin peatland complex, Nature, 542, 86– Helbig, M., Waddington, J. M., Alekseychik, P., Amiro, B. D., Au-
90, https://doi.org/10.1038/nature21048, 2017. rela, M., Barr, A. G., Black, T. A., Blanken, P. D., Carey, S. K.,
Didan, K. and Barreto, A.: VIIRS/NPP Vegetation In- Chen, J., Chi, J., Desai, A. R., Dunn, A., Euskirchen, E. S., Flana-
dices 16-Day L3 Global 500m SIN Grid V001, USGS, gan, L. B., Forbrich, I., Friborg, T., Grelle, A., Harder, S., He-
https://doi.org/10.5067/VIIRS/VNP13A1.001, 2018. liasz, M., Humphreys, E. R., Ikawa, H., Isabelle, P.-E., Iwata,
Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, H., Jassal, R., Korkiakoski, M., Kurbatova, J., Kutzbach, L., Lin-
G., Marquéz, J. R. G., Gruber, B., Lafourcade, B., Leitão, P. J., droth, A., Löfvenius, M. O., Lohila, A., Mammarella, I., Marsh,
Münkemüller, T., McClean, C., Osborne, P. E., Reineking, B., P., Maximov, T., Melton, J. R., Moore, P. A., Nadeau, D. F.,
Schröder, B., Skidmore, A. K., Zurell, D., and Lautenbach, S.: Nicholls, E. M., Nilsson, M. B., Ohta, T., Peichl, M., Petrone,
Collinearity: a review of methods to deal with it and a simula- R. M., Petrov, R., Prokushkin, A., Quinton, W. L., Reed, D. E.,
tion study evaluating their performance, Ecography, 36, 27–46, Roulet, N. T., Runkle, B. R. K., Sonnentag, O., Strachan, I. B.,
https://doi.org/10.1111/j.1600-0587.2012.07348.x, 2013. Taillardat, P., Tuittila, E.-S., Tuovinen, J.-P., Turner, J., Ueyama,
Draper, F. C., Roucoux, K. H., Lawson, I. T., Mitchard, E. T. A., M., Varlagin, A., Wilmking, M., Wofsy, S. C., and Zyrianov,
Coronado, E. N. H., Lähteenoja, O., Montenegro, L. T., San- V.: Increasing contribution of peatlands to boreal evapotranspi-
doval, E. V., Zaráte, R., and Baker, T. R.: The distribution and ration in a warming climate, Nat. Clim. Chang., 10, 555–560,
amount of carbon in the largest peatland complex in Amazo- https://doi.org/10.1038/s41558-020-0763-7, 2020.
nia, Environ. Res. Lett., 9, 124017, https://doi.org/10.1088/1748- Hengl, T.: Soil property layers from openlandmap.org. All
9326/9/12/124017, 2014. data are available under the Open Data Commons Open
Friedl, M., Gray, J., and Sulla-Menashe, D.: MCD12Q2 MOD- Database License (ODbL) and/or Creative Commons
IS/Terra+Aqua Land Cover Dynamics Yearly L3 Global 500m Attribution-ShareAlike 4.0 International license (CC BY-
SIN Grid V006, https://doi.org/10.5067/MODIS/MCD12Q2.006 SA), https://doi.org/10.5281/zenodo.2525663 (last access:
(last access: 4 September 2020), 2019. 4 September 2020), 2018.
GDAL/OGR contributors: GDAL/OGR Geospatial Data Abstrac- Hengl, T. and MacMillan, R. A.: Predictive Soil Mapping with R,
tion software Library, Open Source Geospatial Foundation, https: Lulu.com, 2019.
//gdal.org (last access: 28 December 2020), 2021. Hooker, G., Mentch, L., and Zhou, S.: Unrestricted Permutation
Gelinas, N.: Into the Okavango, USA, https://www. forces Extrapolation: Variable Importance Requires at least One
nationalgeographic.org/projects/okavango/ (last access: 11 Oc- More Model, or There Is No Free Variable Importance, arXiv:
tober 2021), 2018. 1905.03151 (stat.ME), 2021.
Geological Survey of Finland: Superficial deposits of Finland 1 : Hope, G. S.: Peat in the mountains of new guinea, Mires Peat, 15,
200000 (sediment polygons) v.10.1, 2018. 1–21, 2015.
Glass, R. L.: Alaska Wetland Resources, Tech. Rep. 2425, U.S. Ge- Householder, J. E., Janovec, J. P., Tobler, M. W., Page, S., and Läh-
ological Survey, Water-Supply Paper 2425, 1992. teenoja, O.: Peatlands of the Madre de Dios River of Peru: distri-
GLIMS and NSIDC: Global Land Ice Measurements from Space bution, geomorphology, and habitat diversity, Wetlands, 32, 359–
glacier database. Compiled and made available by the interna- 368, 2012.
tional GLIMS community and the National Snow and Ice Data Hribljan, J. A., Suarez, E., Bourgeau-Chavez, L., Endres, S.,
Center, Boulder CO, USA, https://doi.org/10.7265/N5V98602 Lilleskov, E. A., Chimbolema, S., Wayson, C., Serocki, E.,
(last access: 4 March 2021), 2018. and Chimner, R. A.: Multidate, multisensor remote sensing
Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., reveals high density of carbon-rich mountain peatlands in
and Moore, R.: Google Earth Engine: Planetary-scale geospa-
https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022
4736 Joe R. Melton et al.: Machine-learning-based peatland extent
the páramo of Ecuador, Glob. Chang. Biol., 23, 5412–5425, of SOC stocks for the tropics, permafrost regions, wetlands, and
https://doi.org/10.1111/gcb.13807, 2017. the world, SOIL, 1, 351–365, https://doi.org/10.5194/soil-1-351-
Huete, A., Didan, K., Miura, T., Rodriguez, E. P., Gao, X., and Fer- 2015, 2015.
reira, L. G.: Overview of the radiometric and biophysical perfor- Lähteenoja, O. and Roucoux, K.: Inception, history and develop-
mance of the MODIS vegetation indices, Remote Sens. Environ., ment of peatlands in the Amazon Basin, PAGES News, 18, 27–
83, 195–213, https://doi.org/10.1016/S0034-4257(02)00096-2, 28, https://doi.org/10.22498/pages.18.1.27, 2010.
2002. Landcare Research NZ Ltd: Fundamental Soil Layer – New Zealand
Hugelius, G., Loisel, J., Chadburn, S., Jackson, R. B., Jones, M., Soil Classification, https://doi.org/10.7931/L10T0 (last access:
MacDonald, G., Marushchak, M., Olefeldt, D., Packalen, M., 4 January 2020), 2000.
Siewert, M. B., Treat, C., Turetsky, M., Voigt, C., and Yu, Z.: Largeron, C., Krinner, G., Ciais, P., and Brutel-Vuilmet, C.: Imple-
Large stocks of peatland carbon and nitrogen are vulnerable to menting northern peatlands in a global land surface model: de-
permafrost thaw, P. Natl. Acad. Sci. USA, 117, 20438–20446, scription and evaluation in the ORCHIDEE high-latitude version
https://doi.org/10.1073/pnas.1916387117, 2020. model (ORC-HL-PEAT), Geosci. Model Dev., 11, 3279–3297,
IDEAM: Leyenda nacional de coberturas de la tierra: metodología https://doi.org/10.5194/gmd-11-3279-2018, 2018.
CORINE Land Cover adaptada para Colombia: Escala 1 : Lehner, B. and Döll, P.: Development and validation of a global
100000, edited by: Martínez Ardila, N. J. and Murcia García, U. database of lakes, reservoirs and wetlands, J. Hydrol., 296, 1–22,
G., Ministerio De Ambiente, Vivienda Y Desarrollo Territorial https://doi.org/10.1016/j.jhydrol.2004.03.028, 2004.
Instituto De Hidrología, Meteorología Y Estudios Ambientales – Leifeld, J. and Menichetti, L.: The underappreciated potential of
IDEAM, ISBN 978-958-806729-2, 2010. peatlands in global climate change mitigation strategies, Nat.
Izumi, Y., Widodo, J., Kausarian, H., Demirci, S., Taka- Commun., 9, 1071, https://doi.org/10.1038/s41467-018-03406-
hashi, A., Razi, P., Nasucha, M., Yang, H., and Tetuko 6, 2018.
S. S., J.: Potential of soil moisture retrieval for trop- Limpens, J., Berendse, F., Blodau, C., Canadell, J. G., Freeman,
ical peatlands in Indonesia using ALOS-2 L-band full- C., Holden, J., Roulet, N., Rydin, H., and Schaepman-Strub,
polarimetric SAR data, Int. J. Remote Sens., 40, 5938–5956, G.: Peatlands and the carbon cycle: from local processes to
https://doi.org/10.1080/01431161.2019.1584927, 2019. global implications – a synthesis, Biogeosciences, 5, 1475–1491,
Jackson, R. B., Lajtha, K., Crow, S. E., Hugelius, G., Kramer, https://doi.org/10.5194/bg-5-1475-2008, 2008.
M. G., and Piñeiro, G.: The Ecology of Soil Carbon: Pools, Vul- Loisel, J., Yu, Z., Parsekian, A., Nolan, J., and Slater, L.:
nerabilities, and Biotic and Abiotic Controls, Annu. Rev. Ecol. Quantifying landscape morphology influence on peatland lat-
Evol. S., 48, 419–445, https://doi.org/10.1146/annurev-ecolsys- eral expansion using ground-penetrating radar (GPR) and
112414-054234, 2017. peat core analysis, J. Geophys. Res.-Biogeo., 118, 373–384,
Joosten, H. and Clarke, D.: Wise use of mires and peatlands, Inter- https://doi.org/10.1002/jgrg.20029, 2013.
national Mire Conservation Group and International Peat Soci- Loisel, J., van Bellen, S., Pelletier, L., Talbot, J., Hugelius, G., Kar-
ety, ISBN 951-97744-8-3, 304, 2002. ran, D., Yu, Z., Nichols, J., and Holmquist, J.: Insights and is-
Junk, W. J.: Amazonian flood plains: their ecology, present and sues with estimating northern peatland carbon stocks and fluxes
potential use, Revue d’Hydrobiologie Tropicale, 15, 285–301, since the Last Glacial Maximum, Earth-Sci. Rev., 165, 59–80,
1982. https://doi.org/10.1016/j.earscirev.2016.12.001, 2017.
Kaplan, J. O.: Wetlands at the Last Glacial Maximum: Distribu- López Gonzales, M., Hergoualc’h, K., Angulo Núñez, Ó., Baker,
tion and methane emissions, Geophys. Res. Lett., 29, 3-1–3-4, T., Chimner, R., del Águila Pasquel, J., del Castillo Torres,
https://doi.org/10.1029/2001GL013366, 2002. D., Freitas Alvarado, L., Fuentealba Durand, B., García Gon-
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., zales, E., Honorio Coronado, E., Kazuyo, H., Lilleskov, E.,
and Liu, T.-Y.: LightGBM: A Highly Efficient Gradient Boost- Málaga Durán, N., Maldonado Fonkén, M., Martín Brañas,
ing Decision Tree, in: Advances in Neural Information Process- M., Vargas, T. M., Planas Clarke, A. M., Roucoux, K., and
ing Systems 30, edited by: Guyon, I., Luxburg, U. V., Bengio, Vacalla Ochoa, F.: What do we know about Peruvian peat-
S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., lands?, Center for International Forestry Research (CIFOR),
3146–3154, Curran Associates, Inc., 2017. https://doi.org/10.17528/cifor/007848, 2020.
Kidd, D., Moreton, R., and Brown, G.: Tasmanian Organic Soil Martín-López, J. M., Verchot, L., Martius, C., and da Silva, M.:
Mapping Project, Methods Report. Nature Conservation Report Modeling the spatial distribution of soil organic carbon and
21/2, unpublished report, 2021. carbon stocks for the Casanare flooded Savannas, Colombia,
Kobayashi, S., Ota, Y., Harada, Y., Ebita, A., Moriya, M., Onoda, EGU General Assembly 2022, Vienna, Austria, 23–27 May
H., Onogi, K., Kamahori, H., Kobayashi, C., Endo, H., Miyaoka, 2022, EGU22-1840, https://doi.org/10.5194/egusphere-egu22-
K., and Takahashi, K.: The JRA-55 Reanalysis: General Spec- 1840, 2022.
ifications and Basic Characteristics, J. Meteorol. Soc. JPN, 93, Matthews, E.: Global data bases on distribution, characteristics
5–48, https://doi.org/10.2151/jmsj.2015-001, 2015. and methane emission of natural wetlands: Documentation of
Krankina, O. N., Pflugmacher, D., Friedl, M., Cohen, W. B., Nel- archived data tape, NASA Goddard Space Flight Center, Green-
son, P., and Baccini, A.: Meeting the challenge of mapping peat- belt, MD, USA, 1989.
lands with remotely sensed data, Biogeosciences, 5, 1809–1820, McBratney, A. B., Mendonça Santos, M. L., and Minasny,
https://doi.org/10.5194/bg-5-1809-2008, 2008. B.: On digital soil mapping, Geoderma, 117, 3–52,
Köchy, M., Hiederer, R., and Freibauer, A.: Global distribution of https://doi.org/10.1016/S0016-7061(03)00223-4, 2003.
soil organic carbon – Part 1: Masses and frequency distributions
Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022
Joe R. Melton et al.: Machine-learning-based peatland extent 4737
McCartney, M., Haeringer, M., and Polifke, W.: Comparison of Ma- at the global scale, 199–2004, J. Geophys. Res.-Atmos., 115,
chine Learning Algorithms in the Interpolation and Extrapolation https://doi.org/10.1029/2009JD012674, 2010.
of Flame Describing Functions, J. Eng. Gas Turbines Power, 142, Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion,
061009, https://doi.org/10.1115/1.4045516, 2020. B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg,
Melton, J. R., Wania, R., Hodson, E. L., Poulter, B., Ringeval, B., V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Per-
Spahni, R., Bohn, T., Avis, C. A., Beerling, D. J., Chen, G., rot, M., and Duchesnay, E.: Scikit-learn: Machine Learning in
Eliseev, A. V., Denisov, S. N., Hopcroft, P. O., Lettenmaier, D. Python, J. Mach. Learn. Res., 12, 2825–2830, 2011.
P., Riley, W. J., Singarayer, J. S., Subin, Z. M., Tian, H., Zürcher, Pekel, J.-F., Cottam, A., Gorelick, N., and Belward,
S., Brovkin, V., van Bodegom, P. M., Kleinen, T., Yu, Z. C., A. S.: High-resolution mapping of global surface wa-
and Kaplan, J. O.: Present state of global wetland extent and ter and its long-term changes, Nature, 540, 418–422,
wetland methane modelling: conclusions from a model inter- https://doi.org/10.1038/nature20584, 2016.
comparison project (WETCHIMP), Biogeosciences, 10, 753– Peters, J. and Tegetmeyer, C.: Inventory of peatlands in the
788, https://doi.org/10.5194/bg-10-753-2013, 2013. Caribbean and first description of priority areas, Tech. rep., Pro-
Melton, J. R., Chan, E., Millard, K., Fortier, M., Winton, ceedings of the Greifswald Mire Centre, 2019.
R. S., Martín-López, J. M., Cadillo-Quiroz, H., Kidd, D., Pflugmacher, D., Krankina, O. N., and Cohen, W. B.:
and Verchot, L. V.: A map of global peatland extent cre- Satellite-based peatland mapping: Potential of the
ated using machine learning (Peat-ML), Zenodo [data set], MODIS sensor, Glob. Planet. Change, 56, 248–257,
https://doi.org/10.5281/zenodo.5794336, 2021. https://doi.org/10.1016/j.gloplacha.2006.07.019, 2007.
Melton, J. R., Chan, E., Millard, K., Fortier, M., Winton, R. S., Ploton, P., Mortier, F., Réjou-Méchain, M., Barbier, N., Picard,
Martín-López, J. M., Cadillo-Quiroz, H., Kidd, D., and Ver- N., Rossi, V., Dormann, C., Cornu, G., Viennois, G., Bayol,
chot, L. V.: Code for ‘A map of global peatland extent cre- N., Lyapustin, A., Gourlet-Fleury, S., and Pélissier, R.: Spa-
ated using machine learning (Peat-ML)’ (0.9), Zenodo [code], tial validation reveals poor predictive performance of large-
https://doi.org/10.5281/zenodo.6345309, 2022. scale ecological mapping models, Nat. Commun., 11, 4540,
Meyer, H., Reudenbach, C., Wöllauer, S., and Nauss, T.: https://doi.org/10.1038/s41467-020-18321-y, 2020.
Importance of spatial predictor variable selection in ma- Prigent, C., Papa, F., Aires, F., Rossow, W. B., and Matthews,
chine learning applications – Moving from data repro- E.: Global inundation dynamics inferred from multiple satel-
duction to spatial prediction, Ecol. Modell., 411, 108815, lite observations, 1993–2000, J. Geophys. Res.-Atmos., 112,
https://doi.org/10.1016/j.ecolmodel.2019.108815, 2019. https://doi.org/10.1029/2006JD007847, 2007.
Minasny, B., Berglund, O., Connolly, J., Hedley, C., de Vries Folk- Ricaurte, L. F., Olaya-Rodríguez, M. H., Cepeda-Valencia, J., Lara,
ert, Gimona, A., Kempen, B., Kidd, D., Lilja, H., Malone, D., Arroyave-Suárez, J., Max Finlayson, C., and Palomo, I.:
B., McBratney, A., Roudier, P., O’Rourke, S., Rudiyanto, Future impacts of drivers of change on wetland ecosystem
Padarian, J., Poggio, L., ten Caten, A., Thompson, D., services in Colombia, Glob. Environ. Change, 44, 158–169,
Tuve, C., and Widyatmanti, W.: Digital mapping of peat- https://doi.org/10.1016/j.gloenvcha.2017.04.001, 2017.
lands – A critical review, Earth-Sci. Rev., 196, 102870, Roberts, D. R., Bahn, V., Ciuti, S., Boyce, M. S., Elith, J., Guillera-
https://doi.org/10.1016/j.earscirev.2019.05.014, 2019. Arroita, G., Hauenstein, S., Lahoz-Monfort, J. J., Schröder, B.,
Olefeldt, D., Hovemyr, M., Kuhn, M. A., Bastviken, D., Bohn, T. Thuiller, W., Warton, D. I., Wintle, B. A., Hartig, F., and Dor-
J., Connolly, J., Crill, P., Euskirchen, E. S., Finkelstein, S. A., mann, C. F.: Cross-validation strategies for data with temporal,
Genet, H., Grosse, G., Harris, L. I., Heffernan, L., Helbig, M., spatial, hierarchical, or phylogenetic structure, Ecography, 40,
Hugelius, G., Hutchins, R., Juutinen, S., Lara, M. J., Malhotra, 913–929, https://doi.org/10.1111/ecog.02881, 2017.
A., Manies, K., McGuire, A. D., Natali, S. M., O’Donnell, J. A., Rocha, A. D., Groen, T. A., Skidmore, A. K., and Wille-
Parmentier, F.-J. W., Räsänen, A., Schädel, C., Sonnentag, O., men, L.: Role of Sampling Design When Predicting
Strack, M., Tank, S. E., Treat, C., Varner, R. K., Virtanen, T., Spatially Dependent Ecological Data With Remote Sens-
Warren, R. K., and Watts, J. D.: The Boreal–Arctic Wetland and ing, IEEE Trans. Geosci. Remote Sens., 59, 663–674,
Lake Dataset (BAWLD), Earth Syst. Sci. Data, 13, 5127–5149, https://doi.org/10.1109/TGRS.2020.2989216, 2021.
https://doi.org/10.5194/essd-13-5127-2021, 2021. Running, S., Mu, Q., and Zhao, M.: MOD17A3
Olson, D. M., Dinerstein, E., Wikramanayake, E. D., Burgess, MODIS/Terra Net Primary Production Yearly L4
N. D., Powell, G. V. N., Underwood, E. C., D’amico, Global 1km SIN Grid V055, MODIS [data set],
J. A., Itoua, I., Strand, H. E., Morrison, J. C., Loucks, https://doi.org/10.5067/MODIS/MOD17A3.006, 2011.
C. J., Allnutt, T. F., Ricketts, T. H., Kura, Y., Lamoreux, Schroeder, R., McDonald, K., Chapman, B., Jensen, K., Podest,
J. F., Wettengel, W. W., Hedao, P., and Kassem, K. R.: E., Tessler, Z., Bohn, T., and Zimmermann, R.: Devel-
Terrestrial Ecoregions of the World: A New Map of Life opment and Evaluation of a Multi-Year Fractional Sur-
on Earth, BioScience, 51, 933, https://doi.org/10.1641/0006- face Water Data Set Derived from Active/Passive Mi-
3568(2001)051[0933:teotwa]2.0.co;2, 2001. crowave Remote Sensing Data, Remote Sens., 7, 16688–16732,
Page, S. E., Rieley, J. O., and Banks, C. J.: Global and regional im- https://doi.org/10.3390/rs71215843, 2015.
portance of the tropical peatland carbon pool, Glob. Chang. Biol., Schulzweida, U.: CDO User Guide, Zenodo,
17, 798–818, https://doi.org/10.1111/j.1365-2486.2010.02279.x, https://doi.org/10.5281/zenodo.4246983, 2020.
2011. Shimada, M., Itoh, T., Motooka, T., Watanabe, M., Shiraishi, T.,
Papa, F., Prigent, C., Aires, F., Jimenez, C., Rossow, W. B., and Thapa, R., and Lucas, R.: New global forest/non-forest maps
Matthews, E.: Interannual variability of surface water extent
https://doi.org/10.5194/gmd-15-4709-2022 Geosci. Model Dev., 15, 4709–4738, 2022
4738 Joe R. Melton et al.: Machine-learning-based peatland extent
from ALOS PALSAR data (2007–2010), Remote Sens. Environ., Webster, K. L., Bhatti, J. S., Thompson, D. K., Nelson, S. A., Shaw,
155, 13–31, https://doi.org/10.1016/j.rse.2014.04.014, 2014. C. H., Bona, K. A., Hayne, S. L., and Kurz, W. A.: Spatially-
Smith, K. B., Smith, C. E., Forest, S. F., and Richard, A. J.: A field integrated estimates of net ecosystem exchange and methane
guide to the wetlands of the boreal plains ecozone of Canada, fluxes from Canadian peatlands, Carbon Balance Manag., 13, 16,
Tech. rep., Ducks Unlimited Canada, Western Boreal Office: Ed- https://doi.org/10.1186/s13021-018-0105-5, 2018.
monton, Alberta, 2007. Wetlands International: Wetlands International Map of Peatland
Tarnocai, C., Kettles, I. M., and Lacelle, B.: Peatlands of Canada, Distribution Area and Carbon Content in Sumatera 1990–2002
Tech. Rep. Open File 6551, Geological Survey of Canada, 2011. Wetlands International – Indonesia Programme & Wildlife Habi-
Terentieva, I. E., Glagolev, M. V., Lapshina, E. D., Sabrekov, tat Canada, Tech. rep., Wetlands International, Bogor, 2003.
A. F., and Maksyutov, S.: Mapping of West Siberian Wetlands International: Wetlands International Map of Peatland
taiga wetland complexes using Landsat imagery: implica- Distribution Area and Carbon Content in Kalimantan 2000–2002
tions for methane emissions, Biogeosciences, 13, 4615–4626, Wetlands International – Indonesia Programme & Wildlife Habi-
https://doi.org/10.5194/bg-13-4615-2016, 2016. tat Canada, Tech. rep., Wetlands International, Bogor, 2004.
Tian, J. and Philpot, W. D.: Relationship between surface soil water Wetlands International: Wetlands International Cadangan Karbon
content, evaporation rate, and water absorption band depths in Bawah Permukaan di Papua Wetlands International – Indonesia
SWIR reflectance spectra, Remote Sens. Environ., 169, 280–289, Programme & Wildlife Habitat Canada, Tech. rep., Wetlands In-
https://doi.org/10.1016/j.rse.2015.08.007, 2015. ternational, Bogor, 2006.
Touzi, R., Omari, K., Gosselin, G., and Sleep, B.: Polarimetric L- Xu, J., Morris, P. J., Liu, J., and Holden, J.: PEATMAP:
band ALOS for peatland subsurface water monitoring, in: Con- Refining estimates of global peatland distribution
ference Proceedings of 2013 Asia-Pacific Conference on Syn- based on a meta-analysis, Catena, 160, 134–140,
thetic Aperture Radar (APSAR), 53–56, 2013. https://doi.org/10.1016/j.catena.2017.09.010, 2018.
Touzi, R., Omari, K., Sleep, B., and Jiao, X.: Scattered and Yamazaki, D., Ikeshima, D., Tawatari, R., Yamaguchi, T.,
Received Wave Polarization Optimization for Enhanced Peat- O’Loughlin, F., Neal, J. C., Sampson, C. C., Kanae,
land Classification and Fire Damage Assessment Using Po- S., and Bates, P. D.: A high-accuracy map of global
larimetric PALSAR, IEEE J. Sel. Top. Appl., 11, 4452–4477, terrain elevations, Geophys. Res. Lett., 44, 5844–5853,
https://doi.org/10.1109/JSTARS.2018.2873740, 2018. https://doi.org/10.1002/2017gl072874, 2017.
Wang, L., Qu, J. J., Hao, X., and Zhu, Q.: Sensitivity studies Yu, Z., Loisel, J., Brosseau, D. P., Beilman, D. W., and Hunt,
of the moisture effects on MODIS SWIR reflectance and veg- S. J.: Global peatland dynamics since the Last Glacial Maximum,
etation water indices, Int. J. Remote Sens., 29, 7065–7075, Geophys. Res. Lett., 37, https://doi.org/10.1029/2010GL043584,
https://doi.org/10.1080/01431160802226034, 2008. 2010.
Wania, R., Ross, I., and Prentice, I. C.: Integrating peat- Zender, C. S.: Short communication: Analysis of self-
lands and permafrost into a dynamic global vegeta- describing gridded geoscience data with netCDF Op-
tion model: 1. Evaluation and sensitivity of physical erators (NCO), Environ. Model. Softw., 23, 1338–1342,
land surface processes, Global Biogeochem. Cycles, 23, https://doi.org/10.1016/j.envsoft.2008.03.004, 2008.
https://doi.org/10.1029/2008GB003412, 2009.
Geosci. Model Dev., 15, 4709–4738, 2022 https://doi.org/10.5194/gmd-15-4709-2022