Zimbabwe Landscape Characterization 

Technical Report 
 
Telma Sibanda, Bruno Gerard, Frederic Baudron, Vimbayi G.P. 
Chimonyo 

 
31 December 2025 
 

Contents | Page 1 of 10 CGIAR 

Contents 

 
1. Introduction 2 

2. Methodology 2 

2.1 Data sources and types  2 

2.2 Image preprocessing  3 

2.3 Image classification  3 

3. Results 4 

4. Next steps 7 

5. Conclusion 8 

 
CGIAR Contents | Page 2 of 10 

1. Introduction 
This report documents the supervised land use/land cover (LULC) mapping methodology 
developed for Mbire Ward 2 using optical satellite imagery and machine learning. The objective 
was to produce a fit-for-purpose, spatially explicit LULC map to support subsequent landscape 
characterization and analysis (e.g., composition/configuration metrics) in a heterogeneous and 
fragmented dryland agroecological context. We benchmarked two satellite datasets (Sentinel-
2 and Landsat-8) and multiple predictor feature sets (spectral, indices, texture, and 
combinations) using a consistent training/validation design and a transparent model selection 
workflow.  

2. Methodology 

2.1 Data sources and types 
Multispectral satellite imagery was accessed through Google Earth Engine (GEE). The image 
processing and analysis were conducted in Google Colab, while map production and 
visualizations were produced in R. The temporal range spanned from July 2023 to July 2025 
to ensure seasonal representation of land cover conditions for Sentinel 2 and Landsat 8. To 
minimize atmospheric noise, only images with less than 20% cloud cover were included in the 
analysis. A set of spectral indices was computed to enhance land-cover separability, including 
the Normalized Difference Vegetation Index (NDVI), the Normalized Difference Water Index 
(NDWI), and the Automated Water Extraction Index (AWEI). In addition, texture metrics derived 
from NDVI (e.g., GLCM contrast and GLCM angular second moment [ASM]) were calculated 
to capture spatial heterogeneity and improve discrimination among structurally similar classes. 
Ground-truthing was conducted in October 2025, during which 427 GPS-referenced points 
were collected. These points were stratified across major land cover types (cropland = 70 
points, shrubland = 71 points, grassland = 73 points, forest = 69 points, river = 69 points, and 
built-up area = 75). The land cover classes were labeled using the ESA WorldCover 
classification system. Table 1 summarizes the data sources, including satellite bands, ancillary 
datasets (e.g., elevation, soil texture), and spatial resolutions used in the classification.  

Table 1: Types of data and the data sources that will be used in the study 

Source/satellite Data/band Description Spatial resolution 

Sentinel 2 Level 2A B2 Blue 10m 

B3 Green 10m 

B4 Red 10m 

B8 NIR 10m 

B11 SWIR 1 10m 

B12 SWIR 2 10m 

Landsat 8 B2 Blue 30m 

B3 Green 30m 

B4 Red 30m 

B5 NIR 30m 

B6 SWIR 1 30m 

B7 SWIR 2 30m 

Field/study site GPS 

coordinates 

427 GPS coordinates that represent the land use 

classes 

- 

GADM Shape files Shape files for the Area of Interest (AOI) - 

iSDA Soil texture Soil properties 30m 

 
Contents | Page 3 of 10 CGIAR 

 
Figure 1: Distribution of ground truth points across the AOI 

2.2 Image preprocessing 
Image preprocessing was conducted separately for Sentinel-2 Level-2A and Landsat-8 
imagery. All images were first filtered to retain scenes with <20% cloud cover and then clipped 
to the Area of Interest (AOI), corresponding to Mbire Wards 2 and 3. Sentinel-2 Level-2A 
imagery, which is already atmospherically and radiometrically corrected, underwent additional 
cloud and shadow masking using the Scene Classification Layer (SCL) (Louis et al., 2021). 
For Landsat-8, cloud masking was implemented using the QA_PIXEL band through a 
bitmasking approach. After masking, the remaining clear observations were used to generate 
median composites for each sensor to reduce residual noise and produce temporally 
representative inputs for classification. A set of spectral indices was computed to improve class 
separability. For vegetation condition and greenness, NDVI and EVI were calculated for each 
sensor. To strengthen discrimination of moisture and water features, NDWI and AWEI were 
computed. To better represent spatial heterogeneity in fragmented landscapes, texture metrics 
were derived from the NDVI layer using GLCM summaries, including contrast and angular 
second moment (ASM), and these texture bands were added to the predictor stack used in the 
classification. To harmonize spatial resolution across datasets, Sentinel-2 imagery (10 m) was 
resampled to 30 m using bilinear interpolation to match Landsat-8. All processed spectral 
bands, indices, and texture layers were stacked into final predictor composites and exported 
for model training and mapping in the subsequent classification stage. 

2.3 Image classification  
A supervised classification was implemented using the Random Forest (RF) machine learning 
algorithm, which builds an ensemble of decision trees using bootstrapped samples and 
random subsets of predictors at each split. Ground-truth points were compiled for the six target 
classes (Cropland, Shrubland, Grassland, Forest, River, Built-up) and split into 80% training 
and 20% independent validation sets. To test the added value of different predictor types, three 
feature combinations were prepared for each satellite dataset: (i) spectral bands only; (ii) 
vegetation/water indices and texture variables; and (iii) a combined set of spectral bands + 


CGIAR Contents | Page 4 of 10 

indices + texture metrics. Indices captured vegetation and water signals (e.g., NDVI and water-
related indices), while texture metrics were derived from vegetation index layers to better 
represent spatial heterogeneity (e.g., grey-level co-occurrence matrix features such as NDVI 
contrast and NDVI ASM), which is particularly important for separating structurally similar 
classes in mixed landscapes. 

Model training was followed by hyperparameter tuning to identify the best RF settings per 
dataset/feature set. The tuned parameters included the number of trees (Trees), minimum 
node size (MinLeaf), bag fraction (BagFrac), and number of variables considered at each split 
(VarPerSplit). Model performance was evaluated using a combination of internal and external 
validation diagnostics, including Out-of-Bag (OOB) error and independent validation metrics 
derived from the confusion matrix: Overall Accuracy (OA), Kappa, User’s Accuracy (UA), and 
Producer’s Accuracy (PA). Variable importance was assessed using the Gini importance to 
identify the most informative predictors and to support the interpretability of the final model. 
The final LULC product was generated by applying the best-performing RF model (based on 
the composite evaluation of OA/Kappa and class-level performance) to the full predictor stack 
within the Ward 2 boundary. 

 
3. Results 
Across all benchmarking runs, the combined feature sets outperformed spectral-only models, 
confirming the added value of indices and texture for class separability in Ward 2. The best 
overall model was Landsat-8 combined, achieving OA = 0.869 and Kappa = 0.842, 
outperforming Sentinel-2 combined (OA = 0.815; Kappa = 0.778). Within Landsat-8, the indices 
+ textures feature set already improved performance (OA = 0.800; Kappa = 0.759) compared 
to spectral-only (OA = 0.753; Kappa = 0.704), and the combined stack produced the strongest 
final gains. 

Table 2: Random Forest model results for all feature sets 

Dataset/feature 

set 

Overall 

Accuracy 

Kappa 

Coefficient 

Best Parameters 

S2 spectral only 0.807 0.769 Trees = 109, MinLeaf = 1, BagFrac = 0.9, 

VarPerSplit = 1 

S2 indices + textures 0.792 0.758 Trees = 100, MinLeaf = 1, BagFrac = 0.76, 

VarPerSplit = 1 

S2 combined 0.815 0.778 Trees = 323, MinLeaf = 2, BagFrac = 0.68, 

VarPerSplit = 2 

L8 spectral only 0.753 0.704 Trees = 406,  MinLeaf = 1, BagFrac = 0.51, 

VarPerSplit = 2 

L8 indices + textures 0.8 0.759 Trees = 569, MinLeaf = 1, BagFrac = 0.9, 

VarPerSplit = 2 

L8 combined 0.869 0.842 Trees = 163, MinLeaf = 1, BagFrac = 0.53, 

VarPerSplit = 4 

 
The confusion matrices indicate that river and grassland were classified with very high 
reliability (near-perfect separation in both combined models). The most persistent confusion 
was among structurally similar terrestrial classes. In the L8 combined model, most cropland 


Contents | Page 5 of 10 CGIAR 

errors were between cropland and forest, and built-up showed occasional confusion with 
shrubland and cropland; however, the diagonal dominance remained strong across all classes. 
In the S2 combined model, additional confusion was visible between shrubland and grassland 
and forest and grassland, suggesting that in this ward, the Landsat-8 feature space (when 
enriched with indices and textures) separated these classes more cleanly. 

 
Figure 2: Landsat 8 best feature combination confusion matrix 

 
Figure 3: Sentinel 2 best feature combination confusion matrix 

 
CGIAR Contents | Page 6 of 10 

Predictor importance rankings showed that texture and vegetation signals were consistently 
influential. For the Landsat-8 workflow, top predictors included NDVI contrast (texture), NDVI 
ASM (texture), and key spectral bands (notably Blue, Red, and SWIR2). For Sentinel-2, top 
predictors similarly emphasized NDVI texture metrics, with SWIR2 and water-sensitive indices 
(e.g., NDWI/AWEI) contributing strongly, consistent with the need to distinguish vegetated and 
moist/riparian features. The final Mbire Ward 2 LULC map (L8 combined) indicates a 
landscape dominated by natural/semi-natural vegetation classes. Shrubland, Forest, and 
Grassland together account for roughly three-quarters of the ward, while Built-up represents 
10%, Cropland 5–6%, and River 4% (based on the mapped class area percentages). This 
composition aligns with the mapped spatial pattern, where cropland occurs in smaller, 
dispersed patches embedded within a broader shrub/forest/grass matrix, and riverine features 
form a distinct linear corridor. 

 
Figure 4: Final LULC map for Mbire Ward 2 derived from the best-performing Landsat-8 Random Forest model 


Contents | Page 7 of 10 CGIAR 

 
Figure 5: LULC composition of Mbire Ward 2 derived from the Landsat-8 Random Forest classification 

4. Next steps  
The next phase will focus on improving classification accuracy, strengthening training-data 
coverage, and extending mapping from Mbire Ward 2 to Wards 2, 3, 9, 12 and 17. First, 
additional pseudo ground-truth points will be extracted from the high-accuracy LULC product 
developed by Baudron et al. (2022) (reported 95% accuracy). These pseudo-labels will be 
used to increase the spatial and class-wise representation of training samples, particularly in 
areas that were undersampled during field campaigns. The expanded training dataset will then 
be used to retrain the Random Forest model, while the independent field-collected ground-
truth points will be retained as the primary validation dataset to maintain an external accuracy 
check and avoid circular validation. 

Second, the classification scheme will be refined to better reflect the separability of classes in 
the local context. Specifically, classes with weak spectral/structural distinction will be merged 
(dissolved) where appropriate. For example, grassland will be reconsidered because some 
mapped grassland areas may represent fallow cropland or may be embedded within shrubland 
mosaics, leading to inconsistent labeling and confusion during model training. A revised class 
legend will be defined based on (i) observed confusion patterns, (ii) local knowledge of land 
use, and (iii) practical relevance for downstream analyses. 

Third, the extended AOI will be remapped using the improved training dataset and revised 
class legend, with particular attention to potential underclassification of cropland in the current 
map. By integrating pseudo ground-truth and strengthening training coverage across the 
broader landscape, the updated mapping is expected to better capture small and fragmented 
cropland patches and improve overall thematic consistency.  


CGIAR Contents | Page 8 of 10 

5. Conclusion 
This work developed a supervised LULC mapping workflow for Mbire using cloud-based 
satellite processing and machine learning, producing a Ward 2 map and accompanying 
accuracy assessment. The approach combined Sentinel-2 and Landsat-8 predictor stacks 
(spectral bands, indices, and texture features) and used Random Forest with an explicit tuning 
and evaluation framework. Results showed strong performance for the best model and 
produced a spatially explicit baseline map for subsequent landscape analyses. The next 
refinement stage will extend coverage to Ward 3 and improve thematic accuracy by 
augmenting training data with pseudo ground-truth from a previously validated LULC product, 
while validating improvements against independent field ground-truth points and simplifying 
classes where separability is limited.