Zimbabwe Landscape Characterization Technical Report Telma Sibanda, Bruno Gerard, Frederic Baudron, Vimbayi G.P. Chimonyo 31 December 2025 Contents | Page 1 of 10 CGIAR Contents 1. Introduction 2 2. Methodology 2 2.1 Data sources and types 2 2.2 Image preprocessing 3 2.3 Image classification 3 3. Results 4 4. Next steps 7 5. Conclusion 8 CGIAR Contents | Page 2 of 10 1. Introduction This report documents the supervised land use/land cover (LULC) mapping methodology developed for Mbire Ward 2 using optical satellite imagery and machine learning. The objective was to produce a fit-for-purpose, spatially explicit LULC map to support subsequent landscape characterization and analysis (e.g., composition/configuration metrics) in a heterogeneous and fragmented dryland agroecological context. We benchmarked two satellite datasets (Sentinel- 2 and Landsat-8) and multiple predictor feature sets (spectral, indices, texture, and combinations) using a consistent training/validation design and a transparent model selection workflow. 2. Methodology 2.1 Data sources and types Multispectral satellite imagery was accessed through Google Earth Engine (GEE). The image processing and analysis were conducted in Google Colab, while map production and visualizations were produced in R. The temporal range spanned from July 2023 to July 2025 to ensure seasonal representation of land cover conditions for Sentinel 2 and Landsat 8. To minimize atmospheric noise, only images with less than 20% cloud cover were included in the analysis. A set of spectral indices was computed to enhance land-cover separability, including the Normalized Difference Vegetation Index (NDVI), the Normalized Difference Water Index (NDWI), and the Automated Water Extraction Index (AWEI). In addition, texture metrics derived from NDVI (e.g., GLCM contrast and GLCM angular second moment [ASM]) were calculated to capture spatial heterogeneity and improve discrimination among structurally similar classes. Ground-truthing was conducted in October 2025, during which 427 GPS-referenced points were collected. These points were stratified across major land cover types (cropland = 70 points, shrubland = 71 points, grassland = 73 points, forest = 69 points, river = 69 points, and built-up area = 75). The land cover classes were labeled using the ESA WorldCover classification system. Table 1 summarizes the data sources, including satellite bands, ancillary datasets (e.g., elevation, soil texture), and spatial resolutions used in the classification. Table 1: Types of data and the data sources that will be used in the study Source/satellite Data/band Description Spatial resolution Sentinel 2 Level 2A B2 Blue 10m B3 Green 10m B4 Red 10m B8 NIR 10m B11 SWIR 1 10m B12 SWIR 2 10m Landsat 8 B2 Blue 30m B3 Green 30m B4 Red 30m B5 NIR 30m B6 SWIR 1 30m B7 SWIR 2 30m Field/study site GPS coordinates 427 GPS coordinates that represent the land use classes - GADM Shape files Shape files for the Area of Interest (AOI) - iSDA Soil texture Soil properties 30m Contents | Page 3 of 10 CGIAR Figure 1: Distribution of ground truth points across the AOI 2.2 Image preprocessing Image preprocessing was conducted separately for Sentinel-2 Level-2A and Landsat-8 imagery. All images were first filtered to retain scenes with <20% cloud cover and then clipped to the Area of Interest (AOI), corresponding to Mbire Wards 2 and 3. Sentinel-2 Level-2A imagery, which is already atmospherically and radiometrically corrected, underwent additional cloud and shadow masking using the Scene Classification Layer (SCL) (Louis et al., 2021). For Landsat-8, cloud masking was implemented using the QA_PIXEL band through a bitmasking approach. After masking, the remaining clear observations were used to generate median composites for each sensor to reduce residual noise and produce temporally representative inputs for classification. A set of spectral indices was computed to improve class separability. For vegetation condition and greenness, NDVI and EVI were calculated for each sensor. To strengthen discrimination of moisture and water features, NDWI and AWEI were computed. To better represent spatial heterogeneity in fragmented landscapes, texture metrics were derived from the NDVI layer using GLCM summaries, including contrast and angular second moment (ASM), and these texture bands were added to the predictor stack used in the classification. To harmonize spatial resolution across datasets, Sentinel-2 imagery (10 m) was resampled to 30 m using bilinear interpolation to match Landsat-8. All processed spectral bands, indices, and texture layers were stacked into final predictor composites and exported for model training and mapping in the subsequent classification stage. 2.3 Image classification A supervised classification was implemented using the Random Forest (RF) machine learning algorithm, which builds an ensemble of decision trees using bootstrapped samples and random subsets of predictors at each split. Ground-truth points were compiled for the six target classes (Cropland, Shrubland, Grassland, Forest, River, Built-up) and split into 80% training and 20% independent validation sets. To test the added value of different predictor types, three feature combinations were prepared for each satellite dataset: (i) spectral bands only; (ii) vegetation/water indices and texture variables; and (iii) a combined set of spectral bands + CGIAR Contents | Page 4 of 10 indices + texture metrics. Indices captured vegetation and water signals (e.g., NDVI and water- related indices), while texture metrics were derived from vegetation index layers to better represent spatial heterogeneity (e.g., grey-level co-occurrence matrix features such as NDVI contrast and NDVI ASM), which is particularly important for separating structurally similar classes in mixed landscapes. Model training was followed by hyperparameter tuning to identify the best RF settings per dataset/feature set. The tuned parameters included the number of trees (Trees), minimum node size (MinLeaf), bag fraction (BagFrac), and number of variables considered at each split (VarPerSplit). Model performance was evaluated using a combination of internal and external validation diagnostics, including Out-of-Bag (OOB) error and independent validation metrics derived from the confusion matrix: Overall Accuracy (OA), Kappa, User’s Accuracy (UA), and Producer’s Accuracy (PA). Variable importance was assessed using the Gini importance to identify the most informative predictors and to support the interpretability of the final model. The final LULC product was generated by applying the best-performing RF model (based on the composite evaluation of OA/Kappa and class-level performance) to the full predictor stack within the Ward 2 boundary. 3. Results Across all benchmarking runs, the combined feature sets outperformed spectral-only models, confirming the added value of indices and texture for class separability in Ward 2. The best overall model was Landsat-8 combined, achieving OA = 0.869 and Kappa = 0.842, outperforming Sentinel-2 combined (OA = 0.815; Kappa = 0.778). Within Landsat-8, the indices + textures feature set already improved performance (OA = 0.800; Kappa = 0.759) compared to spectral-only (OA = 0.753; Kappa = 0.704), and the combined stack produced the strongest final gains. Table 2: Random Forest model results for all feature sets Dataset/feature set Overall Accuracy Kappa Coefficient Best Parameters S2 spectral only 0.807 0.769 Trees = 109, MinLeaf = 1, BagFrac = 0.9, VarPerSplit = 1 S2 indices + textures 0.792 0.758 Trees = 100, MinLeaf = 1, BagFrac = 0.76, VarPerSplit = 1 S2 combined 0.815 0.778 Trees = 323, MinLeaf = 2, BagFrac = 0.68, VarPerSplit = 2 L8 spectral only 0.753 0.704 Trees = 406, MinLeaf = 1, BagFrac = 0.51, VarPerSplit = 2 L8 indices + textures 0.8 0.759 Trees = 569, MinLeaf = 1, BagFrac = 0.9, VarPerSplit = 2 L8 combined 0.869 0.842 Trees = 163, MinLeaf = 1, BagFrac = 0.53, VarPerSplit = 4 The confusion matrices indicate that river and grassland were classified with very high reliability (near-perfect separation in both combined models). The most persistent confusion was among structurally similar terrestrial classes. In the L8 combined model, most cropland Contents | Page 5 of 10 CGIAR errors were between cropland and forest, and built-up showed occasional confusion with shrubland and cropland; however, the diagonal dominance remained strong across all classes. In the S2 combined model, additional confusion was visible between shrubland and grassland and forest and grassland, suggesting that in this ward, the Landsat-8 feature space (when enriched with indices and textures) separated these classes more cleanly. Figure 2: Landsat 8 best feature combination confusion matrix Figure 3: Sentinel 2 best feature combination confusion matrix CGIAR Contents | Page 6 of 10 Predictor importance rankings showed that texture and vegetation signals were consistently influential. For the Landsat-8 workflow, top predictors included NDVI contrast (texture), NDVI ASM (texture), and key spectral bands (notably Blue, Red, and SWIR2). For Sentinel-2, top predictors similarly emphasized NDVI texture metrics, with SWIR2 and water-sensitive indices (e.g., NDWI/AWEI) contributing strongly, consistent with the need to distinguish vegetated and moist/riparian features. The final Mbire Ward 2 LULC map (L8 combined) indicates a landscape dominated by natural/semi-natural vegetation classes. Shrubland, Forest, and Grassland together account for roughly three-quarters of the ward, while Built-up represents 10%, Cropland 5–6%, and River 4% (based on the mapped class area percentages). This composition aligns with the mapped spatial pattern, where cropland occurs in smaller, dispersed patches embedded within a broader shrub/forest/grass matrix, and riverine features form a distinct linear corridor. Figure 4: Final LULC map for Mbire Ward 2 derived from the best-performing Landsat-8 Random Forest model Contents | Page 7 of 10 CGIAR Figure 5: LULC composition of Mbire Ward 2 derived from the Landsat-8 Random Forest classification 4. Next steps The next phase will focus on improving classification accuracy, strengthening training-data coverage, and extending mapping from Mbire Ward 2 to Wards 2, 3, 9, 12 and 17. First, additional pseudo ground-truth points will be extracted from the high-accuracy LULC product developed by Baudron et al. (2022) (reported 95% accuracy). These pseudo-labels will be used to increase the spatial and class-wise representation of training samples, particularly in areas that were undersampled during field campaigns. The expanded training dataset will then be used to retrain the Random Forest model, while the independent field-collected ground- truth points will be retained as the primary validation dataset to maintain an external accuracy check and avoid circular validation. Second, the classification scheme will be refined to better reflect the separability of classes in the local context. Specifically, classes with weak spectral/structural distinction will be merged (dissolved) where appropriate. For example, grassland will be reconsidered because some mapped grassland areas may represent fallow cropland or may be embedded within shrubland mosaics, leading to inconsistent labeling and confusion during model training. A revised class legend will be defined based on (i) observed confusion patterns, (ii) local knowledge of land use, and (iii) practical relevance for downstream analyses. Third, the extended AOI will be remapped using the improved training dataset and revised class legend, with particular attention to potential underclassification of cropland in the current map. By integrating pseudo ground-truth and strengthening training coverage across the broader landscape, the updated mapping is expected to better capture small and fragmented cropland patches and improve overall thematic consistency. CGIAR Contents | Page 8 of 10 5. Conclusion This work developed a supervised LULC mapping workflow for Mbire using cloud-based satellite processing and machine learning, producing a Ward 2 map and accompanying accuracy assessment. The approach combined Sentinel-2 and Landsat-8 predictor stacks (spectral bands, indices, and texture features) and used Random Forest with an explicit tuning and evaluation framework. Results showed strong performance for the best model and produced a spatially explicit baseline map for subsequent landscape analyses. The next refinement stage will extend coverage to Ward 3 and improve thematic accuracy by augmenting training data with pseudo ground-truth from a previously validated LULC product, while validating improvements against independent field ground-truth points and simplifying classes where separability is limited.