Evaluation of data pre-processing and regression models for estimation of soil organic carbon using Vis–NIR spectroscopy in acid soils of Colombian tropical savanna María Fernanda Rondón Fernández Tropical Forages • Soil organic carbon (SOC) plays a critical role in terrestrial ecosystem functioning and global climate regulation. • The SOC pool in agricultural soils maintains fertility, supports sustainable development, and is essential for national food security (Bangelesa et al. 2020). • Rapid and accurate SOC estimation using near-infrared (NIR) spectroscopy can support sustainable agricultural development and climate strategies— especially in tropical savanna soils. Introduction Objective • To identify effects of spectral data pretreatment on the correlation between SOC and hyperspectral data. • To evaluate the impact of different spectral pretreatments on SOC estimation models. • To develop and compare multivariate models (PLS, SVM, SPA-MLR) optimized for SOC prediction in Colombian tropical savanna soils. Methodology and Technologies applied Soil sample collection and source of heterogeneity Analyzed 629 soil samples previously collected at four depths (0–10, 10–30, 30– 50, 50–100 cm) from native savanna and cultivated pasture in HSJ, Vichada. Preliminary Results Conclusions • Vis-NIR spectroscopy with proper pre-processing and multivariate modeling enables rapid, non- destructive, and cost-effective soil organic carbon (SOC) assessment, outperforming traditional lab methods. • Vis-NIR spectra are strongly correlated with lab- measured SOC; models using both pre- processed (T2-PLS) and raw spectra (T0-PLS) achieved high predictive accuracy (RPD > 2.7), indicating reliability for SOC prediction. • SVM with specific pre-processing (T4-SVM) performed well in calibration but was less reliable in validation (RPD = 1.88), highlighting the need for cross-validation and periodic re-testing. • Vis-NIR-based models, with proper calibration and validation, provide a solid basis for SOC inventory and can support carbon credit applications if quality control is maintained. Lessons learned and next steps Lessons Learned • Pre-processing has a major impact: Careful spectral pre-treatment is critical for maximizing SOC prediction accuracy and model robustness. • Raw spectra can still yield valuable predictions: Even without intensive processing, good models (like T0-PLS) are possible for many cases, but pre-processing further improves results. • Cross-validation is essential: Strong calibration results do not always guarantee reliability, independent validation and data split strategies are necessary for credible application. Next Steps • Expand testing across regions and soil types to verify model portability and robustness for Colombian soils and beyond. • Integrate Vis-NIR-based SOC monitoring into carbon inventory frameworks and explore applications for carbon credit markets. • Develop practical guidelines and standard protocols for in-field Vis-NIR measurement, calibration, and regular re-validation. For more information about the references of the project please contact: Jacobo Arango | j.arango@cgiar.org Spectral data acquisition, using NIR DS3 FOSS (400– 2500 nm range) Spectral data preprocessing • Applied multiple spectral pretreatment techniques including smoothing, derivatives, SNV, detrending, and scattering corrections. • Selected feature wavelengths via Successive Projections Algorithm (SPA). • Calibrated and validated models using a 3:1 split, evaluated with R², RMSE, and RPD. Multivariate statistical analyses Four algorithms were applied to develop predictive models from Vis-NIR spectral data: 1. Successive Projection Algorithm (SPA) 2. Multiple Linear Regression (MLR) 3. Partial Least Squares (PLS) 4. Support Vector Machine (SVM) • SOC showed high variability (CV > 64%) ranging from 0 to 2.66%. • Positive correlation between spectral absorbance and SOC, with key absorption bands at ~1410, 1918, and 2207 nm. • SPA consistently extracted informative bands in the visible and NIR regions across pretreatments. • Pretreatments significantly improved SOC-spectra correlations and model accuracy. • T2-PLS and T0-PLS models showed high confidence (>80%) with validation 𝑅^2≥0.86 and RPD > 2.7. • Narrow error margins (±0.10–0.12% SOC) support field applicability. • T7-SPA-MLR model demonstrated robust performance with ~78% confidence and ±0.15% SOC error. • T4-SVM had lower confidence (~58%) and wider error (±0.20% SOC), indicating potential overfitting risk. • These findings support Vis-NIR spectroscopy combined with advanced models for rapid, nondestructive SOC monitoring in tropical savanna soils. Slide 1