Report AI-Assisted River Discharge Measurement Through Citizen Science and Mobile Technology Kayathri Vigneswaran, Hugo Retief, Jai Clifford-Holmes, Hansaka Tennakoon and Mariangel Garcia Andarcia December 2025 Contents | Page 1 of 52 CGIAR Authors Kayathri Vigneswaran1, Hugo Retief2, Jai Clifford-Holmes2, Mariangel Garcia Andarcia1, Hansaka Tennakoon1 1International Water Management Institute (IWMI), Colombo, Sri Lanka 2Association for Water and Rural Development (AWARD), Hoedspruit, South Africa Acknowledgments This work was conducted as part of the CGIAR Accelerator for Digital Transformation. We would like to thank all funders who supported this research through their contributions to the CGIAR Trust Fund (www.cgiar.org/funders). Forming part of the Citizen Science for Water Management in Limpopo River Basin initiative, implemented within the Digital Innovations for Water Secure Africa (DIWASA) project and the CGIAR Accelerator for Digital Transformation, this work was made possible through the financial support of the Belgian development agency (Enabel), the Leona M. and Harry B. Helmsley Charitable Trust, the United Nations Development Programme (UNDP) through the Global Environment Facility (GEF), and the Microsoft Corporation. We extend our appreciation to the Limpopo Watercourse Commission (LIMCOM) for their ongoing partnership in advancing transboundary water resource management across the basin. We also thank our implementing partners, GroundTruth and the Association for Water and Rural Development (AWARD), for their contributions to citizen science capacity building and technical development. CGIAR Accelerator for Digital Transformation Digital Transformation “co-creates inclusive solutions leveraging advancements in AI, machine learning, modeling and big data analytics” to improve decision-making across food, land and water systems. It supports responsible, AI-enabled research and digital services that help partners design evidence-based policies, investments and innovations for climate-resilient development. Citation © 2025 International Water Management Institute. Some rights reserved. This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 International License (CC by 4.0). Front cover photo: AI citizen scientists using a WhatsApp-based Smart Gauge to photograph a river gauge plate on a river (composite of IWMI app/AI generated imagery). (graphic: IWMI) Back cover photo: GroundTruth staff Nkosingithandile Sithole, Ayanda Lephane and Nick Pattinson (from left to right) reviewing the MiniSASS application during field training in South Africa. (photo: GroundTruth) Disclaimer This publication has been prepared as an output of the CGIAR Accelerator for Digital Transformation and has not been independently peer reviewed. Responsibility for editing, proofreading, and layout, opinions expressed, and any possible errors lies with the authors and not the institutions involved. Boundaries used in the maps do not imply the expression of any opinion whatsoever on the part of CGIAR concerning the legal status of any country, territory, city, or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. Borders are approximate and cover some areas for which there may not yet be full agreement. Vigneswaran, K.; Retief, H.; Clifford-Holmes, J.; Garcia Andarcia, M.; Tennakoon, H. 2025. AI-assisted river discharge measurement through citizen science and mobile technology. Colombo, Sri Lanka: International Water Management Institute (IWMI). CGIAR Accelerator for Digital Transformation. 53p. http://www.cgiar.org/funders CGIAR Contents | Page 2 of 52 Contents Contents 2 Executive Summary 3 Introduction 4 Protocol Design Framework 9 System Architecture 12 API Endpoint Reference AI Processing Pipeline 19 Data Model 27 WhatsApp Bot Integration Workflow 29 Security Considerations 39 Discussion and Conclusions 40 References 42 LIST OF TABLES LIST OF FIGURES Summary | Page 3 of 52 CGIAR Summary This technical report documents the development of a novel protocol for measuring river discharge through the integration of artificial intelligence and citizen science participation. The work was undertaken within the Enabel/Wehubit 'Citizen Science for Water Management in Limpopo River Basin' project, addressing the need for accessible discharge measurement protocols in data-scarce basins. Implemented as part of the broader LIMCOM-UNDP/GEF programme for Integrated Transboundary River Basin Management. The protocol seeks to address a fundamental challenge in hydrological monitoring: basins with the weakest observational infrastructure stand to benefit most from digital decision support systems, yet lack the data required to drive them. The Vision API serves as the backend infrastructure for a WhatsApp-based citizen science platform, enabling community members to contribute water level readings by photographing gauge plates at monitoring stations across the Limpopo Basin's four riparian countries. A custom WhatsApp Bot Service deployed on AWS ECS manages the conversational interface, guiding users through image submission, station selection, location sharing, and result validation without requiring dedicated mobile app installation. The system uses a two-step AI approach to read gauge-plate photos: it first identifies the waterline and scale on the image, and then converts this into a water-level reading. This design improves reliability under real field conditions and achieved strong accuracy in testing (R² = 0.84; average error 5.43 cm on high-quality images).. Validated water level readings are converted to discharge values using station-specific rating curves accessed through the FlowTracker API. Key achievements documented in this report include the successful implementation of a three-stage AI processing pipeline achieving reliable gauge reading extraction across variable field conditions; a WhatsApp-first design prioritising digital inclusiveness for users with limited connectivity, storage, or technical familiarity; a two- stage validation workflow that ensures data quality while continuously generating ground-truth data for model improvement; The platform includes secure user access with permissions linked to specific monitoring stations in line with LIMCOM governance, and it is hosted on scalable cloud infrastructure to support the project’s expected user uptake.. The protocol demonstrates that citizen science, combined with hybrid artificial intelligence, can meaningfully augment traditional hydrological monitoring in data-scarce transboundary basins while maintaining scientific rigour through embedded quality assurance. CGIAR Introduction | Page 4 of 52 Introduction Project Context This technical report documents the development of a novel protocol for measuring river discharge through the integration of mobile applications, artificial intelligence, and citizen science participation. The work forms a core component of the "Citizen Science for Water Management in Limpopo River Basin" project, an 18-month initiative funded by Enabel through the Wehubit programme's call for "Data-driven Digital Social Innovations in Africa" (IWMI 2024). Led by the International Water Management Institute (IWMI) in partnership with GroundTruth and the Association for Water and Rural Development (AWARD), the project aims to develop a citizen science water monitoring prototype that has direct impact on water resource management across the transboundary Limpopo Basin. The initiative represents part of the broader "Integrated Transboundary River Basin Management for the Sustainable Development of the Limpopo River Basin" programme, implemented by the Limpopo Watercourse Commission (LIMCOM) in partnership with the Global Water Partnership Southern Africa (GWPSA), with support from the United Nations Development Programme (UNDP) through funding from the Global Environment Facility (GEF). Background and Rationale Accurate and timely river discharge data are fundamental to effective water resource management, flood forecasting, and drought monitoring (Beven 2012). However, many river basins—particularly in developing regions—suffer from declining hydrometric networks due to infrastructure degradation, limited financial resources, and institutional capacity constraints (Hannah et al 2011). The Limpopo River Basin, Southern Africa's fourth largest international basin shared by Botswana, Mozambique, South Africa, and Zimbabwe, exemplifies these challenges with a gradual decline in the number of active stations and data records (Figure 1). Sparse hydrological data, uneven monitoring infrastructure, and limited institutional capacity make collecting data across national borders a persistent challenge, with operational gauging stations providing insufficient spatial and temporal coverage for robust decision-support systems (Figure 2)(LIMCOM 2019). Figure 1 Temporal coverage of the Department of Water Affairs and Sanitation (DWS) of South Africa hydrological monitoring network in the South African part of the Limpopo River Basin. Introduction | Page 5 of 52 CGIAR Figure 2 Screenshot of the Limpopo Digital Twin showing the locations of river hydrological monitoring stations and citizen science monitoring stations [Source: IWMI] Traditional river monitoring methods relying on pressure probes and expensive cabling often incur high installation and maintenance costs, typically requiring calibration at intervals of up to two weeks. These constraints are particularly acute in transboundary settings where coordinated infrastructure investment remains challenging. Imagery-based solutions can significantly reduce these costs while providing continuous, verifiable field data and enabling more frequent validation. The emergence of Digital Twin technology offers transformative potential for water resource management by creating virtual representations of physical systems that integrate real-time data streams with simulation models (Rasheed et al 2020). IWMI has developed a Digital Twin for the Limpopo Basin (Garcia et al. 2024 and Afham 2024) that provides an advanced virtual representation enabling water managers to visualize real-time data, model watershed processes, and generate forecasts on water availability and quality. However, the effectiveness of Digital Twins depends critically on data availability and quality. This creates a paradox: the basins most in need of advanced decision-support tools are often those with the weakest monitoring infrastructure. Citizen science presents a promising approach to address this data gap by engaging community members in systematic observation and data collection (Buytaert et al 2014). Within the LIMCOM Digital Twin framework, citizen scientists play a crucial role by contributing local data on water quality, flow levels, and ecosystem indicators that complement official datasets and fill critical gaps. Such community-generated data are stored in shared databases and accessed by the Digital Twin through Application Programming Interfaces (APIs) that enable smooth integration with other monitoring systems. Critically, citizen observations capture localized data on subtle shifts in streams, tributaries, and water use patterns that might otherwise go unnoticed (Langa and Kiala 2025). When combined with artificial intelligence for quality control and data extraction, citizen-contributed observations can augment traditional monitoring networks cost-effectively while building local capacity and awareness (See et al 2016). The proliferation of smartphones with high-quality cameras and the near-universal adoption of messaging platforms like WhatsApp create unprecedented opportunities for mobile-based citizen science initiatives in water resource monitoring. The Citizen Science Water Monitoring Framework The broader Enabel-funded project deploys three primary citizen science tool groups across the Limpopo Basin: 1. MiniSASS (Mini Stream Assessment Scoring System): Enables communities to monitor river health by sampling aquatic macroinvertebrates, providing biological indicators of water quality. The MiniSASS CGIAR Introduction | Page 6 of 52 smartphone application digitizes and streamlines ground surveys, using AI vision recognition to confirm measurements taken by citizen scientists. 2. Clarity Tube and Water Quality Tools: The clarity tube measures water turbidity, providing rapid insights into water quality that complement laboratory analyses. Additional apparatus expand the range of water quality parameters collected. 3. AI-Driven Stream Flow Monitoring: Utilizing AI and photo recognition, this tool monitors stream flows through gauge plate readings, enabling automated extraction of water level data from citizen-submitted photographs. This protocol is the focus of this report. Photographs provide an auditable record that can be reviewed, moderated, and reprocessed as the models improve, whereas manual entry alone is difficult to verify after the fact. AI assistance also materially reduces interpretation error in practice: gauge plates are not intuitive for occasional users, and even after a brief explanation, workshop exercises have shown wide variation in readings among participants compared with trained technicians. By providing an on-screen estimate and highlighting the detected waterline, the system guides users toward a correct reading, while still allowing human confirmation or correction as part of quality assurance. Importantly, the same AI approach can be applied beyond WhatsApp submissions to low-cost, fixed camera deployments at stations, offering a scalable alternative to more expensive in-river instrumentation (e.g., pressure probes) where budgets and maintenance capacity are constrained. These tools are integrated within the EnviroChamps citizen science model, where participants collect water resource data and receive recognition through UNICEF's YOMA platform—a blockchain-based system that records skills and contributions in a digital CV while providing tangible incentives such as training vouchers (IWMI, 2025). The project aims to establish a transboundary network of at least 80 active citizen scientists contributing 10,000 data points annually across the basin. Figure 3 MiniSASS Landing Page the gateway to various citizen science based tool and methods (https://minisass.org/). Objectives and Scope This report documents the protocol and system developed for river discharge measurement through citizen science and AI integration. Specific objectives include: • Design a protocol/application for measurement of discharge using gauge plate recognition • Develop Mobile/WhatsApp integration enabling citizen science participation • Support the development and training of AI models for automated gauge reading extraction • Integrate citizen-contributed data with the broader Digital Twin framework The protocol includes a functional Mobile/WhatsApp integration with citizen science capabilities which has been achieved through the development of the Vision API documented in this report. Introduction | Page 7 of 52 CGIAR Approach and Innovation The protocol developed represents a novel integration of several technological components that responds directly to the project's emphasis on digital inclusiveness and accessibility. Rather than requiring users to install a dedicated mobile application, the system leverages WhatsApp—a platform already familiar to most potential citizen scientists in the region—as the primary user interface. This design decision significantly lowers barriers to participation while ensuring the system functions on basic smartphones with limited storage capacity. As Roman (2025) notes, the approach seeks to "democratise data collection" while "overcoming the digital divide by teaching citizen scientists how to incorporate digital technologies." The AI processing pipeline implements a hybrid architecture combining computer vision and large language model capabilities through three sequential stages: • Stage 1 — Waterline Detection (YOLOv8): YOLOv8 is a computer-vision model that detects objects in images; in this system it is used to find the gauge plate in a photo and locate the waterline on the scale. A YOLOv8 object detection model localises the gauge plate within the submitted photograph and identifies the precise point where the water surface intersects the scale. Angular correction using Sobel gradient analysis compensates for camera misalignment, while Canny edge detection refines the waterline position to pixel-level accuracy. • Stage 2 — Scale Gap Estimation (YOLOv8-Pose): A YOLOv8-Pose model performs keypoint localisation to detect individual scale markers along the gauge face. The median pixel distance between consecutive markers (D_m) and the distance from the nearest marker to the waterline (D_n) are calculated, yielding a scale gap ratio (R = D_n / D_m) that enables sub-interval precision. • Stage 3 — Reading Extraction (Gemini 2.0 Flash): Google Gemini's multimodal large language model analyses the cropped gauge image to extract visible numbers and their sequence. The final water level reading (W) is calculated algorithmically (W = M − R × 10), combining the LLM-extracted major scale value (M) with the geometric scale gap ratio (R) from Stage 2. This hybrid approach leverages the geometric precision of computer vision models for spatial measurement while exploiting the contextual reasoning capabilities of large language models for interpreting potentially ambiguous scale markings. Critically, the architecture does not rely solely on end-to-end AI interpretation—the algorithmic combination of outputs from both model types achieves substantially higher accuracy than either approach in isolation. Subsequent conversion of validated gauge readings into volumetric discharge leverages established rating curves maintained through the FlowTracker system, enabling end-to-end automation of river flow estimation. Quality assurance is embedded throughout the workflow through a two-stage user validation process. Validation is primarily a quality-check step, not a requirement that users can reliably read gauge plates unaided. The AI provides an initial estimate and highlights the detected waterline to guide the user; the confirmation (or correction) prevents misreading’s from entering the database, creates labelled examples for ongoing model improvement, and preserves an auditable record that can also support future fixed-camera deployments. Citizens are asked to confirm both the detected waterline position and the extracted numerical reading, providing ground-truth data for continuous model improvement while ensuring erroneous readings are flagged before entering the operational database. This approach acknowledges both the capabilities and limitations of current AI systems while ensuring data quality sufficient for operational use. Initial model evaluation demonstrates robust predictive capability: Gemini Stage 2 processing (image with scale gap metadata) achieved a coefficient of determination (R²) of 0.84 with MAE of 5.43 cm for optimal quality imagery, representing approximately 91% of the evaluation dataset (Vigneswaran et al 2025). Performance on all images combined yielded R² = 0.58, with accuracy degradation observed under challenging conditions including blur, corrosion, and scale occlusion. Notably, processing with image alone (Stage 1, without geometric metadata) achieved only R² = 0.16, confirming that the hybrid architecture—combining YOLOv8-Pose geometric measurements with LLM visual reasoning—is essential for reliable gauge reading extraction. These results indicate the system performs reliably under typical field conditions while highlighting the importance of the user validation workflow for challenging image quality scenarios. Contributions This report presents several contributions to the field of citizen science-based hydrological monitoring. First, we introduce the first WhatsApp-native platform for river discharge measurement, eliminating the need for dedicated mobile application installation and significantly lowering barriers to participation in regions with limited digital infrastructure. Second, we develop and evaluate a hybrid AI architecture that combines YOLOv8 computer vision models for geometric measurement with Google Gemini large language model capabilities for numerical interpretation—an approach that substantially outperforms either technique in isolation. Third, we implement a CGIAR Introduction | Page 8 of 52 two-stage validation workflow that simultaneously ensures data quality through citizen confirmation while generating continuous ground-truth training data, enabling ongoing model improvement without dedicated annotation campaigns. Fourth, we demonstrate integration with transboundary governance structures through station-level permissions aligned with LIMCOM institutional arrangements, providing a template for coordinated citizen science deployment across international river basins. Together, these contributions address a fundamental challenge in water resource management: enabling advanced digital decision support in precisely those basins where traditional monitoring infrastructure is weakest. Related Work Automated river gauge reading has attracted growing research attention as an alternative to costly sensor-based monitoring infrastructure. Liu and Huang (2024) evaluated deep learning approaches for water level measurement, achieving mean absolute errors of 4–6 cm using convolutional neural networks trained on fixed- camera imagery. However, their approach required consistent camera positioning and lighting conditions rarely achievable in citizen science contexts where image quality varies substantially. Citizen science platforms for water resource monitoring have proliferated in recent years, though few integrate artificial intelligence for data extraction. The CrowdWater project (Seibert et al 2019) enables community members to contribute stream level observations through a dedicated mobile application, using virtual staff gauges that require users to estimate water levels visually rather than photographing physical gauge plates. FreshWater Watch (Thornhill et al 2016) engages citizens in water quality monitoring across multiple countries, demonstrating the viability of transboundary citizen science networks, though without the AI-assisted measurement capabilities developed here. The MiniSASS platform, which forms part of the broader project context for this work, successfully deploys AI vision recognition for macroinvertebrate identification but addresses water quality rather than discharge measurement. Large language models with multimodal capabilities represent an emerging approach to visual measurement tasks. Recent work has demonstrated LLM effectiveness for interpreting complex visual scenes (Yang et al., 2023), though application to hydrological instrumentation remains largely unexplored. The hybrid architecture presented in this report—combining geometric precision from purpose-trained computer vision models with contextual reasoning from general-purpose LLMs—addresses limitations inherent to either approach in isolation and, to our knowledge, represents the first application of such an architecture to citizen science-based river monitoring. Protocol Design Framework | Page 9 of 52 CGIAR Protocol Design Framework Conceptual Framework The discharge measurement protocol is grounded in participatory monitoring approaches that recognise community members as capable and valued contributors to scientific data collection (Conrad and Hilchey, 2011). Unlike traditional hydrological monitoring that positions communities as passive beneficiaries of technical outputs, this framework actively engages citizens in the observation-validation-action cycle, creating what Buytaert et al. (2014) describe as "democratised hydrology." The protocol architecture reflects a deliberate separation of concerns: citizens contribute what they do best— regular, localised observations with contextual awareness—while AI systems handle the technical complexity of image interpretation and numerical extraction. This division acknowledges that effective citizen science requires minimising cognitive load on participants while maximising the scientific value of their contributions (Bonney et al 2009). The workflow is structured around five sequential stages connecting citizen observers with the Digital Twin database (Figure 4). Each stage addresses specific technical requirements while maintaining simplicity from the user perspective: • Stage 1 — Image Capture: The citizen photographs the gauge plate via WhatsApp, with the platform automatically capturing essential metadata including GPS coordinates, timestamp, and device information. This approach reduces the need for manual data entry, reducing both user effort and transcription errors. If the image is unclear or metadata cannot be captured (e.g., location services disabled), the bot prompts the user to retake the photo for validation and future reprocessing. The use of WhatsApp as the submission channel ensures compatibility with low-end smartphones and leverages existing user familiarity with the platform's camera interface. • Stage 2 — Object Detection: A YOLOv8 model identifies the gauge plate boundaries within the submitted image and locates the precise point where the water surface intersects the gauge scale. The model draws a visual annotation (red line) at the detected water level, creating an interpretable output that users can verify. This stage transforms an unconstrained field photograph into a spatially-referenced measurement zone. • Stage 3 — Reading Extraction: The cropped gauge region is processed through a dual-AI pipeline. First, a secondary YOLOv8 model detects individual scale markers to establish pixel-to-centimetre calibration. Second, Google Gemini's multimodal large language model analyses the image to identify visible numbers and their sequence. A calculation algorithm combines geometric measurements with LLM-extracted values to compute the precise water level reading in centimetres. • Stage 4 — User Validation: A two-stage confirmation workflow ensures data quality before operational ingestion. In the first validation, users confirm whether the detected water line position is correct; if incorrect, they provide a manual reading. In the second validation, users confirm the AI-extracted numerical reading. This approach generates ground-truth data for continuous model improvement while flagging erroneous readings before they enter the database. The validation design reflects the principle that AI should augment rather than replace human judgement in scientific data collection. • Stage 5 — Discharge Calculation: The validated water level reading is converted to volumetric discharge (m³/s) using station-specific rating curves accessed through the FlowTracker API. Rating curves represent the empirically-derived relationship between water stage and discharge for each monitoring station, accounting for local channel geometry and hydraulic characteristics. The final discharge value is stored in the Digital Twin database and made available for basin-wide analysis and visualisation. A key constraint is that discharge accuracy is ultimately limited by how recently each station’s rating curve has been validated, particularly after floods or channel changes. CGIAR Protocol Design Framework | Page 10 of 52 2.2 Design Principles The protocol design adheres to several key principles informed by citizen science best practices (Wiggins and Crowston 2011), human-computer interaction research for development contexts (Dell and Kumar 2016), and the project's commitment to digital social innovation: • Accessibility No dedicated application installation is required; WhatsApp integration leverages existing digital literacy and device capabilities. This decision directly addresses the "digital divide" challenge identified in the project's design phase, recognising that potential citizen scientists may have limited smartphone storage, unreliable data connections, or unfamiliarity with app installation procedures. The protocol functions on basic smartphones with cameras, requiring only an active WhatsApp account and intermittent network connectivity for data submission. • Simplicity From the user perspective, participation requires a single action: photographing the gauge plate. All technical processing—object detection, reading extraction, unit conversion, and database integration— occurs transparently in the backend. This design follows the principle of progressive disclosure: users see only the information they need at each step, with complexity hidden unless explicitly requested. The cognitive load on participants is minimised through a guided, photo-first workflow that does not require specialized hydrological training. Participants are asked to perform a simple confirmation step when possible, but submissions are retained as draft records and are only accepted into the validated dataset after review by designated moderators. This ensures data quality while keeping the participation barrier low for occasional or non-specialist users.. • Transparency Users receive visual feedback at each processing stage, including annotated images showing the detected water line and extracted readings. This transparency serves multiple purposes: it builds trust by demonstrating how AI systems interpret their photographs; it enables informed validation by showing exactly what is being confirmed; and it supports learning by helping users understand what constitutes a good gauge photograph. When AI interpretations are incorrect, users can identify and correct errors, maintaining agency over their contributions. • Data Quality Multiple validation checkpoints ensure data quality before observations enter operational systems. The two-stage confirmation workflow captures both geometric accuracy (water line position) and numerical accuracy (extracted reading), addressing the distinct failure modes of each AI component. User corrections are stored alongside AI predictions, creating paired datasets for model retraining. Additionally, metadata validation (timestamp format, coordinate bounds, file type) occurs at submission to reject malformed inputs early in the pipeline. • Scalability The architecture supports expansion to additional monitoring stations, river basins, and countries without fundamental redesign. Station-specific parameters (rating curves, gauge configurations) are retrieved dynamically from the FlowTracker API rather than hardcoded, enabling new stations to be onboarded through configuration rather than code changes. The WhatsApp-based interface requires no localised application deployment, and the containerised backend infrastructure can scale horizontally to accommodate increased submission volumes. • Interoperability Data collected through the protocol integrates with existing monitoring infrastructure through standardised APIs. The Digital Twin ingests citizen-contributed observations alongside official gauging station data, enabling comparison and gap-filling. All data follows FAIR principles (Findability, Accessibility, Interoperability, Reusability), supporting the project's commitment to open data and cross-border information sharing across LIMCOM member states. • Digital Inclusiveness Considerations The protocol design incorporates digital inclusiveness principles aligned with the CGIAR's Multidimensional Digital Inclusiveness Index (MDII) framework. Key considerations include: • Language Accessibility: Protocol Design Framework | Page 11 of 52 CGIAR While the current implementation operates in English, the architecture supports multilingual deployment. Text prompts, validation messages, and feedback can be localised without modifying the core processing pipeline. Future development will prioritise translation into languages prevalent in the Limpopo Basin, including Portuguese, Zulu, and Venda. • Connectivity Resilience: The WhatsApp-based submission pathway tolerates intermittent connectivity, as messages queue locally until network access is available. Image compression occurs client-side, reducing data transfer requirements. The validation workflow is designed to complete in a single conversational session, minimising the risk of abandoned submissions due to connection drops. • Device Inclusivity: The protocol functions on entry-level smartphones with basic camera capabilities. No specialised sensors, external hardware, or high-resolution imaging is required. Processing occurs server-side, placing computational demands on cloud infrastructure rather than user devices. • Skill Accessibility: The single-action submission model (photograph and send) requires no specialised technical skills beyond basic WhatsApp usage. Visual feedback and conversational prompts guide users through the validation workflow using familiar interaction patterns. Training materials developed for the broader citizen science programme support onboarding, but the protocol is designed to be usable without formal instruction. Figure 4 Discharge Measurement Protocol - Conceptual Framework [Source: IWMI] CGIAR System Architecture | Page 12 of 52 System Architecture Architectural Design Philosophy The system architecture follows a layered design pattern that separates concerns across presentation, application, and data tiers. This approach provides several advantages for citizen science deployments in resource-constrained environments: independent scaling of components based on demand, isolation of failures to prevent cascade effects, and flexibility to substitute technologies as requirements evolve (Fowler 2002). The architecture prioritizes three key qualities: 1. Accessibility over Complexity: The system interfaces with users through WhatsApp rather than a custom mobile application, eliminating installation barriers and leveraging existing platform capabilities for camera access, GPS capture, and message queuing. 2. Resilience over Performance: Components are designed to tolerate intermittent connectivity, partial failures, and variable response times typical of deployments across multiple countries with heterogeneous network infrastructure. 3. Interoperability over Independence: The system integrates with existing Digital Twin infrastructure, authentication services, and external APIs rather than reimplementing functionality, reducing development effort and ensuring consistency across the broader platform. High-Level Architecture The system architecture comprises three interconnected layers that collectively process citizen-submitted gauge photographs from image capture through to discharge calculation (Figure 5): Figure 5 System Architecture - High Level Overview [Source: IWMI] Presentation Layer — WhatsApp Bot Interface The WhatsApp bot serves as the primary user interface, providing a conversational interaction model familiar to target users across the Limpopo Basin. Built on the WhatsApp Business API, the bot orchestrates the complete submission workflow through a sequence of messages, prompts, and responses. Key responsibilities include: • Receiving gauge plate photographs from citizen scientists System Architecture | Page 13 of 52 CGIAR • Extracting and forwarding image metadata (timestamp, GPS coordinates, device information) • Presenting AI-annotated images for user validation • Collecting confirmation responses and manual corrections • Delivering final discharge results to users • Managing conversation state across multi-turn interactions The bot implements a stateful conversation model where each submission progresses through defined stages (upload → detection → extraction → validation → discharge). Session state is maintained externally, enabling the bot to resume interrupted conversations and handle users who submit multiple images concurrently. Application Layer — Vision API (dt-api) The Vision API provides the backend intelligence for the discharge measurement protocol. Implemented as a Flask-RESTX application, the API exposes RESTful endpoints that encapsulate AI processing, business logic, and external service integration. The API is organised into functional modules: Image Processing Module: Handles image upload, validation, storage, and retrieval. Validates file types (JPEG/PNG), extracts EXIF metadata where available, and manages temporary file storage during processing. Object Detection Module: Executes YOLOv8 models for gauge plate localization and water line detection. Manages model loading, inference execution, result post-processing, and annotated image generation. Reading Extraction Module: Orchestrates the dual-AI pipeline combining YOLOv8 scale detection with Gemini LLM visual analysis. Implements the reading calculation algorithm and handles edge cases (missing scale gaps, negative readings, scale variations). Validation Module: Records user confirmations and corrections, updating prediction records with ground- truth data. Supports the two-stage validation workflow with separate endpoints for water line and reading confirmation. Discharge Calculation Module: Interfaces with the FlowTracker API to convert validated water level readings to discharge values using station-specific rating curves. Authentication Module: Integrates with Keycloak for OAuth2/OIDC-based identity management, providing user registration, login, token refresh, and session management capabilities. The API follows RESTful design conventions with consistent endpoint naming, HTTP method semantics, and JSON response formats. Error handling provides meaningful feedback for debugging while avoiding exposure of sensitive implementation details. Data Layer — MySQL Database The MySQL database provides persistent storage for all submission data, AI predictions, user validations, and system metadata. The schema is designed to support both operational workflows and analytical queries for model performance assessment. Primary tables include: DT_GAUGE_SUBMISSION: Stores initial image submissions including binary image data (BLOB), user identifiers, timestamps, GPS coordinates, station assignments, and processing status flags. Each submission receives a unique identifier used for tracking through subsequent processing stages. DT_gauge_prediction: Stores AI predictions and user validations linked to submissions via foreign key. Captures AI-generated readings, user corrections (both stages), scale gap measurements, confidence indicators, and prediction timestamps. The dual-column structure for validations (user_validation/user_corrected_reading and user_validation_2/user_corrected_reading_2) supports the two-stage confirmation workflow. Database connectivity is secured through SSH tunnelling, providing encrypted transport without requiring direct exposure of the database server to public networks. Connection pooling manages concurrent access from multiple API instances. CGIAR System Architecture | Page 14 of 52 Data Flow Architecture The complete data flow for a gauge reading submission proceeds through six phases: Phase 1 — Submission (User → WhatsApp → API → Database) 1. User captures gauge photograph via WhatsApp camera 2. WhatsApp bot receives image and extracts available metadata 3. Bot calls POST /vision/upload_image with image file and metadata 4. API validates inputs and stores submission in DT_GAUGE_SUBMISSION 5. API returns submission_id to bot for subsequent operations Phase 2 — Object Detection (API → AI Models → Database) 1. Bot calls POST /vision/object_detection with submission_id 2. API retrieves image from database 3. YOLOv8 processes image to detect gauge plate and water line 4. API generates annotated image with water line visualisation 5. Results stored; annotated image returned to bot Phase 3 — First Validation (User → WhatsApp → API → Database) 1. Bot displays annotated image to user 2. User confirms water line position or provides correction 3. Bot calls POST /vision/confirm_reading with validation response 4. API records validation in DT_gauge_prediction Phase 4 — Reading Extraction (API → AI Models → Database) 1. Bot calls POST /vision/extract_reading with submission_id 2. API retrieves cropped gauge image from detection stage 3. YOLOv8 "base" model detects scale markers for calibration 4. Gemini LLM analyses image to extract visible numbers 5. Algorithm calculates water level reading from combined inputs 6. Prediction stored in DT_gauge_prediction; reading returned to bot Phase 5 — Second Validation (User → WhatsApp → API → Database) 1. Bot displays AI-extracted reading to user 2. User confirms reading accuracy or provides correction 3. Bot calls POST /vision/confirm_ai_reading with validation response 4. API records second-stage validation Phase 6 — Discharge Calculation (API → External API → User) 1. Bot calls POST /vision/discharge with submission_id System Architecture | Page 15 of 52 CGIAR 2. API determines final reading (user correction takes priority over AI) 3. API calls FlowTracker API with station ID and water level (converted to metres) 4. FlowTracker returns discharge value from rating curve lookup 5. API returns discharge to bot; bot displays result to user Technology Stack The technology stack was selected to balance performance, maintainability, and compatibility with the broader Digital Twin infrastructure. Selection criteria included: maturity and community support, suitability for the specific technical requirements, alignment with team expertise, and licensing compatibility with open-source project goals. Table 1 Core system components, selected technologies, and rationale for the Vision API and WhatsApp-based discharge measurement workflow. Component Technology Selection Rationale Backend Framework Flask-RESTX (Python 3.10+) Lightweight framework with built-in API documentation (Swagger/OpenAPI), strong ecosystem for scientific computing and ML integration, consistent with broader Digital Twin codebase Object Detection YOLOv8-Pose (Ultralytics) State-of-the-art real-time object detection with pose estimation capabilities, permissive licensing (AGPL-3.0), active maintenance, and extensive documentation LLM Integration Google Gemini 2.0 Flash Multimodal capabilities for combined image-text analysis, competitive performance on visual reasoning tasks, cost-effective API pricing for high-volume inference Database MySQL 8.0 Mature relational database with strong BLOB handling for image storage, compatibility with existing Digital Twin infrastructure, robust replication and backup capabilities Authentication Keycloak OAuth2/OIDC Open-source identity management with standards-based protocols, group-based access control, federation capabilities for future multi- organisation deployment External APIs FlowTracker AWARD-maintained rating curve database covering Limpopo Basin gauging stations, RESTful interface, established operational track record Containerisation Docker Consistent deployment across development, staging, and production environments, isolation of dependencies, compatibility with cloud orchestration platforms Image Processing OpenCV, Pillow Industry-standard libraries for image manipulation, format conversion, and preprocessing Integration Architecture The Vision API integrates with several external systems to deliver end-to-end functionality: FlowTracker API Integration The FlowTracker system, maintained by AWARD, provides access to rating curves for gauging stations across the Limpopo Basin and broader Southern African region. The integration operates as follows: • Endpoint: https://inwards.award.org.za/api/flowtracker/fetch_rating CGIAR System Architecture | Page 16 of 52 • Parameters: Station identifier, water level (meters) • Response: Discharge value (m³/s) interpolated from rating curve • Error Handling: Returns a clear error message and continues the workflow without crashing when station not found or rating curve unavailable Rating curves represent empirically-derived stage-discharge relationships specific to each monitoring station. The FlowTracker database is maintained through periodic field campaigns that update curves as channel geometry evolves. Keycloak Authentication Integration User authentication leverages Keycloak, an open-source identity and access management solution. The integration supports: • User Registration: New users created in Keycloak realm with automatic group assignment • Authentication: OAuth2 password grant flow returning JWT access and refresh tokens • Authorisation: Group-based access control via @kc_require_groups() decorator • Token Management: Refresh token rotation for session continuity Two user groups govern API access: • dt-vision-users: Standard access to vision processing endpoints • vision-app-admin: Administrative access for system management Meta WhatsApp Cloud API Integration The WhatsApp bot leverages the Meta WhatsApp Cloud API for message handling and media management, enabling the conversational interface that connects citizen scientists with the Vision API backend. API Configuration The integration requires the following setup steps: 1. Register a developer account on Meta for Developers (https://developers.facebook.com) 2. Create a new application and add the WhatsApp product 3. Configure a webhook URL to receive message and media events 4. Verify the webhook endpoint using a verification token 5. Obtain a permanent access token for production deployment Access Token Management • Permanent access tokens are required for production deployment, replacing the temporary tokens issued during development • Tokens are securely stored in AWS Secrets Manager rather than environment variables • Token rotation is performed periodically to prevent unauthorized access Webhook Configuration The Flask server provides the API service and exposes a webhook endpoint that Meta's WhatsApp servers call when users send messages. A webhook is a URL that WhatsApp calls automatically to deliver incoming messages to our service, allowing it to process the request and return a response. The endpoint handles: • Webhook verification challenges (GET requests with hub.verify_token) https://developers.facebook.com/ System Architecture | Page 17 of 52 CGIAR • Incoming message events (POST requests with message payloads) • Media download authorization using the permanent access token Message and Media Flow • Endpoint: https://graph.facebook.com/v18.0/{phone_number_id}/messages • Media retrieval: https://graph.facebook.com/v18.0/{media_id} • Authentication: Bearer token in Authorization header • Rate limits: Subject to Meta's standard API rate limiting policies Digital Twin Integration Validated discharge measurements are made available to the broader LIMCOM Digital Twin platform through shared database access and API federation. Citizen-contributed observations complement official gauging station data, enabling: • Gap-filling for stations with intermittent official monitoring • Cross-validation of automated sensor readings • Enhanced spatial coverage in under-monitored tributaries • Near-real-time situational awareness during flood events Deployment Architecture The Vision API is deployed as a containerised application within the Digital Twin infrastructure: Container Configuration: • Base image: Python 3.10 slim • Dependencies managed via requirements.txt • Environment variables for configuration (database credentials, API keys, Keycloak settings) • Health check endpoint for orchestration monitoring Resource Management: • CPU-bound inference (YOLOv8) with limited thread count for stability • Memory allocation sized for concurrent image processing • Temporary file cleanup after processing completion Network Configuration: • HTTPS termination at load balancer • SSH tunnel for database connectivity • Outbound access to Gemini API and FlowTracker endpoints Operational Considerations: • Logging to centralised aggregation service • Metrics collection for performance monitoring CGIAR System Architecture | Page 18 of 52 • Automated restart on failure detection AI Processing Pipeline | Page 19 of 52 CGIAR AI Processing Pipeline Overview The AI processing pipeline represents the core technical innovation of the discharge measurement protocol, combining computer vision and large language model capabilities to automate gauge reading extraction from citizen-submitted photographs. The pipeline addresses a fundamental challenge in imagery-based hydrological monitoring: translating variable-quality field photographs into precise numerical water level readings suitable for operational use. The framework integrates vision-based waterline detection, YOLOv8-Pose scale extraction, and multimodal large language models (Gemini 2.0 Flash) for automated river gauge plate reading. This hybrid architecture leverages the complementary strengths of geometric precision from object detection models for spatial measurements, combined with contextual reasoning from multimodal LLMs for numerical interpretation (Figure 7). The pipeline processes each submission through three sequential stages: waterline detection, scale gap ratio estimation, and reading extraction. The complete workflow executes in approximately 3–4 seconds under typical conditions. CGIAR AI Processing Pipeline | Page 20 of 52 Figure 6 AI Processing Pipeline – Technical Architecture [Source: IWMI] AI Processing Pipeline | Page 21 of 52 CGIAR Stage 1: Waterline Detection (YOLOv8) The first stage localises the gauge plate within the raw citizen photograph and identifies the water line position. This stage transforms an unconstrained field image—potentially containing vegetation, infrastructure, reflections, and other visual noise—into a focused measurement with precise water line positioning. Detection Methodology Waterline detection is performed through a multi-step process combining object detection with image processing techniques: • Object Detection: YOLOv8 identifies the gauge plate region within the submitted image • Angular Correction: Gradient information is extracted using the Sobel operator to correct for camera angle misalignment • Coarse Line Positioning: Edge detection using the Canny operator identifies the approximate waterline through horizontal edge intensity profiling • Fine Line Positioning: Vertical gradient (Sobel Y) computed within a narrow region (±5 rows) around the coarse line identifies the precise water line position Figure 7 Diagram of template gauge plate used in the field [Source: IWMI] Model Specifications: Parameter Value Rationale Architecture YOLOv8 One-stage detection with CSPDarknet53 backbone Confidence Threshold 0.20 Balances detection sensitivity with false positive rejection; 5.76% of images rejected below threshold Validation Zone ±5 pixel rows True positive defined as detection within this zone of annotated waterline Input Resolution 640 × 640 px Standardised resolution for consistent inference Detection Performance Quantitative assessment on the evaluation dataset demonstrated robust waterline detection: CGIAR AI Processing Pipeline | Page 22 of 52 Metric Value Precision 94.24% F1-Score 83.64% False Positive Rate 0% (at confidence >0.20) The confidence distribution showed the majority of predictions concentrated in the high-confidence range (0.85– 1.0). Lower confidence values (0.30–0.60) were observed in challenging cases caused by scale invisibility due to corrosion, interference from surrounding objects (grass, debris), and image blurring. Detection Outputs The waterline detection stage produces outputs that feed downstream processing: 1. Water Line Y-Position: Vertical pixel coordinate where water intersects the gauge scale 2. Confidence Score: Model confidence (0.0–1.0) displayed to users on annotated images 3. Annotated Image: Original photograph with red horizontal line at detected water level and confidence percentage overlay 4. Cropped Gauge Region: Extracted sub-image containing the gauge plate for Stage 2 processing Stage 2: Scale Gap Ratio Estimation (YOLOv8- Pose) The second stage employs YOLOv8-Pose to detect scale markers on the gauge plate through keypoint localisation, enabling pixel-to-centimetre calibration essential for accurate reading calculation. Model Architecture The YOLOv8-Pose architecture integrates object detection and pose estimation within a single framework, enabling simultaneous prediction of bounding boxes, object classes, confidence scores, and keypoint coordinates. The architecture comprises three major components: • Backbone: Feature extractor with convolutional and C2f units, concluding with Spatial Pyramid Pooling Fast (SPPF) module • Neck: Fuses feature representations from different scales using concatenation and upsampling layers • Head: Pose units that localise both objects and structural features, predicting bounding box coordinates, class probabilities, and keypoint positions Model Specifications: Parameter Value Rationale Architecture YOLOv8-Pose Enables combined detection + keypoint localisation Confidence Threshold 0.20 Balances detection sensitivity with false positive rejection NMS Method Custom IoU-based filtering Handles overlapping detections from multi-scale gauge markings Execution Device CPU Ensures deployment stability; GPU optional for higher throughput Thread Limiting Enabled Prevents resource contention in containerised environment AI Processing Pipeline | Page 23 of 52 CGIAR Scale Gap Detection Performance Metric Value Precision 81% Recall 87% mAP@0.5 89% mAP@0.5-0.95 80% Geometric Calculations The scale gap detection outputs enable precise geometric calibration: Major Scale Gap (D_m): Computed as the median of differences between consecutive major scale keypoints: Distance to Waterline (D_n): The pixel distance from the lowest detected major scale marker to the waterline intersection: Scale Gap Ratio (R): The ratio enabling sub-interval precision: These outputs—D_m, D_n, and total pixel height (H_y)—are passed to the LLM for reading extraction. Stage 3: Reading Extraction (Gemini 2.0 Flash LLM) The third stage employs the Gemini 2.0 Flash multimodal large language model to extract numerical values from the cropped gauge image, combining visual perception with contextual reasoning. Two-Stage LLM Approach The reading extraction was developed and evaluated using a two-stage approach: • LLM Stage 1 (Image Only): The model receives only the pre-processed, waterline-detected image, requiring it to infer both the gauge reading and scale spacing directly from visual information. • LLM Stage 2 (Image + Scale Metadata): The model receives the image plus scale gap ratio metadata (D_m, D_n, R), enabling accurate conversion of pixel distances to real-world measurements. The operational deployment uses Stage 2, as incorporating scale gap information substantially improves predictive accuracy. Prompt Engineering A structured prompt guides the AI model in extracting water-level readings: • Identify the topmost visible digit on the gauge scale D_m = Median(s₂ - s₁, s₃ - s₂, s₄ - s₃, ..., sₙ - sₙ₋₁) D_n = s₁ - s₀ R = D_n / D_m CGIAR AI Processing Pipeline | Page 24 of 52 • Determine the full number sequence from top to bottom • Detect any partially visible digit at the lowest edge (water line position) LLM Outputs (Structured JSON): • top_number: Highest numerical value visible on the gauge scale • number_seq: Sequence of visible numbers confirming scale direction and interval • visible_no: Number corresponding to the water line position (major scale reading M) Reading Calculation Algorithm The final water level reading combines the LLM-extracted major scale value with the geometric scale gap ratio: Where: • W = Water level reading (cm) • M = Major scale reading at waterline (from LLM: visible_no, representing the scale mark just above water) • R = Scale gap ratio (D_n / D_m from YOLOv8-Pose) Calculation Example: Given: • M = 47 (major scale mark visible just above water line) • D_m = 45.2 pixels (average major scale gap) • D_n = 18.7 pixels (distance from "47" marker to water line) • R = 18.7 / 45.2 = 0.414 Note: Actual scale interpretation depends on gauge plate design; this example assumes 10 cm intervals between major markers. Edge Case Handling Edge Case Detection Method Handling Strategy Missing scale gaps Fewer than 2 markers detected Flag for user validation; LLM-only estimation with reduced confidence Angular misalignment Sobel gradient analysis Automated rotation correction before processing Corrosion/fading Low confidence detection User prompted to confirm or retake photograph Partial occlusion Incomplete number sequence LLM interpolation based on visible pattern Poor lighting High variance in detections Flag submission for manual review json { "top_number": 50, "number_seq": [50, 40, 30, 20], "visible_no": 47 } W = M − (R × 10) W = 47 − (0.414 × 10) W = 47 − 4.14 W = 42.86 cm AI Processing Pipeline | Page 25 of 52 CGIAR Performance Summary Model Comparison (All Image Categories, After Outlier Removal) Model Configuration Bias (cm) MAE (cm) RMSE (cm) R² GPT-4o Stage 1 13.97 17.35 22.58 0.45 GPT-4o Stage 2 4.34 9.99 17.49 0.49 Gemini Stage 1 7.06 10.27 14.28 0.63 Gemini Stage 2 1.98 6.97 13.68 0.58 Performance on Optimal Quality Images When evaluated on images with clear scales, daylight conditions, no blur, no corrosion, and no visual obstacles: Model Configuration Bias (cm) MAE (cm) RMSE (cm) R² GPT-4o Stage 1 14.71 16.56 21.68 0.54 GPT-4o Stage 2 4.89 9.33 15.99 0.56 Gemini Stage 1 7.90 9.38 11.71 0.80 Gemini Stage 2 3.04 5.43 8.58 0.84 Gemini Stage 2 achieved the best performance across all metrics, with the lowest errors and highest correlation with ground-truth measurements (R² = 0.84 for optimal images). Image Quality Sensitivity The results demonstrate strong sensitivity to image quality: Image Category Proportion Gemini Stage 2 Performance Optimal quality ~91% MAE = 5.43 cm, RMSE = 8.58 cm Sub-optimal quality ~9% MAE = 11.70 cm, RMSE = 13.53 cm All images 100% MAE = 6.97 cm, RMSE = 13.68 cm These findings confirm that the two-stage user validation workflow is essential: for challenging images, user corrections ensure data quality despite reduced AI accuracy. Core Innovation: Hybrid AI Architecture The pipeline's hybrid architecture addresses limitations of single-model approaches: Why Not Pure Computer Vision? Traditional object detection models excel at geometric tasks but struggle with the contextual interpretation required to read gauge scales—particularly when numbers are partially visible, non-standard fonts are used, or scale orientation varies. Why Not Pure LLM Analysis? Stage 1 results demonstrate this limitation: without scale gap metadata, LLMs achieved weak correlations (R² = 0.13–0.16), as they lacked the geometric precision required for accurate interpolation between scale markers. CGIAR AI Processing Pipeline | Page 26 of 52 The Hybrid Advantage: By separating concerns—geometric measurement via YOLOv8-Pose, numerical interpretation via Gemini, integration via algorithm—the pipeline achieves substantially higher accuracy than either approach in isolation. Stage 2 results show the benefit of combining scale gap ratio metadata with LLM visual reasoning (R² improvement from 0.16 to 0.58 for Gemini). This modular architecture also enables independent improvement of each component as models evolve, and the structured output format (JSON) ensures reliable downstream processing. API Endpoint Reference The Vision API exposes six core endpoints implementing the discharge measurement workflow (upload_image, object_detection, extract_reading, confirm_reading, confirm_ai_reading, discharge) plus authentication endpoints for user management. Full API documentation including request parameters, response formats, and error codes is provided in Appendix A. Data Model | Page 27 of 52 CGIAR Data Model The database schema supports the complete citizen science workflow from user registration through submission, AI processing, validation, and discharge calculation. The schema (Figure 8) comprises ten interconnected tables organised into four functional groups: user management, submission handling, AI prediction, and hydrological reference data. User Management Tables app_users: Stores user profile information synchronised from Keycloak authentication, including kc_sub (Keycloak subject identifier), email, display_name, organization, and session timestamps. kc_users: Maps Keycloak subjects to internal user identifiers for cross-referencing. user_app_permissions: Controls application-level access including can_submit, can_moderate, can_app_admin flags, and plate_quota_month for submission rate limiting. user_station_permissions: Manages station-specific permissions, allowing fine-grained control over which users can submit to or moderate specific monitoring stations. Submission Tables (Two-Stage Workflow) DT_GAUGE_SUBMISSION: Stores raw submissions as received from the WhatsApp bot, preserving the original image and metadata before any processing or moderation. Key fields include image_data (LONGBLOB), geographic coordinates, timestamps, mobile_number, user_id, and station reference. plate_submissions: Stores processed and moderated submission records, linking raw submissions to corrected outputs. This table supports the quality assurance workflow with fields for stage_height_corrected, discharge_computed, status (ENUM for workflow state), review_notes, reviewer_kc_sub, and moderation timestamps. Foreign keys link to raw_submission_id and station_id. AI Prediction Table DT_gauge_prediction: Stores AI processing outputs from the Vision API pipeline and captures two-stage user validation results: Field Group Fields Description Image Reference image_path, submission_id Links to source submission YOLOv8-Pose Outputs scale_gap_pixels (D_m), distance_to_bottom (D_n), total_pixel_height_y Geometric measurements Gemini Outputs top_reading, visible_no (M), number_seq Extracted numerical values Calculated Reading expected_reading, corrected_reading W = M − (R × 10) result First Validation user_validation, user_corrected_reading Citizen confirmation/correction Second Validation user_validation_2, user_corrected_reading_2 Follow-up confirmation Hydrological Reference Tables stations: Monitoring station registry containing code, name, geographic coordinates, and is_active status flag. rating_curves: Stage-discharge relationships for each station, storing curve_params (rating curve coefficients) used by the FlowTracker API to convert validated water level readings to discharge values. Data Flow The submission workflow progresses through the schema as follows: CGIAR Data Model | Page 28 of 52 1. Submission Receipt: WhatsApp bot stores raw image and metadata in DT_GAUGE_SUBMISSION 2. AI Processing: Vision API processes image; results stored in DT_gauge_prediction linked via submission_id 3. User Validation: Two-stage citizen confirmation updates user_validation and user_corrected_reading fields 4. Moderation: Approved submissions promoted to plate_submissions with corrected values and moderator notes 5. Discharge Calculation: Validated stage heights passed to FlowTracker API using rating_curves parameters Figure 8 DT-Vision schema [Source: IWMI] WhatsApp Bot Integration Workflow | Page 29 of 52 CGIAR WhatsApp Bot Integration Workflow Platform Architecture The citizen science interface is implemented as a custom conversational bot service deployed on AWS ECS, integrated with the WhatsApp Business API (Cloud API) for message transport. This architecture separates the messaging channel from the application logic, enabling full control over conversation design, state management, and integration with the Vision API backend. Figure 9 Custom Bot Deployment on AWS ECS with WhatsApp Business API Integration [Source: IWMI] System Components Component Deployment Technology Purpose WhatsApp Bot Service AWS ECS (Fargate) Python/Flask Conversation logic, state management, API orchestration Session Store AWS ElastiCache Redis Conversation state persistence across messages Vision API AWS ECS (Fargate) Flask AI processing pipeline WhatsApp Cloud API Meta Platform Graph API v18.0 Message transport (send/receive) Media Storage AWS S3 — Temporary image storage during processing Architectural Flow The WhatsApp Cloud API serves purely as a message transport layer—all conversation intelligence, interactive flows, and integration logic resides in the custom bot service. CGIAR WhatsApp Bot Integration Workflow | Page 30 of 52 Bot Service Implementation Webhook Handler The bot service exposes a webhook endpoint that receives all incoming WhatsApp events. Each message triggers the conversation engine which determines the appropriate response based on current session state: Session State Management Each user's conversation progress is tracked through a session object persisted in Redis. Sessions maintain state across the asynchronous message-based interaction pattern: Conversation State Machine The bot implements a finite state machine controlling the conversation flow. Each state defines valid user inputs and corresponding transitions: State Trigger Bot Action Next State IDLE Any message Send welcome + "Send a photo of the gauge" AWAITING_IMAGE python @app.route('/webhook', methods=['POST']) def webhook(): payload = request.get_json() for entry in payload.get('entry', []): for change in entry.get('changes', []): message = change['value'].get('messages', [{}])[0] sender = message.get('from') # Load or create session session = session_manager.get_session(sender) # Route to conversation handler response = conversation_engine.handle(message, session) # Send response via WhatsApp Cloud API whatsapp_client.send(sender, response) return 'OK', 200 python class UserSession: user_id: str phone_number: str state: ConversationState submission_id: Optional[int] station_id: Optional[int] location: Optional[Tuple[float, float]] image_date: Optional[date] ai_reading: Optional[float] created_at: datetime updated_at: datetime expires_at: datetime # 24-hour TTL WhatsApp Bot Integration Workflow | Page 31 of 52 CGIAR AWAITING_IMAGE Image received Store image, prompt for station AWAITING_STATION AWAITING_STATION Station selected Store station, prompt for location AWAITING_LOCATION AWAITING_LOCATION Location shared Store coordinates, prompt for date AWAITING_DATE AWAITING_DATE Date confirmed/entered Call Vision API, show annotated image AWAITING_WATERLINE_CONFIRM AWAITING_WATERLINE_CONFIRM "Yes" button Extract AI reading, display result AWAITING_READING_CONFIRM AWAITING_WATERLINE_CONFIRM "No" button Prompt to retake photo AWAITING_IMAGE AWAITING_READING_CONFIRM "Yes" button Calculate discharge, show final result COMPLETE AWAITING_READING_CONFIRM "No" button Prompt for manual reading AWAITING_MANUAL_READING AWAITING_MANUAL_READING Number entered Store correction, calculate discharge COMPLETE COMPLETE Any message Thank user, reset session IDLE Interactive Message Design The bot constructs rich interactive messages using WhatsApp's supported formats, creating an intuitive user experience that minimises typing and reduces input errors. CGIAR WhatsApp Bot Integration Workflow | Page 32 of 52 Figure 10 Screenshot of the Vision Bot in action initiated the process after a triggering "hi" [Source: IWMI] Station Selection (Interactive List) When prompting for monitoring station, the bot presents an interactive list populated from the stations table: WhatsApp Bot Integration Workflow | Page 33 of 52 CGIAR Location Request The bot requests the user's location using WhatsApp's native location sharing, which provides accurate GPS coordinates: Figure 11 The vision API endpoints require a location to be submitted before any processing begins [Source: IWMI] json { "type": "interactive", "interactive": { "type": "list", "header": {"type": "text", "text": "Select Station"}, "body": {"text": "Which monitoring station is this gauge located at?"}, "action": { "button": "View Stations", "sections": [{ "title": "Limpopo Basin Stations", "rows": [ {"id": "stn_001", "title": "Beitbridge", "description": "Limpopo Main - Zimbabwe border"}, {"id": "stn_002", "title": "Chokwe", "description": "Limpopo Main - Mozambique"}, {"id": "stn_003", "title": "Pafuri", "description": "Limpopo-Luvuvhu confluence"} CGIAR WhatsApp Bot Integration Workflow | Page 34 of 52 Date Confirmation (Quick Reply Buttons) For date entry, the bot offers the current date as a default with an option to specify a different date: json { "type": "interactive", "interactive": { "type": "location_request_message", "body": { "text": "Please share your current location so we can verify the gauge position." }, "action": {"name": "send_location"} } } json { "type": "interactive", "interactive": { "type": "button", "body": { "text": "When was this photo taken?" }, "action": { "buttons": [ {"type": "reply", "reply": {"id": "date_today", "title": "📅 Today"}}, {"type": "reply", "reply": {"id": "date_yesterday", "title": "📅 Yesterday"}}, {"type": "reply", "reply": {"id": "date_other", "title": "Enter date..."}} ] } } } WhatsApp Bot Integration Workflow | Page 35 of 52 CGIAR Waterline Confirmation (Image + Buttons) After processing, the bot displays the annotated image with the detected waterline and requests confirmation: Figure 12 Once complete the Vision bot has identified the water line super imposed it on the image and returned it to the user for confirmation [Source: IWMI] json { "messaging_product": "whatsapp", "to": "{{user_phone}}", "type": "image", "image": { "link": "https://api.digitaltwins.iwmi.org/images/annotated_12345.jpg", "caption": "Red line shows detected water level. Confidence: 92%" } } CGIAR WhatsApp Bot Integration Workflow | Page 36 of 52 Followed immediately by: Reading Confirmation The AI-extracted reading is presented with confirmation buttons: Final Results The complete submission result is presented in a formatted text message: json { "type": "text", "text": { "body": "✅ *Submission Complete*\n\n📍 *Station:* Beitbridge\n📅 *Date:* 10 December 2025\n📏 *Water Level:* 42.86 cm\n💧 *Discharge:* 15.23 m³/s\n\nThank you for your contribution to water monitoring in the Limpopo Basin! 🌊" } } json { "type": "interactive", "interactive": { "type": "button", "body": {"text": "Is the red line correctly positioned at the water level?"}, "action": { "buttons": [ {"type": "reply", "reply": {"id": "waterline_yes", "title": "✓ Yes, correct"}}, {"type": "reply", "reply": {"id": "waterline_no", "title": "✗ No, retake"}} json { "type": "interactive", "interactive": { "type": "button", "body": { "text": "📏 *Water Level Reading*\n\n*42.86 cm*\n\nIs this reading correct?" }, "action": { "buttons": [ {"type": "reply", "reply": {"id": "reading_yes", "title": "✓ Confirm"}}, {"type": "reply", "reply": {"id": "reading_no", "title": "✗ Correct it"}} } WhatsApp Bot Integration Workflow | Page 37 of 52 CGIAR Complete User Journey The following table summarises the end-to-end interaction flow: Step User Action Bot Response Backend Process 1 Sends "Hi" Welcome message + instructions Create session (IDLE → AWAITING_IMAGE) 2 Sends gauge photo "Photo received. Which station?" + station list Download media, store in S3 3 Selects station "Share your location" + location request Store station_id in session 4 Share’s location "When was this taken?" + date buttons Store coordinates in session 5 Taps "Today" "Processing..." then annotated image + confirm buttons POST /vision/upload_image → POST /vision/object_detection 6 Taps "✓ Yes, correct" "Extracting reading..." then reading + confirm buttons POST /vision/confirm_reading → POST /vision/extract_reading 7 Taps "✓ Confirm" Final results message POST /vision/confirm_ai_reading → POST /vision/discharge 8 — Session reset, ready for next submission Store to DT_gauge_prediction, clear session Media Handling User-submitted images are processed through a secure pipeline: 1. Webhook Receipt: WhatsApp delivers message with media_id reference 2. Media URL Retrieval: Bot calls GET https://graph.facebook.com/v18.0/{media_id} to obtain temporary download URL 3. Image Download: Bot downloads binary from URL (valid ~5 minutes) with OAuth bearer token 4. S3 Storage: Image stored in S3 bucket with submission ID as key 5. Vision API Processing: S3 URL or binary passed to POST /vision/upload_image 6. Annotated Image Return: Vision API returns annotated image, bot uploads to S3 and sends WhatsApp image message with S3 URL Error Handling The bot service implements comprehensive error handling to maintain conversation continuity under adverse conditions: Error Condition Detection User Message Recovery Invalid image format MIME type ≠ image/jpeg, image/png "Please send a JPEG or PNG photo of the gauge." Remain in AWAITING_IMAGE Image too small Resolution < 640px "Image too small. Please send a clearer photo." Remain in AWAITING_IMAGE Gauge not detected Detection confidence < 0.20 "Couldn't detect the gauge. Please retake with full gauge visible." Remain in AWAITING_IMAGE CGIAR WhatsApp Bot Integration Workflow | Page 38 of 52 Low reading confidence AI confidence < threshold "Reading uncertain: 42.8 cm. Please confirm or enter correct value." Proceed with manual option Invalid manual entry Non-numeric or out of range "Please enter a number between 0 and 200 (e.g., 45.2)" Remain in AWAITING_MANUAL_READING Station not recognised station_id not found "Station not found. Please select from the list." Show station list Location too far Distance > 500m from station "Location doesn't match station. Please verify you're at the correct gauge." Prompt re-selection Vision API timeout Response > 30s "Processing taking longer than expected. Please wait..." Retry with backoff Vision API error 5xx response "Technical issue. Please try again in a few minutes." Log error, notify admin Session expired Redis key missing "Session expired. Please send a new photo to start again." Reset to IDLE Rate limit exceeded plate_quota_month reached "Monthly limit reached (30 submissions). Contact your supervisor." Block until reset Security Considerations | Page 39 of 52 CGIAR Security Considerations Authentication Architecture The system implements a dual authentication strategy: Keycloak OAuth2/OIDC for administrative and API access, and WhatsApp's native phone verification for citizen scientists. Keycloak Integration: Administrative users authenticate via Keycloak, receiving JWT Bearer tokens with 5-minute expiry. Refresh token rotation ensures long-lived sessions remain secure, with tokens invalidated after single use. API endpoints are protected using decorator-based access control that validates JWT claims against required Keycloak groups. WhatsApp User Verification: Citizen scientists are authenticated through WhatsApp's inherent SIM-based phone verification, providing strong identity assurance without requiring separate credentials. Conversations are bound to verified phone numbers, and first-time users are automatically registered in the system. Authorisation Model Role-based access control operates at two levels. Application-level permissions (stored in user_app_permissions) control global capabilities: submission creation, moderation rights, administrative access, and monthly submission quotas. Station-level permissions (stored in user_station_permissions) provide granular control over which users can submit to or moderate specific monitoring stations. Data Protection Encryption in Transit All connections use TLS 1.2+, including client-to-API (HTTPS), WhatsApp webhooks, database connections (TLS over SSH tunnel), and inter-service communication. Encryption at Rest Database storage uses AES-256 encryption via AWS KMS. S3 objects are encrypted using server-side encryption. Session data in ElastiCache is encrypted at rest. Infrastructure Security The system deploys within an AWS VPC with public subnets (Application Load Balancer only) and private subnets (ECS tasks, RDS, ElastiCache). Security groups restrict database and cache access to application containers only. Containers run as non-privileged users with resource limits enforced. Credentials are injected via AWS Secrets Manager rather than environment variables. API Security All inputs are validated before processing, including MIME type and size limits for images, coordinate range validation, and numeric bounds checking for readings. Rate limiting prevents abuse: 60 image uploads per hour, 10 authentication attempts per minute per IP. WhatsApp webhook requests are validated using HMAC signature verification to prevent spoofing. Audit and Compliance Security events (authentication, authorisation denials, submissions, admin actions) are logged to AWS CloudWatch with PII redacted. Data resides in AWS Africa (Cape Town) region where available. Citizen scientists acknowledge data usage terms on first interaction, and location sharing is explicitly requested rather than automatically collected. Third-party data sharing (Meta for WhatsApp, Google for Gemini AI) operates under standard data processing agreements. CGIAR Discussion and Conclusions | Page 40 of 52 Discussion and Conclusions Achievement of Objectives The Vision API demonstrates that hybrid AI architectures combining computer vision with large language models can achieve operationally useful accuracy for river gauge reading extraction. Evaluation on 548 images from the Limpopo Basin yielded waterline detection precision of 94.24%, scale gap detection mAP@0.5 of 89%, and reading extraction R² of 0.84 with MAE of 5.43 cm under optimal imaging conditions (Vigneswaran et al 2025). These results indicate performance comparable to existing automated gauge reading systems (Liu and Huang 2024) while offering greater adaptability across gauge plate designs. The three-stage AI processing pipeline—combining YOLOv8 for waterline detection, YOLOv8-Pose for scale gap estimation, and Gemini 2.0 Flash for reading extraction—achieves robust performance under field conditions. Evaluation on the Limpopo Basin dataset demonstrated waterline detection precision of 94.24%, scale gap detection mAP@0.5 of 89%, and reading extraction accuracy of R² = 0.84 with MAE = 5.43 cm on optimal quality images (Vigneswaran et al 2025). These metrics indicate the system is suitable for operational hydrological monitoring where traditional infrastructure is limited. The custom WhatsApp Bot Service enables citizen science participation without requiring dedicated app installation, addressing a significant barrier to adoption in resource-constrained contexts. By separating conversation logic from the messaging transport layer, the architecture provides full control over user experience while leveraging WhatsApp's ubiquitous presence across the Limpopo Basin's four riparian countries. The two- stage validation workflow serves a dual purpose: ensuring data quality through citizen confirmation while continuously generating ground-truth training data that enables ongoing model improvement without dedicated annotation campaigns. Innovation and Contribution The protocol represents a novel contribution to citizen science approaches for hydrological monitoring, addressing the "data paradox" inherent to Digital Twin development—basins with the weakest monitoring infrastructure stand to benefit most from digital decision support, yet lack the observational data required to drive such systems (Roman, 2025). The hybrid AI architecture combines computer vision for geometric measurement with large language models for numerical extraction, addressing limitations inherent to single-model approaches. The manuscript findings confirm this design rationale: processing with image alone achieved R² = 0.16, while incorporating scale gap metadata improved performance to R² = 0.58–0.84, demonstrating that geometric context from YOLOv8-Pose is essential for accurate reading extraction (Vigneswaran et al 2025). The algorithmic combination (W = M − R × 10) leverages each model's strengths rather than relying on end-to-end AI interpretation. The WhatsApp-based interface significantly lowers barriers to participation compared to dedicated mobile applications. This design prioritises digital inclusiveness, recognising that potential citizen scientists may face constraints including limited smartphone storage, unreliable data connections, or unfamiliarity with app installation procedures. The conversational interface with interactive buttons and multi-language support further reduces friction for users with varying digital literacy levels. The protocol's transboundary design, with station-level permissions and integration with the FlowTracker rating curve database, enables coordinated data collection across the basin's four countries under LIMCOM governance structures. Limitations and Future Work Several limitations warrant acknowledgement. The current training dataset comprises 548 images from Limpopo Basin gauge plates; performance on gauge designs from other regions requires validation before broader deployment. Model accuracy degrades under challenging imaging conditions—optimal quality images achieved MAE = 5.43 cm while sub-optimal images (blur, corrosion, poor lighting) showed MAE = 11.70 cm, though the validation workflow mitigates this by enabling user correction. The system's dependence on third-party APIs (Google Gemini, Meta WhatsApp) introduces potential points of failure outside operator control, and the image processing workflow requires sustained connectivity that may challenge users in areas with intermittent coverage. Future development will focus on continuous model improvement through retraining on accumulated validation data, offline capability for image capture during connectivity gaps, expanded language support (Setswana, Shona, Xitsonga), image quality scoring to provide immediate user feedback, and integration with the UNICEF YOMA blockchain platform for citizen scientist recognition and credentialing. Discussion and Conclusions | Page 41 of 52 CGIAR Conclusion The Vision API and associated protocol demonstrate that citizen science, combined with hybrid artificial intelligence, can meaningfully augment traditional hydrological monitoring in data-scarce transboundary basins. The system achieves reading accuracy comparable to previous automated gauge reading methods (Liu and Huang 2024) while offering superior adaptability across gauge plate designs through LLM-based interpretation. By lowering technical barriers through WhatsApp-based interaction, embedding quality assurance through two- stage validation, and integrating with existing hydrological infrastructure through the FlowTracker API, the system enables community members to contribute scientifically valuable discharge observations while building local engagement with water resource management. This work provides a foundation for scaling the approach for scaling the approach across the Limpopo Basin's planned network of 80 citizen scientists generating 10,000 annual observations, and potentially to other river systems facing similar monitoring challenges. CGIAR References | Page 42 of 52 References Afham, Abdul; Silva, Paulo; Ghosh, Surajit; Kiala, Zolo; Retief, H.; Dickens, Chris; Garcia Andarcia,Mariangel. 2024. Limpopo River Basin Digital Twin Open Data Cube Catalog. Colombo, Sri Lanka: InternationalWater Management Institute (IWMI). CGIAR Initiative on Digital Innovation. 22p. Beven, K.J. 2012. Rainfall-runoff modelling: the primer. 2nd ed. Chichester, UK: Wiley-Blackwell. Bonney, R.; Cooper, C.B.; Dickinson, J.; Kelling, S.; Phillips, T.; Rosenberg, K.V.; Shirk, J. 2009. Citizen science: a developing tool for expanding science knowledge and scientific literacy. BioScience, 59(11): 977–984. doi:10.1525/bio.2009.59.11.9. Buytaert, W.; et al. 2014. Citizen science in hydrology and water resources: opportunities for knowledge generation, ecosystem service management, and sustainable development. Frontiers in Earth Science, 2: 26. doi:10.3389/feart.2014.00026. Conrad, C.C.; Hilchey, K.G. 2011. A review of citizen science and community-based environmental monitoring: issues and opportunities. Environmental Monitoring and Assessment, 176(1–4): 273–291. doi:10.1007/s10661- 010-1582-5. Garcia Andarcia, M., Dickens, C., Silva, P., Matheswaran, K., & Koo, J. (2024). Digital Twin for management ofwater resources in the Limpopo River Basin: a concept. Colombo, Sri Lanka: International Water ManagementInstitute (IWMI). CGIAR Initiative on Digital Innovation. 4p. Dell, N.; Kumar, N. 2016. The ins and outs of HCI for development. In: Proceedings of the 2016 CHI conference on human factors in computing systems. San Jose, CA, USA: ACM. pp. 2220–2232. doi:10.1145/2858036.2858081. Fowler, M. 2002. Patterns of enterprise application architecture. Boston, USA: Addison-Wesley. Hannah, D.M.; et al. 2011. Large-scale river flow archives: importance, current status and future needs. Hydrological Processes, 25(7): 1191–1200. doi:10.1002/hyp.7794. International Water Management Institute (IWMI). 2024. Citizen science for water management in Limpopo river basin: project proposal. Colombo, Sri Lanka: International Water Management Institute (IWMI). Langa, N.; Kiala, Z. 2025. Citizen scientists take the lead in tracking southern Africa's transboundary river basin. IWMI Blog. 11 November 2025. Available at: https://www.iwmi.org/blogs/citizen-scientists-take-the-lead-in- tracking-southern-africas-transboundary-river-basin/. Limpopo Watercourse Commission (LIMCOM). 2019. Limpopo river basin monograph. Maputo, Mozambique: Limpopo Watercourse Commission (LIMCOM). Liu, W.-C.; Huang, W.-C. 2024. Evaluation of deep learning computer vision for water level measurements in rivers. Heliyon, 10: e25989. doi:10.1016/j.heliyon.2024.e25989. Rasheed, A.; San, O.; Kvamsdal, T. 2020. Digital twin: values, challenges and enablers from a modeling perspective. IEEE Access, 8: 21980–22012. doi:10.1109/ACCESS.2020.2970143. Roman, H. 2025. Quoted in: Storr, S. 2025. A network of citizen scientists to protect freshwater resources in southern Africa. IWMI Blog. 18 September 2025. Available at: https://www.iwmi.org/blogs/a-network-of-citizen- scientists-to-protect-freshwater-resources-in-southern-africa/. See, L.; et al. 2016. Crowdsourcing, citizen science or volunteered geographic information? the current state of crowdsourced geographic information. ISPRS International Journal of Geo-Information, 5(5): 55. doi:10.3390/ijgi5050055. Seibert, J.; Strobl, B.; Etter, S.; Hummer, P.; van Meerveld, H.J. 2019. Virtual staff gauges for crowd-based stream level observations. Frontiers in Earth Science, 7: 70. doi:10.3389/feart.2019.00070. Storr, S. 2025. A network of citizen scientists to protect freshwater resources in southern Africa. IWMI Blog. 18 September 2025. Available at: https://www.iwmi.org/blogs/a-network-of-citizen-scientists-to-protect-freshwater- resources-in-southern-africa/. Thornhill, I.; Loiselle, S.; Lind, K.; Ophof, D. 2016. The citizen science opportunity for researchers and agencies. BioScience, 66(9): 720–721. doi:10.1093/biosci/biw089. Vigneswaran, K.; Retief, H.; Clifford-Holmes, J.; Garcia Andarcia, M.; Tennakoon, H. 2025. Hybrid framework for automated river gauge reading: integrating YOLOv8 waterline detection and Gemini 2.0 Flash LLM. Unpublished manuscript. Colombo, Sri Lanka: International Water Management Institute (IWMI). https://www.iwmi.org/blogs/citizen-scientists-take-the-lead-in-tracking-southern-africas-transboundary-river-basin/ https://www.iwmi.org/blogs/citizen-scientists-take-the-lead-in-tracking-southern-africas-transboundary-river-basin/ https://www.iwmi.org/blogs/a-network-of-citizen-scientists-to-protect-freshwater-resources-in-southern-africa/ https://www.iwmi.org/blogs/a-network-of-citizen-scientists-to-protect-freshwater-resources-in-southern-africa/ https://www.iwmi.org/blogs/a-network-of-citizen-scientists-to-protect-freshwater-resources-in-southern-africa/ https://www.iwmi.org/blogs/a-network-of-citizen-scientists-to-protect-freshwater-resources-in-southern-africa/ References | Page 43 of 52 CGIAR Yang, Z.; Li, L.; Wang, J.; Lin, K.; Azarnasab, E.; Ahmed, F.; Liu, Z.; Liu, C.; Zeng, M.; Wang, L. 2023. MM- REACT: Prompting ChatGPT for multimodal reasoning and action. arXiv:2303.11381. Available at: https://arxiv.org/abs/2303.11381. Wiggins, A.; Crowston, K. 2011. From conservation to crowdsourcing: a typology of citizen science. In: Proceedings of the 44th Hawaii international conference on system sciences. Kauai, HI, USA: IEEE. pp. 1–10. doi:10.1109/HICSS.2011.207. https://arxiv.org/abs/2303.11381 CGIAR Apendix A: API Endpoint Reference | Page 44 of 52 Apendix A: API Endpoint Reference Overview The Vision API exposes a RESTful interface for the discharge measurement protocol, organised into two endpoint groups: core vision processing endpoints that implement the five-stage workflow, and authentication endpoints that manage user identity and access control. All endpoints follow consistent conventions for request/response formatting, error handling, and authentication. Base URL: https://api.digitaltwins.iwmi.org Content Types: • Request: multipart/form-data for image uploads; application/x-www-form-urlencoded for other endpoints • Response: application/json for data responses; image/jpeg for annotated images Authentication: Unless otherwise specified, endpoints require a valid JWT Bearer token in the Authorization header, obtained through the authentication endpoints. Core Vision Processing Endpoints The six core endpoints implement the discharge measurement workflow sequentially. Each endpoint corresponds to a specific processing stage and must be called in order for a given submission. POST /vision/upload_image Purpose: Initial image submission from WhatsApp bot with associated metadata. This endpoint is public (no authentication required) to facilitate seamless WhatsApp bot integration. Authentication: None required (public endpoint to facilitate WhatsApp bot integration) Request Parameters:Parameter Type Required Description image File Yes Gauge plate photograph (JPEG or PNG format, max 10MB) user_id String Yes WhatsApp user identifier (phone number hash or unique ID) timestamp String Yes Capture timestamp in format YYYY-MM-DD HH:MM:SS longitude Float No GPS longitude coordinate (decimal degrees, WGS84) latitude Float No GPS latitude coordinate (decimal degrees, WGS84) station String No Gauging station identifier (e.g., B7H026) imageSendDate String No Date image was sent via WhatsApp (YYYY-MM-DD) imageSendTime String No Time image was sent via WhatsApp (HH:MM:SS) Processing Logic: 1. Validates presence of required fields (image, user_id, timestamp) 2. Validates timestamp format compliance 3. Validates file type (JPEG/PNG only) and size constraints 4. Generates unique submission_id 5. Inserts metadata record into DT_GAUGE_SUBMISSION table https://api.digitaltwins.iwmi.org/ Apendix A: API Endpoint Reference | Page 45 of 52 CGIAR 6. Stores binary image data in database BLOB field 7. Saves temporary file copy for subsequent processing Response (Success - 201 Created): Response (Error - 400 Bad Request): Error Codes: Code Condition 400 Missing required field, invalid timestamp format, unsupported file type 413 Image file exceeds size limit 500 Database insertion failure POST /vision/object_detection Purpose: Execute YOLOv8 model to detect gauge plate boundaries and water line intersection point. Authentication: Bearer Token (required groups: dt-vision-users or vision-app-admin) Request Parameters: Parameter Type Required Description submission_id Integer Yes Submission identifier from upload_image response user_id String Yes WhatsApp user identifier (must match original submission) Processing Logic: 1. Retrieves original image from DT_GAUGE_SUBMISSION table 2. Loads YOLOv8 (gauge plate detection variant) 3. Executes inference via user_image_val() function 4. Detects gauge plate bounding box coordinates 5. Identifies water line intersection at bottom edge of detected region 6. Draws red annotation line at water intersection point 7. Overlays confidence score on annotated image 8. Crops detected gauge region for subsequent reading extraction 9. Saves annotated image to temporary storage json { "status": "success", "submission_id": 12345, "message": "Image uploaded successfully" } json { "status": "error", "message": "Invalid timestamp format. Expected YYYY-MM-DD HH:MM:SS" } CGIAR Apendix A: API Endpoint Reference | Page 46 of 52 Response (Success - 200 OK): Returns annotated JPEG image with Content-Type: image/jpeg The annotated image displays: • Original photograph with gauge plate region highlighted • Red horizontal line indicating detected water level • Confidence percentage overlaid on image Response (Error - 404 Not Found): Response (Error - 422 Unprocessable Entity): Error Codes: Code Condition 401 Missing or invalid authentication token 403 User not in authorised group 404 Submission ID not found 422 Model failed to detect gauge plate (confidence below threshold) 500 Model inference failure POST /vision/extract_reading Purpose: Extract numerical water level reading from cropped gauge image using dual-AI pipeline (YOLOv8 + Gemini LLM). Authentication: Bearer Token (required groups: dt-vision-users or vision-app-admin) Request Parameters: Parameter Type Required Description submission_id Integer Yes Submission identifier user_id String Yes WhatsApp user identifier Processing Pipeline: 1. Prerequisite Check: Verifies cropped gauge image exists from prior object_detection call. Returns error if detection stage was skipped or failed. 2. Scale Detection: Executes YOLOv8-Pose model on cropped image to detect individual scale marker positions through keypoint localisation.. Calculates scale_gap_pixels as average pixel distance between consecutive detected markers. 3. LLM Vision Analysis: Sends cropped gauge image to Gemini 2.0 Flash with structured prompt requesting: json { "status": "error", "message": "Submission not found or image not available" } json { "status": "error", "message": "Gauge plate not detected in image. Please retake photograph." } Apendix A: API Endpoint Reference | Page 47 of 52 CGIAR a. top_number: Highest visible number on gauge scale b. number_seq: Sequence direction (ascending/descending) c. visible_no: Number closest to (but above) the water line 4. Reading Calculation: Applies geometric formula: 5. Database Storage: Inserts prediction record into DT_gauge_prediction table with AI-extracted reading, scale gap measurement, and processing timestamp. Response (Success - 200 OK): Response (Error - 400 Bad Request): Error Codes: Code Condition 400 Prerequisites not met (detection not completed) 401 Missing or invalid authentication token 422 Scale markers not detected; LLM failed to extract numbers 502 Gemini API unavailable or returned error 500 Reading calculation failure Notes: • The reading value is returned in centimetres • Confidence level is derived from scale detection quality and LLM response consistency • If scale gap detection fails, the endpoint returns an error rather than an unreliable reading POST /vision/confirm_reading Purpose: First-stage user confirmation of object detection results (water line position accuracy). Authentication: Bearer Token (required groups: dt-vision-users or vision-app-admin) corrected_reading = (visible_no × 10) − (10 / scale_gap_pixels × gap_to_water) Where gap_to_water is the pixel distance from the nearest scale marker to the detected water line. json { "status": "success", "submission_id": 12345, "ai_reading": 47.5, "scale_gap_pixels": 28.3, "confidence": "high", "message": "Reading extracted successfully" } json { "status": "error", "message": "Object detection must be completed before reading extraction" } CGIAR Apendix A: API Endpoint Reference | Page 48 of 52 Request Parameters: Parameter Type Required Description submission_id Integer Yes Submission identifier user_id String Yes WhatsApp user identifier is_correct String Yes User response: "yes", "no", or "need help" corrected_reading Float