Ask-the-data probe report
Citation
Palacios, P.; Juarez, H.; Mwanzia, L. 2025. Ask-the-data probe report. CGIAR Accelerator on Digital Transformation; International Potato Center.
Abstract/Description
CIP Dataverse serves as the primary repository for CIP experimental datasets, but functions as a noninteractive, download-oriented platform lacking built-in analysis and visualization capabilities. This creates significant barriers for researchers who need coding expertise to perform data analysis, limiting accessibility and reproducibility of findings for domain experts who are not data scientists. Ask-the-Data (ATD) is a Shiny for Python application that addresses these challenges by integrating interactive data exploration with AI assistance. The tool connects to datasets via DOI links from CIP Dataverse, enables users to select tables and engage in natural language conversations with a Pandas DataFrame agent, and generates visualizations and statistical analyses without requiring coding expertise. Key results include: (1) successful deployment of a web-accessible application at https://askthedata.cipotato.org/; (2) integration of LangChain-powered conversational AI with pandas, scikit-learn, and statsmodels libraries; (3) capability to generate comprehensive markdown reports including visualizations, data tables, and conversation transcripts. The tool democratizes data science by enabling researchers to perform advanced analyses through natural language queries.
Permanent link to cite or share this item
External link to download this item
DOI
Author ORCID identifiers
Henry Juarez https://orcid.org/0000-0002-8535-7089
Leroy Mwanzia https://orcid.org/0000-0002-1107-6110
