R E S E A R C H Open Access © The Author(s) 2025. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit ​h​t​t​p​​:​/​/​​c​r​e​a​​t​i​​v​e​c​​o​m​m​​o​n​s​.​​o​r​​g​/​l​i​c​e​n​s​e​s​/​b​y​/​4​.​0​/. Pierre et al. BMC Genomic Data (2025) 26:73 https://doi.org/10.1186/s12863-025-01359-6 focal point within this field. However, these interactions are difficult to identify because they occur at different molecular levels in plants and are strongly influenced by environmental factors (i.e., climate change). The new challenges include identifying these interactions and spanning diverse molecular entities contributing to phe- notypic expression. This endeavor necessitates a holis- tic approach, incorporating insights from different data stacks into a comprehensive model to unravel the true functioning of biological systems. For researchers, navigating through vast amounts of dispersed information across multiple online data- bases—each with distinct data models, scales, and access modes—is a major challenge. This is particularly evident in genetic association studies such as genome–wide asso- ciation studies (GWAS), which establish links between Introduction Agronomic research is witnessing an unprecedented revolution in the acquisition of various data, such as phenotypic and genomic data, as well as data related to the functional characterization of specific genes. Under- standing the intricate interactions between genotypes and phenotypes that lead to particular traits is a key BMC Genomic Data *Correspondence: Larmande Pierre pierre.larmande@ird.fr 1DIADE, IRD, CIRAD, Univ. Montpellier, Ave Agropolis, Montpellier, France 2AGAP, CIRAD, INRAE, Univ. Montpellier, Ave Agropolis, Montpellier, France 3French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, Montpellier, France 4Bioversity International, Bioversity, Parc Scientifique Agropolis II, Montpellier, France Abstract Background  The demand for food is expected to grow substantially in the coming years. To address this challenge, especially in the context of climate change, a deeper understanding of genotype-phenotype relationships is crucial for improving crop yields. Recent advances in high-throughput technologies have transformed the landscape of plant science research. However, there is an urgent need to integrate and consolidate complementary data to understand the biological system. Results  We introduce AgroLD, a knowledge graph that uses Semantic Web technologies to seamlessly integrate plant science data. AgroLD is designed to facilitate hypothesis formulation and validation within the scientific community. With approximately 1.08 billion triples, it integrates and annotates data from more than 151 datasets across 19 distinct sources. Conclusion  The overarching goal is to provide a specialized knowledge platform addressing complex biological questions in the plant sciences, including gene participation in plant disease resistance and adaptive responses to climate change. Keywords  Knowledge graphs, FAIR, Linked data, Bioinformatics, Plant sciences AgroLD: a knowledge graph for the plant sciences Larmande Pierre1,3*, Pittolat Bertrand2,3, Tando Ndomassi1,3, Pomie Yann1, Happi Happi Bill Gates1, Guignon Valentin2,3,4 and Ruiz Manuel2,3 http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1186/s12863-025-01359-6 http://crossmark.crossref.org/dialog/?doi=10.1186/s12863-025-01359-6&domain=pdf&date_stamp=2025-9-17 Page 2 of 10Pierre et al. BMC Genomic Data (2025) 26:73 genomic regions (loci) and phenotypic traits. GWAS loci often encompass multiple genes, necessitating thorough analysis to identify relevant genes. A similar challenge exists in transcriptomic studies, where researchers must interpret extensive lists of differentially expressed genes and determine which genes merit further laboratory investigation. Inevitably, researchers must decide which genes warrant further investigation in the laboratory, a decision often on the basis of subjectivity and incomplete data reviews. Today’s significant challenges are related to developing methods to integrate these heterogeneous data and enrich biological knowledge. Scientists also need methods to explore this large amount of data and to highlight relevant information that can be used to iden- tify key genes. The Semantic Web introduces techniques and tech- nologies to transform vast amounts of data into action- able knowledge. It is a fundamental component of the Findable, Accessible, Interoperable, and Re-usable (FAIR) principles  [1] - by enhancing data interoperabil- ity. This achievement hinges on establishing standard- ized vocabularies and ontologies, which systematically capture domain knowledge and translate it into seman- tic resources, empowering computers to index, search, and reason over data. Notably, the Resource Descrip- tion Framework (RDF)  [2] has gained widespread uti- lization for web-based data publication, leading to the creation of the Web of Data. Recently, numerous initia- tives have emerged within the biomedical and bioinfor- matics domains, each aiming to provide comprehensive platforms for building scientific hypotheses around gene functions, phenotypic expression, and disease emer- gence. Illustrative examples include Bio2RDF  [3], Uni- Prot RDF [4], PubChem [5] and WikiPathways [6]. In the domain of human biology, notable contributions have been made through the establishment of platforms such as the DisGeNET RDF [7] and the Monarch Initiative [8]. Similarly, the field of plant science has yielded the Knet- miner  [9], a graph database designed to unravel plant molecular networks for analogous objectives. In this con- text, AgroLD  [10, 11] was developed with the ambition of providing the tools and methods needed to exploit the data and knowledge produced within the plant com- munity. AgroLD has been actively developed. Currently, AgroLD contains more than 1,08 billion triples, resulting from the integration of approximately 151 datasets gath- ered in 33 named graphs. Methods Information content AgroLD was designed to accommodate the molecular and phenotypic information available on various plant species with a large focus on tropical crops. Since the first release  [10], 40 new species (6 since the previous release  [11]) have been integrated, including cereals, legumes, and fruit trees. The list of the 51 species is avail- able in Table 1. AgroLD is built incrementally and spans many aspects of plant molecular interactions. Initially, it integrated information on genes, proteins, metabolic pathways, and genetic studies built from several resources such as Ensembl Plants  [12], UniProtKB  [4], Gene Ontology Annotation  [13], Gramene  [14], Oryzabase  [15], RAP- DB  [16], and MSU  [17]. In its current version, AgroLD adds predictions of homologous genes from Ensembl Compara and biological networks from StringDB  [18], RiceNetV2 [19], PlantTFDB [20], and PlantRegMap [21]. The size of the knowledge base has expanded by 16% since the last release [11], reaching 1.08 billion triples. The biological community has guided the choice of these sources, as they are widely used and strongly impact the user’s confidence. We have also integrated resources developed by the local SouthGreen platform  [22] such as TropGeneDB  [23], a tropical plant genetics database; GreenPhylDB  [24], a comparative genomics database for tropical plants; OryzaTagLine  [25], a rice phenotype database and SniPlay [26], a rice genomic variation data- base. These resources combine experimental data from research groups in Montpellier and southern France. The online documentation provides an overview of the inte- grated data sources [27]. Table 2 provides an overview of the data sources integrated into AgroLD. We initially developed the conceptual framework of AgroLD on a custom vocabulary which also included mappings on well-established ontologies and controlled vocabularies in the fields of molecular biology and plant sciences such as Sequence Ontology  [28], Gene Ontol- ogy  [29], Plant Ontology  [30] or Plant Trait Ontol- ogy [31]. Most of these ontologies are hosted by the Open Bio-Ontologies (OBO) Foundry project  [32]. For this updated version, we modified the backbone schema (i.e., its vocabulary) by reusing other existing ontologies such as Semantic Science Ontology (SIO) [33], Feature Anno- tation Location Description Ontology (FALDO)  [34], and Relation Ontology (RO) [35] to increase its interop- erability with other knowledge graphs. Additionally, we included general ontologies such as Resource Description Framework Schema (RDFS), Simple Knowledge Organi- zation System (SKOS), and Dublin Core to describe some properties of the biological entities. The online documen- tation shows the complete list of the ontologies used. Fig- ure 1 shows a subset of the global schema of AgroLD [36]. AgroLD integration pipelines We developed various RDF conversion pipelines for large genomic and agronomic datasets. Although several generic tools exist within the Semantic Web community, such as Tarql [37], RML.io [38] or SPARQL-Generate [39] Page 3 of 10Pierre et al. BMC Genomic Data (2025) 26:73 Table 1  The 51 plant species integrated in AgroLD Species name Common name Taxon ID Aegilops tauschii subsp. strangulata rough-spike hard grass 200361 Amborella trichopoda Amborella 13333 Ananas comosus pineapple 4615 Arabidopsis halleri subsp. gemmifera 63677 Arabidopsis lyrata subsp. lyrata Cardaminopsis lyrata 81972 Arabidopsis thaliana thale cress 3702 Beta vulgaris ssp. vulgaris sugar beet 3555 Brachypodium distachyon stiff brome 15368 Brassica napus rape 3708 Brassica oleracea var. oleracea wild cabbage 109376 Brassica rapa field mustard 3711 Citrus x clementina clementine 85681 Coffea canephora robusta coffee 49390 Daucus carota subsp. sativus carrot 79200 Digitaria exilis White fonio 1010633 Glycine max soybean 3847 Gossypium raimondii Peruvian cotton 29730 Helianthus annuus domesticated sunflower 4232 Hordeum vulgare subsp. vulgare two-rowed barley 112509 Malus domestica apple 3750 Manihot esculenta cassava 3983 Musa acuminata subsp. malaccensis wild Malaysian banana 214687 Nicotiana attenuata wild tobacco 49451 Olea Europaea Mediterranean olive tree 158383 Oryza barthii African wild rice 65489 Oryza brachyantha malo sina 4533 Oryza glaberrima African rice 4538 Oryza glumipatula 40148 Oryza longistaminata long-staminate rice 4528 Oryza meridionalis Australian wild rice 40149 Oryza nivara 4536 Oryza punctata red rice 4537 Oryza rufipogon common wild rice 4529 Oryza sativa Indica Group long-grained rice 39946 Oryza sativa Japonica Group Japanese rice 39947 Phaseolus vulgaris common bean 3885 Prunus avium Sweet cherry 42229 Prunus dulcis almond 3755 Prunus persica peach 3760 Saccharum spontaneum wild sugarcane 62335 Setaria italica foxtail millet 4555 Solanum lycopersicum tomato 4081 Solanum tuberosum potato 4113 Sorghum bicolor sorghum 4558 Theobroma cacao cacao 3641 Triticum aestivum bread wheat 4565 Triticum dicoccoides wild emmer wheat 85692 Triticum turgidum subsp. durum durum wheat 4567 Triticum urartu red wild einkorn wheat 4572 Vitis vinifera wine grape 29760 Zea may maize 4577 Page 4 of 10Pierre et al. BMC Genomic Data (2025) 26:73 to name a few, none of them have been adapted to con- sider the complexity of data formats in the biological domain (e.g. Variant Call Format (VCF)  [40]) or even the complexity of the information they could contain. A simple example illustrates this complexity through the Generic Feature Format (GFF)  [41], which represents genomic data in a Tab Separated Value (TSV) format. It contains a column with a variable length key = value type information and different information depending on the data source. In this case, the transformation must be adapted according to the data source. Moreover, the large volume of data was a limiting factor for the abovemen- tioned tools. In this context, we developed RDF conversion tools adapted to various genomics data standards such as GFF, Gene Ontology Annotation File (GAF)  [42], and VCF. Moreover, we are currently working on packaging these Extraction, Transform, and Load (ETL) tools in an Application Programming Interface (API) [43]. RDF con- version tools are Python-based scripts that can be run independently. Furthermore, the tools are tailored to run locally or use high-performance computing resources. More than forty scripts are available to process either data standards (e.g., GFF) or database-specific data (e.g., TAIR, RAPDB, and Oryzabase). Some parameters, such as the base Uniform Resource Identifier (URI), local paths, and RDF prefixes, can be defined globally. Param- eters specific to a script can be defined at runtime. Docu- mentation is available as a docstring for each script and explains how to run them. Moreover, the GitHub reposi- tory provides documentation on how to deploy and use the tools. Table 3 lists all the resources and tools available for AgroLD. To ensure that AgroLD remains updated with the lat- est data, the entire knowledge base is reconstructed annually. Additionally, new datasets are incorporated multiple times a year, typically every four months. Regu- larly updating the data presents challenges, as the origi- nal databases often lack automatic tracking of changes between versions. On the basis of our experience, com- pletely reconstructing the knowledge base regularly is an effective strategy to bypass the complexities of handling data differences (Fig. 2). URI design and data linking In the transformation pipelines, RDF graphs share a com- mon namespace and are named according to the corre- sponding data sources. Entities in RDF graphs are linked by the common URI principle. We generally build URIs by referring to Identifiers.org [44], which provides design patterns for each registered source—for example, genes integrated from Ensembl Plants (​h​t​t​p​​:​/​/​​i​d​e​n​​t​i​​f​i​e​​r​s​.​​o​r​g​/​​e​ n​​s​e​m​​b​l​.​​p​l​a​n​​t​/​​E​n​t​i​t​y​_​I​D). When Identifiers.org does not provide them, new URIs are constructed, and in this case, URIs take the form (​h​t​t​p​​:​/​/​​p​u​r​l​​.​a​​g​r​o​​l​d​.​​o​r​g​/​​r​e​​s​o​u​r​c​e​/​E​n​t​i​ t​y​_​I​D) In addition, the properties linking the entities are constructed in various forms (​h​t​t​p​​:​/​/​​p​u​r​l​​.​a​​g​r​o​​l​d​.​​o​r​g​/​​v​o​​c​a​ b​u​l​a​r​y​/​p​r​o​p​e​r​t​y). To link identical entities from different data sources, we used an approach based on URI pattern matching. Its principle is to scan the URIs to look for similar patterns in the terminal part of the URI (i.e., Entity_ID). In addi- tion, we also follow the common URI approach, which recommends using the same URI pattern for two identi- cal entities. Therefore, this allowed us to aggregate infor- mation from different RDF graphs for the same entity. In addition, we used cross-reference links by transforming them to URIs and linking the resource to the rdfs predi- cate seeAlso. This significantly increases the number of outbound links by reaching almost 80 million links, mak- ing AgroLD better integrated with other data sources. We plan to implement a similarity-based entity profile Table 2  Data sources integrated in AgroLD Data sources Nb of datasets File format Ontology used Nb of triples Oryzabase 2 TSV GO,PO,TO 347 K GO Associations 2 GAF GO 6,440 K Genome Hub 7 GFF GO, SO 12,233 K Gramene 6 Custom flat file All 159 K Ensembl 51 GFF All 838,874 K UniprotKB 2 Uniprot GO, PO 60,034 K Oryza Tag Line 2 Custom flat file PO, TO, CO 282 K TropGeneDB 2 Custom flat file PO, TO, CO 20 K GreenPhylDB 2 Custom flat file GO, PO 3,627 K SNiPlay 1 HapMap, VCF GO 16,204 K Q-TARO 2 TSV PO, TO 20 K MSU 2 Custom flat file PO, TO 2,068 K RiceNetDB 6 Custom flat file PO, TO 5,879 K StringDB 45 Custom flat file GO 131,559 K RapDB 3 GFF PO, TO 1,026 K PlantTftDB 12 Custom flat file PO, TO 86 K Interpro 1 Custom flat file PO, TO 196 K CEGResources 2 GFF PO, TO 1,031 K OBO ontologies 12 OWL 15,131 K TOTAL 151 1,077,303 K Ontologies are referenced as GO gene ontology, PO plant ontology, TO plant trait ontology, EO plant environnment ontology, SO sequence ontology, CO crop ontology (plant specific traits) http://identifiers.org/ensembl.plant/Entity_ID http://identifiers.org/ensembl.plant/Entity_ID http://purl.agrold.org/resource/Entity_ID http://purl.agrold.org/resource/Entity_ID http://purl.agrold.org/vocabulary/property http://purl.agrold.org/vocabulary/property Page 5 of 10Pierre et al. BMC Genomic Data (2025) 26:73 approach to identify matches between entities with dif- ferent URIs. Results To increase the accessibility of a broader user base, we developed a web application for AgroLD with multiple query interfaces. The initial interface facilitates keyword searches across the entire database content, enabling users to navigate the knowledge base. A more advanced search interface allows users to combine free text and apply filters on the basis of class types, properties, and external web services. This feature supports the aggrega- tion of distributed data. We introduced a SPARQL Protocol and RDF Query Language (SPARQL) editor to address the challenge of handling SPARQL query language complexities, Table 3  Links to AgroLD resources and tools Name of resource or tool and description, URL Data AgroLD datasets, ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​5​2​8​1​​/​z​​e​n​o​d​o​.​4​6​9​4​5​1​8 List of graphs, ​h​t​t​p​​:​/​/​​w​w​w​.​​a​g​​r​o​l​​d​.​o​​r​g​/​d​​o​c​​u​m​e​n​t​a​t​i​o​n​.​j​s​p List of ontologies, ​h​t​t​p​​:​/​/​​w​w​w​.​​a​g​​r​o​l​​d​.​o​​r​g​/​d​​o​c​​u​m​e​n​t​a​t​i​o​n​.​j​s​p AgroLD vocabulary, ​h​t​t​p​​s​:​/​​/​g​i​t​​h​u​​b​.​c​​o​m​/​​S​o​u​t​​h​G​​r​e​e​​n​P​l​​a​t​f​o​​r​m​​/​A​g​​r​o​L​​D​_​ E​T​​L​/​​t​r​e​e​/​m​a​s​t​e​r​/​m​o​d​e​l AgroLD SPARQL Endpoint, ​h​t​t​p​​:​/​/​​a​g​r​o​​l​d​​.​s​o​​u​t​h​​g​r​e​e​​n​.​​f​r​/​s​p​a​r​q​l Example queries, ​h​t​t​p​​:​/​/​​w​w​w​.​​a​g​​r​o​l​​d​.​o​​r​g​/​s​​p​a​​r​q​l​e​d​i​t​o​r​.​j​s​p Tools Web application, ​h​t​t​p​​s​:​/​​/​g​i​t​​h​u​​b​.​c​​o​m​/​​S​o​u​t​​h​G​​r​e​e​​n​P​l​​a​t​f​o​​r​m​​/​A​g​r​o​L​D​_​w​ e​b​a​p​p RDF conversion pipelines (GFF2RDF, GAF2RDF, VCF2RDF, Datasets), ​h​t​t​p​ s​:​​​/​​/​g​i​t​h​u​​​b​.​​c​o​​m​​/​S​o​u​​t​h​​G​r​e​​e​n​P​​l​a​t​f​​o​​​r​m​/​A​g​r​o​​L​D​_​E​T​L Fig. 1  The AgroLD schema https://doi.org/10.5281/zenodo.4694518 http://www.agrold.org/documentation.jsp http://www.agrold.org/documentation.jsp https://github.com/SouthGreenPlatform/AgroLD_ETL/tree/master/model https://github.com/SouthGreenPlatform/AgroLD_ETL/tree/master/model http://agrold.southgreen.fr/sparql http://www.agrold.org/sparqleditor.jsp https://github.com/SouthGreenPlatform/AgroLD_webapp https://github.com/SouthGreenPlatform/AgroLD_webapp https://github.com/SouthGreenPlatform/AgroLD_ETL https://github.com/SouthGreenPlatform/AgroLD_ETL Page 6 of 10Pierre et al. BMC Genomic Data (2025) 26:73 particularly for bioinformaticians and biologists. This editor provides an interactive tool for query formulation and result manipulation. Consequently, the AgroLD plat- form offers several entry points:   • Quick Search: This plugin, powered by Virtuoso, uses faceted search capabilities to enable keyword- based searches and content navigation within AgroLD. Figure 3A illustrates the results of a keyword search, with GRP2 used as an example. The results are ranked by the frequency of occurrence across various entity fields. The Named Graph column indicates the data source, whereas the Title and Entity columns display the entity names and their URIs, respectively. Clicking on a link provides a comprehensive view of the entity, and users can traverse entities via the provided HTTP links.   • Advanced Search: This interface allows targeted searches on the basis of entity classes, incorporating an aggregation engine for external resources. Built Fig. 3  Overview of AgroLD Web interfaces. A displays the Faceted search interface. B displays results from the KnetMaps tool [45]. C displays results from the advanced search interface Fig. 2  AgroLD ETL pipelines Page 7 of 10Pierre et al. BMC Genomic Data (2025) 26:73 upon a Representational State Transfer (REST) API (described below), the Advanced Search conceals the technical intricacies of SPARQL queries. The integration of the AgroLD API facilitates interactive searches across the knowledge base and external services such as Pubmed or EMBL. Figure 3C shows the user interaction: selecting the entity type (e.g., Gene) and providing keywords (e.g., TBP1) yield results presented in a sortable and downloadable table. Each row contains entity attributes, including the ID, data source, and context of the matching keywords. To obtain more details, users must click on the display link below the entity ID. This will open a new window (not shown). This window takes the name and description of the biological object in its header and then comprises several panels. Each of them shows one feature of the object displayed. They can differ according to the type of entity displayed (e.g. Proteins, Pathways, Publications, Terms associated, View as Graph, Expression, and See also panels). Figure 3B shows the View as Graph panel. It was adapted from Knetmaps [45]. It displays a window divided into two parts. On the left part, the entity is represented within a graph showing other entities linked to it (in this case, Pathways). More detailed information corresponding to the entity highlighted in green (not shown) is displayed on the right part. When the users select another entity in the graph, this content changes dynamically;   • AgroLD Restful API: This programming interface supports interaction with the knowledge graph database. It comprises function calls grouped by entity classes within AgroLD (e.g., Genes, Proteins). For example, under the Gene class, functions exist for obtaining gene lists within genomic regions (genes/byLocus), genes matching specific keywords (genes/byKeyword), and genes encoding specific proteins (genes/encodingProtein).   • The SPARQL Editor: We developed a SPARQL query editor with an interactive environment, employing YASQE and YASR tools [46] adapted for our system. The editor features modular and customizable query patterns aligned with user requirements. Figure 4 illustrates the editor’s layout, which is divided into three areas. The main area serves as the query field with syntax highlighting, error checking, autocompletion, and editing functions. Users can load and save queries and execute predefined query templates. The results appear beneath the editor, initially as a sortable table. JSON or graphical formats are also available for display and download. Discussion The process of creating a knowledge graph is complex and challenging. In this section, we will present some of the challenges we had to address, particularly those related to managing the heterogeneity of the datasets and their sizes. We will discuss the challenges in aligning the entities and assessing the data quality. With respect to data heterogeneity, the main prob- lem was the variety of data formats, which we solved via RDF in a unified format. We propose several pipelines that can handle this variety and manage the dataset size. Indeed, as discussed in the pipelines section, in most cases, we preferred to develop our solutions rather than use generic tools to better manage the complexity or size of the datasets. Another problem is the heterogeneity of the genomic coordinates (i.e., different denominations of the chromosome identifier, missing information, etc.). We solve it by choosing a unique representation and Fig. 4  The SPARQL query editor. The Query patterns frame allows users to select a query from a natural language question. The Query text frame allows the visualization and modification of the SPARQL query. The results frame displays results returned from the query Page 8 of 10Pierre et al. BMC Genomic Data (2025) 26:73 transforming all coordinates into URI templates follow- ing the FALDO ontology representation [34]. With respect to the problem of entity linking (i.e., the same entities with different names or identifiers), we have only partially solved this problem, using pat- tern matching in URIs or database cross-linking to identify matches between entities. Indeed, in the case where the entities have a different namespace URIs (e.g., namespace1:identifier1 and namespace2:identifier1), we look for matching patterns in the URIs and create a new URI to establish the correspondence between them. If the entities have different URIs without matching pat- terns but with synonymous properties (i.e., skos:altLabel, skos:prefLabel, skos:synonym or specific properties), we look for matches with these properties and the patterns of the URIs. For entities that do not contain the above information, we take a more global approach based on property and value analysis. However, this is an open challenge that is currently being addressed. With respect to the processes followed for data qual- ity assessment, preprocessing quality assessments such as input file format, raw line, and missing value check- ing were developed for the resources used by the ETL pipeline. Next, the syntax of the triple products was validated via built-in libraries (e.g., with RDFlib). Other assessments include counting the number of entities (e.g., genes, proteins, chromosomes, etc.) and checking the presence/absence of properties with SPARQL query sets. More complex quality assessments, such as type restric- tions on properties, are planned for the future. Conclusion Data in the agronomic field are highly heterogeneous, multi-scale, and dispersed. For plant scientists to success- fully address the challenges of their daily work, it is essen- tial to integrate information on a global scale. Semantic Web technologies are central to data integration and knowledge management. The biomedical domain offers a good example to follow for capitalizing on previous experiences and considering the lessons learned. We have developed the AgroLD KG to leverage this approach in agronomy. AgroLD exploits the power of seamless data integration offered by RDF. It contains more than 1,08 billion triples, resulting from the integration of approxi- mately 151 datasets gathered in 33 named graphs. However, the coverage of its species and data sources is expected to expand with subsequent releases. To our knowledge, AgroLD is one of the first initiatives to apply Semantic Web practices to the agronomic domain, play- ing a complementary role in the integrative approaches adopted by the community. AgroLD is being actively developed on the basis of feedback from domain experts. It has also benefited from the support of the SouthGreen Bioinformatics Platform since its beginning in 2015 by providing IT support and infrastructure to host data and web applications. South- Green is one of the core platforms of the French Elixir- EU node and thus provides long-lasting support for AgroLD. AgroLD is strongly linked to several use-cases of the D2KAB (​h​t​t​p​​s​:​/​​/​d​2​k​​a​b​​.​m​y​​s​t​r​​i​k​i​n​​g​l​​y​.​c​o​m) and DIG-AI projects (National Research Agency funded proj- ect) to demonstrate the benefits of linked data to discover gene-phenotype interactions. With the achievement of the current phase, user feedback reveals some limitations and challenges in the current version. Thus, several issues are a matter of ongoing or future work. On the one hand, we must extend the KG coverage to more biological entities (e.g., miRNA, lncRNA, transpos- able elements) and relations (e.g., co-expression, regu- lation, and interaction networks) to capture a broader view of the molecular interactions. For example, we need to integrate information on gene expression and gene regulatory networks. On the other hand, the ETL pro- cess for KG creation is mostly based on domain-specific approaches, thus limiting its reusability. We will inves- tigate approaches that use declarative functions for its creation. Knowledge augmentation methods must be applied and adapted to the data. Indeed, we observed that some information remains hidden in the literal content of RDF, such as biological entities or relationships between them. Moreover, a large amount of related knowledge is available from external sources. We are currently developing methods to extract information embedded in unstructured data, such as KG text fields or external web documents and scientific publications, and bring this information in a structured form to the knowledge base. Finally, we extend the state-of-the-art data-linking techniques by considering the specificity of the biological domain. Abbreviations API � Application Programming Interface ETL � Extraction, Transform, and Load FAIR � Findable, Accessible, Interoperable, and Re-usable FALDO � Feature Annotation Location Description Ontology GAF � Gene Ontology Annotation File GFF � Generic Feature Format GWAS � Genome–Wide Association Studies KG � Knowledge Graph OBO � Open Bio-Ontologies RDF � Resource Description Framework RDFS � Resource Description Framework Schema RO � Relation Ontology REST � Representational State Transfer SIO � Semantic Science Ontology SKOS � Simple Knowledge Organization System SPARQL � SPARQL Protocol and RDF Query Language TSV � Tab Separated Value URI � Uniform Resource Identifier VCF � Variant Call Format https://d2kab.mystrikingly.com Page 9 of 10Pierre et al. BMC Genomic Data (2025) 26:73 Acknowledgements The authors thank the South Green Bioinformatic Platform and the I-Trop IRD supercomputer for their long-standing support. AgroLD is a service delivery plan selected resource by the ELIXIR-FR IFB (French Institute of Bioinformatics). This work was granted access to the HPC resources of IDRIS under the allocation 2024-A0160315119 made by GENCI. About this supplement This article has been published as part of BMC Genomic Data, Volume 26 Supplement 1, 2025: International SWAT4HCLS Conference – Semantic Web Applications and Tools for Health Care and Life Sciences 2023. The full contents of the supplement are available at ​h​t​t​p​​s​:​/​​/​b​m​c​​g​e​​n​o​m​​d​a​t​​a​.​b​i​​o​m​​e​d​c​​ e​n​t​​r​a​l​.​​c​o​​m​/​a​​r​t​i​​c​l​e​s​​/​s​​u​p​p​​l​e​m​​e​n​t​s​​/​v​​o​l​u​m​e​-​2​6​-​s​u​p​p​l​e​m​e​n​t​-​1. Authors’ contributions PL wrote the manuscript and contributed to the construction of AgroLD KG, the development of ETL pipelines, and the maintenance of web applications. NT and BP contributed to the database and web server administration. BGHH, VG, and MR contributed to the data integration. YP contributed to the development of the Web application. All the authors read and approved the final manuscript. Funding Several research projects have supported the AgroLD platform. ETL pipelines have been endorsed by the D2KAB (ANR-18-CE23-0017) and FOOSIN (ANR- 19-DATA-0019-03) projects. The AgroLD Knowledge Graph (Knowledge Graph (KG)) development has been supported by the IBC (ProjetIA-11-BINF-0002) and DIG-AI (ANR-22-CE23-0012) projects. The Web application has been supported by the IFB project (ProjetIA-11-INBS-0013) and IRD funding support. Data availability The AgroLD datasets can be found at the Zenodo repository ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​ 1​0​.​​5​2​8​1​​/​z​​e​n​o​d​o​.​4​6​9​4​5​1​8. The Web application is available at ​h​t​t​p​​s​:​/​​/​g​i​t​​h​u​​b​.​c​​ o​m​/​​S​o​u​t​​h​G​​r​e​e​​n​P​l​​a​t​f​o​​r​m​​/​A​g​r​o​L​D​_​w​e​b​a​p​p and the RDF conversion pipelines (GFF2RDF, GAF2RDF, VCF2RDF, and datasets) are available at ​h​t​t​p​​s​:​/​​/​g​i​t​​h​u​​b​.​c​​o​ m​/​​S​o​u​t​​h​G​​r​e​e​​n​P​l​​a​t​f​o​​r​m​​/​A​g​r​o​L​D​_​E​T​L. Declarations Ethics approval and consent to participate Not applicable. Consent to publication Not applicable. Competing interests The authors declare that the research was conducted without commercial or financial relationships that could lead to a potential conflicts of interest. Received: 16 August 2023 / Accepted: 18 August 2025 References 1. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, et al. The FAIR guiding principles for scientific data management and steward- ship. Sci Data. 2016;3(1):160018. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​0​3​8​​/​s​​d​a​t​a​.​2​0​1​6​.​1​8. 2. W3C. RDF 1.1 Concepts and Abstract Syntax. 2014. Accessed 31 July 2025. ​h​t​t​ p​​s​:​/​​/​w​w​w​​.​w​​3​.​o​​r​g​/​​T​R​/​r​​d​f​​1​1​-​c​o​n​c​e​p​t​s​/ 3. Nolin MA, Corbeil J, Lamontagne L, Dumontier M. Bio2RDF: Convert. Provide And Reuse Nat Precedings. 2010. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​0​3​8​​/​n​​p​r​e​.​2​0​1​0​.​5​0​6​0​.​1. 4. The UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2018;47:D506–15. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​0​9​3​​/​n​​a​r​/​g​k​y​1​0​4​9. 5. Fu G, Batchelor C, Dumontier M, Hastings J, Willighagen E, Bolton E. Pub- ChemRDF: towards the semantic annotation of PubChem compound and substance databases. J Cheminform. 2015. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​1​8​6​​/​s​​1​3​3​2​1​-​0​1​ 5​-​0​0​8​4​-​4. 6. Kutmon M, Riutta A, Nunes N, Hanspers K, Willighagen EL, Bohler A, et al. Wikipathways: capturing the full diversity of pathway knowledge. Nucleic Acids Res. 2016. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​0​9​3​​/​n​​a​r​/​g​k​v​1​0​2​4. 7. Queralt-Rosinach N, Pinero J, Bravo A, Sanz F, Furlong LI. DisGeNET-RDF: harnessing the innovative power of the semantic web to explore the genetic basis of diseases. Bioinformatics. 2016. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​0​9​3​​/​b​​i​o​i​​n​f​o​​r​m​a​t​​i​c​​s​ /​b​t​w​2​1​4. 8. Shefchek KA, Harris NL, Gargano M, Matentzoglu N, Unni D, Brush M, et al. The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Res. 2020;48:D704–15. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​0​9​3​​/​n​​a​r​/​g​k​z​9​9​7. 9. Hassani-Pak K, Singh A, Brandizi M, Hearnshaw J, Parsons JD, Amberkar S, et al. KnetMiner: a comprehensive approach for supporting evidence-based gene discovery and complex trait analysis across species. Plant Biotechnol J. 2021;19(8):1670–8. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​1​1​1​​/​p​​b​i​.​1​3​5​8​3. 10. Venkatesan A, Tagny Ngompe G, Hassouni NE, Chentli I, Guignon V, Jonquet C, et al. Agronomic Linked Data (AgroLD): A knowledge-based system to enable integrative biology in agronomy. PLOS ONE. 2018;13(11):1–17. ​h​t​t​p​​s​:​/​​/​ d​o​i​​.​o​​r​g​/​​1​0​.​​1​3​7​1​​/​j​​o​u​r​​n​a​l​​.​p​o​n​​e​.​​0​1​9​8​2​7​0. 11. Larmande, P., Todorov, K. (2021). AgroLD: A Knowledge Graph for the Plant Sciences. In: Hotho, A., et al. The Semantic Web – ISWC 2021. ISWC 2021. Lecture Notes in Computer Science(), vol 12922. Springer, Cham. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​ r​g​/​​1​0​.​​1​0​0​7​​/​9​​7​8​-​​3​-​0​​3​0​-​8​​8​3​​6​1​-​4​_​2​9 12. Bolser D, Staines DM, Pritchard E, Kersey P. Ensembl Plants: Integrating Tools for Visualizing, Mining, and Analyzing Plant Genomics Data. Methods Mol Biol. 2016;1374:115–40. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​0​0​7​​/​9​​7​8​-​​1​-​4​​9​3​9​-​​3​1​​6​7​-​5​_​6. 13. Huntley RP, Sawford T, Mutowo-Meullenet P, Shypitsyna A, Bonilla C, Martin MJ, et al. The GOA database: gene Ontology annotation updates for 2015. Nucleic Acids Res. 2015;43:D1057-1063. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​0​9​3​​/​n​​a​r​/​g​k​u​1​1​1​3. 14. Tello-Ruiz MK, Naithani S, Stein JC, Gupta P, Campbell M, Olson A, et al. Gramene 2018: unifying comparative genomics and pathway resources for plant research. Nucleic Acids Res. 2018;46:D1181–9. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​0​9​3​​/​n​​ a​r​/​g​k​x​1​1​1​1. 15. Kurata N, Yamazaki Y. Oryzabase. An integrated biological and genome information database for rice. Plant Physiol. 2006;140(1):12. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​ 1​1​0​4​​/​p​​p​.​1​0​5​.​0​6​3​0​0​8. 16. Sakai H, Lee SS, Tanaka T, Numa H, Kim J, Kawahara Y, et al. Rice annotation project database (RAP-DB): an integrative and interactive database for rice genomics. Plant Cell Physiol. 2013;54(2):e6. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​0​9​3​​/​p​​c​p​/​p​c​s​1​ 8​3. 17. Kawahara Y, de la Bastide M, Hamilton JP, Kanamori H, McCombie WR, Ouy- ang S, et al. Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data. Rice. 2013;6(1):4. ​h​t​t​p​​ s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​1​8​6​​/​1​​9​3​9​-​8​4​3​3​-​6​-​4. 18. Szklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R, et al. The STRING database in 2023: protein–protein association networks and func- tional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 2022;51(D1):D638–46. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​0​9​3​​/​n​​a​r​/​g​k​a​c​1​0​0​0. 19. Lee T, Oh T, Yang S, Shin J, Hwang S, Kim CY, et al. RiceNet v2: an improved network prioritization server for rice genes. Nucleic Acids Res. 2015;43:W122– 7. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​0​9​3​​/​n​​a​r​/​g​k​v​2​5​3. 20. Jin J, Tian F, Yang DC, Meng YQ, Kong L, Luo J, et al. PlantTFDB 4.0: toward a central hub for transcription factors and regulatory interactions in plants. Nucleic Acids Res. 2017;45(D1):D1040–5. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​0​9​3​​/​n​​a​r​/​g​k​w​9​8​ 2. 21. Tian F, Yang DC, Meng YQ, Jin J, Gao G. PlantRegMap: charting functional regulatory maps in plants. Nucleic Acids Res. 2020;48(D1):D1104–13. ​h​t​t​p​​s​:​/​​/​ d​o​i​​.​o​​r​g​/​​1​0​.​​1​0​9​3​​/​n​​a​r​/​g​k​z​1​0​2​0. 22. South Green collaborators. The South Green portal: a comprehensive resource for tropical and Mediterranean crop genomics South Green collabo- rators. Curr Plant Biol. 2016;78:6–9. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​0​1​6​​/​j​​.​c​p​b​.​2​0​1​6​.​1​2​.​0​0​2. 23. Hamelin C, Sempere G, Jouffe V, Ruiz M. TropGeneDB, the multi-tropical crop information system updated and extended. Nucleic Acids Res. 2013;41. ​h​t​t​p​​s​:​ /​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​0​9​3​​/​n​​a​r​/​g​k​s​1​1​0​5. 24. Valentin G, Abdel T, Gaëtan D, Jean-François D, Matthieu C, Mathieu R. Green- PhylDB v5: a comparative pangenomic database for plant genomes. Nucleic Acids Res. 2020;49:D1464–71. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​0​9​3​​/​n​​a​r​/​g​k​a​a​1​0​6​8. 25. Larmande P, Gay C, Lorieux M, Périn C, Bouniol M, Droc G, et al. Oryza Tag Line, a phenotypic mutant database for the Génoplante rice insertion line library. Nucleic Acids Res. 2008;36:1022–7. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​0​9​3​​/​n​​a​r​/​g​k​m​7​ 6​2. 26. Dereeper A, Homa F, Andres G, Sempere G, Sarah G, Hueber Y, et al. SNiPlay3: a web-based application for exploration and large scale analyses of genomic variations. Nucleic Acids Res. 2015;43:W295-300. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​0​9​3​​/​n​​a​r​/​ g​k​v​3​5​1. https://bmcgenomdata.biomedcentral.com/articles/supplements/volume-26-supplement-1 https://bmcgenomdata.biomedcentral.com/articles/supplements/volume-26-supplement-1 https://doi.org/10.5281/zenodo.4694518 https://doi.org/10.5281/zenodo.4694518 https://github.com/SouthGreenPlatform/AgroLD_webapp https://github.com/SouthGreenPlatform/AgroLD_webapp https://github.com/SouthGreenPlatform/AgroLD_ETL https://github.com/SouthGreenPlatform/AgroLD_ETL https://doi.org/10.1038/sdata.2016.18 https://www.w3.org/TR/rdf11-concepts/ https://www.w3.org/TR/rdf11-concepts/ https://doi.org/10.1038/npre.2010.5060.1 https://doi.org/10.1093/nar/gky1049 https://doi.org/10.1186/s13321-015-0084-4 https://doi.org/10.1186/s13321-015-0084-4 https://doi.org/10.1093/nar/gkv1024 https://doi.org/10.1093/bioinformatics/btw214 https://doi.org/10.1093/bioinformatics/btw214 https://doi.org/10.1093/nar/gkz997 https://doi.org/10.1111/pbi.13583 https://doi.org/10.1371/journal.pone.0198270 https://doi.org/10.1371/journal.pone.0198270 https://doi.org/10.1007/978-3-030-88361-4_29 https://doi.org/10.1007/978-3-030-88361-4_29 https://doi.org/10.1007/978-1-4939-3167-5_6 https://doi.org/10.1093/nar/gku1113 https://doi.org/10.1093/nar/gkx1111 https://doi.org/10.1093/nar/gkx1111 https://doi.org/10.1104/pp.105.063008 https://doi.org/10.1104/pp.105.063008 https://doi.org/10.1093/pcp/pcs183 https://doi.org/10.1093/pcp/pcs183 https://doi.org/10.1186/1939-8433-6-4 https://doi.org/10.1186/1939-8433-6-4 https://doi.org/10.1093/nar/gkac1000 https://doi.org/10.1093/nar/gkv253 https://doi.org/10.1093/nar/gkw982 https://doi.org/10.1093/nar/gkw982 https://doi.org/10.1093/nar/gkz1020 https://doi.org/10.1093/nar/gkz1020 https://doi.org/10.1016/j.cpb.2016.12.002 https://doi.org/10.1093/nar/gks1105 https://doi.org/10.1093/nar/gks1105 https://doi.org/10.1093/nar/gkaa1068 https://doi.org/10.1093/nar/gkm762 https://doi.org/10.1093/nar/gkm762 https://doi.org/10.1093/nar/gkv351 https://doi.org/10.1093/nar/gkv351 Page 10 of 10Pierre et al. BMC Genomic Data (2025) 26:73 27. The AgroLD online documentation. Accessed 31 July 2025. ​h​t​t​p​​:​/​/​​w​w​w​.​​a​g​​r​o​l​​ d​.​o​​r​g​/​d​​o​c​​u​m​e​n​t​a​t​i​o​n​.​j​s​p 28. The Sequence Ontology consortium. Sequence Ontology. 2015. Accessed 31 July 2025. ​h​t​t​p​​:​/​/​​w​w​w​.​​s​e​​q​u​e​​n​c​e​​o​n​t​o​​l​o​​g​y​.​o​r​g 29. The Gene Ontology Consortium. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. 2019;47:D330–8. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​0​9​3​​ /​n​​a​r​/​g​k​y​1​0​5​5. 30. Walls RL, Cooper L, Elser J, Gandolfo MA, Mungall CJ, Smith B, et al. The Plant Ontology Facilitates Comparisons of Plant Development Stages Across Spe- cies. Front Plant Sci. 2019;10. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​3​3​8​9​​/​f​​p​l​s​.​2​0​1​9​.​0​0​6​3​1. 31. Cooper L, Meier A, Laporte MA, Elser JL, Mungall C, Sinn BT, et al. The planteome database: an integrated resource for reference ontologies, plant genomics and phenomics. Nucleic Acids Res. 2018. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​0​9​3​​/​n​​ a​r​/​g​k​x​1​1​5​2. 32. Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, et al. The OBO foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol. 2007;25(11):1251–5. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​0​3​8​​/​n​​b​t​1​ 3​4​6. 33. Dumontier M, Baker CJ, Baran J, Callahan A, Chepelev L, Cruz-Toledo J, et al. The semanticscience integrated ontology (SIO) for biomedical research and knowledge discovery. J Biomed Semant. 2014;5(1):14. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​1​8​6​​ /​2​​0​4​1​-​1​4​8​0​-​5​-​1​4. 34. Bolleman JT, Mungall CJ, Strozzi F, Baran J, Dumontier M, Bonnal RJP, et al. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation. J Biomed Semant. 2016;7:39. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​1​ 8​6​​/​s​​1​3​3​2​6​-​0​1​6​-​0​0​6​7​-​z. 35. Relation Ontology consortium. OBO Relation Ontology. 2018. Accessed 31 July 2025. https://oborel.github.io 36. The global schema of AgroLD. Accessed 31 July 2025. ​h​t​t​p​​s​:​/​​/​g​i​t​​h​u​​b​.​c​​o​m​/​​S​o​ u​t​​h​G​​r​e​e​​n​P​l​​a​t​f​o​​r​m​​/​A​g​​r​o​L​​D​_​E​T​​L​/​​t​r​e​e​/​m​a​s​t​e​r​/​m​o​d​e​l 37. Cyganiak R. Tarql: SPARQL for Tables. 2018. Accessed 31 July 2025. ​h​t​t​p​s​:​/​/​t​a​r​q​l​ .​g​i​t​h​u​b​.​i​o​​​​​​​ 38. Dimou A, Sande M, Colpaert P, Verborgh R, Mannens E, Van De Walles R. RML: A generic language for integrated RDF mappings of heterogeneous data. In: CEUR Workshop Proc. 2014.Dimou A, Sande M, Colpaert P, Verborgh R, Mannens E, Van De Walles R. RML: A generic language for integrated RDF mappings of heterogeneous data. The 7th Workshop on Linked Data on the Web (LDOW2014) published in CEUR Workshop Proc. 2014 ​h​t​t​p​​s​:​/​​/​c​e​u​​r​-​​w​s​.​​o​r​ g​​/​V​o​l​​-​1​​1​8​4​​/​l​d​​o​w​2​0​​1​4​​_​p​a​p​e​r​_​0​1​.​p​d​f 39. Lefrançois, M., Zimmermann, A., Bakerally, N. (2017). A SPARQL Extension for Generating RDF from Heterogeneous Formats. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds) The Semantic Web. ESWC 2017. Lecture Notes in Computer Science(), vol 10249. Springer, Cham. ​h​t​t​p​​s​:​/​​ /​d​o​i​​.​o​​r​g​/​​1​0​.​​1​0​0​7​​/​9​​7​8​-​​3​-​3​​1​9​-​5​​8​0​​6​8​-​5​_​3 40. The 1000 Genome project Consortium. the Variant Call Format VCF. 2012. Accessed 31 July 2025. ​h​t​t​p​​:​/​/​​s​a​m​t​​o​o​​l​s​.​​g​i​t​​h​u​b​.​​i​o​​/​h​t​s​-​s​p​e​c​s​/ 41. The Sequence Ontology Consortium. The formal specification of GFF3. 2014. Accessed 31 July 2025. ​h​t​t​p​​:​/​/​​w​w​w​.​​s​e​​q​u​e​​n​c​e​​o​n​t​o​​l​o​​g​y​.​o​r​g 42. The Gene Ontology Consortium. Gene Annotation File GAF. 2014. Accessed 31 July 2025. ​h​t​t​p​​:​/​/​​g​e​n​e​​o​n​​t​o​l​​o​g​y​​.​o​r​g​​/​p​​a​g​e​​/​g​o​​-​a​n​n​​o​t​​a​t​i​​o​n​-​​f​i​l​e​​-​f​​o​r​m​a​t​-​2​0 43. The ETL API of AgroLD. Accessed 31 July 2025. ​h​t​t​p​​s​:​/​​/​g​i​t​​h​u​​b​.​c​​o​m​/​​S​o​u​t​​h​G​​r​e​e​​ n​P​l​​a​t​f​o​​r​m​​/​A​g​r​o​L​D​_​E​T​L 44. Laibe C, Wimalaratne S, Juty N, Le Novère N, Hermjakob H. Identifiers. org: integration tool for heterogeneous datasets. Dils 2014. 2014;14. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​ g​/​​1​0​.​​6​0​8​4​​/​m​​9​.​f​​i​g​s​​h​a​r​e​​.​1​​2​3​2​1​2​2​.​v​1. 45. Singh A, Rawlings CJ, Hassani-Pak K. Knetmaps: a BioJS component to visual- ize biological knowledge networks. F1000Res. 2018. ​h​t​t​p​​s​:​/​​/​d​o​i​​.​o​​r​g​/​​1​0​.​​1​2​6​8​​8​ /​​f​1​0​​0​0​r​​e​s​e​a​​r​c​​h​.​1​6​6​0​5​.​1. 46. Rietveld L, Hoekstra R. The YASGUI family of SPARQL clients. Semant Web J. 2015;30:10127–34. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. http://www.agrold.org/documentation.jsp http://www.agrold.org/documentation.jsp http://www.sequenceontology.org https://doi.org/10.1093/nar/gky1055 https://doi.org/10.1093/nar/gky1055 https://doi.org/10.3389/fpls.2019.00631 https://doi.org/10.1093/nar/gkx1152 https://doi.org/10.1093/nar/gkx1152 https://doi.org/10.1038/nbt1346 https://doi.org/10.1038/nbt1346 https://doi.org/10.1186/2041-1480-5-14 https://doi.org/10.1186/2041-1480-5-14 https://doi.org/10.1186/s13326-016-0067-z https://doi.org/10.1186/s13326-016-0067-z https://oborel.github.io https://github.com/SouthGreenPlatform/AgroLD_ETL/tree/master/model https://github.com/SouthGreenPlatform/AgroLD_ETL/tree/master/model https://tarql.github.io https://tarql.github.io https://ceur-ws.org/Vol-1184/ldow2014_paper_01.pdf https://ceur-ws.org/Vol-1184/ldow2014_paper_01.pdf https://doi.org/10.1007/978-3-319-58068-5_3 https://doi.org/10.1007/978-3-319-58068-5_3 http://samtools.github.io/hts-specs/ http://www.sequenceontology.org http://geneontology.org/page/go-annotation-file-format-20 https://github.com/SouthGreenPlatform/AgroLD_ETL https://github.com/SouthGreenPlatform/AgroLD_ETL https://doi.org/10.6084/m9.figshare.1232122.v1 https://doi.org/10.6084/m9.figshare.1232122.v1 https://doi.org/10.12688/f1000research.16605.1 https://doi.org/10.12688/f1000research.16605.1 AgroLD: a knowledge graph for the plant sciences Abstract Introduction Methods Information content AgroLD integration pipelines URI design and data linking Results Discussion Conclusion References