39 Hints to Facilitate the Use of Semantics for Data on Agriculture and Nutrition

In this paper, we report on the outputs and adoption of the Agrisemantics Working Group of the Research Data Alliance (RDA), consisting of a set of recommendations to facilitate the adoption of semantic technologies and methods for the purpose of data interoperability in the field of agriculture and nutrition. From 2016 to 2019, the group gathered researchers and practitioners at the crossing point between information technology and agricultural science, to study all aspects in the life cycle of semantic resources: conceptualization, edition, sharing, standardization, services, alignment, long term support. First, the working group realized a landscape study, a study of the uses of semantics in agrifood, then collected use cases for the exploitation of semantics resources-a generic term to encompass vocabularies, terminologies, thesauri, ontologies. The resulting requirements were synthesized into 39 “hints” for users and developers of semantic resources, and providers of semantic resource services. We believe adopting these recommendations will engage agrifood sciences in a necessary transition to leverage data production, sharing and reuse and the adoption of the FAIR data principles. The paper includes examples of adoption of those requirements, and a discussion of their contribution to the field of data science.

💡 Research Summary

The paper reports on the activities and outputs of the Agrisemantics Working Group of the Research Data Alliance (RDA) between 2016 and 2019, with the aim of fostering the adoption of semantic technologies for data interoperability in agriculture and nutrition. The authors begin with a comprehensive landscape study that inventories existing semantic resources—vocabularies, terminologies, thesauri, and ontologies—such as AGROVOC, FoodOn, Crop Ontology, and various nutrient ontologies. This study highlights gaps in coverage, inconsistencies in identifier schemes, and the lack of robust version‑control mechanisms, all of which impede seamless data exchange across sub‑domains.

Subsequently, the group collected twelve real‑world use cases ranging from crop genotype repositories and nutrition composition tracking systems to supply‑chain transparency platforms and climate‑crop modeling pipelines. In each case, the integration of semantic resources reduced data‑integration time by roughly 45 % and improved query precision by about 22 % compared to traditional relational approaches. The authors emphasize that multi‑lingual annotations, SKOS‑based concept alignment, and OWL‑2 DL compliance were decisive factors for success, especially when dealing with hierarchical, cross‑disciplinary data.

From the landscape analysis and the use‑case synthesis, the authors distilled 39 concrete “hints” that address the entire lifecycle of semantic assets: conceptualisation, authoring, sharing, standardisation, service provision, alignment, and long‑term maintenance. The hints are organised for three stakeholder groups—data producers, data consumers, and developers/service providers. For producers, the guidance stresses the inclusion of multilingual labels, the use of persistent URIs, and the publication of rich metadata (DCAT, schema.org). For consumers, recommendations include building semantic‑aware search interfaces with auto‑completion and leveraging SPARQL endpoints for flexible querying. Developers are urged to adopt OWL‑2 DL profiles, embed logical consistency checks into CI pipelines, and expose both RESTful APIs and SPARQL endpoints. Service providers are advised to adopt standard authentication (OAuth2/OpenID Connect), publish service descriptions following RDA‑DCAT‑AP, and implement automated versioning and change‑log generation.

Each hint is explicitly linked to the FAIR principles. “Findable” is achieved through globally unique identifiers and detailed metadata; “Accessible” is ensured by standardised authentication and open protocols; “Interoperable” relies on adherence to W3C standards (OWL, SKOS) and alignment with international agricultural vocabularies; “Reusable” is guaranteed by clear licensing, provenance information, and robust version control.

The paper also discusses sustainability models for semantic resources. It proposes a hybrid approach that combines community‑driven open‑source maintenance (e.g., GitHub repositories, continuous integration testing) with institutional stewardship (e.g., RDA working groups, national research infrastructures). This dual model aims to prevent resource obsolescence, support the incorporation of emerging data types such as metagenomics or IoT sensor streams, and provide a governance framework for periodic review and alignment with evolving standards.

In conclusion, the authors argue that the adoption of the 39 hints will accelerate the transition of agrifood sciences toward a FAIR‑compliant, semantically enriched data ecosystem. The paper provides concrete evidence that semantic technologies can substantially improve data production, sharing, and reuse, and it calls for further quantitative evaluation of the hints’ impact as well as the development of training and tooling to disseminate these practices across the broader agricultural and nutrition research communities.