Research Ideas and Outcomes :
Conference Abstract
|
Corresponding author: Christian Greiner (christian.greiner@kit.edu)
Received: 15 Sep 2022 | Published: 12 Oct 2022
© 2022 Ilia Bagov, Christian Greiner, Nikolay Garabedian
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Bagov I, Greiner C, Garabedian N (2022) Collaborative Metadata Definition using Controlled Vocabularies, and Ontologies. Research Ideas and Outcomes 8: e94931. https://doi.org/10.3897/rio.8.e94931
|
Data's role in a variety of technical and research areas is undeniably growing. This can be seen, for example, in the increased investments in the development of data-intensive analytical methods such as artificial intelligence (
The first component of the presented framework is a controlled vocabulary of the domain related to the data which needs to be annotated. A controlled vocabulary is a collective that denotes a controlled list of terms, their definitions, and the relations between them. In the framework presented in this contribution, the terms correspond to the metadata fields used in the data annotation process. Formally, the type of controlled vocabularies used in the framework is a thesaurus (
Despite their advantages, one limit of thesauri is their lacking capability of relating metadata fields to each other in a more semantically rich fashion. This motivated the use of the second component of the framework, namely ontologies. An ontology can be defined as “a specification of a conceptualization” (
The components described above are being implemented in the form of multiple software tools related to the framework. The first one, a controlled vocabulary editor written as a Python-based web application called VocPopuli, is the entry point for domain experts who want to develop a metadata vocabulary for their field of research or lab. The software, whose first version is already being tested internally, enables the collaborative definition, and editing of metadata terms. Additionally, it annotates each term, as well as the entire vocabulary, with the help of the PROV Data Model (PROV-DM) (
The second software solution will facilitate the transformation of the vocabularies developed with the help of VocPopuli into ontologies. It will handle two distinct use cases – the from-scratch conversion of vocabularies into ontologies, and the augmentation of existing ontologies with the terms from a given thesaurus. As is the case with VocPopuli, the second tool is being developed in the Python programming language. The software solutions will be finally tested by two semi-overlapping groups of users from materials science. On the one hand, domain experts will input, edit, and discuss vocabulary terms in their area of interest, and thus create vocabularies. On the other hand, vocabulary and ontology administrators will oversee the vocabulary creation, and ontology transformation processes in a semi-automatic fashion.
After development is complete, the tools will be used in the creation of controlled vocabularies for various experimental procedures, as well as their transformation and/or integration into semantically richer ontologies. This will augment our already published work in the area (
research data, thesaurus, ontology engineering, Python
Ilia Bagov
First International Conference on FAIR Digital Objects, poster
Helmholtz Metadata Collaboration
European Research Council (ERC) Grant No. 771237 (TriboKey)
Alexander von Humboldt Foundation Postdoctoral Fellowship for Nikolay Garabedian
MetaCook: The Metadata Cookbook