Research Ideas and Outcomes :
Conference Abstract
|
Corresponding author: Daan Broeder (daan.broeder@gmail.com)
Received: 02 Sep 2022 | Published: 12 Oct 2022
© 2022 Daan Broeder, Willem Elbers, Michal Gawor, Cesare Concordia, Nicolas Larrousse, Dieter Van Uytvanck
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Broeder D, Elbers W, Gawor M, Concordia C, Larrousse N, Van Uytvanck D (2022) Towards FAIR Data Access. Research Ideas and Outcomes 8: e94386. https://doi.org/10.3897/rio.8.e94386
|
Background
In the past decade many different national, EU and global projects have been successful in raising awareness about Open Science and the importance of making data findable and accessible such as stated in the FAIR principles (
In this respect, there have been many advances with respect to options for discovering data. A multitude of either thematic or general catalogues are providing faceted browsing interfaces for humans and Application Programming Interfaces (APIs) for use by machines and similarly, data-citations in publications offer references to resources hosted by repositories. However, using such catalogues and data-citations, researchers are not guaranteed to obtain access to the data itself. Mostly the resource link in the catalogue (and also in the metadata) or citation is a “landing-page”, a description of the resource meant for human consumption. The landing-page may contain instructions how to access or download the resource itself but usually it is difficult to parse by machines.
FAIR data access
Thus the approach sketched above does not meet the requirements in scenarios where applications need assured and quick access to data. Also the FAIR principles interpretation from GO FAIR states*
Note that we ignore the need for user authentication and authorization prior to accessing data, here we will only consider data that is ‘freely’ accessible.
To improve the situation with respect to machine data accessibility a number of technologies and approaches that have been discussed in the CLARIN and Social Sciences and Humanities (SSH) infrastructure domain can be useful. We present some and comment on their suitability.
Signposting
Signposting*
In the CLARIN community the signposting concept was accepted, but its proposed implementation deviated from van de Sompel and made it less dependent on the HTTP protocol (
CLARIN Digital Object Gateway (DOG)
One approach that is currently worked on for the CLARIN research infrastructure is the creation of a DOG library*
DOG works in two steps: first obtaining metadata from the resource PID and secondly extracting resource links from the metadata. Each of the repositories registered within DOG has a minimal configuration specifying how to parse fields of interest from the resource's metadata. For B-type CLARIN centres DOG uses content negotiation as the primary way of obtaining the metadata in CMDI format. For repositories outside the CLARIN infrastructure, DOG primarily relies on the API provided by the repository in order to access metadata and data resources.
The DOG solution does have scalability problems, but within the limited domain of CLARIN centres, it can offer a solution until a better one becomes available.
Limited PID kernel information
The (limited) PID kernel information approach assumes that for every Digital Object (DO) (
FAIR Digital Objects (FDO)
FDOs*
A rich set of FDO attributes permit signaling machines processing FDOs where and how to access bitstream data including for instance additional information about supported protocols and APIs.
What to do?
For our community and in our collaboration with others, we need solutions now but would prefer not to invest and get closed in unscalable technologies.
We would propose to combine the DOG approach with signposting. First testing URIs (obtained by resolving the Handle PID) for the presence of HTTP Link Headers. If these are missing, (extended) DOG could use its idiosyncratic workflow. Long term we see advantages of the general, scalable and protocol independent approach that FDOs offer. Hybrid solutions are conceivable where FDO proxies can sit between the FDO machinery and data hosted by signposting compliant repositories.
data management, metadata, CLARIN, PIDs, repositories, Signposting, Fair Digital Objects
Daan Broeder
1st International Conference on FAIR Digital Objects, presentation
GO-FAIR, FAIR principles, accessed on: 2022-7-6 www.go-fair.org/fair-principles/
GO FAIR, FAIR principles F1: (Meta) data are assigned globally unique and persistent identifiers, accessed on: 2022-7-6 http://www.go-fair.org/fair-principles/f1-meta-data-assigned-globally-unique-persistent-identifiers/
signposting website, accessed on: 2022-7-6 https://signposting.org
RFC5988, accessed on: 2022-7-6 doi:10.17487/RFC5988
CLARIN ERIC GitHub, Digital Object Gate library README, accessed on: 2022-7-6 https://github.com/clarin-eric/DOGlib
CLARIN website, CE-2013-0095: Checklist for CLARIN B Centres, accessed on: 2022-7-6 https://hdl.handle.net/11372/DOC-78
DARIAH-DE repository documentation, Resolving and Persistent Identifiers, accessed on: 2022-7-6 https://repository.de.dariah.eu/doc/services/resolving.html
FAIR Digital Objects Forum website, coordinates the FDO work and keeps track on related publications, accessed on: 2022-7-06 https://fairdo.org/library/