Research Ideas and Outcomes :
Research Article
|
Corresponding author: Sharif Islam (sharif.islam@naturalis.nl)
Academic editor: Francisco Andres Rivera Quiroz
Received: 29 Jun 2023 | Accepted: 31 Jul 2023 | Published: 12 Sep 2023
© 2023 Sharif Islam, James Beach, Elizabeth R. Ellwood, Jose Fortes, Larry Lannom, Gil Nelson, Beth Plale
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Islam S, Beach J, Ellwood ER, Fortes J, Lannom L, Nelson G, Plale B (2023) Assessing the FAIR Digital Object Framework for Global Biodiversity Research. Research Ideas and Outcomes 9: e108808. https://doi.org/10.3897/rio.9.e108808
|
|
In the first decades of the 21st century, there has been a global trend towards digitisation and the mobilisation of data from natural history museums and research institutions. The development of national and international aggregator systems, which focused on data standards, made it possible to access millions of museum specimen records. These records serve as an empirical foundation for research across various fields. In addition, community efforts have expanded the concept of natural history collection specimens to include physical preparations and digital resources, resulting in the Digital Extended Specimen (DES), which also includes derived and related data. Within this context, the paper proposes using the FAIR Digital Object (FDO) framework to accelerate the global vision of the DES, arguing that FDO-enabled infrastructures can reduce barriers to the discovery and access of specimens, help ensure credit back to contributors and increase the amount of research that incorporates biodiversity data.
FAIR data, FDO, natural history museums, natural science collections, biodiversity specimens, interoperablity, persistent identifiers, FAIR implementation, extended specimen network
The first decades of the 21st century have seen a massive increase in the digitisation and mobilisation of the data from hundreds of millions of specimens curated in thousands of natural history museums and research centres distributed around the world (
This paper introduces and demonstrates the application of the FAIR Digital Objects (FDO) framework (
The foundational concept of the FDO framework is the digital object. While the term “digital object” can be applied to any digitised piece of information (
The FDO framework does not directly result in biodiversity data integration or immediately resolve interoperability challenges, but provides methods, mechanisms and objects within which those problems can be more readily and robustly solved (see Fig.
A conceptual model of the FDO framework applied to the DES. This figure pinpoints the FDO interface with protocols that is needed for access to the objects. The figure also shows existing digital objects and Collections (as in museum collections) contributing to the digital extended specimen objects through create and update operations and receiving credit and usage information for their contributions.
The authors of this paper bring together their perspectives, based on involvement in museum collections, biodiversity aggregators, data fabric infrastructure research and biodiversity data usage. Starting in May 2020, we began a series of discussions to discern the guiding principles by which the FDO vision could accelerate the global realisation of the DES. We think that the FDO framework can reduce barriers to the discovery and access of biodiversity specimens and can facilitate the emergence of digital extended specimens. The FDO vision could increase the volume of research that can incorporate biodiversity data into its research questions while appropriating credit to the specimen sources.
This paper provides a conceptual model of the FDO framework applied to the DES, based on the consensus of the authors. In our subsequent work, we aim to detail the challenges of how specific existing DES data storage and management systems, such as relational databases and collections management systems, may or may not seamlessly fit into the FDO framework. This ongoing work will provide an opportunity to share experiences with concrete examples and empirical evidence.
The extended data of a physical specimen is held in diverse information sources, such as the Global Biodiversity Information Facility (GBIF), Geoscience Collections Access Service (GeoCASe), World Register of Marine Species (WoRMS), Barcode of Life Data System (BOLD) and Biodiversity Heritage Library (BHL) emerging from a variety of information models, data formats, application programming interfaces (API) and access controls. The FDO vision abstracts away the discord of multiple systems through a conceptual layer of digital objects that would be standardised everywhere across the Internet. In the simplest design, one FDO exists for each DES (in a 1:1 relation).
The implementation and application of the FDO framework (or any other framework) are closely linked to the schemas that define metadata and data. These schemas are built on rich data models supported by community culture, research-driven processes and agreements (
Networks of data are inherently dynamic and technologies evolve. A key first principle in the FDO framework is global referential integrity, a principle underpinned by the first FAIR principle: F1. (Meta)data are assigned globally unique and persistent identifiers. The topic of persistent identifiers (PIDs) has been well covered in the literature (see
To harmonise information management tasks and ensure consistency across all FDOs, the following key principles are applied:
Every object has a globally unique, persistent and actionable identifier;
Every object is typed i.e. classified against a specific definition of what the object represents and how it is represented;
Every object has tightly associated metadata that describes it;
Every object has a queryable set of operations that can be requested of it, as determined by its type;
Every object can be addressed and accessed via a common protocol, for example, the Digital Object Interface Protocol (DOIP).
In the absence of implementation detail, the components and services minimally required to implement the FDO framework include:
Persistent identifiers plus an identifier resolution system;
A minimum set of metadata (known as PID Kernel records or FDO records);
Defined digital object types accessible from a well-known set of type registries;
Digital Object Repositories, aka “Object Servers”, including repositories of metadata, aka “metadata registries”;
Mapping/brokering software and services to map existing data storage and management systems, such as relational databases and collections management systems to the FDO paradigm;
An access protocol, such as DOIP, implemented by FDO repositories and applications.
In a fully built-out FDO framework for the DES, there will exist a fixed set of organisational functions. These responsibilities and control points can be distributed across multiple organisations or centralised in a few or exist in some mix of the two. These functions may consist of existing roles or new ones, with the same set of alternatives applying across consortia and standards bodies. Starting with an assumed set of existing records of some type, the top level organisational functions can be summarised as such:
Establishing a standard set of types into which DESs are categorised and differentiated;
Agreeing on PID regime(s) for identifying and resolving FDOs;
Registering type information for new FDO types into a global type registry;
Ensuring machine-actionable access to DES objects, including authorisation and authentication;
Providing backups or dark archives of FDOs for guaranteed persistence.
We acknowledge that the creation of specific FDOs accounts for much of the effort involved in implementing the FDO framework on top of existing practices. However, the prevalence of certain Collection Management Systems (CMSs), such as Specify and widely used standards, such as Darwin Core, will allow many of the early efforts to be reused and perhaps even built into subsequent CMS versions and other tooling. Standards will also play a role in the specific schemas used for DES, such as DiSSCo’s proposed openDS. Obtaining agreement on one or a few such schemas will be essential for interoperability amongst different types of digital specimen objects.
Distributed System of Scientific Collections (DiSSCo) is a research infrastructure in preparation for a portfolio of FAIR services along with capacity building and training to unify European natural science collections data. The FDO vision helps DiSSCo to apply the FAIR principles for natural science collections data use cases and services (
The DiSSCo vision builds on the concept of the DES (which the project calls “Digital Specimen”) -- a digital object acting as a digital surrogate on the internet for a specific physical specimen in a collection (
At the time of writing of this paper, the DiSSCo sandbox implementation is using open source components, such as Postgres, Kafka and Kubernetes, to create an agile, modular and scalable implementation (see Fig.
The envisioned DiSSCo data infrastructure and services. For more information, see https://www.dissco.eu/services.
A few notes about the Digital Specimen FDO record:
The structure and content of the DiSSCo FDO profile are evolving with ongoing discussions within DiSSCo and the FDO Forum. This dynamic development process ensures that the FDO profiles are continually refined and optimised to meet the diverse needs of the DiSSCo project and its stakeholders.
We have introduced the FDO framework applied to the DES. Its adoption will have advantages and complications that will require a community effort. We address both here. The FDO framework's overriding advantage is that the existing global heterogeneity of collection information management and data repositories is pushed down a level of abstraction such that their design details need to be known only to those who are maintaining those systems. Additionally, the FDO framework simplifies the biodiversity community’s networked data space in the following ways:
The FDO framework can represent a dynamic representation of a specimen that, over time, accretes links and pointers to new scholarly treatments and research analyses. Seeded by museum collection data, DES digital objects will link to many types of related scientific and societal information not currently modelled or processed in Collections Management Systems (CMS) rooted in cataloguing and curation.
Our extensive discussions also ranged to address questions about the disruption that the technology might cause. Irrespective of the FDO framework, value-adding changes to digital objects derived from collection catalogue records will occur beyond the scope of CMS and curatorial practice. However, that is happening today as distributed researchers and automated methods update and annotate copies of specimen records in aggregation databases with no synchronisation with source CMSs. A global FDO framework could make it easier for annotated records to be linked back to the source CMS.
A global network of Digital Extended Specimens could attenuate the traditional status of a museum as the “Source of Authority” for information about their holdings. Specimen records in institutional CMS will continue to be the coin of the realm for collection curation and asset management, but, ultimately, many of those records may no longer be the authoritative, complete or up-to-date sources of information for the specimens they proxy.
Although it would be futile to predict the exact consequences of this technological change, the emergence of an FDO-enabled DES network will motivate researchers and their museum stakeholders to re-evaluate their roles and level of participation in the re-architected data community. Natural history museums represent a massive, distributed archive of Earth’s biological and geological diversity. Once specimen data are pulsing through the pipelines and services of a networked DES, will there be incentives for collections' institutions to curate and update records in a re-partitioned specimen data space, organised along biogeographic or taxonomic themes and let the traditional collection cataloguing, curation and primary publishing paradigm give way? We are excited about exploring these questions with global stakeholders to find a sustainable implementation model.
The application of the FDO framework to the DES cannot solve all challenges in scientific practices and data standardisation. However, similar to the earlier technological interoperability of the Internet, it can provide a needed layer to overcome the obstacles to data stewardship and standardisation of cross-disciplinary data. While our focus in this paper centres on the challenges related to specimens and extended specimens, it is worth noting that the concept of FDOs can be extended to encompass the broader field of biodiversity research.
The FDO framework is agnostic to the social and political challenges involved in global integration for facilitating easier research access. However, given the pressing issues of biodiversity and climate change, we are compelled to explore how data records can be transformed into FAIR Digital Objects, ensuring broader access and seamless interoperability. Implementing FAIR principles and adopting the FDO framework are keys to meeting the demands of the present times and maximising the potential for impactful research and collaboration across diverse scientific domains.
The authors would like to acknowledge the attendees of the First International Conference on FAIR Digital Objects (2022) for their valuable feedback. We also want to express our gratitude for the feedback provided by the DiSSCo Technical Team. The work on DiSSCo received funding from the European Union under grant agreements no. 871043 (DiSSCo Prepare) and 101007492 (BiCIKL).
A reviewer of this paper is affiliated with the DiSSCo Project.