Research Ideas and Outcomes : Conference Abstract
Conference Abstract
From data pipelines to FAIR data infrastructures: A vision for the new horizons of bio- and geodiversity data for scientific research
expand article infoSharif Islam‡,§, Claus Weiland|, Wouter Addink‡,§
‡ Naturalis Biodiversity Center, Leiden, Netherlands
§ Distributed System of Scientific Collections - DiSSCo, Leiden, Netherlands
| Senckenberg – Leibniz Institution for Biodiversity and Earth System Research, Frankfurt am Main, Germany
Open Access


Natural science collections are vast repositories of bio- and geodiversity specimens. These collections, originating from natural history cabinets or expeditions, are increasingly becoming unparalleled sources of data facilitating multidisciplinary research (Meineke et al. 2018, Heberling et al. 2019, Cook et al. 2020, Thompson et al. 2021). Due to various global data mobilization and digitisation efforts (Blagoderov et al. 2012,Nelson and Ellis 2018), this digitised information about specimens includes database records along with two/three-dimensional images, sonograms, sound or video recordings, computerised tomography scans, machine-readable texts from labels on the specimens as well as media items and notes related to the discovery sites and acquisition (Hedrick et al. 2020,Phillipson 2022).

The scope and practice of specimen gathering are also evolving. The term extended specimen was coined to refer to the specimen and associated data extending beyond the singular physical object to other physical or digital entities such as chemical composition, genetic sequence data or species data. Thus the specimen becomes an interconnected network of data resources that have incredible potential to enhance integrative and data-driven research (Webster 2017,Lendemer et al. 2019,Hardisty et al. 2022). These practices also reflect the role of data and the curatorial data life-cycle starting from the initial material sampling process to the downstream analysis. We are also seeing growing acknowledgement that disparate and domain specific data elements prevent interdisciplinarity which is crucial for a holistic understanding of biodiversity and climate crisis (Hicks et al. 2010, Craven et al. 2019, Folk and Siniscalchi 2021). 

Thus the data elements are not just records or rows in a database or data pipelines going from one repository to another. They have the potential to become self-describing digital artefacts that can revolutionise how machines interpret and work with specimen data. Within this context, the Distributed System of Scientific Collections (DiSSCo), a new European Research Infrastructure for natural science collections, envisions an infrastructure based on FAIR Digital Objects (FDO) that can unify more than 170 European natural science collections under common and FAIR-compliant (Findable, Accessible, Interoperable, Reusable) (Wilkinson et al. 2016) access and curation policies and practices. DiSSCo’s key element in achieving FAIR is the implementation of Digital Specimen (a domain specific FDO) that closely aligns with the extended specimen practices. The idea behind Digital Specimen – an FDO that acts as a digital surrogate for a specific physical specimen in a natural science collection – was influenced by global conversations around the implementation of the Digital Object Architecture for biodiversity data (De Smedt et al. 2020, Islam et al. 2020,Hardisty et al. 2020). 

The main purpose of this talk is to explain the vision of how FAIR and FDO can create a data infrastructure that can not only take advantage of existing databases and repositories but at the same time provide support for innovative services such as AI and digital twinning. With scientific use cases in mind, the talk will highlight a few key FAIR and FDO components (persistent identifiers, metadata, ontologies) within the collaborative modelling activity of Digital Specimen specification. These components provide the template for specifying how a Digital Specimen should look so DiSSCo can build a FAIR service ecosystem based on FDOs (Addink et al. 2021). We will also give examples of envisioned services that can help with image feature extraction, and model training (Grieb et al. 2021,Hardisty et al. 2022) and digital twinning (Schultes et al. 2022). We believe this is an exciting new paradigm powered by FAIR and FDO that can help both humans and machines to accelerate the use of specimen data. From physical objects curated over hundred years, we have developed data pipelines, aggregators and repositories (Barberousse 2021). Now is the time to look for solutions where these data records can become FAIR Digital Objects to enable wider access and multidisciplinary research.


FAIR data infrastuctures, biodiveristy data, interdisciplinarity, digital specimen, digital twinning, FAIR Digital Objects, DiSSCo

Presenting author

Sharif Islam

Presented at

First International Conference on FAIR Digital Objects, presentation

Funding program

H2020-INFRADEV-2019-2020 – Grant Agreement No. 871043

Grant title

DiSSCo Prepare