Research Ideas and Outcomes :
Project Report
|
Corresponding author: Steffen Neumann (sneumann@ipb-halle.de)
Received: 08 Apr 2024 | Published: 04 Jul 2024
© 2024 Steffen Neumann, Ann-Christin Andres, Felix Bach, Theo Bender, Christian Bonatto Minella, Franziska Eberl, Tillmann Fischer, Benjamin Golub, Shashank Harivyasi, Sonja Herres-Pawlis, Pei-Chi Huang, Johannes Hunold, John Jollife, Nicole Jung, Johannes Liermann, Venkata Nainala, Matthias Razum, Oliver Koepler, Christoph Steinbeck
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Neumann S, Andres A-C, Bach F, Bender T, Bonatto Minella C, Eberl F, Fischer TG, Golub B, Harivyasi SS, Herres-Pawlis S, Huang P-C, Hunold J, Jollife JD, Jung N, Liermann JC, Nainala V, Razum M, Koepler O, Steinbeck C (2024) Interim Report NFDI4Chem 2023. Research Ideas and Outcomes 10: e124977. https://doi.org/10.3897/rio.10.e124977
|
The progress of the DFG-funded NFDI4Chem consortium (NFDI 4/1 - project number 441958208) in data management in chemistry is outlined in our latest report, highlighting the steps we have taken to integrate a data-centric approach within the chemistry community. This interim report offers a comprehensive overview of our data management activities, covering the reporting period from October 2020 to August 2023.
The shift to digital tools in research documentation is driven by our work with Electronic Laboratory Notebooks (ELNs), such as Chemotion ELN, offering systematic data storage for easy retrieval and sharing. Additionally, we focus on developing repositories, such as Chemotion repository and RADAR4Chem, which fulfil the needs for the storage of chemical data. The NFDI4Chem Search Service ensures easy data access from our repositories. Our efforts extend to community engagement through conference visits and online presence, aimed at creating awareness for (digital) research data management and connecting to chemistry students and researchers. Our training programs have reached over 600 participants to date. Initiatives like the FAIR4Chem award and the Chemistry Data Days promote cultural change towards FAIR data. Our Editors4Chem initiative collaborates with publishers for standardised data management and the Ontologies4Chem workshops organised by our consortium promote the ontology development in the field.
Apart from the consortium's engagement for chemists, NFDI4Chem members played key roles in the development of the NFDI as a whole. Being actively involved in the sections and task forces, NFDI4Chem promotes collaborative solutions across NFDI consortia.
Chemistry, Research Data Management, Electronic Laboratory Notebooks, Repositories, Metadata, NFDI, Research Data Infrastructure, Digitalisation
This interim report of the progress in the DFG-funded project NFDI4Chem (NFDI 4/1 - project number 441958208) covers the reporting period from October 2020 to August 2023.
NFDI4Chem started in October 2020 with 27 partners from 21 organisations. Over time, some participants changed their affiliation and two new partners joined. In July 2022, the University of Stuttgart joined the consortium, bringing expertise in enzymology and biocatalysis and supporting the development of standards and smart laboratories. In June 2023, the Federal Institute for Materials Research and Testing (BAM) joined the consortium, strengthening its international integration and expertise in Research Data Management (RDM) solutions for materials science and chemical analytical techniques. As of August 2023, the NFDI4Chem consortium comprises 22 organisations, of which are seven (co-) applicant institutions and 15 participating institutions.
From the beginning, NFDI4Chem has made efforts to systematically identify the needs of the community with our survey (
NFDI4Chem has strong links with the major sub-disciplines of chemistry (organic, inorganic, pharmaceutical, physical chemistry) through learned societies as consortium members (GDCh, DPhG, Bunsen Society, FID Pharmazie) and outreach activities. Collaborative development with specialised communities, such as chemical ontologies, NMR, EPR standards, electrochemistry, coordination chemistry, macromolecules and enzymes, includes regular meetings and workshops to gather feedback. Users can provide feedback via the Helpdesk and GitHub repositories. Interaction with the ontology community is particularly strong, as evidenced by the annual Ontologies4Chem workshop with ~ 50 participants from major chemistry ontology projects. NFDI4Chem actively participates in working groups on standards for chemical analytical methods.
The NFDI4Chem training programme is critical in promoting RDM awareness. We have developed a basic two-day interactive course specifically for chemists. It covers the basics of RDM and specific chemistry-related scenarios. The workshop, based on the FD Mentor concept (
To promote RDM awareness, we presented the FAIR4Chem award at the GDCh JungChemikerForum (JCF) in 2022 and 2023. In June 2023, we organised the first Chemistry Data Days, a two-day conference on data management in chemistry for the non-RDM expert, with around 100 participants, inspiring chemists about the potential of research data. We engage with the community through monthly 'Stammtisch' discussions on RDM, ELNs, repositories, ontologies and molecular representations, along with Chemotion Q&A sessions and various virtual and physical RDM and Chemotion workshops.
NFDI4Chem is active on several social media channels. Our X (formerly Twitter) account has 1062 followers (15/09/2023) and has published 412 tweets. On LinkedIn, we have published > 150 articles for 442 followers and we have launched an Instagram account in March 2023.
NFDI4Chem members have been active in shaping the NFDI and its initiatives, with our spokesperson leading the Consortium Assembly in 2021/22. NFDI4Chem spokespersons and co-spokespersons played key roles in NFDI strategy workshops (
The work of the sections highlighted the need for collaborative solutions across NFDI consortia. As a result, a collaborative grant proposal called "Base4NFDI" (
NFDI4Chem strongly supports the Code of Conduct adopted by the NFDI (
NFDI4Chem and community members have also contributed to the NFDI infra-talk series and co-organise the Physical Sciences Consortia Joint Colloquium with five other consortia. One of our co-spokespersons leads the ELN for Experimental Sciences interest group with three other natural sciences consortia. The interdisciplinary Joint Terminology Service is led by another NFDI4Chem co-spokesperson and a member of NFDI4Ing, with contributions from NFDI4Cat and NFDI4Culture. In addition, NFDI4Chem is strengthening local links with other consortia, having (co-)organised six local networking events in the past.
A complete list of all collaborations within the NFDI Association, as well as bi- and multilateral collaborations with other consortia, can be found in
While collaborations offer opportunities to exploit synergies and fasten or optimise the output for the community, it also creates additional workload that was not foreseen in the consortium’s work program. Nevertheless, we consider the engagement in cross-cutting topics within the NFDI as absolutely valuable and necessary to create an overarching research data infrastructure across all disciplines.
NFDI4Chem believes that the development of standards and best practices for research data management should be undertaken with a global perspective. Therefore, NFDI4Chem is active in the global community of chemists and research data infrastructure experts. NFDI4Chem has strengthened its collaboration with the International Union of Pure and Applied Chemistry (IUPAC), linking measures from its task areas (TAs) to several IUPAC projects. TA4 and TA6 are collaborating with the FairSpec project (
Additionally, NFDI4Chem has participated in two IUPAC Global women’s breakfasts, showcasing the achievements of women in NFDI4Chem. We also recently participated in a panel discussion on critical reflections on RDM at the SDG Graduate Schools Alliance midterm conference, focusing on equal opportunities for researchers in the global south.
NFDI4Chem has established collaborations and joint activities with international learned societies. An important forum for data management in chemistry is the Division of Chemical Information (CINF) of the American Chemical Society (ACS). During ACS CINF events, NFDI4Chem has strengthened links with international collaborators, including the UK's Physical Sciences Data Infrastructure (PSDI). At the 2023 Fall Meeting of the American Chemical Society, NFDI4Chem co-organised two sessions on "Helping Chemists manage their Data" and "Metadata to Knowledge Graphs".
NFDI4Chem regularly collaborates with the European Chemical Society (EuChemS). During the EuChemS congress 2022 in Lisbon, Chemotion ELN attracted considerable interest. We are also working with IYCN and EYCN, which are key stakeholders for RDM implementation in universities, while GDCh-JCF is acting as a multiplier for NFDI4Chem in the German community. NFDI4Chem collaborates with the Royal Society for Chemistry (RSC) in the development and curation of ontologies.
NFDI4Chem partners are active in ELIXIR, where we contribute to Bioschemas developments and organise projects at the 2022 and 2023 ELIXIR Hackathons. The NFDI4Chem knowledge base (TA5) is modelled on the ELIXIR Converge RDMkit and the metadata standards (TA4) are aligned with Bioschemas.
The European Chemistry Thematic Network (ECTN) includes over 20 members from 30 European countries, promoting science and engineering education across borders and shaping chemistry degrees. The ECTN is revising the recommendations for "Bachelor core chemistry content" to integrate modern RDM into European curricula, with the involvement of NFDI4Chem partners. Embedded in the NFDI activities, NFDI4Chem has presented its work at European Open Science Cloud (EOSC) events, i.e. the presentation of the Terminology Service at the EOSC Symposium 2021 in the session "Metadata and Data Quality".
As we attend more international conferences and become more known internationally, we expect the number of Chemotion installations to increase significantly (Fig.
The NFDI4Chem services are located at reliable institutions supported by robust data centres, including KIT, TIB, FIZ and FSU. These centres adhere to state-of-the-art standards and provide fail-safe operations, data security, fast networks and skilled IT staff to ensure high availability. An uptime tracking service monitors repository and service availability and performance, enabling problem identification and smooth operation. In addition to technical reliability, NFDI4Chem's services prioritise sustainability. A key aspect is the use of open source software with modular code design. This approach facilitates flexible software reuse, customisation, updates, security enhancements and integration with other services and tools (e.g. NMRium in nmrXiv, Chemotion ELN and Chemotion Repo). This sustainability focus extends to the Chemotion ELN, core data repository software, FIZ-OAI provider, ontology and terminology services, search services, research software and all other reusable tools.
Another aspect of sustainability is Docker containerisation, which provides easy deployment and scalability and allows services to be easily moved between data centres and the cloud. Services become adaptive and not tied to specific technical setups. However, moving large amounts of archived data remains a challenge. We are developing exit strategies, including the use of BagIt containers (
Under the current operational and financial model, all NFDI4Chem services (see section on Services provided by NFDI4Chem) are free to users to encourage widespread use regardless of financial constraints. We believe this is currently the only successful model for the efficient adoption of RDM, as it is a relatively new requirement for many chemists and free services to lower the barriers. Partner contributions have adequately supported the establishment and operation of the services. While free services are an essential part of the overall concept, the infrastructure and human resources required to deliver the services cannot be provided entirely as in-kind contributions and need to be supported by public funding. Currently, the NFDI4Chem operating model is based on a combination of various in-kind contributions of infrastructure and staff, public funding through the NFDI4Chem project and other third party funding. We believe that this operating model is the key to success and must continue at least until FAIR RDM practices are established as a natural part of scientists' work. However, should political decisions require a change in the operational model, NFDI4Chem can draw on concepts and expertise developed with consortium partners at FIZ Karlsruhe. While the RADAR4Chem service is offered as a free service to German scientists, the widely available RADAR service provides repository services to institutions based on a sustainable business model (
NFDI4Chem's vision is that all chemists publish FAIR data. To achieve this, we are developing the infrastructure and services for research data management and training the chemistry community to use them. The use of our services and infrastructure greatly facilitates and accelerates the daily research routine of our community, for example, by making data more findable and reusable. At the same time, the use of our infrastructure improves the quality of data by promoting best practices, standards and adherence to good scientific practice. The actions we will take to achieve our goal are divided into six task areas (TAs), which will work closely together and are described in the following sections.
TA1 (Management), based at FSU as an applicant institution, manages the technical, financial and administrative processes with the support of the partner institutions. The cooperation agreement of November 2020 serves as the legal basis for the cooperation and the transfer of funds. OpenProject, for which NFDI4Chem, together with other consortia, has purchased an enterprise licence, is being used to monitor the progress of work at consortium level, in addition to a meeting and reporting system that ensures project control. TA1 organises two consortium meetings per year. Additionally, smaller retreats within and between TAs have been effective in driving developments. Four Advisory Boards (National, International, Industry, Publishers) have been established to provide advice and feedback at the consortium meetings. Communication within the consortium is based on regular meetings, mailing lists and the chat tool rocket.chat. A strategic communications concept was initiated early on, including a corporate design and the website, which was launched in Q4 2021 and is frequently updated and expanded. It serves as a single point of information for the community and provides an overview of the consortium and all its services.
TA2 (Smart laboratory) is developing open source software to create a digital infrastructure for FAIR data management. This Smart Lab environment includes instrument integration, electronic lab notebooks and additional scientific digitalisation tools. Seamless data transfer is a key focus to ensure interoperability with NFDI4Chem components.
Over 39 local instances (at 37 different locations) of the Chemotion ELN have already been installed with the support of NFDI4Chem (see Fig.
During the early years of NFDI4Chem, the Chemotion ELN underwent continuous and substantial improvements in functionality, development approaches and deployment methods. This facilitated rapid Docker container installations for a diverse chemistry user base (
Chemotion development and feature integration follows a defined workflow: planning, coding, testing, community consultation in TA team meetings and frequent releases (2020: 0, 2021: 4, 2022: 4, 2023: 5 until August). Feedback via helpdesk and GitHub (resolved issues 2020-2023: 359) allows transparent activity tracking, requirement discussions and task allocation amongst the geographically distributed German team (Karlsruhe, Halle, Braunschweig, Aachen, Jena).
TA2 improved device integration into the ELN, enabling remote device control via the user interface (UI) and automated data transfer (
Within TA2, work on a smart lab environment will enable early digitisation and a digital workflow for deposition in NFDI4Chem repositories. Data transfer and publication from Chemotion ELN to Chemotion repository (
TA3 (Repositories) establishes a virtual environment of federated repositories for molecule-related data and ensures the integration of existing repositories selected on the basis of suitability, open source accessibility, adherence to standards and funding requirements.
The repositories that are currently part of the NFDI4Chem federation are listed below in "Services provided by NFDI4Chem for the community". For each of them, detailed information has been gathered through interviews and workshops with providers (
All NFDI4Chem repositories were supported to optimise their operational fitness, interoperability, metadata standards (developed by TA4) and harvesting, dataset landing pages and interfaces to other services such as the NFDI4Chem search service. Concepts have been developed to link the repositories to the scientists' workspace by allowing data to be transferred from the Chemotion ELN to the repositories. The “repotracker” software was developed to record and monitor the various data transfer processes. In collaboration with TA4, we decided on minimum information standards (
The relevance of TA3’s work can be seen in the author instructions of journals, including Angewandte Chemie International Edition, which recommend the NFDI4Chem repositories for data deposition (
TA4 (Metadata, data standards and publication standards) focuses on the development and harmonisation of minimum information (MI) standards and metadata for chemical research data, as well as data standards for molecules and reactions, including experimental and theoretical characterisations (also see the following section). Working closely with TA6 (Terminology Services), TA4 contributes relevant chemistry file formats to the EDAM ontology and adds repositories, data types and formats to the FAIRsharing catalogue.
Internationally, we work with IUPAC and the InChI Trust and with TA5 on the WorldFAIR initiative to develop a FAIR data cookbook for chemists using CODATA. Our involvement in the InChI Trust Organometallics Working Group has led to important developments in the international open molecular representation standard, including a non-disconnection approach. An NFDI4Chem co-spokesperson now leads the unified InChI organometallics/inorganics working group and sits on the InChI Trust Board. In addition, the recently developed TUCAN, a graph-based molecular representation (
TA4 and TA2 are working together to develop documentation and reporting standards at all scientific levels (processes, entities and data files). TA2's LabIMotion extension guides the creation of comprehensive documentation templates through engagement with the chemistry community, both within and beyond NFDI4Chem. These community-driven templates are iteratively refined and made available on GitHub for scientists to use directly in their ELN instances. After refinement and incorporation of feedback, these templates may become documentation standards or serve as the basis for modules in the next ELN release. These ELN standards can also be applied to NFDI4Chem repositories. For the Chemotion repository, processes for the distribution of standardised templates and the management of future versions have already been established in the form of a template hub, built from the versioned templates on GitHub.
Other activities focus on monitoring and improving the FAIRness of data publications. We analysed DataCite "Dataset" records using F-UJI (
The development of MI guidelines is an ongoing process involving discipline-specific workshops, consensus on journal reporting standards and technical interoperability. Workshops on FAIR NMR RDM and MI standards in polymer chemistry discussed specific metadata and reporting rules. Reports of the workshops are available on the NFDI4Chem website. NFDI4Chem assists the chemistry community and partners in preparing standards-compliant data publications. Key datasets from Lead-by-Example initiatives are described in the Knowledge Base and serve as practical test data for repository and service development (
In TA5 (Community involvement and training), we aim to drive cultural change towards digital research data in chemistry. We provide resources, training and support to researchers. A major milestone was the release of the NFDI4Chem Knowledge Base. Our Youtube channel provides RDM training material on general RDM and Chemotion from basic to advanced topics. We run regular online and on-site RDM training for chemists and a dedicated Chemotion course. In collaboration with TA2, we organise online Chemotion Q&A sessions with 10-20 participants. Our monthly online "Chemotion/NFDI4Chem Stammtisch" features discussions on RDM and various ELNs, as well as advanced topics, such as molecular machine-learning or recent InChI and SMILES developments, with 25-45 participants.
Outside the consortium, we play a leading role in the NFDI section Training & Education. Our involvement in the BMBF project DALIA (Data Literacy Alliance) aims to develop a semantic platform for NFDI teaching materials. We are working closely with IUPAC on WorldFAIR, a FAIR data cookbook and improving digital molecular representation with InChI. We are working with major chemistry publishers on new publishing standards. Our collaboration with TA4 includes best practice examples from the consortium, showcasing research papers with good data management (
Communication is key to spreading cultural change and TA5 actively communicates with the scientific community to this end. We promote the latest developments through presentations at conferences, institute seminars and booths at national and international chemistry conferences. During the reporting period, we participated in around 30 conferences, including ACS, Analytica, EuChemS, ORCHEM, MACRO, Coordination Chemistry Meetings, IUPAC Conference, Chemistry Teachers' Conference and others. We maintain an active presence on Zenodo and distribute information material about the work of NFDI4Chem. Colleagues often seek our guidance in transforming chemistry departments into FAIR departments and we provide support at all levels.
We believe in introducing RDM to the next generation at an early stage and integrated it into a 5th semester inorganic laboratory course at RWTH in 2020. More than 350 students learned the basics of chemical RDM and used the ELN Chemotion in their lab work. The digital change was well received and students expressed interest in further studies on digital chemistry (
TA6 (Synergies and Cross-Cutting Topics) aims at a holistic use of the NFDI4Chem infrastructure and services. It facilitates and promotes the harmonisation of existing and new components and adds cross-cutting infrastructure services. TA6 coordinates NFDI4Chem participation and contributions to NFDI measures (see section on NFDI4Chem within the NFDI) and international networking (see section on International networking of NFDI4Chem). TA6 contributes to the process of establishing a NFDI IAM (Identity and Access Management), initially in a NFDI task force. This joint initiative has resulted in the NFDI basic service project IAM4NFDI. TA6 ensures that these developments can be integrated into the NFDI4Chem services. TA6 is also contributing to the discussion and development of legal guidelines for a legally sound RDM, integrating our perspective in the work of the NFDI section ELSA.
TA6 works closely with TA2, TA3 and TA4 to develop ontologies and metadata standards and implement them in services for semantically annotated data. In the Ontologies4Chem activities, TA6 is contributing to the implementation of ontologies in the chemistry community, with a focus on the creation of machine-actionable data. We have conducted a thorough evaluation of ontologies in the chemistry domain and defined selection criteria (
In the development of the Search Service (
NFDI4Chem adopts and develops metadata standards associated with chemical research data, including information about molecules and reactions, as well as data for their experimental and theoretical characterisation. Datasets may either directly embed enriched metadata and/or metadata is available through APIs (e.g. OAI-PMH), registries (e.g. DataCite commons) and search engines (Google dataset search or NFDI4Chem Search Service).
For domain-independent metadata about datasets, such as title, keywords and creators, we use DataCite in the nmrXiv, Chemotion and RADAR4Chem repositories, which also provide DOIs registered through DataCite. For truly rich annotation of chemical information, we use, adapt and develop specific metadata using the JSON-LD-based Schema.org framework. All repositories provide metadata via Schema.org compliant JSON-LD. We are active in the ELIXIR Bioschemas community, where one of our co-spokespersons has become co-leader of the chemicals working group. New Bioschemas profiles, such as reactions, are under discussion to be specified and submitted to become an accepted standard. One NFDI4Chem co-spokesperson is a member of the Technical Specification & Implementation Group of the FAIR Digital Objects Forum.
The RADAR4Chem and Chemotion repositories provide an OAI-PMH endpoint for metadata harvesting by other services. In addition, we have extended the XML-based OAI-PMH architecture (using the FIZ-OAI provider), so that JSON-LD data can also be submitted to an OAI-PMH service, where it can be queried both as JSON-LD and via a crosswalk as OAI Dublin Core (
Schema.org allows the use of defined terms from ontologies and we have started to connect the repositories to the NFDI4Chem Terminology Service and embed the terms in the exported metadata. Services established within the NFDI4Chem are further described below in "Services provided by NFDI4Chem for the community".
NFDI4Chem strongly endorses FAIR (meta)data. Several Task Areas (TA2, TA3, TA4, TA6) and service providers are contributing to a consistent implementation across services through the adoption of standards and technical requirements. In parallel, education and training will support these developments and the acceptance and understanding of FAIR data by the community.
To ensure Findability, all our resources provide rich, machine-readable (meta)data linked to domain-specific and cross-domain vocabularies. Our ELN and instrument integration strategy, as outlined in the section Achievements of the Task Areas (TA2) encourages the early collection of rich metadata during data generation. While (meta)data in ELNs are typically private to research groups and not globally accessible, they are prepared for transfer to repositories, making them findable while preserving domain-specific information. Most NFDI4Chem repositories register their datasets with DataCite and assign globally unique and persistent DOIs. This facilitates indexing in our Search Service. Dataset landing pages in repositories are optimised for human use. To improve machine actionability and interpretability, several NFDI4Chem services (including RADAR4Chem, Chemotion repository, MassBank and nmrXiv) adopt unified metadata via JSON-LD, following Schema.org and relevant chemistry types from Bioschemas (
To be Accessible, all data provided by NFDI4Chem services are retrievable by their persistent identifier using HTTPS as a standardised communication protocol that is secure, open, free and universally implementable. All NFDI4Chem components are available under open-access models. For some functions, such as data upload, download and editing, where registration and login are required, NFDI4Chem services facilitate and manage access using established authentication protocols and identity providers, such as OpenID and Shibboleth. NFDI4Chem will use the NFDI-AII service for authentication and authorisation procedures when available. To allow programmatic access to data and metadata, NFDI4Chem repositories support or will support the OAI-PMH protocol. An OAI-PMH provider is provided by FIZ. In addition, the components of the infrastructure already provide standardised APIs or will implement them as part of their work programme (see TA3 in section on Achievements of the Task Areas).
To be Interoperable, our metadata and data are stored and made available according to existing standards. We address gaps in standards, routines and representations by developing our own solutions, which are discussed and negotiated with the chemistry community. For example, we are developing MI metadata standards to semantically describe experiments, simulations, molecule characterisations and more. At the same time, NFDI4Chem is promoting open data formats (
To be Reusable, our (meta)data have accurate, relevant attributes that conform to domain-specific community standards. We use established ontologies, such as CHMO and RXNO and develop our own, such as VIBSO, where appropriate, to facilitate understanding by both humans and machines. These ontologies adhere to FAIR principles to ensure that annotated data remains FAIR. We prioritise the use of openly-licensed and well-maintained ontologies. We integrate metadata annotation standards, such as ROR ID (Research Organisation Registry) and GND (Integrated Authority File) into NFDI4Chem services, with plans for their universal adoption. Data are released with clear and accessible usage licences, supported by legal policies and guidelines. NFDI4Chem repositories curate data and metadata to ensure reusability, with the level of curation varying according to their role in the federation. For example, Chemotion is highly curated, while RADAR4Chem is mainly automatically checked. We also facilitate data reuse by providing targeted datasets for machine-learning and other support.
Following the definitions in
The Knowledge Base (N4C-KB), launched in late 2021, involves 21 contributors and offers various entry points based on the viewer's discipline, role or specific interests. It covers a wide range of topics, from basic RDM concepts to more in-depth articles. The N4C-KB assists users in selecting the right data repository for their research data needs. Hosted at JGU and built using the open-source framework Docusaurus, the platform allows all content to be stored in a GitHub repository, using simple Markdown syntax. This approach makes it easy for authors to contribute without requiring web programming skills. The website is automatically updated with every change to the repository. The N4C-KB team actively supports contributors and accepts content in a variety of formats.
The Terminology Service (TS,
NFDI4Chem drives the development and establishment of ELNs as a key requirement to achieve systematic digitalisation. While the developed ELN software is offered to users as source code to be hosted locally, there are three additional ELN-based services within NFDI4Chem:
The federation of core repositories comprises seven German-hosted repositories, each covering essential content in key sub-disciplines of the chemical community. These repositories are developed and provided as individual services, tailored to specific discipline-specific processes and functionalities driven by their respective communities. Within the NFDI4Chem federation, existing repositories adapt and emerging repositories develop workflows, standards and functionalities to create a harmonised data infrastructure. This infrastructure aims not only at data interoperability, but also at collaborative interaction between repositories and other NFDI4Chem services. The selected core repositories can handle different data types, chemical processes, analytical data and specialised methods, thus supporting the entire data landscape. The roles and requirements of these core repositories vary, with the first funding period focusing on strengthening existing repositories through strategic source code improvements for efficient development and stable hosting.
To date, five NFDI4Chem repositories are in operational use. Of these, we describe RADAR4Chem, the Chemotion repository, nmrXiv and massbankEU in more detail, as they are currently relevant to the widest user community and major changes have been released.
RADAR4Chem is a cross-domain repository, launched in March 2022, that provides flexible storage options for a wide range of chemistry-related data, with no restrictions on data types or content. It has been developed by adapting the existing RADAR service. Each registered scientist can publish up to 10 GB of research data by default, with the option to increase storage upon reasonable request. The hosting and integration of RADAR4Chem into the federation was crucial to fill the gap in discipline-specific repositories and ensure data preservation while discipline-specific solutions are still being developed. A high priority has also been given to enabling seamless data transfer from the Chemotion ELN to RADAR4Chem. This allows direct and effortless publication of data collected in the ELN with just a few clicks (enabled with ELN version v.1.5.0).
The Chemotion Repository deals with data related to chemical reactions and chemical substances and was established at KIT in 2015, initially serving a narrow range of scientific data (
The nmrXiv repository, hosted at FSU, is a new NMR spectroscopy data repository and analysis platform built from the ground up. It builds on the experience of its predecessor, nmrshiftdb2. nmrXiv is open, FAIR and consensus-driven, preserving both raw and processed NMR data. In its pre-release phase in early 2023, it already contained 14 projects, 81 compounds and 490 spectra. It provides DOIs, web UI and REST APIs (Open API, DataCite, Bioschemas, NMRium). nmrXiv follows the DataCite metadata schema, enhanced with InChI and SMILES. It uses two-factor authorisation and single sign-on with popular social network logins, including ORCID. Storage capacity is provided in-kind by FSU.
MassBank EU, hosted at the UFZ, is the first public repository of mass spectrometry data, facilitating its sharing with the scientific community. Since 2021, its compound dataset has grown to 15,075 (from 14,788) and its spectra to 90,190 (from 86,576) in 2023. MassBank uses GitHub for AAI (open read access, limited write access) and uses GitHub issues for curation tracking, managed by the MassBank record validator. Spectral data and metadata are stored in a human-readable record format within a revision control system and continuous integration ensures record integrity with each change. NFDI4Chem funding has enabled a modern software overhaul, with a first development release using a JS-based front-end and a REST-based back-end.
Two other databases, Suprabank and STRENDA DB, are part of the NFDI4Chem repository federation. Both services are currently provided as a stable service with only minor adjustments in the production environment.
Suprabank is a specialised database, hosted at KIT since 2019, offering unique data on intermolecular and supramolecular interactions. It primarily addresses supramolecular and physical chemists, as well as biologists in organic chemistry, focusing on binding, assembly and interaction phenomena not found in other repositories.
STRENDA DB, established in 2016 and operated by Beilstein Institute, is a well-established repository for enzymology data. It collaborates with over 55 international biochemistry journals and has integrated the STRENDA guidelines into its author instructions. The database ensures the completeness and validity of enzymology data prior to submission for publication. It primarily contains functional enzymology data, including kinetic and experimental data. STRENDA DB is an in-kind contribution.
In addition to the services available in production mode, the VibSpecDB repository is under development. VibSpecDB will focus on Raman and IR spectra.
The Search Service by TIB, which was released in summer 2022, acts as a central hub for searching the federated repositories of NFDI4Chem. It currently includes 93,935 datasets from the Chemotion Repository, MassBank and Radar4Chem. The integration of the chemistry sub-collection of DaRUS marks the first integration of datasets from a generic data repository. The service regularly harvests and indexes metadata, handling different metadata models and protocols (Fig.
The NFDI4Chem Helpdesk serves as a central hub for community requests. It provides efficient support for all NFDI4Chem services and RDM topics. Basic issues and common questions are handled by first-level support, while complex cases are handled by specialised second-level teams of the corresponding services. The Helpdesk streamlines communication with our user community, collects common queries in the N4C-KB for proactive solutions and is hosted by TIB, supported by teams from JGU, FSU, TuBr, KIT and RWTH.
The following summary of NFDI4Chem outputs (Fig.
Year |
Title |
DOI or Link |
---|---|---|
Computer Program |
||
2023 |
ChemCLI is a tool to help you manage Chemotion ELN on a machine. |
|
2023 |
ChemConverter app v0.10.0 |
|
2023 |
ChemConverter app v1.0.0 (released 03.07.2023) |
|
2023 |
cheminformatics-python-microservice v1.0.0 |
github.com/Steinbeck-Lab/cheminformatics-python-microservice |
2023 |
nmrium-react-wrapper v0.1.0 |
|
2023 |
Chemotion ELN Release v1.5.0 |
|
2023 |
Chemotion ELN Release v1.6.0 |
|
2023 |
Chemotion ELN Release v1.7.0 |
|
2023 |
ChemSpectra: Chem Spectra app (26 releases) |
|
2023 |
InChI Webdemo |
|
2023 |
LabIMotion - a Ruby Gem extension to Chemotion ELN |
|
2023 |
LabIMotion/dataset/cyclic voltammetry |
|
2023 |
nmrium-react-wrapper v0.1.0 |
|
2023 |
nmrium-react-wrapper v0.2.0 |
|
2023 |
nmrium-react-wrapper v0.3.0 |
|
2023 |
ontology-elements - pre-release |
|
2023 |
repo-helm-charts |
|
2023 |
Repository downloader |
|
2023 |
Repository tracker |
|
2023 |
Shiny App - Implementation for ELN |
|
2023 |
SVG composer: software enabling the composition and rendering of reactions SVGs based on molecule SVGs |
|
2023 |
Vibrational Spectroscopy Ontology |
|
2022 |
ChemConverter client |
|
2022 |
Chemotion ELN Release v1.1.0 |
|
2022 |
Chemotion ELN Release v1.2.0 |
|
2022 |
Chemotion ELN Release v1.3.0 |
|
2022 |
Chemotion ELN Release v1.4.0 |
|
2022 |
ChemSpectra: Chem Spectra Client (17 releases) |
|
2022 |
ChemSpectra: react spectra editor (22 releases) |
|
2022 |
nmrXiv - pre-release |
|
2022 |
TUCAN - a molecular identifier and descriptor for all domains of chemistry |
|
2021 |
Chemotion ELN Release v1.0.0 |
|
Conference Paper |
||
2023 |
Digitalizing the Chemical Landscape: A Comprehensive Overview and Progress Report of NFDI4Chem |
|
2023 |
Finding a Common Ground for NFDI Terminologies: Proposing I-ADOPT as a NFDI Wide Semantic Layer |
|
2023 |
LabIMotion ElectronicLab Notebook as Research Data Management tool in Catalysis |
|
2023 |
Leveraging Terminology Services for FAIR Semantic Data Integration across NFDI Domains: How to Integrate Terminology Services Into Other Service Applications |
|
2023 |
RADAR: building a FAIR and community tailored Research Data Repository |
|
2023 |
RDM in Chemistry: How to Educate and Train Future Researchers to Manage Their Data |
|
2023 |
Schema.org as a Lightweight Harmonization Approach for NFDI |
|
Conference Poster |
||
2023 |
A Practical Guide to FAIR Research Data Management in Medicinal Chemistry |
|
2023 |
Harmonising, Harvesting, and Searching Metadata across a Repository Federation |
|
2023 |
nmrXiv: A FAIR and Open, Consensus-Driven NMR Data Repository and Computational Platform |
|
2023 |
PIDs in the Natural Sciences |
|
2023 |
www.nmrium.org: Revolutionizing NMR Spectra Processing with a Free Web-Based Application |
|
2022 |
Metadata, Data Standards and Publication Standards: NFDI4Chem |
|
2022 |
MIChI Workshop Series |
|
Dataset |
||
2023 |
Collaborative work in NFDI |
|
2023 |
Dataset: Chemotion Repository - Data collection: mass spectrometry data |
|
2023 |
Dataset: The current landscape of author guidelines in chemistry through the lens of research data sharing |
|
2021 |
Collection of SOPs for extensions of chemotion repository and chemotion ELN |
|
Grant Application |
||
2023 |
Base4NFDI - Basic Services for NFDI |
|
2022 |
LabIMotion4Catalysis (KIT RDM grant 2022) |
|
2020 |
NFDI4Chem - Towards a National Research Data Infrastructure for Chemistry in Germany |
|
2020 |
SciMotion ELN (KIT RDM grant 2020) |
|
Journal Article |
||
2023 |
Integrative analysis of multimodal mass spectrometry data in MZmine 3 |
|
2022 |
Minimum Information Standards in Chemistry: A Call for Better Research Data Management Practices |
|
2022 |
SELFIES and the future of molecular string representations |
|
2022 |
Sharing is Caring: Guidelines for Sharing in the Electronic Laboratory Notebook (ELN) Chemotion as applied by a Synthesis-oriented Working Group |
|
2022 |
Treatment of research data |
|
2022 |
TUCAN: A molecular identifier and descriptor applicable to the whole periodic table from hydrogen to oganesson |
|
2021 |
Den Datenschatz endlich heben |
|
2021 |
FAIR and Open Data in Science: The Opportunity for IUPAC |
|
2021 |
NFDI4Chem – Fachkonsortium für die Chemie |
|
2021 |
NFDI4Chem – Infrastruktur für den digitalen Wandel in der Chemischen Forschung |
|
2020 |
Chemotion Repository, a Curated Repository for Reaction Information and Analytical Data |
|
2020 |
Comparability of Raman Spectroscopic Configurations: A Large Scale Cross-Laboratory Study |
|
2020 |
Forschungsdatenmanagement - Zeit für den Abschied vom analogen Laborbuch |
|
2020 |
Research Data in Chemistry ‐ Results of the first NFDI4Chem Community Survey |
|
2020 |
The Repository Chemotion: Infrastructure for Sustainable Research in Chemistry |
|
Movie |
||
2023 |
Chemotion ELN Instruction Videos |
|
2022 |
Chemotion ELN Erklärvideos |
|
Position Paper |
||
2020 |
Leipzig-Berlin-Erklärung zu NFDI-Querschnittsthemen der Infrastrukturentwicklung |
|
Preprint |
||
2023 |
Cheminformatics Python Microservice (CPM): unifying access to open cheminformatics toolkits |
|
2023 |
Results of a Three-Year Survey on the Implementation of Research Data Management and the Electronic Laboratory Notebook (ELN) Chemotion in an Advanced Inorganic Lab Course |
|
2023 |
Supporting Sustainability of Chemistry by Linking Research Data with Physically Preserved Research Materials |
|
Presentation |
||
2023 |
FAIR Research Data Management: Basics for Chemists |
|
2023 |
HeFDI Data Talk "Chemotion. An Introduction to an Open-Source ELN for FAIR Data" |
|
2023 |
HeFDI Data Week 2023: Chemotion. An Introduction to an Open-Source ELN for FAIR Data |
|
2023 |
NFDI4C* Workshop on synergy & cooperation |
|
2023 |
Overview of Research Data Management in Chemistry |
|
2023 |
Schema.org as a Lightweight Harmonization Approach for NFDI |
|
2023 |
Setting up your own ODK ontology repository |
|
2023 |
NFDI4Chem bei der SaxFDM Digital Kitchen am 11.05.2023 |
|
2023 |
NFDI4Chem: from chemical research data management to digital chemistry |
|
2022 |
Breakout Session II: Hands on Data Annotation using Ontologies - Creating a prototype knowledge graph from NMR spectroscopy research data |
|
2022 |
Chemotion ELN and Chemotion Repository as tools for the digitalization in chemical research within the framework of NFDI4Chem |
|
2022 |
Chemotion & Research Data Infrastructure NFDI4Chem |
|
2022 |
NFDI4Chem Knowledge Base |
|
2022 |
NFDI4Chem Terminology Service: Enabling semantic research data interoperability, discovery and exploitation in chemistry |
|
2022 |
Ontologies4Chem: Current chemical ontologies 4 research data management |
|
2021 |
NFDI4Chem - Digitising Research Workflows in Chemistry |
|
Report |
||
2023 |
50 Experimental processes and data publications using NFDI4Chem infrastructure |
|
2023 |
Accessible Documentation on NFDI4Chem portal |
|
2023 |
Analysis of the Landscape of Repositories for Chemistry in re3data |
|
2023 |
Continuously updated protocols and minutes of consortium and TA meetings |
|
2023 |
Gap analysis report for selected repositories |
|
2023 |
Minutes of Advisory Board meetings published on portal |
|
2023 |
NMR Task Force Meeting |
nfdi4chem.github.io/workshops/docs/workshops/nmr-michi/overview |
2023 |
Report of relevant cross-cutting topics for NFDI4Chem |
|
2023 |
Report on FAIRness of data standards and datasets published by the community |
|
2023 |
Repos4Chem - criteria for acquisition - for suggestion by NFDI4Chem for data providers |
|
2023 |
The NFDI4Chem portal |
|
2022 |
Data Formats |
nfdi4chem.github.io/workshops/docs/workshops/standard-formats/overview |
2022 |
FAIR NMR Research Data Management |
nfdi4chem.github.io/workshops/docs/workshops/fair-nmr/overview |
2022 |
Minimum Information Standards in Polymer Chemistry |
nfdi4chem.github.io/workshops/docs/workshops/polymer/overview |
2022 |
NFDI4Trackact (Organised by Daphne4NFDI) |
|
2021 |
NFDI Cross-cutting Topics Workshop Report |
|
Repository |
||
2023 |
Chemotion Repository Release v1.1.0 (released on 12.06.2023) |
|
2023 |
Chemotion Repository Release v1.2.0 (released on 29.06.2023) |
|
2020 |
Chemotion repository |
|
Review Article |
||
2023 |
The current landscape of author guidelines in chemistry through the lens of research data sharing |
|
2023 |
The Impact of Digitalized Data Management on Material System Workflows |
|
2022 |
Data format standards in analytical chemistry |
|
2022 |
Ontologies4Chem: the landscape of ontologies in chemistry |
|
Website |
||
2021 |
Chemotionsaurus: Dokumentation for Chemotion ELN and Chemotion repository |
|
2021 |
Knowledge Base |
|
2020 |
NFDI4Chem website |
|
White Paper |
||
2023 |
Interim Report Reference |
|
2023 |
Umgang mit Zielen der BLV als Grundlage für die Strukturevaluation |
A glossary of abbreviations used in this report is available in Table
ACS |
American Chemical Society |
API |
Application Programming Interface |
BMBF |
Bundesministerium für Bildung und Forschung |
CCDC |
Cambridge Crystallographic Data Centre |
CDK |
Cloud Development Kit |
CHEMINF |
Chemical Information Ontology |
CHMO |
Chemical Methods Ontology |
CINF |
Division of Chemical Information |
CODATA |
Committee on Data of the International Science Council |
CPU |
Central Processing Unit |
CRDIG |
Chemistry Research Data Interest Group |
CSD |
Cambridge Structural Database |
DALIA |
Data Literacy Alliance |
DaRUS |
Data Repository of the University of Stuttgart |
DB |
Data Base |
DNS |
Domain Name System |
DOI |
Digital Object Identifier |
ECTN |
European Chemistry Thematic Network |
EDAM |
Bioinformatics operations, data types, formats, identifiers and topics |
ELIXIR |
European life sciences infrastructure |
ELN |
Electronic Laboratory Notebook |
EOSC |
European Open Science Cloud |
EPR |
Electron Paramagnetic Resonance |
EuChemS |
European Chemical Society |
EYCN |
European Young Chemist Network |
FAIR |
Findable, Accessible, Interoperable, Reusable |
FID |
Forschungsinformationsdienste |
FTE |
Full-Time Equivalent |
GND |
Integrated Authority File |
HPC |
High-Performance Computing |
HTTPS |
Hypertext Transfer Protocol Secure |
IAM |
Identity and Access Management |
ICSD |
Inorganic Crystal Structure Database |
InChI |
International Chemical Identifier |
IR |
Infrared |
IUPAC |
International Union of Pure and Applied Chemistry |
IYCN |
International Younger Chemist Network |
JCF |
JungChemikerForum |
JS |
JavaScript |
JSON-LD |
JavaScript Object Notation for Linked Data |
MI |
Minimum Information |
MIChI |
Minimum Information for Chemical Investigations |
MOP |
Molecular Process Ontology |
N4C-KB |
NFDI4Chem Knowledge Base |
NFDI |
Nationale Forschungsdateninfrastruktur |
NMR |
Nuclear Magnetic Resonance |
OAI-PMH |
Open Archives Initiative Protocol for Metadata Harvesting |
OBO |
Open Biological and Biomedical Ontologies |
OLS |
Ontology Lookup Service |
ORCID |
Open Researcher and Contributor Identifier |
OS |
Open Source |
PM |
Personenmonat |
PSDI |
Physical Sciences Data Infrastructure |
Q&A |
Questions and Answers |
RDA |
Research Data Alliance |
RDM |
Research Data Management |
REST |
Representational State Transfer |
ROR |
Research Organisation Registry |
RSC |
Royal Society for Chemistry |
RXNO |
Name Reaction Ontology |
SC |
Steering Committee |
SMILES |
Simplified Molecular-Input Line-Entry System |
STRENDA |
Standards for Reporting Enzymology Data |
TA |
Task Area |
TS |
Terminology Service |
UI |
User Interface |
UV-Vis |
Ultraviolet-visible |
VIBSO |
Vibrational Spectroscopy Ontology |
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under the National Research Data Infrastructure (NFDI4/1).
NFDI4Chem – Chemistry Consortium in the NFDI (Project number 441958208)
Friedrich Schiller University Jena