Research Ideas and Outcomes :
Grant Proposal
|
Corresponding author: Egon L. Willighagen (egon.willighagen@maastrichtuniversity.nl)
Received: 02 Mar 2022 | Published: 07 Mar 2022
© 2022 Egon Willighagen, Martina Kutmon, Marvin Martens, Denise Slenter
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Willighagen EL, Kutmon M, Martens M, Slenter D (2022) BridgeDb and Wikidata: a powerful combination generating interoperable open research (BridgeDb). Research Ideas and Outcomes 8: e83031. https://doi.org/10.3897/rio.8.e83031
|
|
Like humans have a unique social security number and different phone numbers from various providers, so do proteins and metabolites have a unique structure but different identifiers from various databases. BridgeDb is an interoperability platform that allows combining these databases, by matching database-specific identifiers. These matches are called identifier mappings, and they are indispensable when combining experimental (omics) data with knowledge in reference databases. BridgeDb takes care of this interoperability between gene, protein, metabolite, and other databases, thus enabling seamless integration of many knowledge bases and wet-lab results. Since databases get updated continuously, so should the Open Science BridgeDb project.
BridgeDb, Wikidata, open science, identifie
Net zoals mensen een uniek Burgerservicenummer (BSN) hebben en verschillende telefoonnummers van diverse telecomaanbieders, zo hebben eiwitten en metabolieten een unieke structuur maar andere identificatiecodes in verschillende databases. BridgeDb is een interoperabiliteitsplatform die het combineren van databases mogelijk maakt op basis van gelijkwaardige identificatiecodes. In het Engels heten deze identifier mappings en ze zijn essentieel in analyse van biologische data. BridgeDb zorgt ervoor dat experimentele data over genen, eiwitten, en metabolieten eenvoudig gekoppeld kan worden aan kennis over biologische processen opgeslagen in andere digitale bronnen. Omdat deze databases regelmatig veranderen, zal het Open Science project BridgeDb dat ook doen.
Linking any two or more databases always requires linking identical entities described in those databases. Unfortunately, the identifier used for the same entity in one database is often different from the identifiers for the same entity in the other database. BridgeDb was created to make the bridge between databases by providing uniform access to mappings between different database identifiers for the same entities. This is why BridgeDb is a Recommended Interoperability Resource (RIR) of ELIXIR, a collaboration of leading life science organisations, and has been supporting projects like the ELIXIR-NL WikiPathways resource (
The vision of this project is to improve the foundation of BridgeDb, to allow us to widen the scope in the future and enhance the support of currently unsupported, but important data sources. This will open up the road to wide adoption in the European Open Science Cloud (EOSC). To reach this vision, we aim to
The first output of this project is an improved BridgeDb Java library (
Currently, BridgeDb has been an important project to link multiple life science databases, e.g. genes, proteins, metabolites. With clear open licenses, FAIR approaches (
The project plan is organized in three work packages (WP1, WP2, WP3), following the three output themes. Work package 1 (WP1) intends to upgrade the BridgeDb Java library. Currently, the main Java library is already built with Apache Maven, however, the build system should also be applied to related tools, and we will extensively use GitHub Actions for automation. Second, only a subset of library modules is currently available as OSGi bundles, which is essential for reuse in various third-party tools, like PathVisio (
Work package 2 (WP2) focuses on the BridgeDb Webservice. This continuously running service is an ELIXIR RIR and daily supports projects like WikiPathways and Cytoscape to assist data analysis of omics datasets (transcriptomics, proteomics, metabolomics, etc.). The Webservice will be extended to support Compact Identifiers (
The last work package (WP3) translates the new functionalities to practical use cases. In this WP, existing ID mapping databases will be updated, using the new releases of BridgeDb Java library and tested in applications using the new BridgeDb version. We intend to widen the scope of ELIXIR resources supported in the ID mapping databases, to make more resources interoperable (and therefore more FAIR). Here, we will increasingly use Wikidata and its international scientific collaborations (
The funding will be used to employ a scientific programmer. Additionally, from the Dept of Bioinformatics (BiGCaT), the following people will be involved for WP3 for testing the upgraded BridgeDb library to create updated ID mapping databases. Denise Slenter (orcid:0000-0001-8449-1318) will work on the metabolite, disease and interaction ID mapping databases, Dr Martina Kutmon (orcid:0000-0002-7699-8191; assistant professor) on the gene and protein ID mapping database (with Ensembl as source), and Marvin Martens (orcid:0000-0003-2230-0840) will work on a gene and protein mapping databases for Daphnia magna and Daphnia pulex (relevant model species for toxicology, but currently not in Ensembl). Slenter, Kutmon, and Martens have all been previously involved in the BridgeDb projects in their research projects (e.g. created the Docker Image for BridgeDb and using Wikidata as a source of ID mappings), and are experts in the fields relevant for these mapping databases: chemistry and metabolism (Slenter); systems biology and data analysis (Kutmon); toxicology and Adverse Outcome Pathways (Martens).
Dr Egon Willighagen has been active in Open Science for over 20 years, for example, contributing to projects like JChemPaint (since 1998; doi: 10.3390/50100093), WikiPathways (since 2011; doi: 10.1093/NAR/GKV1024), and (temporarily) leading projects like Jmol and coordinating the science in the EU FP7 project eNanoMapper (doi: 10.3762/BJNANO.6.165), and co-founded the Chemistry Development Kit (in 2000; doi: 10.1021/ci025584y). He is recognized for his work with the international Blue Obelisk Award (2007) and a national runner-up Open Initiative Trophy (2021). From 2016 to 2021 he has been one of two Editor-in-Chief of the fully CC-BY, highly ranked Journal of Cheminformatics (issn:1758-2946), which promotes Open Science in chemistry. At various National Plan Open Science events and meetings, Willighagen has provided input from a researcher’s perspective and is co-founder of the Open Science Community Maastricht. A more complete list of his Open Science work can be found in his publication list: orcid.org/0000-0001-7542-0286.
Yes. Where existing data is reused, these will have an open license or a public domain waiver (like the American public domain or the international CCZero waiver). Any license, including open licenses, constrain the reuse. License information will be clearly provided, following the FAIR principles.
Yes, reuse is the aim of the BridgeDb project, where downstream users are, for example, WikiPathways, PathVisio, and Cytoscape.
Data will be archived during the project in public repositories, like Figshare and Zenodo, which have committed themselves to availability of 20 years or more. The open licenses allow other repositories to archive a copy of the data.
No restrictions (other than the open license terms) and no embargoes are anticipated.
No: All the necessary resources (financial and time) to store and prepare data for sharing/preservation are or will be available at no extra cost.
Yes.
All BridgeDb software is available under an OSI-approved license on GitHub. This includes the Apache License 2.0-licensed BridgeDb library as well as the existing source code to generate ID mapping databases, available under other open licenses (see Table
The BridgeDb project comprises of multiple independent code bases, of which a few are listed here.
Name |
Source of mappings (where applicable) |
Source Code License |
Code repository |
---|---|---|---|
BridgeDb Java Library |
Apache License 2.0 |
||
Metabolite ID mapping database |
HMDB, ChEBI, Wikidata |
Simplified BSD License |
|
Interaction ID mapping database |
Rhea |
Simplified BSD License |
|
Disease ID mapping database |
Wikidata |
Simplified BSD License |
|
Gene/Protein ID mapping database |
Ensembl |
||
Protein complexes, virus proteins, journal articles |
Wikidata |
Apache License 2.0 |
WP1 will improve the maintainability and portability of the software. The main BridgeDb Java library is developed on GitHub and disseminated via Zenodo (using the GitHub-Zenodo integration) and via Maven Central (search.maven.org/search?q=g:org.bridgedb).
The size of communities is hard to accurately estimate, but with the highly cited WikiPathways (monthly 15,000 unique website users) and Cytoscape projects as daily users and being an ELIXIR Recommended Interoperability Resource, we estimate a few thousand daily users. The gene/protein ID mapping database is downloaded more than 14 thousand times for local use, and the Bioconductor R package for BridgeDb (doi: 10.18129/B9.bioc.BridgeDbR) is downloaded 50-100 times each month (rank 774 out of 1974).
BridgeDb has been used in EU projects like OpenPHACTS, OpenRiskNet, and NanoSolveIT. A full list of past contributors can be found on GitHub for each of the subprojects, e.g. at github.com/bridgedb/BridgeDb/graphs/contributors.
The main applicant has more than 20 years of experience in the development of open data, open-source, and open standards projects, and the BridgeDb project already exists for over 10 years. As Editor-in-Chief of a journal that has reuse and Open Science as strong editorial standards, the required expertise is available.
No overlapping grant applications.
This project is funded by NWO grant 203.001.121.
Open Science (OS) Fund 2020/2021
BridgeDb and Wikidata: a powerful combination generating interoperable open research (BridgeDb)
Maastricht University