Research Ideas and Outcomes : PhD Project Plan
|
Corresponding author:
Received: 11 Jan 2016 | Published: 11 Jan 2016
© 2018 Viktor Senderov, Lyubomir Penev
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation: Senderov V, Penev L (2016) The Open Biodiversity Knowledge Management System in Scholarly Publishing. Research Ideas and Outcomes 2: e7757. https://doi.org/10.3897/rio.2.e7757
|
This project aims to develop and implement novel ways of publication, visualization, and dissemination of biodiversity and biodiversity-related data and thus bring the Open Biodiversity Knowledge Management System closer to fruition. In order to do so, we will develop new types of Enhanced Publications (EP's), which will allow automated data import into the manuscript and export from the manuscript and provide dynamic visualizations. These EP's will enable biodiversity researchers and taxonomists to streamline their work and publish more data-rich species descriptions.
entomology, systematics, taxonomy, biodiversity informatics, evolutionary informatics, bioinformatics, data visualization, data publishing, semantically enhanced publication
This PhD project plan constitutes a translation and an expansion of the original Bulgarian version titled "Публикуване, визуализация и разпространение на първични и геномни данни за биологичното разнообразие на основата на открита система за управление на информацията" - "Publishing, Visualizing, and Dissemination of Primary and Genomic Biodiversity Data Based on the Open Biodiversity Knowledge Management System," which was officially approved at the Bulgarian Academy of Sciences on 27. Oct. 2015.
Up to the time of writing of this PhD project plan, willingness to create an Open Biodiversity Knowledge Management System (OBKMS) was declared by over ninety institutional and many more individual signatories of the Bouchout Declaration. The goals and purpose of the system were set forth in the project deliverables from the pro-iBiosphere project (
According to
Several tools and systems that deal with the integration of biodiversity and biodiversity-related data have been developed by different groups. Some of the most important ones are UBio, Global Names project, BioGuid, BioNames, Pensoft Taxon Profile, the Plazi Treatment Repository, and others.
According to the afore-mentioned pro-iBiosphere brochure, the OBKMS must be built upon ten principles:
The objective of this dissertation project will therefore be to study, develop and apply new types of enhanced electronic publications that implement the principles of the OBKMS and aid in the publishing, visualization and re-use of research data and its associated narrative.
One of the main challenges of the OBKMS is to develop a system for robust and universal identification of biodiversity and biodiversity-related objects, such as taxon names, taxon name usages, museum specimens, occurrence records, taxon treatments, genomic sequences, organism traits, bibliographic citations, figures, multi-media files, etc. Historically, many such systems have been proposed and utilized. For example, Darwin Core Triplets, the de facto standard for occurrence-type data are discussed in
We are of the opinion that the OBKMS needs to be addressed from the point of view of open science. According to
One such instrument that we plan to utilize is the Enhanced Publication (EP) (
In other words, the act of publishing in a digital, enhanced format, differs from the ground up from a paper-based publication. The main difference is that the document can be structured in such a format as to be suitable for machine processing and to the human eye. In the sphere of biodiversity science, journals such as ZooKeys, PhytoKeys, and the Biodiversity Data Journal (BDJ) in particular, have already made first steps in the direction of EP's (
EP's can be connected to one of the main issues facing zoology nowadays, which is the discrepancy between traditional morphologically described species and the growing number of species delimited via genomic technologies (
The large number of dark taxa is due to the fact that genomic technologies are very effective and allow for the generation of SH or OTU's with a speed much higher than the speed with which taxonomists manage to publish morphological descriptions and name them (
EP's can also be connected to another interesting issue in bioinformatics - namely the publication, visualization and analysis of genomic data. Recently, interest in data visualization in the genomic and publishing communities has risen sharply: there have been blog posts about visualizing phylogenies (
One example of such a visualization as part of an enhanced publication may be a very large phylogeny (
Another related example is the graphical display of OTU and SH data. Since genomic methods for species delimitation provide different outcomes depending on the selected cut-off value for the similarity (
Yet another example, related to the previous two, is displaying metagenomic data. In metagenomic data, sequence information from the environment coming potentially from many different species is mixed together (
Finally, in order to complete the full life-cycle of the data as described in the OBKMS brochure (
This approach is complementary to the approach taken by ContentMine. While ContentMine uses text-mining to find "facts" within thousands of articles of scientific literature, we will start by taking semantically enriched publications where pieces of data can easily be identified. The goal of both projects is to export the data to the Linked Open Data cloud.
One possible way of implementing this workflow could be the extraction of the data objects from the manuscripts and their storage in a database. In order to capture the complex relationships between the various biodiversity-related objects, a graph data model might be most appropriate (
Through the implementation of the work-tasks described in this section, we would like to build and describe the implementation of a part of the OBKMS in the area of publishing of digitally born biodiversity literature.
Materials will be collected together with other BIG4 project partners in different expeditions. Also museum and database records will be used. For the realization of the technical part the following internet technologies will be utilized:
For the realization of the scientific part in bioinformatics, the following methods will be utilized:
An array of methods is to be expected as the output of the work itself. We also intend to realize the work as an open thesis. This means that we intend to open the access to the primary scientific output (scientific papers) written as a result of the effort. Also, we will try to open a maximum amount of secondary scientific output such as lab notebooks, software code, etc. Finally, we will aim at involving the BIG4 community, as well as the wider scientific community, in contributing to the discussion by means of a popular blog, which can be found at http://openbkms.blogspot.com.
Work-task 1: Propose, model and realize a software system for universal identification, access and handling of sub-article level data elements such as article metadata, article sections, taxon names, taxon treatments, collection specimens, occurrence records, genomic sequences, species traits, images, tables, and so on.
Sub-tasks:
Work-task 2: Develop, test, and apply new forms of EP's, allowing for automated or semi-automated exchange of data with international biodiversity portals such as GBIF, NCBI, IUCN, ZooBank, UNITE, iDigBio, DataONE, and others.
Sub-tasks:
Work-task 3: Develop and integrate new methods for publishing and visualizing of genomic and metagenomic data with with platforms such as BDJ.
Sub-tasks:
Work-task 4: Apply the novel methods from Work-tasks 1, 2, and 3 to publish one or more pilot publications together with BIG4 project partners.
Educational program:
Courses that will be attended, and deadlines:
Other BIG4 courses which may be attended:
Individual study will be done leading up to an exam at Bulgarian Academy of Sciences covering the following topics:
Embedding of source code in the EP's will allow readers to test the results by running and modifying data and code. The thesis will be developed in accordance with the open science approach, which assumes that all data and standalone software tools or code developed through the project will be available as open data and open source.
As part of the scientific and methodological results, we expect to develop new approaches, methods and formats for publishing of data and narrative in biodiversity science. We also expect to develop novel methods for information flow between publications and external data repositories, and to illustrate the aforementioned methods in exemplar papers using data gathered in the BIG4 consortium.
We also expect a minimum of two scientific publications in open access journals, where the student will be the first author:
Furthermore, methods and other scientific exemplar papers are expected where the the student will be a co-author. The student will also give presentations at international symposia, write blog posts and actively popularize the results in the social media. The open science approach to development of the PhD, which starts with publication of the present PhD research plan, will be utilized throughout.
We would like to thank our colleagues at Pensoft, in particular Pavel Stoev and Teodor Georgiev - for many helpful working meetings; my colleagues at the Bulgarian Academy of Sciences - for reviewing and approving the Bulgarian version of this work plan; our partners at Plazi, in particular Donat Agosti аnd Terry Catapano - for contributing ideas and useful discussions; the reviewers, Alexey Solodovnikov, Daniel Mietchen, and Donat Agosti - for insightful comments and suggestions for improvement; and Prof. Rod Page - for the stimulating discussions and advice.
This project has received funding from the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No. 642241.
This PhD project is being developed by Mr. Viktor Senderov under the supervision of Prof. Lyubomir Penev at Pensoft Publishers and the Bulgarian Academy of Sciences as part of the larger BIG4 (Biosystematics, Informatics and Genomics of the 4 big insect groups) EU training network. The dissertation will be defended at the Bulgarian Academy of Sciences. In case of successful defense, Mr. Viktor Senderov will be awarded the title of Doctor of Entomology, whereas the field of specialization is Bioinformatics.
Pensoft Publishers, Bulgarian Academy of Sciences, various BIG4 partners.
Viktor Senderov participated in the concept development of his PhD and wrote the main text of the paper.
Lyubomir Penev developed, together with Viktor Senderov, the overall concept of the PhD plan, edited and revised the manuscript.