Research Ideas and Outcomes : Research Presentation
Print
Research Presentation
From Open Access to Open Science from the viewpoint of a scholarly publisher
expand article info Lyubomir Penev
‡ Pensoft Publishers & Bulgarian Academy of Sciences, Sofia, Bulgaria
Open Access

Abstract

Background

The open access publishing model led to dramatic changes in the way scientists communicate their results. Open access also challenged the traditional business models of academic publishers that have been maintained for hundreds of years. Open access to article content, however, soon appeared insufficient as far as access to underlying data was concerned. Opening research data came as the logical second stage of this challenge which was soon put on the agenda of scientific communities, funding organisations and governments. Open data, by itself, raised the question how we can re-use data and reproduce research results, how transparent is the peer-review and, more generally, how scientific evaluation is being performed. Over time, these and other similar developments morphed into what we now call "open science" or, in more general terms, transforming research into a primarily collaborative rather than a primarily competitive endeavour.

New information

The present lecture summarises the key milestones of the movement from open access through open data to open science from the viewpoint of an academic publisher. It is also illustrated by the ARPHA Biodiversity Data Publishing and Dissemination Toolbox (ARPHA-BioDiv) which is a set of standards, guidelines, tools, workflows and journals, developed by Pensoft within its ARPHA Journal Publishing Platform. The history of development of ARPHA-BioDiv largely resembles the evolution of the open access to open science which started with the pre-publication semantic markup of important domain-specific terms and relationships between them, as implemented in 2010 by the ZooKeys open access journal and then followed by others, for example: PhytoKeys, MycoKeys, Nature Conservation, NeoBiota, Journal of Hymenoptera Research, Deutsche Entomologische Zeitschrift, Zoosystematics and Evolution. The next stage of integrated narrative and open data publishing was pioneered in 2013 by the Biodiversity Data Journal and its associated authoring tool, the ARPHA Writing Tool (AWT), launched as the first ever journal publishing workflow that supported the full life cycle of a manuscript, from writing through community peer-review, publication and dissemination within a single, entirely Web- and XML-based, online collaborative platform. The latest stage of open science publishing is demonstrated by the Research Ideas and Outcomes (RIO) journal that publishes all outputs of the research cycle – including project proposals, data, methods, workflows, software, project reports and research articles – together on a single collaborative platform, with the most transparent, open and public peer-review process.

Keywords

open access, open data, open science, scholarly publishing, semantic publishing, XML tagging, data re-use, science communication

Introduction

Open access to research articles was a disruptive change in the academic publishing paradigm that took place and, for about the last 20 years, is still continuing (Suber 2012). While open access was still in a process of establishing itself as a publishing model, it soon became clear that only opening the narrative, human-readable content (especially, if the latter is presented only as PDF), is far from sufficient to utilise its huge potential for scientific progress.

Open access plus open data were indeed great steps forward in both research and publishing practices. Nonetheless, even before open access and open data succeeded in becoming a significant publishing model, scientists, funders and governments started to realise that there were many other issues in the whole ecocystem of research production and communication that should be performed in a much more open and transparent manner than they are now. This is how we all arrived at the concept of open science (Nielsen 2011, Pontika et al. 2015, see also the TED talk video of Michael Nielsen). Open science refers to a whole range of issues around opening up the research life cycle, the most important of which are: (1) Open access, (2) Open data, (3) Free and Open-source software, (4) Reproducible research, (5) Open peer-review, (6) Open science policies, (7) Open funding, (8) Open science evaluation, (9) Open science tools and (10) Open education. A critical requirement of open science is the transparency in methodology, observation and collection of data, open access and re-usability of research objects covering the entire research cycle, public accessibility and transparency of scientific communication – including the open peer review process – and using web-based open tools for scientific collaboration and communication. In brief, open science builds on collaboration rather than competition between researchers (European Commission 2016b).

The process of transformation of open access into open science academic publishing is the main focus of the current presentation held within the iDiv Seminar Series at the Biodiversity Informatics Unit of the German Centre for Integrative Biodiversity Research (iDiv), Leipzig, on 15th of February 2017. The presentation claims that the way we publish most of the scientific results nowadays creates some bottlenecks that hamper the otherwise extraodrinary rapid progress in science. It illustrates the transition from open access to open science in the field of biodiversity publishing which is the main area of expertise of the author and the publishing company he has established, Pensoft Publishers.

Presentation

This presentation consists of four main blocks: (1) Open access, (2) Open data, (3) Open Science and (4) The Future. The first block presents the story of the flagship journal of Pensoft, Zookeys, established as a conventional open access journal in 2008. Soon thereafter, we realised that the continuing emphasis on formats that make it difficult to extract the content algorithmically, e.g. paper or PDF, was – and still is! – one of the increasingly worrying impediments in data and content (re-)usablity (Agosti 2006, Agosti 2016). Compensating for this lack of machine readability requires significant additional effort of post-publication markup and data extraction into a structured form, in order to make publications and data inter-operable and re-usable. One of the solutions to this problem was the pre-publication markup of important domain-specific terms and relationships between them which has been implemented in ZooKeys in 2010 (Penev et al. 2010) and subsequently in other journals published by Pensoft, e.g. PhytoKeys, MycoKeys, Journal of Hymenoptera Research, Deutsche Entomologische Zeitschrift, Zoosystematics and Evolution (Penev et al. 2012). For the pre-publication markup, the TaxPub XML extension to the Journal Archival Tag Suite (JATS) developed by Plazi was used (Catapano 2010).

Open access to journal articles gave birth to a quickly growing baby now known as "open data publishing" which normally takes place as: (1) publishing data supplementary files to the article, (2) deposition of data in repositories and linking these to and from the article, (3) stand-alone description of the data as "data papers" or "data notes" and (4) publication of data integrated in the narrative content of the article. This last stage of publication of machine-readable, integrated structured biodiversity data and narrative was piloted by the Biodiversity Data Journal (BDJ) and its associated authoring tool, the ARPHA Writing Tool (AWT), launched within the ViBRANT EU Framework Seven (FP7) project (Smith et al. 2013). The Biodiversity Data Journal realised in practice the first ever journal publishing workflow that supported the full life cycle of a manuscript, from writing through community peer-review, publication and dissemination within a single, entirely Web- and XML-based, online collaborative platform. Over the course of the years since its inception, the BDJ workflow has been continuously improved, e.g. by way of an upgrade to the ARPHA-XML journal publishing workflow as an integral part of the ARPHA Journal Publishing Platform.

Pensoft's response to the open science challenge was the launch of the Research Ideas and Outcomes (RIO) journal that publishes all outputs of the research cycle – including project proposals, data, methods, workflows, software, project reports and research articles – together on a single collaborative platform, with the most transparent, open and public peer-review process (Mietchen et al. 2015, see also the RIO video). The scope of the journal encompasses all areas of academic research, including science, technology, medicine, humanities and the social sciences. A good example of a collection of papers that covers a wide range of research outcomes is the one produced by the EU BON FP7 project: Building the European Biodiversity Observation Network (EU BON) Project Outputs.

What is next? At Pensoft, we believe that academic publishers will soon face another disruptive change in their everyday publishing practices which will be provoked by the need to handle, publish and export semantically enhanced content into Linked Open Data (LOD). Since 2015, Pensoft – together with our partners from Plazi – instigated an Open Biodiversity Knowledge Management System (OBKMS). OBKMS aims at converting and amalgamating RDF data extracted from legacy, prospectively published literature, and unpublished sources, together with ontologies and vocabularies, into a Graph database, in order to ensure cross-domain inter-operability and new horizons of data re-use in the semantic Web space (pro-iBiosphere 2014, Senderov and Penev 2016).

Data resources

The presentation that is described in the current article is available from Slideshare (http://www.slideshare.net/pensoft/from-open-access-to-open-science-from-the-viewpoint-of-a-scholarly-publisher-72128076), the Biodiversity Literature Repository at Zenodo (DOI: http://doi.org/10.5281/zenodo.291411), and also as a supplementary file to the article (Suppl. material 1).

Conclusions

Academic publishing and science communication, in general, experience disruptive transformations that can be summarised as follows:

  1. From open access to open science. Open access and open data publishing models are quickly being transformed into open science practices that affect the whole ecosystem of producing, communicating and re-using research results.
  2. From human-readable to machine-readable content. Machine readability of the content is now at least as important as human readability as it facilitates the automated harvesting, text mining and re-use of content.
  3. From open data to data re-use. Publishers should strive to implement technologies that integrate structured data into the narrative to the highest possible degree.
  4. From traditional publishing to technology-driven service. Technological innovations become critical for the proper publishing and dissemination of scientific content, hence for the survival and sustainability of scientific journals and publishers.
  5. From semantic enrichment of content to semantic publishing. Semantic tagging and enrichment of content is seen as a transitional step towards the next stage of transformation of the published content into Linked Open Data (LOD).

Presented at

Lecture held at the Biodiversity Informatics Unit of the German Centre for Integrative Biodiversity Research (iDiv), Leipzig, on 15th of February 2017, within the iDiv Seminar Series.

References

Supplementary material

Suppl. material 1: From Open Access to Open Science from the Viewpoint of a Scholarly Publisher
Authors:  Lyubomir Penev
Data type:  PowerPoint presentation (pptx)
Brief description: 

A presentation held by Lyubomir Penev in the iDiv Seminar Seies at the Biodiversity Informatics Unit of the German Centre for Integrative Biodiversity Research (iDiv) Leipzig, 15 February 2017.