ARPHA-BioDiv : A toolbox for scholarly publication and dissemination of biodiversity data based on the ARPHA Publishing Platform

The ARPHA-BioDiv Тoolbox for Scholarly Publishing and Dissemination of Biodiversity Data is a set of standards, guidelines, recommendations, tools, workflows, journals and services, based on the ARPHA Publishing Platform of Pensoft, designed to ease scholarly publishing of biodiversity and biodiversity-related data that are of primary interest to EU BON and GEO BON networks. ARPHA-BioDiv is based on the infrastructure, knowledge and exeprience gathered in the years-long research, development and publishing activities of Pensoft, upgraded with novel tools and workflows that resulted from the FP7 project EU BON. What is ARPHA-BioDiv? The transformation from humanto machine-readability of published content is a key feature of the dramatic changes experienced by academic publishing in the last decade. Non-machine readable PDFs, either digitally born or scanned from paper prints, require ‡,§ ‡ ‡ ‡ ‡


What is ARPHA-BioDiv?
The transformation from human-to machine-readability of published content is a key feature of the dramatic changes experienced by academic publishing in the last decade.Non-machine readable PDFs, either digitally born or scanned from paper prints, require ‡, § BON and GEO BON networks.ARPHA-BioDiv constitutes a key EU BON deliverable (D.8.3).

ARPHA Journal Publishing Platform
The market for online collaborative writing tools has long been dominated by Google Docs.However, as it is too generic, it has not met the specific demands of academic publishing and, in recent years, some start-ups have developed platforms and services to fulfil this increasing gap in the publishing market.Some examples include Overleaf (originally WriteLaTeX), Authorea, ShareLatex and others, most of them being based on LaTeX, but differing in the level of complexity and features for manuscript writing.For people unfamiliar with LaTeX, the learning curve is steep which explains the comparatively restricted usage, mostly centred around the LaTeX community.Currently, none of the above-mentioned tools provides all the components of an end-to-end authoring, peer review and publishing pipeline.For instance, most tools lack a peer review system and rely on integrations with well-established platforms, such as Editorial Manager, ScholarOne, or others.
ARPHA has emerged as the first ever publishing platform to support the full life cycle of a manuscript, from authoring through submission, peer review, publication and dissemination, within a single, fully Web-and XML-based, online collaborative environment.The acronym ARPHA stands for "Authoring, Reviewing, Publishing, Hosting and Archiving" -all in one place, for the first time.The most distinct feature of ARPHA, amongst others, is that it consists of two interconnected but independently functioning journal publishing platforms.Thus, it can provide to journals and publishers either of the two or a combination of both services by enabling a smooth transition from the conventional, document-based workflows to fully XML-based publishing (Fig. 2): ARPHA-BioDiv is a set of standards, guidelines, tutorials, tools, workflows, journals and services, designed to facilitate the scholarly publication and dissemination of biodiversity data. 1.
ARPHA-XML: Entirely XML-and Web-based, collaborative authoring, peer review and publication workflow; 2. ARPHA-DOC: Document-based submission, peer review and publication workflow.
The two workflows use a one-stop login interface and a common peer-review and editorial manuscript tracking system.The XML-based workflow in use at Biodiversity Data Journal (BDJ) was the first of its kind back in 2013 and has since seen continuous refinement over the course of more than three years of active use by the biodiversity research community.It is also now used by the Research Ideas and Outcomes (RIO), One Ecosystem and BioDiscovery journals.The second, file-based submission workflow, is currently used by ZooKeys, PhytoKeys, MycoKeys, Journal of Hymenoptera Research, Nature Conservation, Deutsche Entomologische Zeitschrift, Zoosystematics and Evolution, NeoBiota and other journals, published by Pensoft.
At the core of the ARPHA-XML workflow is the collaborative online manuscript authoring module called ARPHA Writing Tool (AWT).AWT's innovative features allow for upfront markup, automisation and structuring of the free-text content during the authoring process, import/download of structured data into/from human-readable text, automated export and dissemination of small data, on-the-fly layout of composite figures and import of literature and data references from online resources.ARPHA-XML is also perhaps the first journal publishing system that allows for submission of complex manuscripts via a dedicated API.
The generic and domain-specific features of ARPHA (used for publication and dissemination of biodiversity data via the ARPHa-BioDiv toolbox) are listed in Table 1 and  ARPHA consists of two independent journal publishing workflows: (1) ARPHA-XML, where the manuscript is written and processed via ARPHA Writing Tool and (2) ARPHA-DOC, where the manuscript is submitted and processed as document file(s).X For editor's convenience, peer reviews in ARPHA are automatically consolidated into a single online file that makes the editorial process straightforward, easy and comfortable.

X
In the ARPHA-XML workflow, authors can publish updated versions of their articles anytime.X Table 1.

Generic features of the ARPHA Journal Publishing Platform
ARPHA-BioDiv: A toolbox for scholarly publication and dissemination of ...

Novel Article Formats
Research articles have traditionally been containers for scientiifc results for several centuries and this holds even more for research books.The Internet era brought disruptive changes to academic publishing and one of these is that the notion of the research article *1 as the only valid output for scientific endeavours was challenged.Resulting from this, novel article formats started to proliferate in an attempt to publish extra research objects from across the research cycle, such as methods, data and software.Pensoft pioneered several novel article formats with the launch of the Biodiversity Data Journal.Currently, the ARPHA Writing Tool supports nearly fifty article formats (Fig. 3), used in the Biodiversity Data Journal, Research Ideas and Outcomes, One Ecosystem, and BioDiscovery.The article formats can be generic, e.g.used within almost any domain (for example, research idea, research article, data management plan and others), or domain-specific, such as the article formats described below.

Data Paper
A data paper is a scholarly journal publication whose primary purpose is to describe a dataset or a group of datasets, rather than report a research investigation.As such, it contains facts about data, rather than hypotheses and arguments in support of those hypotheses based upon data, as found in a conventional research article (for details, see Newman and Corke 2009, Chavan and Penev 2011, Penev et al. 2017).
The Article template is available for Biodiversity Data Journal, One Ecosystem, Research Ideas and Outcomes (RIO), BioDiscovery.

Software Description
A publication that describes software or an online platform.It contains a link to an openly accessible code (for details, see Penev et al. 2017).Examples from: Biodiversity Data Journal.
Customisable templates are available for Biodiversity Data Journal, Research Ideas and Outcomes (RIO), One Ecosystem and BioDiscovery.

R Package
A description of an R Package including information on its purpose, installation and usage.The code should be openly available and a link to it should be present in the article.
The Article template is available for Biodiversity Data Journal, One Ecosystem, Research Ideas and Outcomes (RIO).

Monitoring Schema
A brief description of a monitoring schema including information on the monitored system component; its location; indicators used; spatial and temporal scales; purpose of the monitoring programme; and potential application of the resulting data.
The Article template is available for Research Ideas and Outcomes (RIO) and One Ecosystem.

Species Conservation Profile (SCP)
A publication of a single or multiple IUCN species assessment report(s) imported and edited in an IUCN-compliant species template.
Examples from: Biodiversity Data Journal.
The Article template is available for Biodiversity Data Journal.

Alien Species Profile (ASP)
An assessment report of alien or invasive species following an IUCN-compliant species template.After publication, the article can be exported to the Global Invasive Species Database (GISD).
The Article template is available for Biodiversity Data Journal.

Ecosystem Inventory
A brief description of a specific ecosystem type; its structures; processes and functions; abundant species; biodiversity; anthropogenic pressures; and management options.Data could result from, for example, direct observations, monitoring programmes, modelling or literature and database reviews.
The Article template is available for One Ecosystem.

Ecosystem Service Mapping
A brief description of an ecosystem service mapping study or application including information on the purpose of the map; data and methods used (biophysical, economic, social); mapped ecosystem service; mapped beneficiary (ecosystem service potential, flow, demand); spatial and temporal scale and indicators.The resulting maps should be included in the manuscript or uploaded to the ESP Visualisation Tool.
The Article template is available for One Ecosystem.

Ecosystem Service Models
A brief description of an ecosystem service mapping study or application including information on the purpose of the map; data and methods used (biophysical, economic, social); mapped ecosystem service; mapped beneficiary (ecosystem service potential, flow, demand); spatial and temporal scale and indicators.The resulting maps should be included in the manuscript or uploaded to the ESP Visualisation tool.
The Article template is available for One Ecosystem.

Semantic Tagging of the Article Content
In 2010, ZooKeys published its 50th issue Taxonomy shifts up a gear: New publishing tools to accelerate biodiversity research in a new format based on pre-publication tagging of biodiversity-specific terms in the article XML and semantic enhancements to the published paper (Penev et al. 2010b, Penev et al. 2010a).ZooKeys implemented the TaxPub XML schema, developed by Plazi, later endorsed as an extension of the Journal Archiving Tag Suite (JATS) standard (Catapano 2010).Since then, all life science journals published by Pensoft use the semantic markup workflow in their everyday editorial work to "atomise" and disseminate the content at sub-article level.A list of tools and features for semantic tagging and enhancements of the article content is available in Table 2; implementation and use cases are reviewed by Penev et al. (2012).Examples of the use of the domain-specific markup are illustrated in Fig. 4.

Integrated Narrative and Data Publishing
The "integrated narrative and data publishing", or "integrated data publishing", is a relatively new approach, assuming that data or code are imported in a structured form in the manuscript text and are downloadable from the published article.In biodiversity science, this term has been coined and first demonstrated by the Biodiversity Data Journal (BDJ), developed in the course of the EU-funded project ViBRANT (Smith et al. 2013, see also Fig. 5).Publishing of an executable code, also known as "literate programming", in an article was proposed back in 1984 (Knuth 1984), but only recently did we see this practice in journals (Veres and Adolfsson 2011).Another example of integrated data publishing is the linking of a standard article to an external platform that hosts all data associated with the article and provides additional data analysis tools and computing resources; this approach is believed to have been pioneered by the GigaDB and the GigaScience journal (Edmunds et al. 2016).Various kinds of implementing 3D or other multimedia visualisations in an article can also be considered as integrated narrative and data publishing; a good example of that in the biodiversity domain is the paper of Stoev et al. (2013).

Import of Data into Manuscripts
The ARPHA Writing Tool provides online direct import from external databases using community-accepted standards (e.g.within the biodiversity community, these are Darwin Core, TaxPub JATS extension and others -see http://www.tdwg.org/standards/).Initially, data import was from CSV spreadsheets or manually via a Darwin Core HTML editor (Penev et al. 2017).A new functionality of the integrated data publishing system in ARPHA is the online import of specimen records from GBIF, Barcode of Life, iDigBio and PlutoF (Fig. 6).The workflow is described in Senderov et al. (2016).Stepwise guidelines on how to use the feature are also available from Penev et al. (2017) and a blog post.Data and metatada import into manuscripts in ARPHA Writing Tool.
Another example of online import of structured text is the ReFindit tool which exists both as a stand-alone application and a plugin in ARPHA Writing Tool.ReFindit locates and imports literature and data references from CrossRef, DataCite, RefBank, Global Names Usage Bank (GNUB) and Mendeley.

Content and Data Export from Published Articles
Article content that is tagged and available in TaxPub XML can be harvested by aggregators which can select and pick sub-article elements, such as metadata, taxon treatments, occurrence records, images and others.Several of these aggregators are major players in biodiversity data preservation and management, for example, GBIF, Encycopedia of Life, Biodiversity Heritage Library, Plazi, Biodiversity Literature Repository at Zenodo, ZooBank, International Plant Names Index, MycoBank, Index Fungorum and many others.The data export in some cases is provided by a featured outbound API.The workflows and aggregators that use the semantically enriched article XMLs are listed in Table 2, and illustrated in part on Fig. 7; the initial core set of features was also reviewed by Penev et al. (2010b) and Penev et al. (2012).
All data published in the Biodiversity Data Journal can be downloaded in tabular format (CSV) straight from the article text and re-used by anyone, provided that the original source is cited (Fig. 8).Upon publication, the primary biodiversity data (for example, species occurrence records, species descriptions and taxon checklists) are also automatically exported into machine-readable Darwin Core Archives and become available for harvesting and indexing by aggregators (Fig. 8).Furthermore, species occurrences are indexed and made available as a separate dataset in GBIF bearing the article's DOI (Fig. 9) which increases the visibility and citation probability of both the article and the underlying data.

Submission of Manuscripts through an Application Programming Interface (API)
A distinct feature of the ARPHA-XML publishing workflow is the possibility to import complex manuscripts, including metadata, text figures, tables, references, citations and others, via an API available in ARPHA Writing Tool (Fig. 11, documentation at http:// arpha.pensoft.net/dev/).A working example of the workflow is described in the next section.

Creation and Publication of Data Papers from Ecological Metadata Language (EML) Metadata
Data papers, often called also "data articles", "data notes", or similar, were first established by the journals Ecological Archives (published by the Ecological Society of America ) and Earth System Science Data (ESSD) (published by Copernicus) (see Newman andCorke 2009, Chavan andPenev 2011).According to the definition of Chavan and Penev (2011), data papers are "scholarly publications whose primary purpose is to describe data, rather than report a research investigation.As such, data papers contain facts about data, not hypotheses and arguments in support of those hypotheses based on data, as found in a conventional research article.Their purposes are threefold: to provide a citable journal publication that brings scholarly credit to data publishers; to describe the data in a structured human-readable form; and to bring the existence of the data to the attention of the scholarly community."The data paper should include several important elements (usually called metadata, or "description of data"), for example: • Title, authors and abstract; • Project description; • Methods of data collection; • Spatial and temporal ranges and geographical coverage; • Collectors and owners of the data; • Data usage rights and licences; • Software used to create or view the data.
These metadata, if available and deliverable in machine-readable form (XML, JSON, etc.), can be used to produce a "data paper manuscript" that can be submitted to a journal for peer review and publication.The ARPHA approach to data paper publishing was first demonstrated in 2010 in a joint project of the Global Biodiversity Information Facility (GBIF) and Pensoft.As a result, this partnership created a workflow (Fig. 12) between the GBIF's Integrated Publishing Toolkit (IPT) and Pensoft's journals (ZooKeys, Phytokeys, Nature Conservation and others).A special module at IPT generates data paper manuscripts into RTF files from the extended metadata descriptions automatically, at the click of a button.Thereafter, manuscripts can be submitted to a journal for peer review and publication.After publication, the data paper's DOI is linked back to the dataset's DOI at IPT.In less than three years, more than 100 data papers have been published in Pensoft journals this way (for examples, see the Data paper subsection above).
Creation of data paper manuscripts from Ecological Metadata Language (EML) metadata hosted at the GBIF IPT Recently, the workflow was amended by a direct import functionality of EML metadata downloadable from GBIF, LTER and DatONE networks on to a data paper manuscript in ARPHA Writing Tool (Senderov et al. 2016, Penev et al. 2017, see also Fig. 13).The workflow has been thoroughly described in a blog post, while stepwise instructions are available via ARPHA's Tips and tricks guidelines.

Use Cases
The ARPHA-BioDiv toolbox has been developed in the course of several years and its tools, workflows and journals are used routinely by thousands of authors, reviewers, editors and readers worldwide.It is virtually impossible to list here the numerous use cases and approaches that have been tested and succesfully implemented over the years (see Penev 2017 andPenev et al. 2017 for review).Below we describe three publishing use cases that have been elaborated during the EU BON project.

Expert and Data Mobilisation through the Fauna Europaea Special Issue
One of the major data mobilisation initiatives realised by ARPHA and the Biodiversity Data Journal is the publication of data papers on the largest European animal database 'Fauna Europaea' within a new series "Contributions on Fauna Europaea", launched in 2014.This novel publication model was aimed at assembling in a single collection data papers on different taxonomic groups of higher rank covered by the Fauna Europaea project and accompanying papers highlighting various aspects of this project (gap-analysis, design, taxonomic assessments etc.) (Jong et al. 2014).Altogether, eleven artciles have been published so far.

Expert and Data Mobilisation through the LifeWatchGreece Special Issue
The LifeWatchGreece special collection LifeWatchGreece: Research infrastructure (ESFRI) for biodiversity data and data observatories was published in the Biodiversity Data Journal and currently contains twenty-three papers organised in four sections:(1) Electronic infrastructure and software applications; (2) Taxonomic checklists; (3) Data papers and (4) Research articles (Arvanitidis et al. 2016).The Biodiversity Data Journal was chosen because it is a "community peer-reviewed, open access, comprehensive online platform for publishing part of the up-to-date outcomes of LifeWatchGreece and enables the publication of a wide variety of papers (e.g.software descriptions, data papers, taxonomic checklists and research articles) along with the accompanying datasets and supporting material" (Arvanitidis et al. 2016).

EU BON Open Science Collection in RIO Journal
The journal Research Ideas and Outcomes (RIO) was designed to publish all outputs of the research cycle, from research ideas and grant proposals to data, software, research articles and research collaterals, such as workshop and project reports, guidelines, policy briefs, Wikipedia articles and others (Mietchen et al. 2015).In the RIO Journal, EU BON realised one of the first ever open science collections of publications, entitled Building the European Biodiversity Observation Network (EU BON) Project Outcomes.To date, the collection contains 15 publications.

Biodiversity Data Legal Framework and Policies
The legal framework and policies for publishing and re-use of biodiversity data is a subject of primary interest to the biodiversity community and policy-makers.Several EU BON teams and tasks worked on various aspects of the subject which resulted in the following set of documents:

Licences for publishing and re-use
This section from the paper of Penev et al. (2017) builds on the fundamental principles of open data publishing and re-use, known as Panton Principles and their biodiversity-specific interpretation in the Bouchout Declaration for Open Biodiversity Knowledge Management.The document is supported by a wide range of previously published research and review papers, as well as the data publishing practices of Pensoft and other publishers (Penev et al. 2011a, Hagedorn et al. 2011, Egloff et al. 2014).
The recommended data publishing licence used by Pensoft is the Open Data Commons Attribution License (ODC-By), which is a licence agreement intended to allow users to freely share, modify and use the published data(base), provided that the data creators are attributed (cited or acknowledged).This ensures that those who publish their data receive the academic credit that is due.
Alternatively, other licences, namely the Creative Commons CC0 (also cited as "CC-Zero" or "CC-zero") and the Open Data Commons Public Domain Dedication and Licence (PDDL), are also STRONGLY encouraged for use in the Pensoft journals.According to the CC0 licence, "the person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighbouring rights, to the extent allowed by law.You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission."

Strategies and Guidelines for Scholarly Publishing
The Strategies and Guidelines for Scholarly Publishing of Biodiversity Data (Penev et al. 2017) have been elaborated during the Framework Program 7 EU BON project on the basis of an earlier version published on Pensoft's website in 2011 (Penev et al. 2011a).
The document discusses some general concepts, including a definition of datasets, incentives to publish data and licences for data publishing.Further, it defines and compares several routes for data publishing, namely as (1) supplementary files to research articles which may be made available directly by the publisher or (2) published in a specialised open data repository with a link to it from the research article or (3) as a Data Paper, i.e. a specific, stand-alone publication describing a particular dataset or a collection of datasets or (4) integrated data publishing through online import/download of data into/ from manuscripts, as provided by the ARPHA Writing tool and its associated journals (Biodiversity Data Journal, RIO Journal, One Ecosystem).
The paper also contains detailed instructions on how to prepare and peer review data intended for publication, listed under the Guidelines for Authors and Reviewers, respectively.Special attention is given to existing standards, protocols and tools to facilitate data publishing, such as the GBIF Integrated Publishing Toolkit (IPT) and the DarwinCore Archive (DwC-A).
Here, we include the table of contents of the document which will give the reader a comprehensive overview of its content (Penev et al. 2017): The Strategies and Guidelines are referred to in the Author Guidelines of Pensoft's journals and are used in their everyday publishing practices.

Tutorials, Manuals and Supporting documentation
The current article describes the rationale, overall structure and the key elements of ARPHA-BioDiv.The various elements of ARPHA-BioDiv have been featured in several papers (cited in the respective sections of the present document), guidelines, blog posts and tutorials.Below, some important supporting documentation are listed to assist the users to access this complex system.

Future of ARPHA-BioDiv
In the future, we want to reimagine and reinvent the academic publishing process.At the dawn of academic publishing, papers had been written especially for human consumption.The human mind alone was expected to crunch the data.Now humans rely on computers to store and manipulate the data and verify the correctness of numerical algorithms, whereas our minds focus on the big picture and the story behind the data.
With ARPHA-BioDiv, we have already taken the first few steps in creating articles that can be read both by humans and computers, as has been described so far in this article.However, more can be done.One area of innovation in academic publishing lies in creating linked content -embedding machine-readable database records in each publication that are linked to the world-wide network of linked knowledge hubs.To achieve this goal, we are currently working towards exporting content that has been semantically enriched in a knowledge graph called the Open Biodiversity Knowledge Management System or OpenBioDiv for short (pro-iBiosphere 2014, Senderov andPenev 2016).
This will enable the reader of an aritcle, for example, to connect published occurrence data to portals such as GBIF and geographic repositories such as GeoNames.An illustration of the use-value of this integration will be, for example, an accelerated creation of various models, such as species distribution models, based on the article data.Thanks to the linking of the occurrence data in the article to databases, it will be possible to assemble all the elements needed for a species distribution model of the discussed taxon programmatically in an environment such as R.Moreover, the links in themselves are valuable information and can point to "hot" topics, such as "hot" taxa or "hot" figures, having many incoming links to them (Page 2016).Or, the user may choose to investigate the genetics of the taxon, the occurrence of which they had just seen, through a link to GenBank.
We also believe that a large portion of tradional academic publishing, even if enriched with Linked Data, will be supplemented by nano-publications (Groth et al. 2010, Mons et al. 2011, Chichester 2013).More and more academic research reveals stories and data that cannot be published in the traditional seven-figure-paper.Imagine that the research team you are leading has just discovered 500,000 gene-disease associations across the genome of an important domestic animal.You want all of these findings to be first class research objects -with DOIs, just as publications -and not to be relegated only to a database record that can be altered or deleted.Towards this goal, we are working on nano-publications: first class research objects with DOIs and metadata including author, publisher, etc. which are published as a regular publication, but nevertheless formatted primarily as a machinereadable fact that can be ingested by a database without any alterations.
Finally, we believe that publishers are stewards of the worlds' scientific information and there is knowledge in the totality of the published articles that is not part of any article alone.We are working on artifical intelligence algorithms both from the machine logic domain and from the machine learning domain to discover this hidden knowledge.The authors of tomorrow will have at their disposal not only a tool to format their manuscript, add citations and mark-up their data, but also tools that will discover additional information relevant to the authors' ideas and suggest similar research during the authoring phase.
And, if we can dream very big, why not have artificial intelligence algorithms sophisticated enough to act as a research assistant during the authoring phase?What a marvelous thought!promotion and PR support; SP -webdesign; PS -editorial supervision and project management.
combination of software platform and a wide range of associated services.X X ARPHA serves individual journals or multiple journal platforms.X X Integrated with the industry leading indexing and archiving platform (see list) through web services, APIs and data exchange protocols.customisable by journal.It can be conventional (either single-blind or double-blind), community-sourced, or public.X X Online collaborative authoring tool (ARPHA Writing Tool, abbreviated AWT, formerly Pensoft Writing Tool, abbreviated PWT), closely integrated with submission, peer review, production and dissemination tools.X Collaborative work on a manuscript with co-authors; external contributors, such as mentors; pre-submission reviewers; linguistic and copy editors; or colleagues.The external contributors are not listed as co-authors of the manuscript.X Large set of pre-defined, but flexible article templates covering many types of research outcomes.X Online search and import of literature or data references; cross-referencing of in-text citations; import of tables; upload of images and multimedia; assembling images for display as composite figures.X Automated technical validation step (it can be triggered by authors any time) checks the manuscript for consistency and for compliance with the JATS standard as well as the journal's requirements.X Human-based, interactive pre-submission technical check and validation tool helps authors to proceed with their manuscripts to a form almost ready for publication.X Pre-submission external peer review(s) performed during the authoring process.The presubmission peer reviews are submitted together with the manuscript to prompt editorial evaluation and publication.

Figure 3 .
Figure 3. Article formats available in ARPHA Writing Tool.
Figure 4. Examples of use of the domain-specific XML markup in the published artices.a: Interactive mapping of geo-coordinated species occurrences (example from Frolov and Akhmetova 2013).b: Pensoft Taxon Profile (PTP) is created in real time by clicking on any taxon name mentioned in an article (in this case Annoniaceae from Hoekstra et al. 2016).c: Images and pages from historic literature where a taxon name has been mentioned are available from various sources (e.g.Encyclopedia of Life and the Biodiversity Heritage Library via Pensoft Taxon Profile (PTP) (in this case Annoniaceae from Hoekstra et al. 2016).d: All taxon names usages (TNU) in an article are indexed and matched to their type of use (e.g.citations in the text, heading a taxon treatment, associated to images or present in identification keys, example from Brown et al. 2017).

Figure 5 .
Figure 5. Integrated data and narrative publishing in the ARPHA-XML journal workflow.

Figure 7 .
Figure 7.Extraction and delivery of data and content from published articles to aggregators, nomenclators, archives, and indexers.

Figure 8 .
Figure 8. Export of data from articles published in Biodiversity Data Journal.Species occurrences and other structured data tables can be downloaded in CSV format (green arrow); all species occurrences are also available as Darwin Core Archives and are automatically harvested and indexed by GBIF (red box and arrow).

Figure 9 .
Figure 9.The occurrence data from articles published in the Biodiversity Data Journal (in this case from the paper of Johnson 2013) are automatically indexed via Darwin Core Archive in the GBIF Integrated Publishing Toolkit.

Figure 10 .
Figure 10.Data extraction and re-publishing workflow of the Advanced Books platform

* 2 Figure 11 .
Figure 11.Submission of manuscripts to ARPHA Writing Tool through Application Programming Interface (API).

Figure 13 .
Figure 13.Conversion of Ecological Metadata Language (EML) metadata into data paper manuscripts in ARPHA Writing Tool.
The last two documents summarise the effort and can serve as guidelines and recommendations in the work Group on Earth Observation's Biodiversity Observation Network (GEO BON) and beyond.