The project EcoNAOS : vision and practice towards an open approach in the Northern Adriatic Sea ecological observatory

The Open Science approach delineates high and inspiring principles. In order to really root them into the scientific community, a cultural shift must occur that can be fostered and achieved mainly through the development of practical case studies. This is particularly relevant in the field of ecology, a highly multidisciplinary science, where the Open Science issue has become a matter of discussion only in very recent years. In particular, in the context of long-term ecological research, enabling open sharing of in-situ and derived longterm ecological data is required to advance research and education in the European and global networks. ‡ § ‡ ‡


Background
The Open Science approach delineates high and inspiring principles.In order to really root them into the scientific community, a cultural shift must occur that can be fostered and achieved mainly through the development of practical case studies.This is particularly relevant in the field of ecology, a highly multidisciplinary science, where the Open Science issue has become a matter of discussion only in very recent years.In particular, in the context of long-term ecological research, enabling open sharing of in-situ and derived longterm ecological data is required to advance research and education in the European and global networks.

Introduction
A critical global and regional issue is the man-induced changing of the marine ecosystem, which threatens its sustainable use by present and future generations.Significant challenges still persist in developing and delivering the sound ecological knowledge necessary to assess the on-going changes and their impact on the sustainable use of the sea and on the consequences for the protection strategies.This requires an innovative integration of ecological and oceanographic research and conservation monitoring programmes, across a wide range of temporal and spatial scales.The creation of marine ecological observatories, able to arrange and maintain integrated, harmonized and coherent long-term ecological observations, is actually stressed as a relevant step at the European level, for sustaining European marine policies (see, e.g., the EASAC and JRC (2016) policy report and the Recommendations from the G7 expert workshop on future of the oceans and seas -G7 Science Ministers 2016).Indeed, the most significant issues concerning the sea are fundamentally biological and socio-ecological, since they are related to its role as a source of food, health and human well-being.
The Northern Adriatic Sea (NAS) is a significant geographical zone for the establishment of a marine ecological observatory, due to the concomitant presence of high degree of biodiversity, sensitive habitats and ecosystems, numerous on-going monitoring and research activities, as well as of heavy and diversified human pressures and economic interests, based on the marine resources of the area.It is one of the 25 parent sites belonging to the LTER-Italy (Long-Term Ecosystem research) network: it is composed by 4 LTER research sites (Fig. 1, https://data.lter-europe.net/deims/site/lter_eu_it_012),where meteo-oceanographic and biological data, mainly on plankton, are gathered through both fixed point observatories and oceanographic cruises.LTER is based on gathering and analysing multi-decadal ecological observations and data to support understanding and managing the environment.It provides a sound basis of data and knowledge that is particularly important in our rapidly changing world, where processes like climate change, land and sea exploitation, and global trade are dramatically affecting the environment, altering ecosystem structure and functioning, which are at the base of the ecosystem services we depend on (e.g.provision of food and water, air and water quality, and the aesthetic value of a landscape).LTER is organised in networks of sites at the national, regional (LTER-Europe) and the global level (ILTER).LTER-Italy, a formal member of LTER-Europe and LTER-International since 2006, consists of 79 research sites belonging to terrestrial, freshwater, transitional and marine ecosystems, managed and coordinated by public research, monitoring Institutions and Universities.The LTER-Italy parent site "Northern Adriatic Sea".The four research sites that compose it, together with the fixed point observatories, are evidenced.1: Gulf of Trieste and Mambo buoy, 2: Gulf of Venice and Acqua Alta Tower, 3: Po Delta and Romagna Coast and S1-GB and E1 buoys, 4: Senigallia-Susak Transect and TeleSenigallia Pylon (Minelli et al. 2018) The project EcoNAOS: vision and practice towards an open approach in the ... The Italian national flagship project RITMARE ("Italian research for the sea"), funded by the Italian Ministry of University and Research, dedicated a Research Line to the establishment of a marine ecological observatory in the NAS: building on the existing facilities, infrastructures and long-term ecological data, it aims at enhancing and aligning the marine observational capacities and at activating synergies among the main conservation management questions and key ecological and oceanographic variables.
Along this process, it is crucial to start the implementation of the Open Access and Open Science principles, by creating an open research lifecycle, which involves sharing each step of the process, including not only results (scientific papers) but also data (raw and processed), metadata, methods and software.We introduce here each of the different steps planned for "Opening Science" in this context: research ideas statement, raw data collection, data harmonization (structural, syntactic and semantic), ancillary data collection/ recovery and metadatation, data and metadata publication, software publication, publication of results, and data citation.The whole process of application of the Open Science principles to the NAS ecological observatory will be referred in the text, from now on, as the project "EcoNAOS" (Ecological Northern Adriatic Open Science Observatory System).
The Open Science concept comes as far back as the early 1980s, when the need for an Open Source code was first expressed and formalized by R. M. Stallmann with the GNU project (Stallman 1998) and the Free Software Foundation institution.More recently (Ross andHarlan M. Krumholz 2013, Tessarolo et al. 2017), a tendency arose to change the common practice of publishing scientific results in closed environments still the dominant one despite the pioneering goal of the World Wide Web, consisting in allowing the free circulation of knowledge (Berners-Lee 1989).Although the concept of Open Access has been mentioned and formalized since 1957 with the institution of the World Data System by the International Council for Science (Ruttenberg 1992), the scientists' attitude to protect their data and associated knowledge as a private property, and the business interests surrounding the publication of scientific articles in a restricted and certified number of scientific journals have not allowed for its full application.Until the early 2000s, most of the scientific research outputs and data, and more generally the global knowledge was still under the control of few research institutions, governments, and publishers (David 1998, David 2003).Something started to change with the publication of milestones papers and books in the Open knowledge history such as "The cathedral and the bazaar" by Raymond (1999).This change was supported in 2002 by the Budapest Open Access Initiative, which formalized the concept of Open Access with respect to the new available technologies, and also by Various Authors (2003) who stated the "Berlin Declaration" on Open Access in science and humanities fields.Broadening the principles defined in the two fundamental pillars of Open Source and Open Access, the concept of Open Science came to light, becoming well-defined by the European-funded project FOSTER as " the practice of science in such a way that others can collaborate and contribute, where research data, lab notes and other research processes are freely available, under terms that enable reuse, redistribution and reproduction of the research and its underlying data and methods".From this definition it is possible to argue that Open Data represent a crucial part of Open Science (European Commission 2016a).Klump et al. 2006 affirmed that data should follow Open Access principles and, in recent times, the European Union, with the program Horizon 2020, stated that all the European research must be "Open by Default".This means that not only outcomes and results of the research must be accessible to everyone, but also data, code and tools, in order to make science Findable, Accessible, Interoperable and Reproducible (F.A.I.R., Wilkinson et al. 2016, Moedas 2016).
Open Science is actually also one of the key thematic elements of RRI -Responsible Research and Innovation (Owen et al. 2012), which recognizes that making research results more accessible contributes to better and more efficient science and to innovation in the public and private sectors.Open Science is actually the route towards a democratic way of making freely available, for every researcher or stakeholder, research ideas, data, metadata, tools and outcomes of the research itself.
From the researchers point of view, open practices can give advantage to gain more citations, media attention, potential collaborators, job and funding opportunities (McKiernan et al. 2016).
In the field of ecology, the Open Science issue is a matter of interest and discussion only in quite recent years (Reichman et al. 2011).Actually, ecology, being a highly multidisciplinary science, will surely benefit from the Open Science revolution.However, the cultural shift from "data ownership to data stewardship" is not yet widely accomplished and data sharing standards, both from technical and ethical point of view, still have to be established.An overview on Open Science practices in ecology was recently given by Hampton et al. (2015).In the LTER context, enabling open sharing of in-situ and derived long-term ecological data is important in order to advance research and education.LTER data are characterised by long and continous time series; therefore data consistency and integrity is crucial, in order to make the scientists able to identify reliable trends.Since observations' technologies, precisions and methods are changing over long periods in LTER networks, the maintenance of these datasets is a challenge addressed also at the European and international level.LTER data must currently be recognized as a research output with formal citation links, to promote data reuse and acknowledgement of originators, in agreement with the F.A.I.R. principles for data management and Open Science (European Commission 2016b).
The Open Science approach delineates high and inspiring principles to be followed.However, in order to really root them into the scientific community, a cultural shift must occur.This can be fostered and achieved mainly through the development of practical case studies and projects, where the cultural resistances and uncertainties could be evidenced, appreciated instead of neglected, and included as parts of the path towards Open Science.In this article we present plans and ideas for the application of the Open Science principles to the marine domain in LTER Italy, in the context of the establishment of the NAS marine ecological observatory.The focus will be, as a starting point, on the long-term data on plankton and related abiotic factors produced in 50 years of oceanographic cruises in the NAS, prevalently in one of its 4 research sites: the Gulf of Venice (Fig. 1).However, we do not consider only the data, but each of the different steps of the research lifecycle (Figs 2, 3) needed for "opening science": research ideas statement, raw data collection, data harmonization (structural, syntactic and semantic), ancillary data collection/recovery and metadatation, data and metadata publication, software publication, publication of results, data citation.Our final goal is to demonstrate that a change of vision is possible, leading from "publishing ASAP" to "sharing data and information and collaborating ASAP" (Moedas 2015).The whole process involved, since its start, both LTER and data management researchers in a joint partnership, in order to share and give value to the different points of view and expertise and include, as essential parts and outcomes of the process itself, also cultural resistances and practical challenges.

Rationale and Data Description
Publishing and sharing the different phases of the research process involves not only results, but also ideas, data collection and elaboration, metadata and tools construction.Many possible integrations, reuse and different implementations in the frame of Open Science could lead to growth, improvement, and widening of the research process as a whole.Starting from the circular model conceived by Rüegg et al. (2014) to describe research life cycle, and integrating the key role of metadata and Open Science, it is possible to rethink the life cycle of data within a growing spiral mode (Fig. 3).A representation of the open scientific process lifecycle explained in Sarretta 2016 is given in the Fig. 2.
The spiral model can be applied to many different research topics, because it follows the entire lifecycle of the research, from data collection to results publication.We used as input a specific dataset: 50 years of data on abiotic parameters, phytoplankton and zooplankton, gathered from 1965 until 2015 at the LTER-Italy parent site NAS, prevalently in the Gulf of Venice, from oceanographic cruises and, more recently, fixed observation points.The dataset has been produced by the Institute of Marine Sciences of the National Research Council of Italy (ISMAR-CNR), located in Venice, using different methodologies, which changed according to sampling and laboratory instruments and technologies available in different times or to research interests.The data is composed by parameters quite heterogeneous in type, number, geographical distribution and semantics.An example is reported in Fig. 4a, b, c.The parameters are both abiotic and biotic: water temperature, transparency (Secchi Disk), pH, dissolved oxygen, dissolved macronutrients, phytoplankton chlorophyll-a, phytoplankton and zooplankton abundance.
The data have been stored in paper registries and in spreadsheets.Old data have been recovered from paper registries and transcribed.Due to its relatively small dimension (about 60.000 records) the dataset currently is managed as a unique spreadsheet.A lot of raw (coming directly from sensors), semi-elaborated (first pruned data, e.g. from on board instruments) and elaborated data (e.g.coming from laboratory analysis) are available.
The elaboration of the datasets will be worked out by a small group of plankton ecologists and data management experts, with the aim of sharing and harmonizing as well the different experiences, needs and points of view.This collaboration is an important step to contribute overcoming cultural differences, barriers and fragmentation that might represent an obstacle for Open Science.The inclusion of the different attitudes and specificities allows flexibility and fosters finding jointly proper solutions and incentives.

Implementation
A participatory process will be set up, for the whole duration of EcoNAOS.In fact, it is fundamental in order for the project to be successful, to involve and interest each researcher working with the specific dataset, receiving and accepting feedbacks, suggestions and evaluations at each step of EcoNAOS, driven by the spiral model and described in detail in the following paragraphs.The concepts and the outcomes of EcoNAOS will be tested on its applicability involving researchers from the LTER-Italy network in order to verify, within a wider but related community, the opening process effectiveness, its limits and strengths.In fact, it is important to understand perceptions and barriers (technical or cultural) with regard to Open Science and in particular Open Access applied to data, in order to find the right way to communicate Open Science and try to overcome existing obstacles.
During the whole project, we will collect impressions on the EcoNAOS ideas and development through workshops, involving institutions and research groups dealing with similar data at the LTER research sites of the NAS and at another LTER-Italy research site in the Tyrrhenian Seas (LTER Marechiara site).In these workshops we plan also to propose to our colleagues testing the opening process by the calculation of some relevant environmental quality indicators that could be executed by each institution/group using its own LTER dataset.EcoNAOS will be organized in six main steps: First step -Harmonization of the database.A complete harmonization, aimed to a real interoperability, includes standardisation of semantic, schemes and structure of a database (Kwok 2010).This harmonization will be carried out as much as possible automatically.This implies the implementation of appropriate tools in order to clean up, re-order and organize (where necessary) the database.Procedures will be developed in order to automatically harmonizing semantic and structure of sampling stations.The clean dataset will be used both for research and management purposes, by research institutions and environmental agencies.
Second step -Description of data through the collection of all the available metadata, such as information about the specific sampling cruise, the used instruments and probes, the involved researchers and operators, and so on.These metadata are supposed to be found in papers or digital archives and the aim of this specific task is not only to better describe and enrich the information related to specific data, but also to build up fundamental and affordable tools for the successive quality and consistency data check phases, and the reliability and the actual meaning of each sampled datum.
Third step -Integration of all the data in the well-established technological infrastructure created in the wider RITMARE project (Fugazza et al. 2014).This infrastructure is modelled on standard web services, the cardinal elements of a Spatial Data Infrastructure (SDI), and has been implemented for the exchange and use of marine geospatial data in an interoperable way.Following the Service Oriented Architecture (SOA), adopted and recommended by the INSPIRE Directive (2007/2/CE, Network Services Drafting Team -EU 2007), geospatial web services can share different types of marine geodata through W3C ( World Wide Web Consortium), ISO (International Organization for Standardization), OGC ( Open Geospatial Consortium), and GEOSS (Global Earth Observation System of Systems) standards.To increment the capacity in the preservation, publishing and discovery of data (Fig. 2) the Open Source software suite GET-IT (Geoinformation Enabling ToolkIT starterkit®, Oggioni et al. 2017) and the customizable, template-driven metadata editor EDI (Pavesi et al. 2016, Tagliolato et al. 2016) have been developed in RITMARE.These tools, tested in some LTER-Italy mountain sites (NextData project) and adopted as a data node in the H2020 project "eLTER", have been released as Open Source software, and can be used for a wider documentation, publication and sharing of data, which is one of the scopes of EcoNAOS.They have already been tested using small portions of ocenographic data coming from the Gulf of Venice (Bastianini et al. 2015) and reported to be performing in presentation, update, querying and extraction of data.In our conception, the data life cycle is fully integrated into the research life cycle and the release of the dataset is previewed at different research stages (for different dataset versions).At the same time, EcoNAOS must not represent an increase of work for the researcher nor an imposition on data management, so the internal handling of data (tools and infrastructure to perform research, not aimed to share data or results) inevitably remains a prerogative of single researchers and research groups.
Fourth task (crosscutting all the others) -Publication of ideas, results, tools, data and metadata after an accurate Open Science framing.In this particular phase, it is important to choose the best tools in relation to the principles that guide our work and to the structure of our data.It is primarily important to us that everything could be published as soon as possible and in an Open Access environment (preferably gold road Open Access).Some solutions have already been studied: an example is the choice of RIO journal to publish these research ideas.We took this decision because it is Open Access and has a fully open and citable review process; another preferable choice appears to be the use of Githu b platform to share the code written from time to time.Some others solutions are still under discussion.
Fifth step -Implementation of a proper data citation system dealing also with the dynamic character of long-term data and versioning.So the issue of citing a dynamic and growing dataset (or portions of it) soon will arise.Querying an oceanographic database temporally, semantically or in relation to one or more numerical parameters is not yet a standard process, no specific rules or protocols have been created purposely (Belter 2014) and this reflection opens a wide and interesting perspective for research in the field of oceanographic data science.On more generic scientific data, a remarkable attempt of standardization has been done by the data citation working group of the Research Data Alliance (RDA).We will start, in particular, from the 14 rules defined by RDA to support accurate citation of data subjected to change in order to ensure the efficient processing and permanent linking of data (Rauber et al. 2016).
Sixth step -Development of guidelines aimed to better explain the whole opening process.The guidelines development can only be carried on after all the others steps and it is useful to clarify each step for future applications, summarize strengths and weaknesses of EcoNAOS and express a final opinion on the efficiency/utility of the whole process.

Expected Impact
EcoNAOS can deliver an effective contribution to the application of the Open Science in marine ecology since: 1.
It involves scientists from different research areas, fields and institutions, and data management experts, thus representing an example of cooperation under the Open Science principles; 2.
It aims to introduce a set of best practices, based on a practical case study, for research and data sharing in marine ecology again under the Open Science principles.

3.
It will allow the release of data, metadata and instruments to the marine scientific community and to the environmental managers This activity is fully in line with the data management plan of the LTER networks, at the national, European and global level, since one of the LTER mandates is actually to foster open sharing of LTER data, in order to advance socio-ecological research and education (Mirtl 2010, Mirtl et al. 2018).The ultimate impact of EcoNAOS on the LTER and on the wider community of marine ecologists will be related to a process of gradual change in the perspective of the opening research process.We aim to demonstrate that the model of Open Science works for static and dynamic datasets; that the data sharing does not represent a threat to data integrity or attribution but, on the contrary, it could lead to qualitative and quantitative integration of the dataset.Supporting the view that an Open Access work has bigger scientific resonance than a closed access one (Swan 2010).From this perspective, it is interesting to test the strength and the limits of the Open Access approach, and this is important not only to make data available for everyone, following the Open Science principles, but also to understand which barriers (if any) are present.For example, a crucial point could be the release of sensitive data, or the release of an evolving database.To this respect, a data policy must be conceived and shared among involved parts at the beginning and during the whole process.
Lastly, this pilot project is not aimed to force the complete opening of any on-going and future research process.Instead, it has the goal to provide an overview on the current situation, as detailed as possible, on data opening and to test the feasibility of the process, identifying perspectives on the short and long-term, existing obstacles and solutions.EcoNAOS will offer as well a support to researchers in data management, using an integrated approach and stimulating cooperation among all the researchers involved , collecting their requests, doubts and observations on the wider Open Science principles applied to LTER marine themes.

Figure 3 .
Figure 3.The spiral model -an open research lifecycle involves sharing of each step of the process, including not only scientific papers but also research ideas, data (raw and processed), metadata, methods, software(Minelli et al. 2017).
Figure 4.An extract from data collected since 1965 to today in the NAS.a: snapshot of the organization of the main table of data b: 2D visualization of custom sampling stations (used from 1965 to 1990, before GPS advent) and real sampling points (until 2015) c: 3D visualization of data from the viewpoint in b: the yellow grid represents the sea level, each set of vertical points represents more samplings relying on the same (X, Y) coordinates but at different depths (Z)