Research Ideas and Outcomes :
Commentary
|
Corresponding author: Teresa Gomez-Diaz (teresa.gomez-diaz@univ-mlv.fr), Tomas Recio (trecio@nebrija.es)
Received: 02 Feb 2021 | Published: 05 Feb 2021
© 2021 Teresa Gomez-Diaz, Tomas Recio
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Gomez-Diaz T, Recio T (2021) Open comments on the Task Force SIRS report: Scholarly Infrastructures for Research Software (EOSC Executive Board, EOSCArchitecture). Research Ideas and Outcomes 7: e63872. https://doi.org/10.3897/rio.7.e63872
|
The goal of this document is to openly contribute with our comments to the EOSCArchitecture report: Scholarly Infrastructures for Research Software (SIRS), and thus, to participate in the European Open Science Cloud (EOSC) architecture design.
Open Science Infrastructures, Research Software, Open Science
The goal of this document is to openly contribute with our comments to the EOSCArchitecture report: Scholarly Infrastructures for Research Software (SIRS). Its draft version was dated October 2020 and it was open for consultation as a Google document until November 10th 2020, as announced at the EOSC Symposium web page (19-22 October 2020, Online)*
The first version of our Open comments on the Task Force SIRS report... (
Political evolutions in digital policy have been recently announced by Ursula von der Leyen, President of the European Commission (
In this section we propose a list of comments to the SIRS report. Each comment is associated to one section or subsection of the report.
Please note that some short extracts of the SIRS report have been included here and are, thus, out of context. We recommend the consultation of the original text, as maybe some mistakes or misinterpretations could have been unintentionally introduced in the present text. These texts correspond now to the official publication and we have updated our comments.
Section 2.1 Scope and goals - Research software definition This section 2.1 of the SIRS report introduces the concept of research software used in the report as follows:
... the term “research software” may carry very different meanings in different research communities: in this report, we will use this term simply to designate software that researchers in any discipline may feel the need to have scholarly infrastructure support for, no matter if it is considered a tool, a result or an object of study.
Please note that this definition does not imply any difference between the concepts of software and research software, as the research software proposed term in the SIRS report could easily include, for instance, any version of the Windows operating system or a commercial scientific software such as Matlab and many other similar products. Moreover, according with this definition proposal, all software developed since 1960 should be considered as research software, as a science history researcher (for example) could feel that it should be preserved as an object for future studies.
It seems to us that to define EOSC infrastructures and services based in this view of research software is a task that requires some essential precisions. Otherwise, using strictly the above research software definition, how could EOSC teams design adapted services? For which target community? In particular, it is necessary to pay careful attention when dealing with software produced by private companies. It seems to us that these questions are not correctly presented or are missing in the report.
On the other hand, The European Open Science Cloud (EOSC) is presented in (
The EOSC will be a fundamental enabler of Open Science and of the digital transformation of science, offering every European researcher the possibility to access and reuse all publically funded research data in Europe, across disciplines and borders.
In addition, we have proposed the following definition for Open Science in this recent preprint (
Open Science is defined as the political and legal framework where research outputs are shared and disseminated in order to be rendered visible, accessible, reusable.
According with this Open Science vision, we propose another research software definition as follows:
... research software is a well identified set of code that has been written by a (again, well identified) research team. It is software that has been built and used to produce a result published or disseminated in some article or scientific contribution. Each research software encloses a set (of files) that contains the source code and the compiled code. It can also include other elements as the documentation, specifications, use cases, a test suite, examples of input data and corresponding output data, and even preparatory material.
You can find this definition and all the considerations for its proposition in section 2.1 of (
Thus, the research software definition in (
One of the authors of the present document provides with a good example to further study this research software concept. T. Recio is nowadays studying automatic proving of geometric theorems through dynamic geometry software, and comparing current work with the previously existing one, done with very old software computer programs, such as the computer language Logo*
The SIRS report should clarify if historical tools and commercial software should be considered as part of the objects for which EOSC should provide infrastructure and services, and how commercial software should be dealt with.
Section 2.2 Infrastructures participating in the Task Force (TF) This section presents summary sheets introducing the nine infrastructures that are represented in the SIRS report: three for the Archives category (HAL, Software Heritage, and Zenodo), three for the publisher category (Dagstuhl, eLife, and IPOL), and three in the aggregators category (OpenAIRE, ScanR, and swMath).
We would like to suggest, for all these nine infrastructures, to add the following information, that could be presented in an homogeneous way:
This Section 2.2 also indicates that:
In the context of this report we use the term [...] ‘Publishers’ are organizations that prepare submitted research texts, possibly with associated source code and data, to produce a publication and manage the dissemination, promotion, and archival process. Software and data can be part of the main publication, or assets given as supplementary materials depending on the policy of the journal. In addition, publishers implement a process for ensuring the quality of the accepted research material (usually peer Review), which is carried out by a subject-specific community of experts.
The report could provide further clarification about if these publications include Data papers and Software papers, and if the software mentioned in this paragraph corresponds to the research software defined in the section 2.1 of the SIRS report (see the above comment on this section). For those that are unfamiliar with Data or Software papers, examples of scientific journals publishing biodiversity-related data papers can be found in the Global Biodiversity Information Facility (GBIF) web site*
Section 3.1 Survey on Related Initiatives and Related Works The SIRS report indicates that:
... it seems that general awareness about the importance of software as a research output has started growing only very recently, around 2010, in particular as a byproduct of the reproducibility crisis (Barnes, 2010; Borgman et al., 2012; Colom et al., 2015; Konrad Hinsen, 2013; Rougier et al., 2017; Stodden et al., 2012).
Please note that, at least in France, there have been older initiatives. The PLUME Project (2006-2013), launched by the UREC CNRS unit, has studied research software and its dissemination conditions. It has also published research software descriptions and validated software descriptions (
At the time, the term research software was not really existing or maybe its use was not widely extended, so the terms used by the PLUME Project where logiciel d’un laboratoire, that is, software produced in a French research lab, and the term dév. Ens Sup - Recherche or dév. ESR, the short forms of développements de l’enseignement supérieur et la recherche, that is, developments realised in the Higher Education andResearch community.
This project was well known of in France, as the members of the PLUME team did a lot of conferences and publications to present the project (
Section 3.1.3 Aggregators The SIRS report indicates that:
Another remarkable example is the catalog built by the Plume project in order to collect information about software that is useful for research activities (Plume, 2013): it maintains a collection of over 400 entries manually curated about software projects that are successfully deployed and in use in at least three different research laboratories.
Please note that the PLUME Project has published 406 validated software descriptions*
Section 3.2.2 Publishers The SIRS report mentions that:
Over the past few years several publishers have led the effort in the transition towards open access as the predominant model of publication for scholarly outputs. This also paves a path for fair and affordable conditions from the start for the dissemination of software, but support for software outside of specialist journals is still limited.
As presented in (
The SIRS report could clarify what means software support provided by publishers.
Please also note that the section 2.4 of (
Section 3.2.3 Aggregators The SIRS report mentions here the OpenAIRE Research Graph. Please note that there is also the Software Heritage graph Dataset presented in (
The OpenAIRE Research Graph was the object of the news at the OpenAIRE end of the year 2020 newsletter*
The SIRS report could provide further insight over comparisons between these two different graphs and the possible interactions between these two works.
Section 3.3 Best Practices and Open Problems This section includes three tables which are surely the result of extended discussions inside the working groups. Please note that for readers that have not been involved in the debates, these tables (and many of the terms and expressions used inside) need further explanation. For example the table in section 3.3.1 Best Practice Principles for Archives includes a last column entitled Priorities indicating levels of Development, Adoption, Research, Harmonization. These terms remain unclear and they could be put in a more precise context.
Section 3.3.1 Best Practice Principles for Archives This section indicates that:
... one does not need to reinvent the wheel, the archival community should agree on an overall architecture to integrate existing infrastructures.
We would like to express our agreement with this idea, which we have found very inspiring. We are not experts in archiving services nor data treatment or management, but it seems to us that archiving software source code may find many common issues with data archiving, and thus, both services could be compared and put into perspective. The SIRS TF team could consult with members of the EOSC-HUB project*
In this section we can also find:
Last, the ideal architecture interconnecting a variety of infrastructures for research software needs inclusiveness of archives for both open software, as well as non-open software, and the ability to ensure the universal archival and reference of the source code of all software, not just research software.
As already mentioned in our comment to Section 2.1, the handled definition for research sofware does not provide, in our view, any difference with the concept of software. In particular, it seems difficult to design and build EOSC infrastrutures and services in a way in which there is no difference between sofware produced by privated companies and the research software produced in our University labs. Should EOSC infrastructures ask to private companies to produce references, metadata, and links to related research articles and research data for their produced software?
Section 3.4.2 Identifiers This section presents the need of proper identification for software artifacts and presents the SoftWare Heritage persistent identifiers (SWHIDs) as a candidate for solution.
Please note that there are other EOSC teams working on persistent identifiers (PIDs) issues (
The work proposed in the SIRS report as a solution could be more connected with these others EC funded ongoing efforts, and collaborative work should be carried on in order to propose and adopt consensual solutions.
Section 3.4.3 Quality and Curation The table mentions the Evaluation of source code.
Please note that the CDUR procedure to evaluate research software has been proposed in (
The SIRS report could consider to include the CDUR procedure in the list of methods to assess research software quality, as CDUR includes the evaluation of the research software source code in the Use step.
Sections 3.4.4 Metrics, 3.4.5 Guidelines, 3.4.6 Tools and Workflows The SIRS report could provide further details for proposed guidelines, metrics and tools and workflows.
Section 4.1.1 Archive This section mentions:
1. Universal archive specifically designed for software source code
— proactive archival of all software source code (including all dependencies of research software) [...]
2. Scholarly repositories
— explicit deposit by identified individuals ...
In here we find, again, the problem of the definition of research software (see above comments on Sections 2.1 and 3.3.1), and the importance of having well defined objects in order to design sound services and infrastructures to deal with them.
The first item refers to the archiving of every existing software and in the second one we find a more “usual” research software object, as identified individuals do the deposit of their own production in the scholarly repositories. But how the scholarly repositories should deal with the dependencies of the deposited software? Or this should be managed by the identified individuals, mostly researchers?
Section 4.1.4 Cite/Credit To cite research software and to give credit to the research software producers is a real issue at stake, we could not agree more. This is why the CDUR proposed research software evaluation procedure dedicates a whole step, the Citation step, to this issue (
Moreover, in order to cite research sofware, a reference or citation form should be established by their producers. You can find the description of the PLUME/RELIER software reference cards in the already mentioned presentation of PLUME at fOSSa 2009*
Section 5.1.1 Interactions The SIRS report mentions that:
... it is important to ensure a vertical interconnection between an universal software archive and scholarly repositories, for the latter to feed the universal archive (see Figure 5). This requires engineering and funding for the development of proper adaptors.
The SIRS report could provide further insight on the goals of the Scholarly Infrastructures for Research Software studied in the report concerning their contribution to feed universal archives.
Section 5.3.1 Advanced Technology Development The SIRS report mentions the need of the development of an advanced search engine for software source code. We think that an Universal Software Archive like the one under study at the SIRS report should equally provide good and sound search interfaces oriented to find and retrieve research software by researchers.
Indeed it is known to be difficult to look for research software (
Finally we conclude this short report with some further questions to be considered by EOSC decision makers, as we think that some of the issues raised by the SIRS report need extended and in-depth reflection.
Definition The concept of research software is essential for a sound design of infrastructures and services that will deal with this research output.
Software Management Plan It is now widely accepted that a Data Management Plan (DMP) is an important tool when dealing with research data, and DMPs are usually required by funders. Tools for Software Management Plans are also available, see for example (
Services The SIRS report considers three kind of existing infrastructures: Archives, Publishers, Aggregators. The issues studied for software and/or for research software are Archive, Reference, Describe, and Credit, as mentioned in section 2.1.1. On the other hand, if EOSC’s goals are to render research software visible, accessible, reusable, there is also need for services like, for example, search, testing, and retrieval interfaces. EOSC services should be designed, built and provided in order to answer researchers’ needs in the Open Science context (
User-centric EOSC As well as the services that will be provided, there is the question of the interactions with foreseen users. Relevant members of the EOSC construction have signed a joint statement*
Architecture EOSC is already a complex system with several key actors of distinct nature. Interactions and collaborations among all the different components should be designed and developed in order to facilitate the user approach to EOSC.
Ethical issues Organisational EOSC Ethics are mentioned in the EOSC Pilot report (
... insisting on transparency, with strategy and decisions documented and public. It means honesty, including disclosure on financial issues and data usage, so that there is no suspicion of hiding possible conflicts of interest. [...] It should also mean, as discussed below, putting into place systems that support and incentivise the research integrity of individual researchers, and demonstrating a commitment to periodic ethical inspection and oversight by an independent body of experts, acting as an advisory board.
It also mentions that research integrity has been described as concerned with:
Co-Funding As mentioned in (
Reliable digital infrastructure and services are critical in today’s society, as the coronavirus crisis has highlighted. A range of initiatives have been proposed or are already under discussion at EU level to accelerate the digitalisation process and enhance Europe’s strategic autonomy in the digital field.
In this context, EOSC decision makers should consider the co-funding of infrastructures already funded by non-EU tech companies in a transparent way.
As far as we understand, there are little differences between the SIRS draft report open for consultation until November 2020 and the official publication that has followed in December 2020. The latter publication states in its p. 6 that:
The consultation period ran from October 21 until November 10. All comments received were considered.
Perhaps some of our comments in Open comments on the Task Force SIRS report... (
As we can find in the EOSC Secretariat news*
We would like to thank the SIRS TF team for such an inspiring work.
We acknowledge the funding provided by the Laboratoire d’informatique Gaspard-Monge (LIGM) at the University Gustave Eiffel (Est of Paris).
Both authors have the following roles: Conceptualization, Formal Analysis, Investigation, Methodology, Project Administration, Supervision, Validation, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing.
T. Gomez-Diaz has also participate to Funding Acquisition.
No competing interests have been detected.
https://www.eoscsecretariat.eu/eosc-symposium-2020, recording available at https://www.youtube.com/watch?v=U_sxfV0kjEg.
https://projet-plume.org/patrimoine-logiciel-laboratoire. Note that patrimoine logiciel translates to the English term software heritage.
See fOSSa 2009 archives available at http://fossa2010.inrialpes.fr/, https://www.slideshare.net/fossaconference/presentations, and https://projet-plume.org/ressource/fossa-2009-free-open-source-software-academia-conference-2009-presentations. The PLUME presentation realized by T. Gomez-Diaz is available at https://www.slideshare.net/fossaconference/plume-project-4734924.
https://www.openaire.eu/eosc-a-tool-for-enabling-open-science-in-europe. See the statement on the EOSC Secretariat website at https://www.eoscsecretariat.eu/eosc-liaison-platform/post/research-oriented-services-trust-collaboration-sustainability-key.