Research Ideas and Outcomes : Project Report
|
Corresponding author: John A Borghi (john.borghi@ucop.edu)
Received: 04 May 2018 | Published: 09 May 2018
© 2018 John Borghi, Stephen Abrams, Daniella Lowenberg, Stephanie Simms, John Chodacki
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation: Borghi J, Abrams S, Lowenberg D, Simms S, Chodacki J (2018) Support Your Data: A Research Data Management Guide for Researchers. Research Ideas and Outcomes 4: e26439. https://doi.org/10.3897/rio.4.e26439
|
|
Researchers are faced with rapidly evolving expectations about how they should manage and share their data, code, and other research materials. To help them meet these expectations and generally manage and share their data more effectively, we are developing a suite of tools which we are currently referring to as "Support Your Data". These tools, which include a rubric designed to enable researchers to self-assess their current data management practices and a series of short guides which provide actionable information about how to advance practices as necessary or desired, are intended to be easily customizable to meet the needs of a researchers working in a variety of institutional and disciplinary contexts.
Research Data Management, RDM, Data Sharing, Open Data, Open Science
Research data management (RDM), a term that encompassess activities related to the storage, organization, documentation, and dissemination of data*
As demonstrated by visualizations such as the research data lifecycle (
Complementing calls for improved data management and more widespread data sharing by transparency and reproducibility-related initiatives within the research community (
There are several existing tools that bring together the perspectives of data service providers and researchers to evaluate RDM practices. However, because these tools are often oriented towards data service providers, they have not seen widespread adoption by researchers who may have minimal contact with library-based RDM programs. For example, the Data Curation Profiles toolkit- which consists of a structured interviewed designed to elucidate data-related practices and needs in different academic disciplines- was designed to launch discussions between librarians and researchers and facilitate the development of data services that address the needs of researchers (
This brief review of the current RDM landscape highlights several significant trends:
Researchers face an evolving array of expectations related to how they manage and share data. Unfortunately, there is a significant communication gap between researchers and library-based data service providers.
Overcoming this communication gap requires placing RDM in the context of a researcher’s day-to-day work with data and overcoming differences in language, terminology, and priorities between and within different research communities.
There is currently no user-friendly guide that allows researchers to assess and advance their own data management practices.
The intention of the Support Your Data project is to address these trends by developing materials that frame activities related to research data management so that they can be easily understood and acted upon by researchers. At present, these materials consist of a rubric designed to allow researchers to self assess their own RDM practices over the course of a research project and a complementary set of guides that direct researchers towards RDM-related services at their institution and provide actionable information about how to advance their practices as necessary or desired. To meet the needs of researchers in different institutional and disciplinary contexts, all of these materials have been designed to be easily customizable.
The development process for the Support Your Data project drew upon a large number of sources. An initial point of inspiration was the “HowOpenIsIt?” guide developed by SPARC, PLOS, and the Open Access Scholarly Publishers Association (OASPA) (
A literature search and analysis of existing RDM evaluation tools revealed that the majority were either designed to benchmark RDM services at the institutional level (e.g.
One major early difficulty was determining how to describe the research process. While we wanted to draw from the workflow-based organization of visualizations such as the research data lifecycle, we also wanted to avoid presenting the progression of a research project using models or terminology that would be unfamiliar or unappealing to researchers. After conducting an informal survey of what words researchers associate with given activities (e.g. “What term(s) do you use to describe the stage of your research that involves acquiring, accumulating, or measuring data?”) and examining related work on the topic (e.g.
As with other RDM evaluation tools, we adopted elements of the capability maturity model framework to describe different data management-related activities on a continuum from “ad hoc” to “refined and optimized”. This early conception of an “RDM Maturity Guide” was described in early blog posts intended to elicit feedback from members of the the data services and research communities. However, as the project progressed, we moved away from explicitly referencing the concept of practice maturity. Informal feedback received during the development of a parallel project, in which researchers were asked to provide quantitative RDM maturity ratings for themselves and their field as a whole (
The general structure of what would become the Support Your Data rubric was therefore refined to include a series of RDM-related activities described at different levels of definition and optimization. Because the rubric was to be designed to allow researchers to self-assess the current state of their RDM practices, we quickly decided that the rubric should be complemented by a series of short guides designed to provide information about how to advances practices as necessary or desired. In a series of biweekly meetings, we then set out to draft content for these materials. Feedback from the broader community was sought throughout this process through additional blog posts and presentations at research data-focused conferences (e.g.
Initially, development of the content for the rubric and the guides progressed in parallel. Informed by informal surveys of researchers and data service providers (e.g. “What activities do you consider part of ‘planning for data’?”), we reviewed draft materials, worked to clarify language, and added relevant information as necessary. Though the activities described in the rows of the rubric (and expanded upon further in the guides) remained largely consistent throughout the development process, the earliest iterations of the rubric did not use use set labels to describe a researcher’s practices related to each activity. This was intentional, as we wanted to resist quantification of a researcher’s practices into a score of their RDM maturity. However, after an initial round of revisions, we determined that the rubric was becoming unbalanced. The lack of labels meant that different activities were being described at different levels of specificity which made interpretation difficult, thus defeating the entire purpose of the project.
In response, we refined the structure of the rubric further so that a researcher’s RDM-related activities were described using one of four labels (see next section). After taking care that these labels were descriptive and not evaluative, we then completed a draft version of the entire rubric. We decided to use declarative statements to describe each RDM-related activity under each label in order to maximize the degree to which a researcher would identify a description with their own practices. We then proceeded to refine the content and structure of the guides. The materials presented in the next section are the result of this most recent round of revision.
At present, the Support Your Data materials consist of a rubric designed to allow researchers to self assess their own RDM practices and a complementary series of one page guides intended to provide researchers access to RDM-related expertise (including local RDM-related resources) and advance practices as necessary or desired. All of these materials are intended to be customizable in order to meet the needs of researchers in different institutional or disciplinary contexts.
The aim of the Support Your Data project is to be descriptive rather than prescriptive. Neither the rubric nor the guides assumes that every researcher will want, need, or be able to achieve the same level data management practices. Rather, the intent of these materials is to help researchers understand where they are in regards to RDM and, when appropriate, how to get to where they want or need to be
A schematic version of RDM rubric is shown in Table
The Support Your Data RDM rubric. The language used throughout the rubric is intended to describe RDM-related activities such as data management planning, organizing data, saving data, preparing data, analyzing data, and sharing data in a researcher-friendly fashion. A formatted version is available as Suppl. material
Ad Hoc |
One-Time |
Active and Informative |
Optimized for Re-Use |
|
Planning your project |
When it comes to my data, I have a "way of doing things" but no standard or documented plans. |
I create some formal plans about how I will manage my data at the start of a project, but I generally don't refer back to them. |
I develop detailed plans about how I will manage my data that I actively revisit and revise over the course of a project. |
I have created plans for managing my data that are designed to streamline its future use by myself or others. |
Organizing your data |
I don’t follow a consistent approach for keeping my data organized, so it often takes time to find things. |
I have an approach for organizing my data, but I only put it into action after my project is complete. |
I have an approach for organizing my data that I implement prospectively, but it not necessarily standardized. |
I organize my data so that others can navigate, understand, and use it without me being present. |
Saving and backing up your data |
I decide what data is important while I am working on it and typically save it in a single location. |
I know what data needs to be saved and I back it up after I'm done working on it to reduce the risk of loss. |
I have a system for regularly saving important data while I am working on it. I have multiple backups. |
I save my data in a manner and location designed maximize opportunities for re-use by myself and others. |
Getting your data ready for analysis |
I don't have a standardized or well documented process for preparing my data for analysis. |
I have thought about how I will need to prepare my data, but I handle each case in a different manner. |
My process for preparing data is standardized and well documented. |
I prepare my data in such a way as to facilitate use by both myself and others in the future. |
Analyzing your data and handling the outputs |
I often have to redo my analyses or examine their products to determine what procedures or parameters were applied. |
After I finish my analysis, I document the specific parameters, procedures, and protocols applied. |
I regularly document the specifics of both my analysis workflow and decision making process while I am analyzing my data. |
I have ensured that the specifics of my analysis workflow and decision making process can be understood and put into action by others. |
Sharing and publishing your data |
I share the results of my research, but generally I do not share the underlying data. |
I share my data only when I'm required to do so or in response to direct requests from other researchers. |
I regularly share the data that underlies my results and conclusions in a form that enables use by others. |
Because of my excellent data management practices, I am able to efficiently share my data whenever I need to with whomever I need to. |
Proceeding left to right, a series of declarative statements describe each activity in terms of how well they are designed to foster access to and use of data in the future. The four levels, “ad hoc”, “one-time”, “active and informative” and “optimized for re-use”, are intended to be descriptive not prescriptive.
Ad hoc - Refers to circumstances in which practices are neither standardized or documented. Every time a researcher has to manage their data they have to design new practices and procedures from scratch.
One time - Refers to circumstances in which data management occurs only when it is necessary, such as in direct response to a mandate from a funder or publisher. Practices or procedures implemented at one phase of a project are not designed with later phases in mind.
Active and informative - Refers to circumstances in which data management is a regular part of the research process. Practices and procedures are standardized, well documented, and well integrated with those implemented at other phases.
Optimized for re-use - Refers to circumstances in which data all management activities are designed to facilitate the re-use of data in the future.
It should be noted that “re-use” in the context of the Support Your Data project is not necessarily meant as an endorsement of data sharing or other open science practices but is representative of the close link between effective sharing and effective research data management. It is very likely that the person who will need to examine or re-use a given dataset will be the researcher who collected or analyzed it in the first place.
Prelimary versions of the guides associated with each row of the RDM rubric are available as Suppl. materials
Abstract - A brief summary of the contents of the guide.
What does it mean? - Provides an operational definition of the activity covered by the guide. For some guides (Planning, Preparing), this consists of a sentence or two describing the activity. For others (e.g. Saving, Preparing, Analyzing, Sharing) this involves a more detailed breakdown of what each activity involves in practice.
Requirements and how to meet them - Provides a brief summary of how to meet expectations or mandates related to each activity. Because data-related requirements and services are highly discipline and institionally specific, the contents of these sections are designed to be easily customizable.
Things to think about - Contains notes and recommendations that do not fit into the other sections.
Both the rubric and the guides are intended for easy customization to reflect the terminology, tools, best practices, and services specific to different disciplinary and institutional communities. In the template guides, some suggested points of customization are highlighted in yellow (discipline-specific) and red (institution-specific). Discipline-specific versions may incorporate the jargon, workflow, standards, and priorities of researchers working in a particular domain (e.g.
We envision several use cases for the Support Your Data Materials. The most likely is one in which these materials are used to facilitate discussion between an individual researcher or research group and a data service provider. In such a case, the researcher or research group can use the RDM rubric to identify the difference between where they are in regards to RDM versus where they want or need to be and then a data service provider can use the guides, customized to highlight available services and tools, to provide information about how to move forward. Another probable use case is one in which a particular research community uses these materials as part of a broader effort to improve data management (including data sharing) related practices. In this case, the organization and content of both the RDM rubric and the guides can be customized, with the assistance of data service providers, to include community-specific activities, requirements, and terminology. Though we were careful to ensure that our materials are merely descriptive, such customized versions could be more prescriptive in adhering to institutional or discipline-specific norms or policies.
Though helping researchers respond to evolving expectations related to the management and sharing of their data was a major driving force behind the project, the Support Your Data materials, at least in their current iteration, are not designed to increase compliance with specific policies or requirements. For example, though a researcher using these materials would be directed to local RDM services and tools (e.g. a local DMPTool instance) related to the creation of data management plans (DMPs), neither the rubric nor the “planning for data” guide give specific guidance on how to comply with the DMP requirements of different funding agencies. However, in helping researchers assess and advance their data management practices, the Support Your Data materials may indirectly help them comply more effectively with data-related requirements throughout the lifecycle of a research project.
Now that we have a complete set of draft materials, the next step of the Support Your Data project is to focus on design and adoption. Moving forward, we will work with internal and external partners on the visual presentation of the materials and to develop pamphlets, postcards, and other collateral. As has been the case throughout the project, we will also continue to invite feedback and explore partnerships with stakeholders interested in developing customized materials.
UC Curation Center, California Digital Library
JB drafted the manuscript and lead the development of the materials. SA, DL, SS, and JC co-developed the materials and reviewed the manuscript.
The authors declare no conflicts of interest.
A formatted version of the Support Your Data RDM rubric.
A draft guide that corresponds with the "Planning your project" row of the RDM rubric. Suggested points of customization are highlighted in yellow (discipline-specific) and red (institution-specific).
A draft guide that corresponds with the "Organizing your data" row of the RDM rubric. Suggested points of customization are highlighted in yellow (discipline-specific) and red (institution-specific).
A draft guide that corresponds with the "Saving and backing up your data" row of the RDM rubric. Suggested points of customization are highlighted in yellow (discipline-specific) and red (institution-specific).
A draft guide that corresponds with the "Getting your data ready for analysis" row of the RDM rubric. Suggested points of customization are highlighted in yellow (discipline-specific) and red (institution-specific).
A draft guide that corresponds with the "Analyzing your data and handling the outputs" row of the RDM rubric. Suggested points of customization are highlighted in yellow (discipline-specific) and red (institution-specific).
A draft guide that corresponds with the "Sharing and publishing your data" row of the RDM rubric. Suggested points of customization are highlighted in yellow (discipline-specific) and red (institution-specific).
For the purposes of this report we are using the term “data” broadly to refer to the inputs or outputs required to evaluate, reproduce, or built upon the analyses or conclusions of a given research project. This includes, but is not limited to, raw data, processed data, research-related code, and documentation pertaining to study parameters and procedures.