Research Ideas and Outcomes : Project Report
Print
Project Report
Support Your Data: A Research Data Management Guide for Researchers
expand article infoJohn A Borghi, Stephen Abrams, Daniella Lowenberg, Stephanie Simms, John Chodacki
‡ California Digital Library, University of California Curation Center, Oakland CA, United States of America
Open Access

Abstract

Researchers are faced with rapidly evolving expectations about how they should manage and share their data, code, and other research materials. To help them meet these expectations and generally manage and share their data more effectively, we are developing a suite of tools which we are currently referring to as "Support Your Data". These tools, which include a rubric designed to enable researchers to self-assess their current data management practices and a series of short guides which provide actionable information about how to advance practices as necessary or desired, are intended to be easily customizable to meet the needs of a researchers working in a variety of institutional and disciplinary contexts.

Keywords

Research Data Management, RDM, Data Sharing, Open Data, Open Science

Introduction

Research data management (RDM), a term that encompassess activities related to the storage, organization, documentation, and dissemination of data*1, is central to efforts aimed at maximizing the value of scientific investment (e.g. Holdren 2013) and addressing concerns related to the integrity of the research process (e.g Collins and Tabak 2014). Unfortunately, when surveyed directly, researchers often acknowledge that they lack the skills and experience needed to manage and share their data effectively (Barone et al. 2017, Federer et al. 2015, Tenopir et al. 2016). This disconnect demonstrates the need for tools that bridge the communication gap that exists between the research community, data service providers, and other local, national, and international data stakeholder groups. The development of one such tool, which we are tentatively referring to as “Support Your Data” is the subject of this project report.

As demonstrated by visualizations such as the research data lifecycle (Carlson 2014, Cox and Ting Tam 2018), RDM is continuous, iterative, and embedded throughout the course of a research project. Well thought out RDM practices make the research process more efficient, facilitate collaboration, and help prevent the loss of data (see Lowndes et al. 2017). Effective RDM is also crucial to establishing the accessibility of data after a project’s conclusion, which is increasingly required by data stakeholders including research funding agencies and scholarly publishers. Steps must be taken early in the research process to ensure that data can be shared later. For example, the sharing of data from human participants must be approved by an institutional review board (IRB) and described in informed consent documents before any data is collected (Meyer 2018). More generally, data that are made available are only useful if formatted, documented, and organized in a manner that enables examination and reuse by others. Related guidance (e.g. Goodman et al. 2014) and standards (e.g. FAIR - Wilkinson et al. 2016) highlight that proper data management is a key factor in enabling effective data sharing which is itself a key factor in establishing research transparency and reproducibility.

Complementing calls for improved data management and more widespread data sharing by transparency and reproducibility-related initiatives within the research community (Ioannidis 2014, Munafò et al. 2017), RDM has increasingly become a focus for academic libraries. Though offerings vary considerably between institutions, library RDM programs generally emphasize skills training and assisting researchers in complying with data-related policies and mandates (Cox et al. 2017, Flores et al. 2015, Tenopir et al. 2014). Guidance provided to researchers by library-based data service providers often focuses on topics such as data management planning, metadata and documentation, data organization, storage and backup procedures, and long term preservation. Though “best practice” documents written by researchers often cover similar topics, they generally do not reference the work of data service providers. A recent effort to bridge these two perspectives through a survey of data management practices in the field of human brain imaging (neuroimaging) demonstrates that many researchers are unaware of or do not make use of library-based RDM resources. Furthermore, their RDM practices are highly variable, often described using hypothesis or workflow-specific terminology, and rooted in immediate and practical concerns (e.g. “I want to prevent the loss of data.”) (Borghi and Van Gulick 2018). Therefore, for data service providers, crossing this communication gap and effectively engaging with researchers on the topic of RDM requires not only overcoming differences in language, terminology, and priorities between and within different research areas, but also placing related concepts within the context of a researcher’s day-to-day work with data.

There are several existing tools that bring together the perspectives of data service providers and researchers to evaluate RDM practices. However, because these tools are often oriented towards data service providers, they have not seen widespread adoption by researchers who may have minimal contact with library-based RDM programs. For example, the Data Curation Profiles toolkit- which consists of a structured interviewed designed to elucidate data-related practices and needs in different academic disciplines- was designed to launch discussions between librarians and researchers and facilitate the development of data services that address the needs of researchers (Witt et al. 2009). Other RDM assessment tools draw heavily from the capability maturity model (CMM) framework, which describes practices based on their degree of formality and optimization (Paulk et al. 1993). A maturity model specific to the management of scientific data characterizes research groups on the basis of how well their procedures related to data acquisition, description, dissemination, and preservation are defined, documented, and generalized (Crowston and Qin 2011). The DMVitals tool (Sallans and Lake 2014) combines elements of the Data Curation Profiles and maturity-based tools, to systematically assess a researcher’s data management practices and generate customized and actionable recommendations based on institutional and domain standards.

This brief review of the current RDM landscape highlights several significant trends:

  1. Researchers face an evolving array of expectations related to how they manage and share data. Unfortunately, there is a significant communication gap between researchers and library-based data service providers.

  2. Overcoming this communication gap requires placing RDM in the context of a researcher’s day-to-day work with data and overcoming differences in language, terminology, and priorities between and within different research communities.

  3. There is currently no user-friendly guide that allows researchers to assess and advance their own data management practices.

The intention of the Support Your Data project is to address these trends by developing materials that frame activities related to research data management so that they can be easily understood and acted upon by researchers. At present, these materials consist of a rubric designed to allow researchers to self assess their own RDM practices over the course of a research project and a complementary set of guides that direct researchers towards RDM-related services at their institution and provide actionable information about how to advance their practices as necessary or desired. To meet the needs of researchers in different institutional and disciplinary contexts, all of these materials have been designed to be easily customizable.

Project Development

The development process for the Support Your Data project drew upon a large number of sources. An initial point of inspiration was the “HowOpenIsIt?” guide developed by SPARC, PLOS, and the Open Access Scholarly Publishers Association (OASPA) (SPARC 2013). The format of this guide, in which a number of topics (e.g. author posting rights, reuse rights) are described on a spectrum from closed to open access, allows for a number of complex and interrelated issues to be presented in a relatively simple and easy to understand manner. This prompted us to consider how to present research data management, a topic sufficiently complex as to be labelled a “wicked problem” (Awre et al. 2015), in a similar manner.

A literature search and analysis of existing RDM evaluation tools revealed that the majority were either designed to benchmark RDM services at the institutional level (e.g. Australian National Data Service 2011, Digital Curation Center 2013) or intended to foster communication between researchers and library based data service providers (Sallans and Lake 2014, Witt et al. 2009). For this reason, we decided that our yet unnamed project should focus on developing materials for researchers. Working under the assumption that researchers in different institutional and disciplinary contexts might have a range of RDM-related priorities and access to different levels of RDM-related services, we decided at the outset of the development process that our materials should be developed with an eye towards customization.

One major early difficulty was determining how to describe the research process. While we wanted to draw from the workflow-based organization of visualizations such as the research data lifecycle, we also wanted to avoid presenting the progression of a research project using models or terminology that would be unfamiliar or unappealing to researchers. After conducting an informal survey of what words researchers associate with given activities (e.g. “What term(s) do you use to describe the stage of your research that involves acquiring, accumulating, or measuring data?”) and examining related work on the topic (e.g. Mattern et al. 2015) we decided to focus on describing RDM-related practices rather than project stages. Even so, terminology proved to be a significant problem as we quickly determined that phrases such “data management planning” and “data sharing” had significantly different meanings to different audiences. Our efforts to reduce jargon would continue throughout the development process.

As with other RDM evaluation tools, we adopted elements of the capability maturity model framework to describe different data management-related activities on a continuum from “ad hoc” to “refined and optimized”. This early conception of an “RDM Maturity Guide” was described in early blog posts intended to elicit feedback from members of the the data services and research communities. However, as the project progressed, we moved away from explicitly referencing the concept of practice maturity. Informal feedback received during the development of a parallel project, in which researchers were asked to provide quantitative RDM maturity ratings for themselves and their field as a whole (Borghi and Van Gulick 2018), revealed that the concept needed constant clarification and that researchers were resistant to the connotation that their practices could be considered “immature.”

The general structure of what would become the Support Your Data rubric was therefore refined to include a series of RDM-related activities described at different levels of definition and optimization. Because the rubric was to be designed to allow researchers to self-assess the current state of their RDM practices, we quickly decided that the rubric should be complemented by a series of short guides designed to provide information about how to advances practices as necessary or desired. In a series of biweekly meetings, we then set out to draft content for these materials. Feedback from the broader community was sought throughout this process through additional blog posts and presentations at research data-focused conferences (e.g. Borghi et al. 2017, Borghi et al. 2018)

Initially, development of the content for the rubric and the guides progressed in parallel. Informed by informal surveys of researchers and data service providers (e.g. “What activities do you consider part of ‘planning for data’?”), we reviewed draft materials, worked to clarify language, and added relevant information as necessary. Though the activities described in the rows of the rubric (and expanded upon further in the guides) remained largely consistent throughout the development process, the earliest iterations of the rubric did not use use set labels to describe a researcher’s practices related to each activity. This was intentional, as we wanted to resist quantification of a researcher’s practices into a score of their RDM maturity. However, after an initial round of revisions, we determined that the rubric was becoming unbalanced. The lack of labels meant that different activities were being described at different levels of specificity which made interpretation difficult, thus defeating the entire purpose of the project.

In response, we refined the structure of the rubric further so that a researcher’s RDM-related activities were described using one of four labels (see next section). After taking care that these labels were descriptive and not evaluative, we then completed a draft version of the entire rubric. We decided to use declarative statements to describe each RDM-related activity under each label in order to maximize the degree to which a researcher would identify a description with their own practices. We then proceeded to refine the content and structure of the guides. The materials presented in the next section are the result of this most recent round of revision.

The Support Your Data Materials

At present, the Support Your Data materials consist of a rubric designed to allow researchers to self assess their own RDM practices and a complementary series of one page guides intended to provide researchers access to RDM-related expertise (including local RDM-related resources) and advance practices as necessary or desired. All of these materials are intended to be customizable in order to meet the needs of researchers in different institutional or disciplinary contexts.

The aim of the Support Your Data project is to be descriptive rather than prescriptive. Neither the rubric nor the guides assumes that every researcher will want, need, or be able to achieve the same level data management practices. Rather, the intent of these materials is to help researchers understand where they are in regards to RDM and, when appropriate, how to get to where they want or need to be

RDM Rubric

A schematic version of RDM rubric is shown in Table 1. Different RDM-related activities occurring over the course of a research project are represented in separate rows. Though the order from top to bottom loosely follows the progression of a research project, it is very likely that these activities will occur in a different order or simultaneously in a researcher's day-to-day work with data. The six activities described in the rubric (planning, organizing, saving, preparing, analyzing, sharing) are intentionally general in order to make the rubric applicable to as wide a population as possible. Future versions of the rubric, adapted to specific disciplinary or institutional contexts, could incorporate greater, fewer, or altogether different activities.

The Support Your Data RDM rubric. The language used throughout the rubric is intended to describe RDM-related activities such as data management planning, organizing data, saving data, preparing data, analyzing data, and sharing data in a researcher-friendly fashion. A formatted version is available as Suppl. material 1.

Ad Hoc

One-Time

Active and Informative

Optimized for Re-Use

Planning your project

When it comes to my data, I have a "way of doing things" but no standard or documented plans.

I create some formal plans about how I will manage my data at the start of a project, but I generally don't refer back to them.

I develop detailed plans about how I will manage my data that I actively revisit and revise over the course of a project.

I have created plans for managing my data that are designed to streamline its future use by myself or others.

Organizing your data

I don’t follow a consistent approach for keeping my data organized, so it often takes time to find things.

I have an approach for organizing my data, but I only put it into action after my project is complete.

I have an approach for organizing my data that I implement prospectively, but it not necessarily standardized.

I organize my data so that others can navigate, understand, and use it without me being present.

Saving and backing up your data

I decide what data is important while I am working on it and typically save it in a single location.

I know what data needs to be saved and I back it up after I'm done working on it to reduce the risk of loss.

I have a system for regularly saving important data while I am working on it. I have multiple backups.

I save my data in a manner and location designed maximize opportunities for re-use by myself and others.

Getting your data ready for analysis

I don't have a standardized or well documented process for preparing my data for analysis.

I have thought about how I will need to prepare my data, but I handle each case in a different manner.

My process for preparing data is standardized and well documented.

I prepare my data in such a way as to facilitate use by both myself and others in the future.

Analyzing your data and handling the outputs

I often have to redo my analyses or examine their products to determine what procedures or parameters were applied.

After I finish my analysis, I document the specific parameters, procedures, and protocols applied.

I regularly document the specifics of both my analysis workflow and decision making process while I am analyzing my data.

I have ensured that the specifics of my analysis workflow and decision making process can be understood and put into action by others.

Sharing and publishing your data

I share the results of my research, but generally I do not share the underlying data.

I share my data only when I'm required to do so or in response to direct requests from other researchers.

I regularly share the data that underlies my results and conclusions in a form that enables use by others.

Because of my excellent data management practices, I am able to efficiently share my data whenever I need to with whomever I need to.

Proceeding left to right, a series of declarative statements describe each activity in terms of how well they are designed to foster access to and use of data in the future. The four levels, “ad hoc”, “one-time”, “active and informative” and “optimized for re-use”, are intended to be descriptive not prescriptive.

  • Ad hoc - Refers to circumstances in which practices are neither standardized or documented. Every time a researcher has to manage their data they have to design new practices and procedures from scratch.

  • One time - Refers to circumstances in which data management occurs only when it is necessary, such as in direct response to a mandate from a funder or publisher. Practices or procedures implemented at one phase of a project are not designed with later phases in mind.

  • Active and informative - Refers to circumstances in which data management is a regular part of the research process. Practices and procedures are standardized, well documented, and well integrated with those implemented at other phases.

  • Optimized for re-use - Refers to circumstances in which data all management activities are designed to facilitate the re-use of data in the future.

It should be noted that “re-use” in the context of the Support Your Data project is not necessarily meant as an endorsement of data sharing or other open science practices but is representative of the close link between effective sharing and effective research data management. It is very likely that the person who will need to examine or re-use a given dataset will be the researcher who collected or analyzed it in the first place.

One Page Guides

Prelimary versions of the guides associated with each row of the RDM rubric are available as Suppl. materials 2, 3, 4, 5, 6, 7. Designed to be easily customizable to fit the terminology, practices, and services associated with different disciplinary and institutional communities, the guides all follow a similar structure.

  • Abstract - A brief summary of the contents of the guide.

  • What does it mean? - Provides an operational definition of the activity covered by the guide. For some guides (Planning, Preparing), this consists of a sentence or two describing the activity. For others (e.g. Saving, Preparing, Analyzing, Sharing) this involves a more detailed breakdown of what each activity involves in practice.

  • Requirements and how to meet them - Provides a brief summary of how to meet expectations or mandates related to each activity. Because data-related requirements and services are highly discipline and institionally specific, the contents of these sections are designed to be easily customizable.

  • Things to think about - Contains notes and recommendations that do not fit into the other sections.

Both the rubric and the guides are intended for easy customization to reflect the terminology, tools, best practices, and services specific to different disciplinary and institutional communities. In the template guides, some suggested points of customization are highlighted in yellow (discipline-specific) and red (institution-specific). Discipline-specific versions may incorporate the jargon, workflow, standards, and priorities of researchers working in a particular domain (e.g. Nichols et al. 2017). Institution-specific versions may also incorporate links to available data management, curation, and preservation tools and services.

Using the Support Your Data Materials

We envision several use cases for the Support Your Data Materials. The most likely is one in which these materials are used to facilitate discussion between an individual researcher or research group and a data service provider. In such a case, the researcher or research group can use the RDM rubric to identify the difference between where they are in regards to RDM versus where they want or need to be and then a data service provider can use the guides, customized to highlight available services and tools, to provide information about how to move forward. Another probable use case is one in which a particular research community uses these materials as part of a broader effort to improve data management (including data sharing) related practices. In this case, the organization and content of both the RDM rubric and the guides can be customized, with the assistance of data service providers, to include community-specific activities, requirements, and terminology. Though we were careful to ensure that our materials are merely descriptive, such customized versions could be more prescriptive in adhering to institutional or discipline-specific norms or policies.

Though helping researchers respond to evolving expectations related to the management and sharing of their data was a major driving force behind the project, the Support Your Data materials, at least in their current iteration, are not designed to increase compliance with specific policies or requirements. For example, though a researcher using these materials would be directed to local RDM services and tools (e.g. a local DMPTool instance) related to the creation of data management plans (DMPs), neither the rubric nor the “planning for data” guide give specific guidance on how to comply with the DMP requirements of different funding agencies. However, in helping researchers assess and advance their data management practices, the Support Your Data materials may indirectly help them comply more effectively with data-related requirements throughout the lifecycle of a research project.

Next Steps

Now that we have a complete set of draft materials, the next step of the Support Your Data project is to focus on design and adoption. Moving forward, we will work with internal and external partners on the visual presentation of the materials and to develop pamphlets, postcards, and other collateral. As has been the case throughout the project, we will also continue to invite feedback and explore partnerships with stakeholders interested in developing customized materials.

Hosting institution

UC Curation Center, California Digital Library

Author contributions

JB drafted the manuscript and lead the development of the materials. SA, DL, SS, and JC co-developed the materials and reviewed the manuscript.

Conflicts of interest

The authors declare no conflicts of interest.

References

Supplementary materials

Suppl. material 1: Formatted RDM Rubric 
Authors:  John Borghi
Data type:  OpenDocument Presentation (.odp) file
Brief description: 

A formatted version of the Support Your Data RDM rubric.

Suppl. material 2: Draft Guide - Planning 
Authors:  John Borghi
Data type:  OpenDocument Text (.odt) file
Brief description: 

A draft guide that corresponds with the "Planning your project" row of the RDM rubric. Suggested points of customization are highlighted in yellow (discipline-specific) and red (institution-specific).

Suppl. material 3: Draft Guide - Organizing 
Authors:  John Borghi
Data type:  OpenDocument Text (.odt) file
Brief description: 

A draft guide that corresponds with the "Organizing your data" row of the RDM rubric. Suggested points of customization are highlighted in yellow (discipline-specific) and red (institution-specific).

Suppl. material 4: Draft Guide - Saving 
Authors:  John Borghi
Data type:  OpenDocument Text (.odt) file
Brief description: 

A draft guide that corresponds with the "Saving and backing up your data" row of the RDM rubric. Suggested points of customization are highlighted in yellow (discipline-specific) and red (institution-specific).

Suppl. material 5: Draft Guide - Preparing 
Authors:  John Borghi
Data type:  OpenDocument Text (.odt) file
Brief description: 

A draft guide that corresponds with the "Getting your data ready for analysis" row of the RDM rubric. Suggested points of customization are highlighted in yellow (discipline-specific) and red (institution-specific).

Suppl. material 6: Draft Guide - Analyzing 
Authors:  John Borghi
Data type:  OpenDocument Text (.odt) file
Brief description: 

A draft guide that corresponds with the "Analyzing your data and handling the outputs" row of the RDM rubric. Suggested points of customization are highlighted in yellow (discipline-specific) and red (institution-specific).

Suppl. material 7: Draft Guide - Sharing 
Authors:  John Borghi
Data type:  OpenDocument Text (.odt) file
Brief description: 

A draft guide that corresponds with the "Sharing and publishing your data" row of the RDM rubric. Suggested points of customization are highlighted in yellow (discipline-specific) and red (institution-specific).

Endnotes
*1

For the purposes of this report we are using the term “data” broadly to refer to the inputs or outputs required to evaluate, reproduce, or built upon the analyses or conclusions of a given research project. This includes, but is not limited to, raw data, processed data, research-related code, and documentation pertaining to study parameters and procedures.

login to comment