Research Ideas and Outcomes :
Review Article
|
Corresponding author: Abigail Goben (agoben@uic.edu)
Academic editor: Editorial Secretary
Received: 03 Apr 2025 | Accepted: 28 Apr 2025 | Published: 14 May 2025
© 2025 Abigail Goben, Kristin Briney
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Goben A, Briney KA (2025) Data Management Books for Researchers - An Annotated Bibliography. Research Ideas and Outcomes 11: e154845. https://doi.org/10.3897/rio.11.e154845
|
|
While funders and publishers continue to expand requirements for data management planning and sharing, few books have been written for academic researchers and research trainees to help them understand both introductory or discipline-specific concepts and practices. In this annotated bibliography, we review currently available English-language data management books and identify the limitations and opportunities for future publications.
data management, research data management, bibliography, data information literacy, data literacy, data lifecycle
Despite established funder or journal obligations for data preservation and sharing, data management education is still somewhat haphazardly taught through self-education, mentoring or one-time workshops offered by an academic library or as a single lecture in a responsible conduct of research course or a methodology course. This was demonstrated by
Over the past fifteen years, associated educational materials have been created to primarily support graduate students and faculty researchers. Concurrently, there have been more than thirty books published which were aimed primarily at academic librarians about how to teach and create services surrounding research data management, which are out of scope for this article. However, as data management has shifted from being a novel area for funding and educational development to being an expected and established part of training programmes, much of the early momentum put into developing data management education has been lost. For example, the curricula initially developed to meet training needs after early policy changes in 2010-2012 in the UK and US are no longer being updated or supported and, thus, do not meet current researcher or trainee needs. The original NECDMC, MANTRA and DataOne Curricula are all over a decade old (
In addition to mentoring or single lectures, current data management education for researchers takes the form of online courses, webinars, websites, articles and, occasionally, books. Such books fill an important need for individuals and classes seeking to learn about data management. Frequently, though, data management is not the entire topic of a book, but instead is integrated with other methodological materials or is only tangentially mentioned amongst software-specific guidance. This creates barriers for learners seeking topical or discipline-focused texts and for educators looking for a full semester course textbook.
There is ongoing need for longer form explorations of data management which include updates related to funder, policy and disciplinary changes and which provide a more significant investigation into areas of data management for specific disciplines or broadly for groups like STEM graduate students. To understand the gaps, documentation is needed of the book-length options aimed at researchers or students (as opposed to librarians). In this annotated bibliography, we explore the landscape of books addressing data management for researchers in order to evaluate the available materials and identify areas where updated book-length resources might be beneficial.
We started the bibliography of data management books with a list of known books, as both authors regularly teach on the topic and reference such material. To supplement the existing list, in Summer 2023, we searched our respective academic library catalogues, Amazon.com, Bookshop.org and Google Play for books on data management using the terms including “data management”, “data curation” and “data literacy” along with using subject headings and bibliographies of identified books. We also noted relevant recommended books on individual book pages on the commercial websites. These search lists were repeated and updated in May 2024.
From the initial list of books returned from our search terms, we narrowed the selection down to English-language books on data management written for researchers and research trainees. Books on data management for information technology professions were excluded from the results. Similarly, books about library data services or librarians teaching data management were excluded. We also excluded manuals that solely focused on a specific tool or software instead of data management principles. Once narrowed to books for researchers, we included books no matter the research discipline, either broad (e.g. social sciences) or narrow (e.g. linguistics).
The authors borrowed each book on this short list from our libraries or via interlibrary loan or, where available, viewed an open access copy of the book. We recorded metadata about each book and checked its ebook availability from GOBI Library Solutions (GOBI) (a book and ebook vendor commonly used by academic libraries), Amazon and Google Play. Cost information was collected from GOBI in May 2024; where a book was available in paperback and hardback, only the less expensive paperback price was recorded. After calibrating our coding scheme on a chosen text, the authors divided the final list of books. For each book, the author reviewed the content, assessed the intended audience and coded it by the data management topics covered. Codings were based on stages of the data management lifecycle from the UK Data Archive’s Data Lifecycle (
Code definitions are provided in Appendix A. At this point in the coding process, we excluded additional books that only tangentially covered data management in less than about 20% of their content or at least one whole chapter. For the final list of included books, see Table 1; the list of excluded titles is provided in Appendix B.
After coding, the authors summarised each book’s coverage of data management and how it relates to the larger bibliography. Summaries were later normalised between both authors for consistency in coverage and analysis.
Table
List of books on data management for researchers included in this bibliography.
Year | Authors | Title | Publisher |
1986 | Michener | Research data management in the ecological sciences |
University of South Carolina Press |
1994 | Michener, Brunt, & Stafford |
Environmental information management and analysis: Ecosystem to global scales |
Taylor & Francis |
2000 | Michener & Brunt |
Ecological data: Design, management and processing |
Blackwell Science |
2012 | Pryor |
Managing research data |
Facet Publishing |
2015 | Baykoucheva |
Managing scientific information and research data |
Chandos Publishing |
2015 | Briney |
Data management for researchers: Organize, maintain and share your data for research success |
Pelagic Publishing |
2016 | Cooper |
Ethical choices in research: Managing data, writing reports and publishing results in the social sciences |
American Psychological Association |
2016 | Herzog |
Data literacy: A user's guide |
SAGE Publications |
2017 | Hoffman |
Principles of data management and presentation |
University of California Press |
2017 | Smalheiser |
Data literacy: How to make your experiments robust and reproducible |
Elsevier Academic Press |
2017 | Zozus |
The data book: Collection and management of research data |
CRC Press |
2018 | Berenson |
Managing your research data and documentation |
American Psychological Association |
2018 | Sibinga |
Ensuring research integrity and the ethical management of data |
IGI Global, Information Science Reference |
2019 | Corti, Van den Eynden, Bishop, & Woollard |
Managing and sharing research data: A guide to good practice (2nd edition) |
SAGE Publications |
2021 | Berez-Kroeker, McDonnell, Koller, & Collister |
The open handbook of linguistic data management |
MIT Press |
2021 | Paulus & Lester |
Doing qualitative research in a digital world |
SAGE Publications |
2023 | Weidmann |
Data management for social scientists: From files to databases |
Cambridge University Press |
Notable amongst the publications is the diversity of publishing houses and academic presses producing these titles. Many presses have just one book addressing this subject with the exception of SAGE Publications. SAGE has three books in this bibliography, as well as a number of books on the excluded titles list (see Appendix B), demonstrating significant interest in the area of research data management publications.
Availability of each of the 17 books is listed in Table
Citation | GOBI Print Price | GOBI eBook Available | Kindle eBook Available | Google Play eBook Available | OCLC, Inc. Number | Open Access DOI |
|
$42.95 | No | No | No | 889519240 | |
|
$290.00 | Yes | No | Yes | 29703875 | |
|
$139.95 | Yes | No | Yes | 42296795 | |
|
$105.00 | Yes | No | No | 702873233 | |
|
$78.95 | Yes | Yes | Yes | 914463642 | |
|
$42.00 | Yes | Yes | Yes | 927940305 | |
|
$41.99 | Yes | Yes | No | 945827430 | |
|
$68.00 | Yes | Yes | No | 884817437 | |
|
$34.95 | Yes | Yes | Yes | 996528474 | |
|
$89.95 | Yes | Yes | Yes | 1012406563 | |
|
$61.99 | Yes | Yes | Yes | 1232123759 | |
|
$35.99 | Yes | Yes | No | 1001457099 | |
|
$275.00 | Yes | No | Yes | 1029852989 | |
|
$56.00 | No | Yes | Yes | 1239746995 | |
|
$250.00 | Yes | Yes | No | 1242017899 | 10.7551/mitpress/12200.001.0001 |
|
$95.00 | No | Yes | Yes | 1240261864 | |
|
$34.99 | No | Yes | Yes | 1302577289 | 10.1017/9781108990424 |
Cost of the books varies widely. Five of the print books (29%) cost over $100, with three (18%) of these over $200. Only six of the 17 books (35%) cost less than $50 to purchase in print. Two of the publications (12%) have open access versions. While libraries may be able to afford more expensive editions, cost is likely a factor for individuals looking to purchase a data management book for his/her own collection.
Table
Citation | Why Manage Data | The Data Lifecycle | Creating Data | Processing Data | Analysing Data | Data Documentation | Data Storage | Preserving Data | Giving Access to Data | Reusing Data | DMPs | Data Policies |
|
x | x | x | x | x | x | x | |||||
|
x | x | x | x | ||||||||
|
x | x | x | x | x | x | x | x | x | x | ||
|
x | x | x | x | x | x | ||||||
|
x | x | x | x | x | |||||||
|
x | x | x | x | x | x | x | x | x | x | x | |
|
x | x | x | x | ||||||||
|
x | x | x | |||||||||
|
x | x | x | x | x | |||||||
|
x | x | x | x | ||||||||
|
x | x | x | x | x | x | x | x | x | |||
|
x | x | x | x | x | |||||||
|
x | x | x | x | x | x | ||||||
|
x | x | x | x | x | x | x | x | x | |||
|
x | x | x | x | x | x | x | x | x | x | x | x |
|
x | x | x | x | x | |||||||
|
x | x | x | x | x |
The most commonly covered topics were Processing Data (14 of 17, 82%) and Giving Access to Data (13 of 17, 76%). The least commonly covered topics were DMPs (6 of 17, 35%) and Data Policies (6 of 17, 35%), which are not part of the data lifecycle, but are becoming a required condition of much grant-funded research. Several books addressed other related topics, such as data visualisations, managing sensitive data and data ethics, which were not coded here.
The following annotated bibliography includes the citation, target audience of the text and a summary of each book.
Baykoucheva, S. (2015). Managing scientific information and research data. Chandos Publishing.
Berenson, K. R. (2018). Managing your research data and documentation. American Psychological Association.
Berez-Kroeker, A. L., McDonnell, B. J., Koller, E., & Collister, L. B. (Eds.). (2021). The open handbook of linguistic data management. MIT Press.
Briney, K. (2015). Data management for researchers: Organize, maintain and share your data for research success. Pelagic Publishing.
Cooper, H. M. (2016). Ethical choices in research: Managing data, writing reports and publishing results in the social sciences. American Psychological Association.
Corti, L., Van den Eynden, V., Bishop, L., & Woollard, M. (2020). Managing and sharing research data: A guide to good practice (Second edition). SAGE Publications.
Herzog, D. (2016). Data literacy: A user’s guide. SAGE Publications.
Hoffmann, J. P. (2017). Principles of data management and presentation. University of California Press.
Michener, W. K. (1986). Research data management in the ecological sciences. University of South Carolina Press.
Michener, W. K., & Brunt, J. W. (2000). Ecological data: Design, management, and processing. Blackwell Science.
Michener, W. K., Brunt, J. W., & Stafford, S. G. (1994). Environmental information management and analysis: Ecosystem to global scales. Taylor & Francis.
Paulus, T. M., & Lester, J. N. (2021). Doing qualitative research in a digital world. SAGE Publications.
Pryor, G. (2018). Managing research data. Facet Publishing.
Smalheiser, N. R. (2017). Data literacy: How to make your experiments robust and reproducible. Elsevier Academic Press.
Sibinga, C. T. S. (2018). Ensuring research integrity and the ethical management of data. IGI Global, Information Science Reference.
Weidmann, N. B. (2023). Data management for social scientists: From files to databases. Cambridge University Press.
Zozus, M. (2017). The data book: Collection and management of research data. CRC Press.
Despite the extent to which data management has become a critical part of grant applications and obligations, we could only identify 17 books which met our criteria for inclusion in this bibliography. Comparably, in library and information science alone, we are aware of nearly double that number of data management-in-libraries titles published within the past decade. Further, even for the books which are included, data management is the central focus in only 10 titles. Researcher and research trainee books frequently consider “data management” to be mostly about data analysis rather than any of the other activities of the data lifecycle. This is supported by the fact that creating, processing, analysing, documenting and providing access to data were the most commonly addressed topics in these books (in addition to why researchers should manage data in the first place). This limits the utility of these books to teach good data management practices or to help researchers and research trainees identify discipline-specific best practices.
Another concern is that only four of the 17 data management books reviewed were published in the past five years, while there has been significant changes in funder and data policy, which is likely to further evolve drastically with the addition of interest in datasets to underlie data science, machine learning and other algorithmic research techniques. All of these changes drive a need for books about data management for researchers and research trainees which are likely to address foundational issues comprehensively and outlast the availability of resources such as courses, website lists and other online materials which are much more subject to change. The current trend towards developing Open Education Resource style books might provide a mechanism that would combine both the comprehensiveness and narrative of a book format with the opportunity to make continual updates, but the sustainability for this has not been investigated.
There is great variety in the disciplinary coverage of these books. Several books have a limited disciplinary focus – for example, William Michener wrote comprehensively about data management in the ecological sciences – which are of great use in their individual subject areas. While there are a number of books which focus on data management in the social sciences, the same is not replicated across life or basic sciences or engineering. Additionally, as various humanities disciplines continue to play catch-up in the area of data management, books should be published for these researchers and trainees. Future data management books for researchers should take these disciplinary gaps into account.
There is also an educational need for updated cross-disciplinary material to serve as textbooks and introductions to data management. The most recent generalist book is the second edition of Managing and sharing research data by
Finally, cost is a significant issue which future authors and publishers must take into account cost. Most books in this bibliography are cost prohibitive for an individual buyer. To counter this, we hope to see continuation of the recent trend of offering open access versions of data management books.
A review of the available books that address research data management reveals a limited number of books which are aimed at an academic researcher and research trainee audience and less than a dozen that focus on the topic. There are significant opportunities to create disciplinary-specific introductory texts that would support either classroom instruction and individual learners in understanding the history and foundation of data management in their respective fields. Additionally, introductory textbooks about data management are needed to provide up-to-date information about this continually evolving field.
Why Manage Data: The book addressed why data management is an important topic and why researchers should care to perform data management activities.
The Data Lifecycle: The book defined a data lifecycle (i.e. a model, often shown as a diagram, that demonstrates how data move through multiple stages over the course of its lifetime); lifecycles could be cyclical or linear and there were no requirements for specific categories to occur in any lifecycle.
Creating Data: The book discussed discipline or methodology-specific data management activities used while collecting data for the research project.
Processing Data: The book included information on how to properly handle data while they were being cleaned and prepared for analysis.
Analysing Data: The book discussed strategies for analysing data to answer a specific research question. This might include statistical analysis, qualitative coding or other techniques.
Data Documentation: The book addressed how information about the data and their collection and analysis, should be recorded to support transparency and reproducibility of the research process.
Data Storage: The book gave guidance on how data should be stored physically or electronically and backed up, potentially including information on how to prevent loss or unauthorised access to the data.
Preserving Data: The book included content on how to maintain data after the end of the active research project, with preservation being conducted either by the original researchers or by another entity.
Giving Access to Data: The book described how to properly share data with others, including informal sharing with other research teams up through sharing data with the public.
Re-Using Data: The book covered how to find data generated by other research teams, use such data and/or give credit for such reuse.
Data Management Plans (DMPs): The book defined “data management plans”, described their place within the data lifecycle and/or gave guidance on how to write a plan.
Data Policies: The book outlined policies that can apply to research data, such as funder policy, institutional policy and/or journal policy.
Bazeley, P., & Jackson, K. (2013). Qualitative data analysis with NVivo (Second edition). SAGE Publications.
Coffey, A., & Atkinson, P. (1996). Making sense of qualitative data: Complementary research strategies. SAGE Publications.
El-Mazny, A. (2014). Biomedical statistics: Research methods and data management. Createspace Independent P.
Fleming, G., & Bruce, P. C. (2021). Responsible data science: Transparency and fairness in algorithms. John Wiley & Sons, Incorporated.
Fogarty, B. J. (2023). Quantitative social science data with R: An introduction (Second edition). SAGE Publications.
Friese, S. (2019). Qualitative data analysis with ATLAS.ti (Third edition). SAGE Publications.
Morrow, J. (2021). Be data literate: The data literacy skills everyone needs to succeed. Kogan Page Limited.
Perry, S. M. (Ed.). (2018). Maximizing social science research through publicly accessible data sets. Information Science Reference.
Rensi, G., & Claxton, H. D. (1972). A data collection and processing procedure for evaluating a research program. Pacific Southwest Forest and Range Experiment Station, Forest Service, U.S. Dept. of Agriculture.
Richards, L. (2021). Handling qualitative data: A practical guide (Fourth edition). SAGE Publications.
Salmona, M., Kaczynski, D., & Lieber, E. (2020). Qualitative and mixed methods data analysis using Dedoose: A practical approach for research across the social sciences. SAGE Publications.
Sommer, R., & Sommer, B. B. (2002). A practical guide to behavioral research: Tools and techniques (Fifth edition). Oxford University Press.
Thomson, R. E., & Emery, W. J. (2014). Data analysis methods in physical oceanography (Third edition). Elsevier.
Kristin Briney is the author of one of the books in the bibliography. She did not code or review her own book for this article.