Research Ideas and Outcomes :
Workshop Report
|
Corresponding author: Erica Krimmel (ekrimmel@gmail.com)
Received: 15 Aug 2024 | Published: 28 Aug 2024
This is an open access article distributed under the terms of the CC0 Public Domain Dedication.
Citation:
Karim T, Krimmel E, Little H, Walker LJ (2024) Community-driven enhancement of information ecosystems for the discovery and use of paleontological specimen data: Stakeholder engagement workshop. Research Ideas and Outcomes 10: e134840. https://doi.org/10.3897/rio.10.e134840
|
|
A stakeholder engagement workshop was held in May 2024 as part of the "Community-driven enhancement of information ecosystems for the discovery and use of paleontological specimen data" project, which is funded under the United States National Science Foundation (NSF) Geosciences Open Science Ecosystem (GEO OSE) program. This report describes the activites and outcomes of the workshop.
paleontology, palaeontology, fossil, geology, biodiversity, collection, natural history collection, specimen
This workshop took place from 14-15 May 2024 at the University of Colorado Boulder (CU) and was hosted by the CU Museum of Natural History.
The twenty-four participants of this workshop (Table
List of on-site workshop participants. Workshop organizers are indicated with an asterisk (*).
Name | Institutional affiliation | Role or title |
---|---|---|
Alex Cano | Smithsonian National Museum of Natural History | Data Specialist |
Amanda Millhouse | Smithsonian National Museum of Natural History | Deputy Collections Manager of Vertebrate Paleontology |
Carl Simpson | University of Colorado Boulder, Museum of Natural History | Assistant Professor of Geological Sciences | Curator of Invertebrate Paleontology |
Casey Thater | University of Colorado Boulder, Museum of Natural History | Graduate Student |
Chrissy Garcia | Stanford University | Geoscience Specimen Collection Curator and Manager |
Corinne Myers | University of New Mexico | Associate Professor of Earth and Planetary Sciences |
Ellen Currano | University of Wyoming | Professor of Paleobotany |
Erica Krimmel* | independent | Information Scientist |
Holly Little* | Smithsonian National Museum of Natural History | Informatics Manager of Paleobiology |
Jacob Van Veldhuizen | University of Colorado Boulder, Museum of Natural History | Collections Manager of Vertebrate Paleontology |
Jaelyn Eberle | University of Colorado Boulder, Museum of Natural History | Professor of Geological Sciences | Curator of Vertebrate Paleontology |
Kit Lewers | University of Colorado Boulder | Graduate Student |
Lindsay Walker* | Arizona State University | Symbiota Support Hub Community Manager |
Melanie Hopkins | American Museum of Natural History | Curator of Invertebrate Paleontology |
Nancy Stevens | University of Colorado Boulder, Museum of Natural History | Director | Professor of Anthropology |
Natalia Lopez-Carranza | University of Kansas, Biodiversity Institute | Collection Manager – Invertebrate Paleontology |
Nicole McGee | University of Colorado Boulder, Museum of Natural History | Graduate Student |
Pat O'Connor | Ohio University | Professor of Anatomy and Neuroscience |
Pedro Monarrez | University of California Los Angeles | Recruitment, Outreach, Diversity, Equity, and Inclusion Coordinator for the Department of Earth, Planetary, and Space Sciences |
Sarah Leventhal | University of Colorado Boulder, Museum of Natural History | Graduate Student |
Simon Goring | University of Wisconsin – Madison | Assistant Scientist |
Stewart Edie | Smithsonian National Museum of Natural History | Research Geologist | Curator of Fossil Bivalvia |
Talia Karim* | University of Colorado Boulder, Museum of Natural History | Collections Manager of Invertebrate Paleontology and Paleobotany |
Will Taylor | University of Colorado Boulder, Museum of Natural History | Assistant Professor of Anthropology | Curator of Archaeology |
This workshop is part of the "Community-driven enhancement of information ecosystems for the discovery and use of paleontological specimen data" project, which is funded under the United States National Science Foundation (NSF) Geosciences Open Science Ecosystem (GEO OSE) program. The goal of the project is to support transformational and translational research in the geo- and biosciences by driving development in the open data landscape, by improving discoverability and use of paleontological specimen data through community engagement and collaboration. Project personnel are actively coordinating with partners throughout the larger data ecosystem, including via two in-person workshops, of which this is the first.
At the intersection of geo- and bioscience, paleontology is an inherently interdisciplinary field and one with impactful research. The ever-growing climate crisis, for one example, highlights a need to understand how taxa reacted to changes in Earth’s history, and underscores the importance of examining patterns from deep time into the modern. Over the last decade, the United States paleontology collections community has invested heavily in the digitization of primary specimen data, including over $10 million funded through the NSF Advancing Digitization of Biodiversity Collections (ADBC) program*
The desired outcomes of this stakeholder engagement workshop were:
By engaging a broad spectrum of individuals who interact with paleontological collections data in different ways, workshop organizers hoped to build a shared understanding of needs. As a component of the overarching project, these outcomes form the basis for advanced investigations into cyberinfrastructure needs and potential solutions that will be explored during the latter part of 2024 and into 2025.
After a welcome from Nancy Stevens, director of the CU Museum of Natural History Collections, the workshop kicked off with icebreaker activities designed to set an active and productive, yet informal, tone. Breakout group introduction discussions built trust and encouraged participants to learn about each other as individuals by posing questions like: What is something (non-work) that you have accomplished recently and are proud of? What keeps you up at night, good or bad (work-wise)? What are your hopes and dreams for paleo data?
Throughout the two days, most workshop participants shared briefly about their work via an activity we called a "Data Use Spotlight." Instructions for this activity were to prepare one slide to illustrate how they use fossil data for their work (Fig.
Example "Data Use Spotlight" slides.
Workshop organizers presented an overview of their vision for a paleo data ecosystem map, contextualized as the universe of resources we use to do our work and how these resources interact with each other. Creating this map will involve modeling the existing information and systems landscape by characterizing various resources (concepts, systems, platforms, mechanisms, drivers, tools, documentation, standards, etc.), and specifically addressing their use for fossil data. The resulting map will be a tool with entry points for multiple audiences, including new members to the community, members working in specific sectors, and members working to integrate initiatives and systems.
With the context provided by this overview, participants worked collaboratively to list resources they use on giant sticky notes. Essential resources were then flagged with pink sticky notes, and additional sticky note colors were added to capture comments about how participants use the resource, and how it might be tagged in the envisioned ecosystem map (Fig.
This activity began with providing an overview of paleo collections as core research infrastructure (
Considering both physical (in person or via loan) and digital (data and/or media) access, this project is building a conceptual data model for paleo specimens where we can classify different types of data and describe the attributes of and relationships between classes. We expect that this model may be useful for tasks such as:
In small groups, workshop participants sketched out connections to the data they might need in order to use a given specimen for research (Fig.
The final workshop activity focused on developing a better understanding of the research data pipeline. Workshop organizers asked participants to write down research questions (old, new, previously examined, or unsolved) on sticky notes. These were grouped thematically into three large clusters and one question from each group was chosen as an exemplar research question to explore. Participants were asked to map out all the steps they would do in order to answer their exemplar question and where there could be resource gaps or challenges that would inhibit the research process. This allowed the group to better understand how fossils and associated data are utilized and accessed as part of the research data pipeline.
Group A focused on "How do we find new fossils?" and identified the key data points needed to answer various iterations of this question (Fig.
Group B focused on biogeography, comparing niche dimensions with phylogeny (Fig.
Group C focused on trait data, specifically, looking at trait selectivity to predict extinction risk across a geologic time boundary, for example, the Cretaceous-Paleogene Boundary (Fig.
As with previous activities, this one provided workshop organizers with invaluable perspective about how researchers perceive and use fossil specimen data, both digital and analog. The diagrams resulting from this activity will inform future work on the overarching project.
All participants were asked to provide anonymous feedback on the workshop via a brief survey, which was separate from a demographics survey. Eleven people responded, representing slightly over half of the 20 workshop participants (workshop organizers did not participate in this survey). Feedback provided in the survey was overwhelmingly positive (Fig.
Throughout the workshop, participants highlighted critical themes that align with the big-picture objectives of this project.
Fitness-for-use of specimen-based data available on aggregators (e.g. GBIF, iDigBio) was one such recurrent theme. Discussions touched on use of specimen images for diverse research purposes, digitization of data "on demand," the necessity of species-level taxonomic identifications, and duplication of occurrence records. Participants were particularly interested in considering the “why” of collecting, as knowing why something was collected could inform its fitness-for-use in other applications.
Fitness-for-use ties directly into another theme, data availability. Workshop participants had many discussions focused on what data are available, what data are not, and (if not) why not? On a specimen level, participants discussed availability of digitized trait data, which are typically not stored with the specimen record itself, or shared on data aggregators for fossil specimens. On a broader level, participants explored the idea of sharing minimal data to improve discoverability of larger collections where specimen-level digitization is an unreasonable target (e.g. an institution might share inventory data via the Latimer Core standard to let researchers know about all brachiopods collected by a particular person). Such minimal data might be the entry point for digitization "on demand" of data at the specimen-level. On a human level, participants discussed how much institutional knowledge is held by collection staff, and how best to capture that before individuals retire or move on.
Finally, the capacity to make collections data fit-for-use and available came up constantly. Several participants shared that they were the only people at their institutions managing those collections. For others, the scope of digitizing legacy data in their collections is so vast that multiple additional trained staff would be needed to address the issue. Everyone was concerned about how we might try to future-proof existing research datasets and databases. Who is going to maintain these key resources in the future when we barely have the capacity and funding to do it now?
All three of these themes emphasize that humans are at the center of research and collections. In planning this workshop, we attempted to be people-centric. Built-in flexibility in the agenda allowed participants to have time for discussions when a topic emerged that sparked group interest. Similarly, providing longer lunch and coffee breaks facilitated unstructured discussion and allowed people to think, chat, and explore ideas organically. Concrete results from the workshop activities are valuable to the overarching project, but equally so was laying the groundwork for continuing to have productive and collaborative conversations with the group of people who participated. Building a shared understanding of the needs of research and collections communities related to fossil data is an ongoing process, and one that is essential to envisioning solutions.
To conclude, this stakeholder engagement workshop brought together a group of professionals with varied skillsets, perspectives, and end-use goals for digitized fossil collections data. In two days, the group provided critical feedback to defining the essential elements of the vast landscape (or ecosystem) of research resources available to the paleontological community, modeled data pipelines based on real-life questions in paleontological research, and became better acquainted with the data needs, uses, and workflows of colleagues working in other sectors of the paleontological domain. While much progress remains to be accomplished, the outcomes of this workshop underscore the need for the paleontological research, collections, and informatics specialists to collaboratively define solutions for data pipelines through people-centric initiatives.
Thank you sincerely to all those who participated in this workshop and made it the success that it was! Extra thanks to Kit Lewers, Alex Cano, Nicole McGee, Jerah Brewster, and Sam Eads for their help with lunchtime logistics.
University of Colorado Boulder
NSF-funded "Thematic Collections Networks" including fossil specimens: Fossil Insect Collaborative (2013-2020), PaleoNICHES (2012-2015), EPICC (2015-2020), Cretaceous World (2016-2023), Pteridological Collections Consortium (2018-2023). Monetary amount acquired from NSF’s Award Search.
As of this writing (2024-06-04), there are 8,917,071 occurrence records in the GBIF data portal where basisOfRecord = “FossilSpecimen”.