Research Ideas and Outcomes :
Methods
|
Corresponding author: Matt von Konrat (mvonkonrat@fieldmuseum.org)
Academic editor: Editorial Secretary
Received: 12 Jun 2024 | Accepted: 02 Oct 2024 | Published: 14 Oct 2024
© 2024 Zoe Ryan, Emily Clark, Beatrice Cundiff, Joslyn Nichols, Maya Mahoney, Nkosi Evans, Thomas Campbell, Danny Kreider, Matt von Konrat
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Ryan Z, Clark E, Cundiff B, Nichols J, Mahoney M, Evans N, Campbell T, Kreider D, von Konrat M (2024) Open-source software integration: A tutorial on species distribution mapping and ecological niche modelling. Research Ideas and Outcomes 10: e129578. https://doi.org/10.3897/rio.10.e129578
|
|
Over the last decade, access to global data has become increasingly critical for research, allowing insights into diverse biological, environmental and societal questions at a macro scale. Digitisation has greatly enhanced the use of herbarium data in the analysis of species distributions and ecological niche modelling. Yet, sources on modelling and mapping methodology using open-source software is greatly lacking for beginners. We have created a replicable and thorough tutorial to visualise species occurrence data and exploratory analysis that was developed by undergraduates with broad backgrounds and levels of experience. This tutorial integrates the open-source programmes QGIS, MaxEnt and R to develop distribution maps, using bryophytes as a case study, to promote the accessibility of open-source software and remote access learning. This tutorial has already set the foundation for further research into distribution modelling of rare Illinois bryophytes to better understand the potential impact of climate change.
Bryophytes, QGIS, MaxEnt, R, Rstudio, tutorial, open-source, ecological niche modelling, species distribution maps
Digitisation and Natural History Collections
Globally, thousands of institutions house nearly three billion scientific collections containing multiple layers of associated metadata (
Despite the scientific, educational and societal relevance (
Modelling and Mapping
The geographic range in which a species can exist is defined by three boundaries: biotic, abiotic and mobility factors, also called the BAM framework (
This paper outlines a beginner-friendly SDM/ENM modelling tutorial using Illinois bryophytes as a case study. Bryophytes lend themselves well to such methods due to their broad distribution and important ecological role, but this tutorial can be followed with any organism. The project aims to explore the potential range of bryophytes based on climate variables, while utilising open-source data and programmes. A more detailed set of instructions can be found in the supplementary materials.
Bryophytes and Ecological Significance
Bryophytes, including mosses, liverworts and hornworts, are the second largest group of land plants after flowering plants and are pivotal in our understanding of early land plant evolution (
A, D) The liverwort, Frullania. A) Growing on bark, D) Ventral view of the stem under the microscope; B, E) Plagiomnium. B) Mat with sporophytes, inset, magnification on peristome teeth, E) Shoot with leaves illustrating a mid-rib; C, F) Polytrichum commune. C) Erect stems with with sporophytes, F) Close up of stem showing the spiral leaf arrangement (courtesy of Jerry Jenkins). Scale bars: A, B, C = 1 cm (B inset = 20um); D, E = 1 mm; F = 500 mm.
Aims
Data Resources
The process of producing maps and modelling requires many resources, for example, online data aggregators (e.g. Consortium of Bryophyte Herbaria (
Distribution Data
Occurrence datasets of Entodon seductrix, Dicranella heteromalla, Plagiomnium cuspidatum, Frullania eboracensis and Polytrichum commune were downloaded from CNABH and used to construct the models outlined below.
QGIS
QGIS is an open-source GIS software (Geographical Information System) used to visualise occurrence and environmental data. It is a user-friendly tool for mapping points with longitude and latitude, shapefiles and pixelated raster data. QGIS can accommodate a variety of data types in order to create a cohesive visualisation of the information being used.
R and Rstudio
Rstudio is an integrated development environment (IDE) that runs the programming language R and is commonly used for statistical analysis and bioinformatics. We used a brief script to clean our data of occurrences lacking longitude and latitude that generates a new CSV file containing only the georeferenced occurrences.
MaxEnt
MaxEnt is an open-source software which creates a heat map of potential habitats for the modelled species within the study area. These areas are identified with varying probabilities using correlations between occurrence points and environmental data. The heat map generated by MaxEnt can be input into QGIS for further visualisation. MaxEnt also generates model statistics, including a jackknife plot that indicates the relevancy of each environmental variable for constructing the model.
A brief summary of the methodology is provided below, to serve as introductory steps into distribution modelling. Supplementary materials provide detailed and step-by-step instructions and information.
Suppl. material
Suppl. material
Suppl. material
Suppl. material
Suppl. material
Occurrence Map
This occurrence data from CNABH is entered into QGIS in order to create a distribution map of the species. For Illinois focused maps, a shapefile was freely downloaded from
Environmental Layers
Bioclim historical climate variables were downloaded from WorldClim to act as the environmental layers for the model (
Using R and MaxEnt
The R script was run to remove bryophyte occurrences where geoference data was absent. The cleaned CSV file was put into the MaxEnt programme as the “Samples” and the clipped WorldClim data in as “Environmental Layers”. Running MaxEnt results in a comprehensive distribution map built off the set of chosen environmental factors in reference to occurrence data. MaxEnt results can be put into QGIS, which allows us to visualise the distribution of a species in relation to where they are predicted to be seen. These distributions can be seen in Figs
Predictive distribution of Polytrichum commune in Illinois. The occurrences are represented by white points, with the yellow end of the gradient representing a location the bryophyte is likely to be found. The colour gradient can be altered to be colourblind-friendly, such as the IBM palette as depicted here.
Troubleshooting
Some common issues:
A detailed step-by-step guide to develop species distribution maps and preliminary ecological niche models using the open-source software - QGIS, Rstudio and MaxEnt - is provided. The tutorial uses selected moss and liverwort species as a case study mapping occurrences in Illinois compared to climatic variables, demonstrating the potential capacity of mosses as climate change indicators (e.g.
Starting with open-source materials and simple methodology provides the foundation for building more complex and accurate models. All of the data editing and modelling completed in this tutorial can be done entirely in R for a more seamless and customisable experience. Fig.
The creation of accurate and useful distribution and niche models is a challenging process and can be overwhelming for beginners to know where to start. We hope this tutorial demystifies modelling methodology for researchers, students and citizen scientists and allows for a cost-free starting point into learning what creates a useful model.
This project would not have been possible without the hard work of many people. We would like to acknowledge Don De Alban from the National University of Singapore, who made the first tutorial for using these three programmes to create distribution maps which we then expanded upon to develop our own case study. The Student Center for Science Engagement at Northeastern Illinois University, the Dean's Undergraduate Fellowship, College of Science and Health, DePaul University and the Field Museum Women's Board, and the NSF-funded iDigBio Natural History Collections Summer Internship Program (iDB-SIP), helped provide funding for student interns. Financial support was provided by the National Science Foundation (Award No. 0949136, 1145898, 1458300, 1541545, 2001509) and the Friends of Nachusa Grasslands Scientific Research Grant, especially Research Scientist Elizabeth Bach for her valuable time and support. We also thank Jerry Jenkins for permission to use his images and Daniel Le and Chris Maves for their macro-photography. We are with gratitude for the investment in infrastructure provided by the Grainger Bioinformatics Center, Field Museum. The senior author thanks Dr. John Dean, Dr. Jalene LaMontagne and Dr. Ken Cameron for their role in advisorship. The senior author is also grateful to the University Wisconsin-Madison Botany Department for funding the presentation of this work at the Ecological Society of America (ESA) 2024.
We are also in deep gratitude towards Dr. Wes Testo and Dr. Anna Ferretto for their valuable assistance and time.
The Field Museum of Natural History