Research Ideas and Outcomes : Methods
PDF
Methods
Open-source software integration: A tutorial on species distribution mapping and ecological niche modelling
expand article infoZoe Anne Ryan‡,§,|, Emily Kathleen Clark§, Beatrice Cundiff, Joslyn Althea Nichols#, Maya Mahoney|, Nkosi Michael Evans§,¤,, Thomas Campbell¤, Danny Kreider«, Matt von Konrat§
‡ University of Wisconsin - Madison, Madison, United States of America
§ Field Museum, Gantz Family Collections Center, Chicago, United States of America
| DePaul University, Chicago, United States of America
¶ University of California, Berkeley, Berkeley, United States of America
# Arizona State University, Sussex, United States of America
¤ Northeastern Illinois University, Chicago, United States of America
« University of Illinois, Chicago, Chicago, United States of America
Open Access

Abstract

Over the last decade, access to global data has become increasingly critical for research, allowing insights into diverse biological, environmental and societal questions at a macro scale. Digitisation has greatly enhanced the use of herbarium data in the analysis of species distributions and ecological niche modelling. Yet, sources on modelling and mapping methodology using open-source software is greatly lacking for beginners. We have created a replicable and thorough tutorial to visualise species occurrence data and exploratory analysis that was developed by undergraduates with broad backgrounds and levels of experience. This tutorial integrates the open-source programmes QGIS, MaxEnt and R to develop distribution maps, using bryophytes as a case study, to promote the accessibility of open-source software and remote access learning. This tutorial has already set the foundation for further research into distribution modelling of rare Illinois bryophytes to better understand the potential impact of climate change.

Keywords

Bryophytes, QGIS, MaxEnt, R, Rstudio, tutorial, open-source, ecological niche modelling, species distribution maps

Introduction

Digitisation and Natural History Collections

Globally, thousands of institutions house nearly three billion scientific collections containing multiple layers of associated metadata (Holmes et al. 2016, Sweeney et al. 2018). Extensive, professionally managed natural history collections, with their broad taxonomic, geographic and temporal scope, offer unparalleled resources that contribute to science, society and their community (e.g. Saxena and Harinder (2004), Berendsohn and Seltmann (2010), Hedrick et al. (2019)). Digitisation has greatly enhanced the use of herbarium data in scientific research, impacting diverse research areas, ranging from biodiversity informatics, conservation biology to global change biology (Bebber et al. 2010, Soltis et al. 2018). Nualart et al. (2017) reviewed the potential use of herbarium specimens and categorised them based on:

  1. occurrence data, such as studies about plant extinction or introduction or those focused on modelling their ecological niche;
  2. the specimens themselves, such as morphological or phenological studies to evaluate the impact of climate change;
  3. genetic data, such as phylogeographic or taxonomical studies; and
  4. other applied studies.

Despite the scientific, educational and societal relevance (Heberling et al. 2019) of museum specimens, they remain underused particularly in ecological studies. Yet, the wealth of hidden biodiversity may reveal global patterns that are not observable from other data sources (Meineke et al. 2018). Herbarium specimens, in particular, offer a great resource for research projects to involve the public, high school interns and university students, allowing for the entire community to create scientific discoveries (Pivarski et al. 2022).

Modelling and Mapping

The geographic range in which a species can exist is defined by three boundaries: biotic, abiotic and mobility factors, also called the BAM framework (Peterson and Soberón 2012, Melo-Merino et al. 2020). Models can be created with any number of these factors to describe species distributions (species distribution model [SDM]) or niches (ecological niche model [ENM]). A model can be an SDM and/or an ENM depending on the desired goal, which should provide information for a researcher's prioritisation and selection of BAM factor data (Barve et al. 2011, Melo-Merino et al. 2020).

This paper outlines a beginner-friendly SDM/ENM modelling tutorial using Illinois bryophytes as a case study. Bryophytes lend themselves well to such methods due to their broad distribution and important ecological role, but this tutorial can be followed with any organism. The project aims to explore the potential range of bryophytes based on climate variables, while utilising open-source data and programmes. A more detailed set of instructions can be found in the supplementary materials.

Bryophytes and Ecological Significance

Bryophytes, including mosses, liverworts and hornworts, are the second largest group of land plants after flowering plants and are pivotal in our understanding of early land plant evolution (Leebens-Mack et al. 2019, Zhang et al. 2020). Fig. 1 and Fig. 2 depict a few common bryophytes that demonstrate the salient morphological characters used to distinguish between them, for example, a liverwort, Frullania, (Fig. 1a, d) and mosses, for example, Plagiomnium (Fig. 1b, e), Polytrichum (Fig. 1c, f) and Entodon (Fig. 2). Bryophytes are of great ecological and environmental significance, playing an important role in nurtrient cycling and water retention by reducing soil nurtrient loss and flooding risk (Rieley et al. 1979, Anderson et al. 2010). Bryophytes have also been explored as possible indicators of climate change (e.g. Lindo and Gonzalez (2010), Ruklani et al. (2021)) and air pollution (Zechmeister et al. 2003, Dymytrova 2009) due to their small size, broad distribution and environmental sensitivity (Zanatta et al. 2020, Hespanhol et al. 2022, Mallen‐Cooper et al. 2022).

Figure 1.  

A, D) The liverwort, Frullania. A) Growing on bark, D) Ventral view of the stem under the microscope; B, E) Plagiomnium. B) Mat with sporophytes, inset, magnification on peristome teeth, E) Shoot with leaves illustrating a mid-rib; C, F) Polytrichum commune. C) Erect stems with with sporophytes, F) Close up of stem showing the spiral leaf arrangement (courtesy of Jerry Jenkins). Scale bars: A, B, C = 1 cm (B inset = 20um); D, E = 1 mm; F = 500 mm.

Figure 2.  

Entodon seductrix. A) Growing on bark, showing mat-like growth; B) Close-up with sporophytes; C) Whole leaf; D) Leaf cells. Scale bars: A, B = 1 cm; C = 100 um; D = 20 um.

Aims

Alban (2017) created a short tutorial utilising QGIS, R and MaxEnt. This provided a foundation for creating a project in these programmes, but it lacked detail outlining the MaxEnt modelling process, MaxEnt statistics and map formatting options in QGIS which a beginner would need. Additionally, our tutorial provides an account of how to collect distribution data, as well as highlighting widely avaliable climate environmental layers. Therefore, a comprehensive tutorial and supplementary guides have been developed with the purpose of being beginner-friendly and accessible through the use of free or open-source software. It has been designed to be easily replicable for other studies exploring different organisms and distribution modelling. We outline this using bryophytes as case studies investigating distribution patterns regionally in Illinois and on a broader continental scale using the United States.

Description

Data Resources

The process of producing maps and modelling requires many resources, for example, online data aggregators (e.g. Consortium of Bryophyte Herbaria (Consortium of Bryophyte Herbaria 2024), GADM (GADM 2018,) WorldClim (Fick and Hijmans 2017), QGIS (QGIS.org 2021), R (R Core Team 2022) and MaxEnt (Phillips et al. 2024). All of these resources are open access.

Distribution Data

Occurrence datasets of Entodon seductrix, Dicranella heteromalla, Plagiomnium cuspidatum, Frullania eboracensis and Polytrichum commune were downloaded from CNABH and used to construct the models outlined below.

QGIS

QGIS is an open-source GIS software (Geographical Information System) used to visualise occurrence and environmental data. It is a user-friendly tool for mapping points with longitude and latitude, shapefiles and pixelated raster data. QGIS can accommodate a variety of data types in order to create a cohesive visualisation of the information being used.

R and Rstudio

Rstudio is an integrated development environment (IDE) that runs the programming language R and is commonly used for statistical analysis and bioinformatics. We used a brief script to clean our data of occurrences lacking longitude and latitude that generates a new CSV file containing only the georeferenced occurrences.

MaxEnt

MaxEnt is an open-source software which creates a heat map of potential habitats for the modelled species within the study area. These areas are identified with varying probabilities using correlations between occurrence points and environmental data. The heat map generated by MaxEnt can be input into QGIS for further visualisation. MaxEnt also generates model statistics, including a jackknife plot that indicates the relevancy of each environmental variable for constructing the model.

Implementation

A brief summary of the methodology is provided below, to serve as introductory steps into distribution modelling. Supplementary materials provide detailed and step-by-step instructions and information.

Suppl. material 1: A full tutorial with reference images and text-focused steps of QGIS and MaxEnt.

Suppl. material 2: A tutorial for more advanced users with background knowledge on QGIS and MaxEnt (contains the same material as Suppl. material 1, but more brief).

Suppl. material 3: A tutorial on how collect and plot occurrence data in QGIS.

Suppl. material 4: A tutorial for more advanced users of QGIS (contains the same material as Suppl. material 3, but more brief).

Suppl. material 5: A downloadable template for QGIS map outputs and associated information.

Methodology

Occurrence Map

This occurrence data from CNABH is entered into QGIS in order to create a distribution map of the species. For Illinois focused maps, a shapefile was freely downloaded from GADM (2018) to cut out occurrences outside the state boundaries. In the first tutorial, we used Suppl. material 3 and Suppl. material 4 to produce a species distribution maps showing occurrences, seen in Figs 3, 4, 5.

Figure 3.  

Eastern United States map of Entodon seductrix, with green points representing each occurrence.

Figure 4.  

United States distribution of Plagiomnium cuspidatum.

Figure 5.  

Eastern US distribution of Frullania eboracensis.

Figure 6.  

Predictive distribution of Plagiomnium cuspidatum in Illinois, represented by tan points. The white of the predictive gradient indicated a location where the bryophyte is likely to be found.

Environmental Layers

Bioclim historical climate variables were downloaded from WorldClim to act as the environmental layers for the model (Fick and Hijmans 2017). These data were free for download and provided worldwide climate information in the form of a TIFF file. To make the environmental data specific to the study area, the variable layers were clipped using the Illinois Mask Layer file and Extraction Raster tool. More types of environmental layers, such as those provided by the US National Land Cover Database (Dewitz and U.S. Geological Survey 2021) can be explored as additional modelling parameters (e.g. Figs 7, 8).

Figure 7.  

Entodon seductrix occurrences overlaid on the 2001 U.S. National Land Cover Data layer. Prepared with the supplementary template.

Figure 8.  

Entodon seductrix occurrences overlaid on the 2019 U.S. National Land Cover Data layer. Prepared with the supplementary template.

Using R and MaxEnt

The R script was run to remove bryophyte occurrences where geoference data was absent. The cleaned CSV file was put into the MaxEnt programme as the “Samples” and the clipped WorldClim data in as “Environmental Layers”. Running MaxEnt results in a comprehensive distribution map built off the set of chosen environmental factors in reference to occurrence data. MaxEnt results can be put into QGIS, which allows us to visualise the distribution of a species in relation to where they are predicted to be seen. These distributions can be seen in Figs 6, 9, 10. This visualisation was then placed into a template (Suppl. material 5) for a cohesive presentation of results. Examples of maps displayed using this template can be seen in figures Figs 11, 12.

Figure 9.  

Entodon seductrix occurrences in Illinois, represented by red points. The light end of the gradient represents a location where the species is likely to be found.

Figure 10.  

Predictive distribution of Polytrichum commune in Illinois. The occurrences are represented by white points, with the yellow end of the gradient representing a location the bryophyte is likely to be found. The colour gradient can be altered to be colourblind-friendly, such as the IBM palette as depicted here.

Figure 11.  

Predictive distribution of Frullania eboracensis, using the created template for presentation. The white points represent the occurrences of the bryophyte, with the red end of the gradient representing an area in which the species is likely to be found.

Figure 12.  

Predictive distribution of Entodon seductrix, using template for presentation. The white points represent occurrences of the bryophyte and the red end of the spectrum indicates a location in which the species is likely to be found.

Troubleshooting

Some common issues:

  • The MaxEnt programme can have difficulties running on the MacOS 11.4 Big Sur operating system. The MaxEnt team recommend using MacOS 10 to resolve this. The stand-alone MaxEnt programme has not been tested on more updated MacOS operating systems.
  • If QGIS stops working after a long period of use, saving your work and restarting the programme often resolves this.
  • It is recommended that you run the R code in a script in Rstudio, otherwise editing the code after running it can be difficult; taking this extra step makes the data cleaning process run more smoothly.

Conclusions

A detailed step-by-step guide to develop species distribution maps and preliminary ecological niche models using the open-source software - QGIS, Rstudio and MaxEnt - is provided. The tutorial uses selected moss and liverwort species as a case study mapping occurrences in Illinois compared to climatic variables, demonstrating the potential capacity of mosses as climate change indicators (e.g. Gignac (2001)). The tutorial has been designed to be versatile and adaptable to projects that use QGIS for distribution mapping. Significantly, the guide highlights the importance of open-source resources to help accelerate our understanding of biodiversity patterns, making this field more accessible and equitable through both location and cost. The guide and tutorial especially serves the intended audience of those who are beginners or first time investigators navigating QGIS, MaxEnt and Rstudio mapping.

Next Steps: Increasing Model Precision, Accuracy and Specific Niche Analysis

Starting with open-source materials and simple methodology provides the foundation for building more complex and accurate models. All of the data editing and modelling completed in this tutorial can be done entirely in R for a more seamless and customisable experience. Fig. 13 shows the predictive distribution for E. seductrix, based on non-correlated Bioclim variables and National Land Cover Database (NLCD) variables for land cover and percentage tree canopy cover. The jackknife plot accompanying the map indicates relevance and importance of each variable to the model, which is relevant for describing the species niche and comparing it to other bryophyte species.

Figure 13.  

Predictive distribution model of Entodon seductrix and jackknife plot of environmental variables.

The creation of accurate and useful distribution and niche models is a challenging process and can be overwhelming for beginners to know where to start. We hope this tutorial demystifies modelling methodology for researchers, students and citizen scientists and allows for a cost-free starting point into learning what creates a useful model.

Acknowledgements

This project would not have been possible without the hard work of many people. We would like to acknowledge Don De Alban from the National University of Singapore, who made the first tutorial for using these three programmes to create distribution maps which we then expanded upon to develop our own case study. The Student Center for Science Engagement at Northeastern Illinois University, the Dean's Undergraduate Fellowship, College of Science and Health, DePaul University and the Field Museum Women's Board, and the NSF-funded iDigBio Natural History Collections Summer Internship Program (iDB-SIP), helped provide funding for student interns. Financial support was provided by the National Science Foundation (Award No. 0949136, 1145898, 1458300, 1541545, 2001509) and the Friends of Nachusa Grasslands Scientific Research Grant, especially Research Scientist Elizabeth Bach for her valuable time and support. We also thank Jerry Jenkins for permission to use his images and Daniel Le and Chris Maves for their macro-photography. We are with gratitude for the investment in infrastructure provided by the Grainger Bioinformatics Center, Field Museum. The senior author thanks Dr. John Dean, Dr. Jalene LaMontagne and Dr. Ken Cameron for their role in advisorship. The senior author is also grateful to the University Wisconsin-Madison Botany Department for funding the presentation of this work at the Ecological Society of America (ESA) 2024.

We are also in deep gratitude towards Dr. Wes Testo and Dr. Anna Ferretto for their valuable assistance and time.

Hosting institution

The Field Museum of Natural History

Conflicts of interest

The authors have declared that no competing interests exist.

References

Supplementary materials

Suppl. material 1: Full QGIS MaxEnt Tutorial 
Authors:  Emily Clark and Zoe Ryan
Data type:  pdf
Suppl. material 2: Quick Step QGIS MaxEnt Tutorial 
Authors:  Emily Clark
Data type:  pdf
Suppl. material 3: QGIS Intro and Instructions for Mapping Species Occurrences 
Authors:  Zoe Ryan
Data type:  pdf
Suppl. material 4: Quick Guide to Mapping Occurrences in QGIS 
Authors:  Zoe Ryan
Data type:  pdf
Suppl. material 5: QGIS Map Output Template (requires QGIS to open) 
Authors:  Emily Clark
Data type:  qpt
login to comment