Research Ideas and Outcomes : Research Article
Research Article
People-Powered Research and Experiential Learning: Unravelling Hidden Biodiversity
expand article infoMelanie Pivarski, Matt von Konrat§, Thomas Campbell|, Ayesha T. Qazi-Lampert§, Laura Trouille, Heaven Wade§, Aimee Davis#, Selma Aburahmeh|, Joseph Aguilar, Cosmin Alb, Ken Alferes, Ella Barker, Karl Bitikofer#, Kelli J. Boulware, Carla Bruton#, Sicong Cao, Arturo Corona Jr., Christine Christian, Kaltra Demiri, Daniel Evans, Nkosi Michael Evans§, Connor Flavin, Jasmine Gillis, Victoria Gogol, Elizabeth Heublein, Edward Huang¤, Jake Hutchinson, Cyrus Jackson§, Odaliz Rubee Jackson, Lauren Johnson§, Michi Kirihara, Henry Kivarkis, Annette Kowalczyk, Alex Labontu, Briajia Levi¤, Ian Lyu, Sylvie Martin-Eberhardt«, Gaby Mata#, Joann Lacey Martinec§, Beth McDonald», Mariola Mira, Minh Nguyen, Pansy Nguyen˄, Sarah Nolimal˅, Victoria Reese#, Will Ritchie, Joannie Rodriguez, Yarency Rodriguez§, Jacob Shuler¦, Jasmine Silvestre, Glenn Simpson|, Gabriel Somarribaˀ, Rogers Ssozi, Tomomi Suwaˁ, Cheyenne Syring, Nidhi Thirthamattur#, Keith Thompson¦, Caitlin Vaughn§, Mario R Viramontes, Chak Shing Wong, Lauren Wszolek
‡ Roosevelt University, Chicago, United States of America
§ Field Museum, Gantz Family Collections Center, Chicago, United States of America
| Northeastern Illinois University, Chicago, United States of America
¶ The Adler Planetarium, Chicago, United States of America
# Field Museum, Learning Center, Chicago, United States of America
¤ University of Illinois at Urbana-Champaign, Champaign County, United States of America
« Michigan State University, East Lansing, United States of America
» Field Museum, Grainger Bioinformatics Center, Chicago, United States of America
˄ Connecticut College, New London, United States of America
˅ DePaul University, Chicago, United States of America
¦ Field Museum, Visitor Services & Analytics, Chicago, United States of America
ˀ University of Florida, Gainsville, United States of America
ˁ Field Museum, Keller Science Action Center, Chicago, United States of America
Open Access


Globally, thousands of institutions house nearly three billion scientific collections offering unparallelled resources that contribute to both science and society. For herbaria alone - facilities housing dried plant collections - there are over 3,000 herbaria worldwide with an estimated 350 million specimens that have been collected over the past four centuries. Digitisation has greatly enhanced the use of herbarium data in scientific research, impacting diverse research areas, including biodiversity informatics, global climate change, analyses using next-generation sequencing technologies and many others. Despite the entrance of herbaria into a new era with enhanced scientific, educational and societal relevance, museum specimens remain underused. Natural history museums can enhance learning and engagement in science, particularly for school-age and undergraduate students. Here, we outline a novel approach of a natural history museum using touchscreen technology that formed part of an interactive kiosk in a temporary museum exhibit on biological specimens. We provide some preliminary analysis investigating the efficacy of the tool, based on the Zooniverse platform, in an exhibit environment to engage patrons in the collection of biological data. We conclude there is great potential in using crowd‐sourced science, coupled with online technology to unlock data and information from digital images of natural history specimens themselves. Sixty percent of the records generated by community scientists (citizen scientists) were of high enough quality to be utilised by researchers. All age groups produced valid, high quality data that could be used by researchers, including children (10 and under), teens and adults. Significantly, the paper outlines the implementation of experiential learning through an undergraduate mathematics course that focuses on projects with actual data to gain a deep, practical knowledge of the subject, including observations, the collection of data, analysis and problem solving. We here promote an intergenerational model including children, high school students, undergraduate students, early career scientists and senior scientists, combining experiential learning, museum patrons, researchers and data derived from natural history collections. Natural history museums with their dual remit of education and collections-based research can play a significant role in the field of community engagement and people-powered research. There also remains much to investigate on the use of interactive displays to help learners interpret and appreciate authentic research. We conclude with a brief insight into the next phase of our ongoing people-powered research activities developed and designed by high school students using the Zooniverse platform.


analysis, biodiversity, bryophytes, citizen science, crowd-sourced science, community science, experiential learning, families, interdisplinary research, intergenerational participation, K-12, museum, people-powered research, taxonomy, undergraduate students


Globally, thousands of institutions house nearly three billion scientific collections, each of which can have multiple layers of associated metadata (Holmes et al. 2016, Sweeney et al. 2018). Extensive, professionally-managed natural history collections, with their broad taxonomic, geographic and temporal scope, offer unparallelled resources that contribute to both science and society (e.g. Graham et al. 2004, Berendsohn and Seltmann 2010, Hedrick et al. 2019). Digitisation has greatly enhanced the use of herbarium data in scientific research, impacting diverse research areas, including biodiversity informatics, global change biology, analyses using next-generation sequencing technologies and many others (Bebber et al. 2010, Heberling and Isaac 2017, James et al. 2018, Soltis et al. 2018, Lang et al. 2019. Digitisation of specimen records and their associated data will provide unparallelled educational resources that can be tailored to diverse audiences (e.g. professional scientists, students- biology majors and non-majors and the general public (Cook et al. 2014)). Natural history museums can enhance learning and engagement in science, particularly for school-age students. Recently, we reported the use of an online web-based tool using a crowd-sourced model that produces quality taxonomic datasets and enriches engagement through real contributions to science (von Konrat et al. 2018). Here, we outline a novel approach of a natural history museum using touchscreen technology that formed part of an interactive kiosk in the temporary museum exhibit Specimens: Unlocking the Secrets of Life (Field Museum 2016). Participation in authentic research experiences is an important component in moving youth towards engaging in meaningful scientific thinking and preparing them to enter a modern workforce where STEM plays a central role (National Research Council 2010). We provide some preliminary analysis investigating the efficacy of the tool in an exhibit environment to engage patrons in the collection of biological data. Significantly, we demonstrate the collaborative role that experiential learning played in this process for university students. From 2020 through to 2021, a group of mathematics, actuarial science, data analytics and computer science students from Roosevelt University, Chicago, U.S.A. worked on data cleaning, processing and analysis for this project. This was completed during the course, Industrial Applications of Mathematics at Roosevelt University; this course was developed in conjunction with the Preparing for Industrial Careers in Math (PICMath) programme (MAA 2021). Both this course and the PICMath programme seek to give students an authentic experiential learning experience involving real problems in order to prepare them for future careers in mathematics, actuarial science and data analysis. Experiential learning consists of contextually rich concrete experience, critical reflective observation, contextual-specific abstract conceptualisation and pragmatic active experimentation (Morris 2019). Working with real data introduces a variety of technical challenges; overcoming these provides a firm grounding for students in their future careers (Dorff and Weekes 2019). Much like a direct experience, analysing real specimens gives an enhanced learning experience to museum patrons and experiential learning courses involving community partners are a high impact educational practice (Kuh 2008). By engaging the public in measuring the specimens, having biology interns observe and report on the measuring process and having university students perform the data analysis on these, the project ceases to be simply a scientific analysis and instead becomes a true community endeavour. There was broad participation across a range of ages, career stages/paths and disciplines. We conclude there is great potential in using crowd‐sourced science, coupled with online technology to unlock data and information from digital images of natural history specimens themselves. Significantly, this provides an ongoing opportunity for student growth through experiential learning. Throughout this project, we recognise the valuable contributions of all participants to both the data and the analysis. Large projects that involve crowd-sourced science or participatory science are often referred to as 'citizen science' (Eitzel et al. 2017). Recently there has been a lot of discussion and debate that this term is not inclusive (e.g. Heigl et al. (2019), Auerbach et al. (2019)). Throughout the manuscript we use the words 'people-powered research' and 'community science' interchangeably in order to emphasise the key role of the community - including students, museum patrons, faculty and scientists. We also use the phrases 'community scientist' and 'participants' to refer to the museum patrons who generated data. Defining these terms and indicating alternative terms to 'citizen science' follow recommendations and strategies promoted by Eitzel et al. (2017) in order to avoid confusion. We acknowledge that this differs from many uses of the phrase 'community science' which instead emphasise scientific questions which originate in the community itself (Pandya 2019).


We wanted to apply and develop the statistical and computational expertise of college students to determine the scientific quality of data generated by museum patrons.

Specifically, our objectives are to determine the following:

  1. Are the general public able to provide usable real data?

  2. To what scale is real usable data generated (i.e. over 70%, for example)?

  3. To what scale can different age groups (demographics) generate usable data?

  4. Is there a difference in the setting of the kiosk - i.e. one where a facilitator is available (in the Science Hub) versus non-facilitated (in the Specimens exhibit)?

  5. To explore the effectiveness of public participation in a museum setting.

We also seek to demonstrate how this exercise was driven by student work in a formal class setting. Although this was part of an industrial mathematics course at Roosevelt University, this type of university and museum collaboration could be replicated by partnerships between educators and museums seeking to work with community-generated data on a large scale. An underlying goal, as demonstrated below, was for students to develop a series of publicly accessible computational tools for data curation, validation and analysis.

History of the collaboration

Herbaria are reservoirs of both well‐documented specimens and undescribed diversity (Bebber et al. 2010). New species are described each year from specimens that have been housed in collections for decades, if not centuries. However, the pace of such discovery is slow, especially for non‐angiosperms and accelerating the process of discovery is expensive (Soltis et al. 2018). In order to help overcome this, Field Museum began partnering with educational institutions in the greater Chicago area in 2012, developing the MicroPlant project where students and the general public would generate data as community scientists aiding taxonomists (von Konrat 2012). There were many benefits, especially connecting natural history collections to education. Students, in particular, would have a hands-on experience where they could contribute to scientific discovery in a way that could be used in introductory biology courses as well as K-12 settings. Teachers who were not experts in botany or the life sciences could easily incorporate this into their courses. Our group provided an outline describing a model of a crowd-sourced data collection project that produces quality taxonomic datasets and empowers community scientists through real contributions to science (von Konrat et al. 2018). The project is an ongoing collaboration amongst taxonomists, community science experts, teachers and students from both universities and K–12. Scientists, who have more specimens than taxonomists can measure or observe, in this case, could use these student measurements to accelerate the pace of discovery. As a result of meetings between Field Museum scientists, faculties from many institutions, students and community scientist experts, an online web-based version was developed for the Zooniverse platform (Zooniverse 2021b). Classes could meet in a computer lab at their home institutions and generate measurement data contributing to authentic research. The project became surprisingly popular receiving media attention (e.g. Cimons 2018, Ruppenthal 2018) and, in 2017, had over 11,000 participants who generated almost 100,000 measurements. As data were collected, it became clear that due to the size and complexity of a real unculled dataset, it would need to be cleaned and analysed using more advanced techniques and semi-automated tools and so collaborations with the mathematics faculty occurred. An analysis of the web data showed excellent results (von Konrat et al. 2018) and an updated version of the project involving new images was created for a touchscreen kiosk for the Field Museum. This kiosk differed from the classroom setting not only because there was no instructor assistance for data collection, only an explanatory walk through via an optional onscreen tutorial, but also in its physical setting. The question of whether this new interface would lead to a usable dataset remained.

Materials and methods

Outlined below are the process and methodology including development of the kiosk, data collection, data analysis and data validation. All data were taken in units of pixels and, for simplicity of presentation, we present results in units of pixels; however, the images were consistently scaled so that pixels can be converted into microns via the conversion 1 pixel = 1.05 microns.

Experiential learning and interdisciplinary science

Experiential learning foregrounds the crucial role experience takes in the learning process (Kolb et al. 2014). According to Kolb (1984), learning involves four cyclical stages; concrete experience, reflective observation, abstract conceptualisation and active experimentation. A strong experiential learning experience in data analysis involves a set of real data, questions of interest to the partner and domain knowledge for the data. All of these elements require an ongoing communication between the mathematics students and scientists. Interactions between biologists and maths students occurred in the 2020 and 2021 Industrial Applications of Mathematics class at Roosevelt University. Biologists visited the class, at the start of the semester in person and regularly on Zoom, to present the overarching project, raw datasets of both measurements and demographics, the discoveries so far and background information. Students problem-solved to determine how to process the data, validate and improve upon initial processing and cleaning and analyse the data. The biologists joined the class every few weeks to both pose and answer questions about the data, kiosk set up and the biological underpinnings. Complimenting the experiential learning model, the students engaged in interdisciplinary collaborative community science experience where they worked along with the scientists to unpack raw data, reflect on the data, think about the data and act on how to apply these data in a meaningful way for stakeholders. For over a decade, the need to accelerate the adoption of interdisciplinary approaches has been recognised in an era of vast datasets (Derrick et al. 2011). The biologists were also able to help focus the research in a direction that was most meaningful for their needs. In a typical semester, students would visit the museum, interact with scientists and gain exposure to scientific natural history collections in order to put the science and industrial questions into context. However, due to the global pandemic, this was achieved remotely and virtually, helping the students to understand the experimental set up and to pose relevant data questions. At the end of the spring 2020 section of the course, many questions about selected images were answered, but others about the large-scale dataset remained. There was some student work on the project over the summer and then in spring 2021, the second group of mathematics students worked intensively with this dataset and biologists to expand upon the previous student work and complete the data analysis.

Research organism and biological context

The MicroPlant project focuses on a group of early land plants often referred to as bryophytes. Bryophytes, including mosses, liverworts and hornworts, are the second largest group of land plants after flowering plants and are pivotal in our understanding of early land plant evolution (e.g. Ligrone et al. 2012, Zhang et al. 2020). Bryophytes play a significant ecological role including CO2 exchange (DeLucia et al. 2003), plant succession (Cremer and Mount 1965), production and phytomass (Frahm 2008), nutrient cycling (Coxson 1991) and water retention (Pócs 1980). Bryophytes, together with lichens, serve as the “macrophytes,” providing a matrix where many microscopic organisms live, including tardigrades, mites, rotifers, micro-molluscs, microalgae, microfungi and prokaryotes (Gerson 1982, Huttunen et al. 2017). For the MicroPlants project, we focused on the liverwort genus Frullania (Fig. 1). This genus has a worldwide distribution and is one of the largest and taxonomically most complex genera of leafy liverworts with more than 2,000 published names (Hentschel et al. 2015). Specifically, participants were asked to measure a modified leaf or lobule (Fig. 1), from digitally rendered images.

Figure 1.  

An example of the liverwort genus Frullania a) Growing on bark; b) Ventral view of stem indicating modified leaves or lobules (L) that participants are asked to measure.

Development and description of the kiosk

Given the success of the web-based MicroPlant project on Zooniverse, the Field Museum adapted the measurement tool to a touchscreen kiosk which was first used in the exhibit Specimens: Unlocking the Secrets of Life of the Field Museum. This was a special exhibit which ran from 10 March 2017, through to 7 January 2018 showcasing the museum's collection of over 30 million specimens and their ongoing scientific potential. Museum patrons, who had just viewed many of the typically hidden scientific specimens, were able to interact with digitised versions of liverwort specimens on the kiosk so that they too could contribute to scientific discovery (Fig. 2).

Figure 2.  

The online platform, based on Zooniverse, developed into touchscreen technology as part of an interactive kiosk in a high-profile exhibit at Field Museum: a) Instructions were mounted as well as available using the touchscreen; b) Students from Roosevelt University testing the platform; c) Depicting details of the interactive, including the workspace for measuring, instructions on what a pair of perpendicular lines looks like, map indicating the geographic locality and number of measurements.

After the Specimens exhibit closed, the kiosk was updated to include a survey question about the participant's age group. The kiosk then became one of the rotating exhibits in the Grainger Science Hub (Field Museum 2021) where it was used in 2018. After it was retired from the Science Hub, it made appearances at Field Museum Member Nights and at ad hoc events.

Data collection from kiosk

Initially, the kiosk was located in the Specimens: Unlocking the Secrets of Life exhibit at the Field Museum. For this phase of data collection, the general public interacted with the kiosk, viewing a brief instructional demonstration which was programmed into the kiosk that showed them an animation of how to measure the length and width of lobules along with the need for these line segments to be perpendicular. They then viewed a randomly displayed MicroPlant image. The image contained a stem of the plant with a various number of lobules, typically between 1 and 10 lobules (Fig. 1), which participants would measure using the touchscreen (Fig. 2). During the course of 24 days in 2017, staff, students and volunteers unobtrusively observed and collected demographics on who was using the touchscreen. Later, these were matched with the corresponding kiosk measurements in the data processing phase.

As these demographic observations were labour intensive, there was a desire for participants to self-classify their demographics; this would also increase accuracy. When the kiosk moved to the Science Hub in 2018, it included a brief demographic survey for participants to fill out. This survey allowed participants to pick one category or to skip the question. However, the survey did not automatically reset between each participant. Demographic information was collected over a 48-day period in 2018. The demographic results were saved separately from the kiosk lobule measurements and these files were matched in the data processing phase (Table 1).

Table 1.

Demographics and the number of participants using the kiosk in each category in the dataset used in this analysis. This does not include all of the people who interacted with the exhibit; rather, it includes only those individuals for whom we have demographic data.

2017 Specimens exhibit demographics No. of participants 2018 Science Hub demographics No. of participants


230 10 and under 319
Teen 107 11 to 17 220
Adult 243 18+ 324
Family (group with multiple age groups) 6 Skip (unanswered question) 151

Initial data processing

Students in the 2020 and 2021 Industrial Applications of Mathematics course at Roosevelt University were tasked to clean and analyse data generated by museum participants and demographic data collected by interns (Fig. 3).

Figure 3.  

Data generation and processing. Blue round-edged rectangles indicate data from the public. Green round-edged rectangles indicate data processed by hand by students. Yellow rectangles indicate automated data processing. Red rectangles indicate data which have been filtered out.

Pre-processing of crowd-sourced data

Any time a person presses the submit button on the kiosk, the kiosk records the data and transfers the image measurements to a comma-separated variable (csv) file. Each image was labelled with a unique subject identification number for ease of analysis. In the csv file, each submission receives its own line whether it has measurements or not. The csv file includes the image ID, x and y coordinates, timestamp, degree of angles and other information, in a json format. There will be multiple measurements for each image because the kiosk displays each image on multiple occasions. Each time a measurement on an image is completed, a new data line is added to the csv file. For example, if a particular image was measured ten times, the csv file will add a new line per new measurement for a total of ten separate lines. Using the scripting language PowerShell, we performed an initial extraction and cleaning of the data which removed unnecessary notation, such as added parenthesis or instructional notes and made a clean csv file which could be used for data analysis.

There was a second set of demographic information for data collected in the Science Hub in 2018 and data collected in the Specimens exhibit in 2017. For the 2018 data, these demographic records were recorded on the kiosk itself by museum patrons. For the 2017 data, these records were recorded by a set of interns who observed museum patrons using the kiosk. Each of these was matched to the line segment data from the kiosk using timestamps via Excel VBA. This matching process was verified for samples of the data to guarantee computational accuracy. The initial matched dataset, without cleaning, will be considered all data. All data include anything that was submitted through the kiosk by pressing the submit button along with the associated demographic information.

Data cleaning

Once the measurement and demographic data were aligned into a single file, it was important to extract the measurements involving intersecting pairs of line segments. Missing data are data missing one or two lines of measurements. For example, someone could have pressed the submit button without taking any measurements. Another scenario, someone could have tapped the screen leaving only one line as a measurement and then pressed the submit button. Invalid data are data with any measurements of two or more lines that do not intersect. Potentially valid data are any data with an intersecting pair of lines regardless of location on screen. Once this initial data parsing process was done and the data were cleaned so that all invalid data were removed, we performed some manual checks to verify the accuracy of the initial cleaning process. This stage of the data processing is objective; there are no judgements that needed to be made about the quality. It resulted in an excel file where each row in the excel file corresponded to a pair of intersecting line measurements along with the corresponding image and demographic information. We also kept a record of the submitted measurements that had only invalid measurements in order to determine the percentages of high quality data that came from each demographic category.

Advanced data cleaning

Once the data was split into a csv file where each row corresponded to a unique pair of intersecting lines, it was necessary to determine which data were of sufficiently high quality to use. As the kiosk specified that pairs of line segments should intersect at 90 degree angles (see Fig. 2), an initial cut was made to all data based on the angle measured. Good data are any data with an intersecting pair of lines that have an angle 80 degrees or above. Note that the kiosk records the smaller angle that occurs between the line segments; thus, if a pair of line segments intersect with smaller angles of 85 degrees and larger angles of 95 degrees, the dataset only records the 85 degree angles. This means that the maximum angle measurement possible is 90 degrees. We analyzed one image in detail, comparing measurements created by the public to a set of measurements created by an expert. This led to a second set of cuts using the interquartile range (IQR) independently for each axis to remove outliers (Zwillinger and Kokoska 2000). To calculate the IQR cut for each axis, we first split the dataset into quartiles (Q0 = min ,Q1, Q2 = median, Q3, Q4 = max) and set IQR = Q3-Q1. We keep all data that are in the range from Q1-1.5*IQR to Q3+1.5*IQR (Table 2).

Table 2.

Categories of data used in analysis.

All data

Data collected any time a person pushed the submit button on the kiosk, regardless of quality or even existence of measurements.

Good data

A set of pairs of line segments that intersected for an image and whose smaller angles of intersection are at least 80 degrees.

IQR cut data

A subset of good data for a particular image where the IQR (interquartile ranges) for both the length and the width are calculated and an image-dependent cut is made, based on both of these.

Determining an appropriate cutting schema using expert measurements

The images were measured by experts in order to test the validity and quality of data. In order to determine the best way to find accurate lobule measurements from the data, we plotted data gathered from the public from one of these images (ID No. 8735482) along with expert measurements of the same image to see how accurate the angle cuts were. We used this first image to guide our cleaning process and then, after we determined the process, we verified it with a second image. Our goal is to have a set of cuts that leaves us with public data that give the same axis lengths as do the expert data. We assumed that images that contained multiple lobules from the same specimen would have near identical sizes for those lobules; this meant we would be able to use the specimen's lobules' average measurement.

Comparing the expert data with the public data from the first image, we can say that (17/119) = 14.3% of all good data had both small and large axis lengths within the expert’s min and max measurements. Similarly, (47/119) = 39.5% of all good data had small and large axes within 10 pixels of the expert’s average measurements. There were many outliers. The angle cuts alone did not remove data where people measured background leaves or partial lobules and the background leaves and partial lobules had a noticeably different size than the intended lobules. The outliers from the background leaves skewed the averages; when using just the angle cuts, the resulting averages differed from the expert by a large amount and the standard deviations in the public measurements were large. As we wanted a way to cut outliers that did not rely on the expert data, we decided to use IQR cuts to remove these outliers (Table 3) and we plotted the result for the individual endpoints of the line segments that were measured by the public. When these points are far from the lobule's edge, it indicates an inaccurate measurement. Although this technique kept one set of measurements from a partially obscured lobule, it produced a more accurate set of measurements which was strongly clustered on the image, as well as a more accurate pair of averages (Fig. 4 and Fig. 5).

Table 3.

Comparison of measurements done by an expert with those done by the public after cutting based on angles (above 80 degrees) and by IQR. Comparison of public and expert measurements for image ID. No. 8735482.

Major axis length (pixels)


Minor axis length (pixels)







Public with angle cuts





Public with IQR cuts





Figure 4.  

Full data (without any cuts) for image (ID. No. 8735482). Despite the high contrast between the lobule and background, there are measurements which are far from the actual lobule.

Figure 5.  

IQR cuts for image (ID. No. 8735482). The data remaining are on the correct lobule, with the exception of one set on the portion of the lobule on the top left.

Validation via a second expert image measurement

In order to verify that our process of cleaning and cutting the data leads to measurements which are close to expert measurements, we applied the process to a second image. This image (ID. No. 25352420) had not been used to determine the data cleaning procedure, so it is a useful way to check that our procedure was not biased by the image used to create it. For this image (ID. No. 25352420), the expert predicted the smaller axis length to be 96.49 pixels with standard deviation of 3.99 and the larger axis to be 193.79 pixels with a standard deviation of 2.92. When the general public measured it, they found (after cutting for angle and outliers using IQR) the smaller axis length to be 97.34 pixels with standard deviation of 8.34 and the larger axis to be 187.83 pixels with standard deviation of 6.17. These are statistically the same. This is evidence that removing faulty data using the IQR bounds leads to a dataset which can produce a good measurement (Table 4).

Table 4.

A comparison of expert measurements and public measurements after and IQR cut for a second MicroPlant image. Second comparison between public and expert measurements for image 25352420.

Major axis length (pixels)


Minor axis length (pixels)







Public with IQR cuts





Processing the IQR range for the full dataset

For each image, an image-specific interquartile range (IQR) was found for both the major and minor axes in the remaining good data. One can determine statistical outliers by considering only data that are within 1.5*IQR of the middle quartiles. This was used to remove data that were outliers for one or both axes. For the two sample images, we computed these manually in Excel. In order to extend this to all of the different images, these calculations were done both in Python and using Excel pivot tables. By comparing the two programming solutions with the manual ones, we were able to verify their correctness. Note that, unlike an angle cut, this type of cut depends on all of the data that have been collected for an image. As a result, there may be variations in whether a particular set of measurements is cut when new data are added to the analysis.

Examples of measurements on the kiosk

Fig. 6 is an example of a pair of lines that do not intersect, known as invalid data. The image is an example of a data entry that would not meet our qualifications for good data or IQR range data. However, this is the kind of data that will be under the category of all data because all data accept any data entry regardless of quality.

Figure 6.  

A lobule with a pair of non-intersecting measurements.

Fig. 7 passes the qualifications for potentially valid data, a pair of intersecting lines. This is an example of a pair of lines that intersect with an angle of 33 degrees. Note that the smaller angle is recorded rather than the larger 147 degree angle. This image is an example of a data entry that would not meet our qualifications for good data or IQR range data. However, this is the kind of data that will be under the category of all data and valid data.

Figure 7.  

A lobule with a pair of line segments which intersect, but which do so at a small angle.

Fig. 8 passes the qualifications for good data, a pair of lines that intersect and form an angle of at least 80 degrees. This is an example of a pair of lines that intersect with an angle of 88.6 degrees. This is an example of a data entry that has the potential to pass the qualifications of the IQR range data. This is also the kind of data that will be under the category of all data, good data and potentially IQR range data.

Figure 8.  

A lobule with a pair of nearly perpendicular intersecting line segments whose smaller angle is above 80 degrees.


Overall, measurements were of high quality. Significantly, all age groups, including children (10 and under), teens and adults, produced data that could be used by researchers. The clustering of measurements obtained by these groups with the expert measurements can be visualised in Fig. 5. Regarding measurement retention, our initial predictions were that around 50% of measurements would make it through the IQR cut process and that the older the age group, the more measurements would be retained. It was thought that children 10 and under would have more inconsistent measurements than the other age groups simply due to their young age. After our statistical data cleaning, 60% of the initial measurements were retained, higher than originally anticipated (Fig. 9, Fig. 10 and Table 5). We felt that the most notable of these was that, in the Science Hub, the youngest age group of kids under 10 had just over 50% measurement retention and in the Specimens exhibit, children (who were not being helped by older friends or relatives) had a 41% measurement retention, which was contrary to our initial thoughts. This means that children did a remarkable job following instructions and taking the MicroPlant measurements seriously. Unsurprisingly, the lowest retention within a self-identified age group at 43% were those that skipped giving their age; this was the case for 627 measurements. A general assumption is that this group took the kiosk experience less seriously or were pressed to engage in other museum activities.

Table 5.

Demographic breakdown of totals and IQR pass work in 2017 and 2018. In this, each number represents the total number of lobules measured, rather than the number of individuals doing the measuring or the number of images used. A kiosk session where no valid measurements were submitted is counted as 1 in the data collected category.

Demographic Data collected Number passing IQR Percent passing IQR
Child (in 2017) 503 207 41%
Teen (in 2017) 414 197 48%
Adult (in 2017) 448 356 79%
Family (in 2017) 215 131 61%
Total 2017 (Specimens) 1,580 891 56%
10 and under (in 2018) 1,562 775 50%
11-17 (in 2018) 1,224 782 64%
18+ (in 2018) 1,690 1,298 77%
Skip (in 2018) 627 270 43%
Total 2018 (Science Hub) 5,103 3,125 61%
Overall Total 6,683 4,017 60%
Figure 9.  

The majority (60%) of the over 6,000 lobule measurements were of generally high quality, with only a small number of non-intersecting or missing measurements. The majority passed IQR cuts; the most common reason to filter out data was when the angle was under 80 degrees.

Figure 10.  

The total number of lobule measurements and total number that passed IQR cuts broken down by demographic grouping.

Comparing the Science Hub data with the Specimens exhibit data

While the initial data analysis was performed on a 2018 dataset collected from the Science Hub, a secondary analysis was done on a dataset collected in the Specimens exhibit. This exhibit was focused on the large collection of scientific specimens at the museum and so the kiosk was only a small part of the larger exhibit. This differed from the Science Hub, which is a dedicated space where visitors can interact with scientists, as well as specimens from the Museum's collection. The fact that there was a smaller timeframe where interns collected demographic data from the Specimens exhibit (24 versus 48 days), meant that the amount of data collected from the Specimens exhibit was smaller. As some of the images had a very small amount of data associated with them, we combined the two datasets to perform the IQR cuts. Note that because the IQR cuts depend on the specific dataset used, the results may change when additional data are added in. This happened here; the IQR cut for the 2018 data alone had 3,125 pass the cuts. When we added in the 2017 data, there were 3,126 data out of the 2018 set within the IQR ranges. This suggests that combining the two is robust. When these combined cuts were used to examine the 2017 exhibits data, we found that, although there was a smaller amount of 2017 data collected, the quality was similar and the majority of data collected was usable, based on completeness, angle and IQR cuts. As the quality of data was good, the majority of images in the 2017 dataset had sufficient data to determine the lobule lengths and widths (Table 6).

Table 6.

Overall data passing IQR cuts. In this, each number represents the total number of lobules measured, rather than the number of individuals doing the measuring or the number of images used. A kiosk session where no valid measurements were submitted is counted as 1 in the data collected category.

Data collected

Data which passed combined IQR cuts

Percent which passed combined IQR cuts

2017 Specimens exhibit data

1,580 891


2018 Science Hub data 5,103 3,126 61%
Combined data 6,683 4,017 60%

Image clarity for observers

To gain more insight into the image measurement data, an analysis was conducted on the images themselves. The classification of the MicroPlant images was based on the number of lobules present, complexity and clarity of the image. Standard deviations of the axis length measurements were used to determine the clustering of the axis measurements. The lower the standard deviation, the closer together the measurements are clustered. The image classifications were then compared to the standard deviations of the axis measurements, post IQR cut for each image. There were no notable trends present between the complexity of the images and the standard deviations of the measurements.

We then looked at images with large standard deviations, meaning the measurements were not very clustered together. Out of the 78 distinct subjects, we found only three images with very large standard deviations, one of them from the 2017 dataset. This was only 4% of the total images displayed on the kiosk. One similarity between these three different images is the number of observations counted, with each having between 6-14 observations total. This can be one possible explanation about the low quality of data collected since such data were very limited. For the image with only six measurements, there may have been confusion determining the difference between the lobules edge and the leaf behind the lobule. The shading of the pictures can leave room for confusion as well; and it might become unclear what is considered part of the lobule and what is not.

Comparison of participant and expert measurements

With the goal of comparing the accuracy between participant measurements and expert scientist measurements, students conducted the same statistical test used in von Konrat et al. (2018) on the IQR cut data for image ID. No. 8735435. This was the third distinct image which had expert measurements associated with it. The statistical test used was a t-test which measures the difference between two datasets and determines if they are significantly different. After performing the angle and IQR cuts, 45% of the original 855 participant measurements remained. After calculating the t-values for the minimum and maximum axis data with a confidence interval of 95%, we were able to conclude that the participant measurements after IQR cuts were not significantly different from expert measurements.

Community engagement with kiosk

Though literally hundreds of citizen or community science platforms exist, to our knowledge, this was one of the first to be featured in a live interactive museum exhibit. Commonly, people-powered research projects engage participants online via platfoms like Zooniverse (Zooniverse 2022) or via targeting specific interests of their users- iNaturalist (California Academy of Sciences and National Geographic Society 2022) or WeDigBio (WeDigBio 2022). We wanted to know what impact placing an exploratory and unguided community science platform within a museum setting would have. What percentage of people who pass through the exhibit would stop to engage with the kiosk platform (Table 7)? Rough calculations, based both on our observations, ticketing information and dataset timestamps tell us that about 14-20% of individuals who passed through the Specimens exhibit interacted with the kiosk in some manner. We were able to tabulate kiosk interactions by occasionally placing interns who observed and recorded from a distance who was interacting with the exhibit following standard protocols for observing people in exhibitions. These observers noted approximate age and perceived gender of participants, as well engagement levels.

Table 7.

Visitors to kiosk exhibit in Field Museum's Specimens exhibit, summer 2017 (June-August).

Total interactive hours observed 44.5
Approximate number of people engaged with exhibit during observation hours 580
Amount of hours people spent engaged with exhibit 12
Approximate percentage of exhibit patrons who interacted with the Specimen exhibit kiosk 14-20%

The Science Hub is designed for hands-on interactions and discussion with scientists. During the time that the kiosk was present in the Science Hub, 23,549 people visited the Science Hub, with 1,014 interactions with the kiosk. Based on this, we estimate that between 4.3% and 12% of the visitors to the Science Hub interacted with the kiosk; if the group sizes were similar to those directly observed in the Specimens Exhibit, approximately 8% of the Science Hub patrons interacted with the kiosk. As the kiosk was a stand-alone exhibit in the Science Hub, it is likely that museum patrons were more inclined to interact with the scientists present rather than a stand-alone computer exhibit.

Data resources

GitHub houses all of our data and scripts (Labontu 2022). Scripts were used to take raw data (Raw 2018 Data , 2017 data with intern observations) and parse it into the "All Data" and "Good Data" formats.

Initial IQR cuts and comparisons to expert data were done by hand. Systematic IQR cuts were performed in Excel Pivot Tables for IQR. Results from IQR cuts in Python are in Processed 2017 and 2018 data, which includes totals of the 2017 and 2018 data broken down by demographics, average lengths and standard deviation for each image after IQR cuts and also the All Data sets for both 2017 and 2018. For the third image with expert measurements, t-test results were performed in Excel.


All age groups, including children (10 and under), teens and adults, produced valid, high quality data that could be used by researchers. Significantly, the paper outlines the implementation of experiential learning through an undergraduate mathematics course that focuses on projects with actual data to gain a deep, practical knowledge of the subject, including observations, the collection of data, analysis and problem solving. We are promoting an intergenerational model including children, high school students, undergraduate students, early career scientists and senior scientists, combining experiential learning, people-powered research and data derived from natural history collections.

Data precision

From this study, the public is capable of producing a usable set of measurements. However, there are two limitations to the precision of these measurements. One is the touchscreen technology. As the smallest unit was a pixel, the difference of just a couple of pixels in the measurement corresponded to a 1% difference in length. This level of precision could not be improved upon with the technology used. The second is the variation in public measurements. Scientists who want to distinguish between species whose size differences are large (such as 10% difference) would be able to use work from the public; however, if the difference is very subtle (such as a 1% difference in length), it would not be possible.

For cases where multiple types of species may exist in an individual image, there could be multiple sizes of measurements in an individual image. This would make the IQR analysis ineffective at finding outliers and so more subtle methods would need to be employed. However, if one found clusters of different sizes, it may be possible to create a machine-learning algorithm to use for the data processing portion.

Experiential learning and interdisciplinary science

Interdisciplinary science entails the collaboration of scientists with largely non-overlapping training and core expertise to solve a problem that lies outside the grasp of the individual scientists (Cech and Rubin 2004). Yet interdisciplinary research (IDR) is more than collaboration: it is also applying concepts or methods from other fields or writing to make your research accessible to other types of scientists (Brigandt 2013). IDR is better suited for addressing critical “big picture problems” such as sustainability and conservation (Palmer 2001, Carayol 2005, Campbell 2005). Early IDR exposure aids cross-discipline communication (Bridle et al. 2013) and makes students more likely to pursue STEM careers (Daugherty and Carter 2017).

This was an authentic interdisciplinary experience in experiential learning. Students from various backgrounds, specialities and ages were involved in all aspects of this project from inception to completion. While this report focuses on the data analysis, prior collaborations with both college and high school students led to the development of the project. Thus, all involved students participated in a rich, real world learning experience to generate and later analyse a real and meaningful dataset answering questions that were previously unanswered. Students involved in the data analysis found the skills that they acquired through the project to be highly applicable to their post-graduation jobs, with comments such as: "I’ve used a lot of the VBA (Visual Basic for Applications) skills from your classes with the Field Museum". This indicates that this project was good preparation for both research work and industry jobs. As this course was originally developed as part of the PICMath programme, it is a way to both fulfil the goals of the programme and to increase scientific knowledge. Future endeavours could implement student evaluation prior to, during and post course, as student feedback on their learning journey is effective in improving both student satisfaction and learning (Mandal 2018).

Despite the critical role of experiential learning in building student research skills and capacity, few have explored social interaction mechanisms used to facilitate student experiential learning in an interdisciplinary research team (Ryser et al. 2008). This has great potential in future reiterations that could be investigated.

Exploring motivation and testing between audiences

There remain many interesting education and learning questions that could be investigated using the current dataset, as well as future studies embarking on similar large scale projects. For example: How many measures must be taken by each kind of user group? Are there significant differences in measurement facility amongst children, adolescents and adults? Are there significant differences between a facilitated audience and a purely online audience? Limited work has investigated the arc of engagement from secondary to post-secondary education and into adulthood. Examining a cross-sectional population set will allow us to study reasons and motivations of learner engagement moving from a formal to an informal setting. The potential also exists to use this project to explore how authentic research experiences can both develop student interest in STEM and STEM careers (Boyer 2017), as well as promote learning of biodiversity concepts (Gunckel et al. 2012).

Although it is possible to use this dataset to compare kiosk locations, in the future, having a consistent set of self-described demographic categories would allow for a consistent comparison of how different demographics interact in the the different kiosk locations. However, given that the desire to collect demographic information was realised after some data were collected, the addition of a brief demographic survey to the kiosk was a natural course correction. In addition, collection of demographic information before starting the activity interrupts the activity and presents other challenges when switching between participants.

Our experience with this project yielded insight into how to plan for future projects. For other taxonomic projects, the most robust measurements will involve images where the public can easily identify the object to be measured. As noted in the conclusions, the distinction between smaller lobules and the larger underlying leaves led to a number of inaccurate measurements. As this is a large, multi-year project, students who are involved in it will graduate before it is completed. Faculty and museum leaders are the ones maintaining continuity of the project; they need to make sure that the student data work is documented and stored in a way that is both carefully labelled and accessible. This allows multiple years of students to collaborate and make significant continual progress in a robust manner. In terms of project management, the most challenging aspect was maintaining contact and retaining connections between student cohorts. It is critical to plan for this from the very start of development when pursuing such projects.

Community Science in a museum setting

Though we are able to tabulate a certain interaction level by patrons with the kiosk (Table 7), it is worth noting that we also have sets of observational notes about how individuals and groups interacted with the kiosk display. As we did not have questionnaires and surveys always linked to the activity, engagement levels and observations were noted by onlookers for a portion of the time that the kiosk was in the Specimens exhibit. From observer notes, we were able to compile a word map demonstrating the true inclusive nature of the activity and summarise interactions with the kiosk (Fig. 11). As museums are often visited by families and groups, the kiosk was a place where people gathered to interact and engage not only in science, but with each other. We were able to note many examples of parents, children and peers working together in a truly collaborative manner as is core to community science. Our anecdotal observations of patrons interacting with the kiosk support the supposition that there is a growing body of evidence suggesting that such digital technology can create engaging learning opportunities in museums (Roberts et al. 2018). There is great potential in implementing community science activities in a natural history museum environment using digital technology to help foster curiosity and engagement with scientific collections. Unobtrusive video recording and patron surveys would be invaluable in providing deeper insight.

Figure 11.  

Word cloud of observer notes of kiosk participants who were recorded from an unobtrusive distance.


A project which involves the public, high school interns and university students, can allow for the entire community to create scientific discoveries. It allows for scientists to analyse large collections of specimens and it helps to give students an in-depth experience of what it is like to be a professional scientist or data analyst. This occurred at all levels. Given a sufficient number of community members measuring leaves, we were able to obtain high quality measurements, which were comparable to expert measurements, using methods that can be automated. This bodes well for crowd-sourcing taxonomic data collection from images. Mathematics alumni reported that the process of developing and creating these automated data processors was educationally beneficial for them as they were able to apply their skill-set to internships and post-graduation jobs working with data. Students of some cohorts of both the Industrial Applications of Mathematics course and museum interns have continued to pursue graduate degrees.

Next Phase: Unfolding of MicroPlant Mysteries

In today’s society, K-12 students' technological interests with platforms, such as digital making (Lewin and Charania 2018), can enhance students' informal learning experiences (Lai et al. 2013). The MicroPlant kiosk created an informal experience where participants of all ages leveraged their digital curiosity to learn more about biodiversity in under-represented plants. The MicroPlant project's success in the in-person, online and kiosk formats generated an avenue for this theme to be explored on a greater scope. That project focused on participants' ability to produce high quality data, based on measurements of a specific morphological character set. The Unfolding of MicroPlant Mysteries project (Zooniverse 2021a), initiated and designed by two high school interns, focused on participants’ capacity to classify and identify morphological features, such as branching patterns and sexual phenotypes (Fig. 12). Data from the project would then be analysed by undergraduates and Masters' students. This expands data generation beyond measurements; this will greatly aid in accelerating biodiversity discovery and documentation of these organisms. To date, this project has met with enormous success logging over 60,000 classifications from over 800 users with early results showing over a 70% accuracy rate for the reproductive structure task (Gillis et al. 2022). Supervised learning algorithms have the potential to use multiple crowdsourced inputs to generate a response whose reliability is similar to an expert’s (Li et al. 2015). Here, we performed a preliminary and exploratory data analysis using sequential neural networks. Clusters of boxes that users drew around reproductive structures were used along with expert boxes as training data for a neural network; this neural network was then validated using a second set of user drawn clusters and expert data, resulting in over 90% accuracy in classification for the reproductive structure task. The class was also able to make several recommendations to improve the project, such as the addition of a no-branching option, changes to the tutorial and data management suggestions. This content is also currently being mapped on to Next Generation Science Standards (NGSS) (NGSS Lead States 2013) and is being used to explore the utility of a new Zooniverse classroom ( The research team is also exploring the potential application of machine learning as recent studies have show that combining human and machine classifications can efficiently produce results superior to those of either one alone(Trouille et al. 2019). With the aim of making science accessible to a large and diverse demographic of learners, we hope to continue to emphasise the important connections and potential collaborations between universities, museums, students, researchers, and the general public.

Figure 12.  

A plant stem of the liverwort genus, Frullania. Participants identified and outlined male parts in blue and female parts, in red and green. The dots in the centre represent the centroids of the boxes; these were used for data clustering purposes.


The Student Center for Science Engagement at Northeastern Illinois University helped provide funding for student interns. Financial support was provided by the National Science Foundation (Award No. 0949136, 1145898, 1458300, 1541545, 2001509 to M.v.K.). We especially acknowledge the support under the Research Experience for Post-Baccalaureate Students (REPS) in the Biological Sciences Supplemental Funding Opportunity from the National Science Foundation (Award No 2001509), from the Negaunee Fund for Science, Field Museum, and from the Grainger Bioinformatics Center, Field Museum. CME Group Foundation and Field Museum Women's Board funded the Digital Learning Programs at Field Museum. Support is also acknowledged from C. Moreau (PI) and T. Lumbsch (co-PI) - Research Experience for Undergraduates (REU) Site: Evolution of Biodiversity across the Tree of Life (Award No. 1559779). This publication uses data generated via the platform, development of which is funded by generous support, including a Global Impact Award from Google and by a grant from the Alfred P. Sloan Foundation. We would like to thank the following individuals for their support with imaging specimens: Jonathan Scheffel and Steve Schulze (Field Museum), Anthony Carmona and Stephanie Maxwell (Northeastern Illinois University) and Ed Gluzman and Vishal Patel (DePaul University, Chicago, Illinois). We are grateful to Brendon Reidy, Xenia Alava, Dara Arabsheibani, Alex Vizzone and Zak Zillen (all Northeastern Illinois University); Kavita Elliott (The Ohio State University); and Pedro Rebollar (DePaul University) for their support with guest interactions. Ramsey Millison and Ariel Wagner (both DePaul University) provided imaging and guest interaction support.

We thank the following individuals for technical and administrative support, consultancy and general assistance throughout the project: Lauren Hasan and Beth Crownover (both Field Museum); Kristina Lugo (Roosevelt University and Field Museum); Anthony Flores, Maha Khan, Alexandra Lopez, Lisa Murata and Oana Vadineanu (all Northeastern Illinois University); Gabriel Somarriba (University of Florida); Mariam Nasser (Elgin Academy, Elgin, Illinois); Joey T Rene Shelley (University of Illinois, Chicago); Audrey Aronowsky (Department of Ecology and Evolution, University of Illinois at Chicago); Beth Sanzenbacher (Bernard Zell Anshe Emet Day School, Chicago, Illinois); Jennifer Campagna (Blaine Elementary School, Chicago, Illinois); Christine LaPointe (Hillcrest Elementary School) (Green Ambassador, Field Museum); Laura Briscoe (New York Botanical Garden); Jordan Newson (Albion College); and Taylor Walker (Hollins University). We acknowledge Jose Hernandez Lopez (Northeastern Illinois University) and Allison Chen (University of California) for their early efforts investigating Google Analytics as part of the demographic survey.

We would like to acknowledge previous cohorts of Roosevelt University students whose quantitative work was influential for this analysis. The following performed analyses on subsets of the kiosk data under the direction of Prof. Steve Cohen: Jonathan Aird, Estevan Carrillo, Esther Fiala, Ian Fluhler, Nathan Gregory, Martin Hayford, Breanna Ivery, Jonathan Kasongo, Brian Keyser, Andrew Moskwa, Vingie Ng, Mary Strickler, Eric Synajie and Haki Wright. Thanks also to Cuong Pham (pictured) for demonstrating the kiosk at the museum.

The following Roosevelt University students performed large scale processing of data from the web platform; this type of parsing and processing work was essential for the current authors to learn how to work with a similarly formatted dataset: Julia Buczek, Matt Caraher, James Crigler (pictured), Davaadulam Ganzorig, Kyler Gillespie, Robert Hennecy, Thomas Hill, Chengjian Li, Ashlyn Liu, Marcos Mercado, Ali Myhelic, Nupur Patel, Rebecca Plata, Jacob Rubinstein, Michael Sassaman, Luke Swanson and Joshua Torres (pictured).

Finally, we would like to give a huge thank you to all of the children, families, teens and adults who visited the museum and took time to engage with the kiosk. Without them, none of this would have been possible.

Author contributions

Melanie Pivarski - managed data analysis, methodology, student work, created diagrams, writing, formal analysis, visualisation; Matt von Konrat - project conception, writing, data analysis, supervision, funding acquisition, project administration, resources; Thomas Campbell - funding acquisition, project conception, writing, diagrams, data analysis, supervision, visualisation; Ayesha T. Qazi-Lampert – observations, investigation, writing, student work, supervision; Laura Trouille, software, resources, validation, writing; Heaven Wade - writing, student work, supervision; Aimee Davis - funding acquisition, project conception, student work, supervision; Selma Aburahmeh - data collection, observations, investigation; Joseph Aguilar - writing/data analysis confirming IQR; Cosmin Alb - writing/data analysis initial IQR work; Ken Alferes - writing/data analysis initial IQR work; Ella Barker - writing/data analysis initial IQR work and cleaning, diagrams, visualisation; Karl Bitikofer - student work, supervision, investigation; Kelli J. Boulware - writing/data analysis of outliers in measurements, IQR confirmation; Carla Bruton - student work, supervision, investigation; Sicong Cao - writing, some background of experiments, confirming IQR; Christine Christian - investigation, writing; Arturo Corona - writing, automating IQR in Excel Pivot tables, data analysis; Kaltra Demiri - writing/data analysis of outliers in measurements, IQR confirmation, diagrams, visualisation; Daniel Evans - data analysis confirming expert/synching specimens data with interns; Nkosi Evans - data collection, digital imaging; Connor Flavin - writing/synching Science Hub datasets/initial IQR work; Jasmine Gillis - future work-zooniverse improvement suggestions; Victoria Gogol - writing/data analysis initial IQR work; Elizabeth Heublein - writing/data analysis of outliers in measurements, IQR confirmation; Edward Huang - data collection, observations, investigation; Jake Hutchinson - writing/data analysis initial IQR work, diagrams, visualisation; Cyrus Jackson - data collection, observations; Odaliz Rubee Jackson - writing/data analysis initial IQR work, diagrams; Lauren Johnson - data collection, digital imaging; Michi Kirihara - writing, automating IQR in python, data analysis; Henry Kivarkis - future work-zooniverse improvement suggestions; Annette Kowalczyk - writing/data analysis initial IQR work, diagrams, visualisation; Alex Labontu - data cleaning in powershell, writing, GitHub documentation; Briajia Levi - data collection, observations; Ian Lyu - future work-zooniverse improvement suggestions; Sylvie Martin-Eberhardt - data collection, observations; Gaby Mata – conceptualisation, data collection, data curation, observations; Joann Lacey Martinec - data collection, observations; Beth McDonald - data curation, student work; Mariola Mira - future work-zooniverse improvement suggestions; Minh Nguyen - future work-data cleaning, python, estimates of accuracy, neural nets; Pansy Nguyen - data collection, observations; Sarah Nolimal - data collection, observations; Victoria Reese – methodology, validation; Will Ritchie - future work-zooniverse improvement suggestions; Joannie Rodriguez - writing/data analysis initial IQR work, diagrams, visualisation; Yarency Rodriguez - observations, student work, supervision; Jacob Shuler - data collection; Jasmine Silvestre - writing/data analysis confirming IQR/diagrams of experimental set up; Glenn Simpson - data collection, observations; Gabriel Somarriba - data collection, observations; Rogers Ssozi - writing/data analysis initial IQR work, diagrams, visualisation; Tomomi Suwa - data collection, observations; Cheyenne Syring - data analysis confirming expert; Nidhi Thirthamattur - conceptualisation, data collection, data curation, observations; Keith Thompson - data collection; Caitlin Vaughn - data collection, observations; Mario R Viramontes - data analysis confirming expert; Chak Shing Wong - writing/data analysis initial IQR work, time stamp synch check diagrams; Lauren Wszolek - future work-data cleaning, python, estimates of accuracy.