Research Ideas and Outcomes :
Research Article
|
Corresponding author: Matt von Konrat (mvonkonrat@fieldmuseum.org)
Academic editor: Editorial Secretary
Received: 15 Mar 2022 | Accepted: 25 May 2022 | Published: 27 Jun 2022
© 2022 Melanie Pivarski, Matt von Konrat, Thomas Campbell, Ayesha Qazi-Lampert, Laura Trouille, Heaven Wade, Aimee Davis, Selma Aburahmeh, Joseph Aguilar, Cosmin Alb, Ken Alferes, Ella Barker, Karl Bitikofer, Kelli Boulware, Carla Bruton, Sicong Cao, Arturo Corona Jr., Christine Christian, Kaltra Demiri, Daniel Evans, Nkosi Evans, Connor Flavin, Jasmine Gillis, Victoria Gogol, Elizabeth Heublein, Edward Huang, Jake Hutchinson, Cyrus Jackson, Odaliz Jackson, Lauren Johnson, Michi Kirihara, Henry Kivarkis, Annette Kowalczyk, Alex Labontu, Briajia Levi, Ian Lyu, Sylvie Martin-Eberhardt, Gaby Mata, Joann Martinec, Beth McDonald, Mariola Mira, Minh Nguyen, Pansy Nguyen, Sarah Nolimal, Victoria Reese, Will Ritchie, Joannie Rodriguez, Yarency Rodriguez, Jacob Shuler, Jasmine Silvestre, Glenn Simpson, Gabriel Somarriba, Rogers Ssozi, Tomomi Suwa, Cheyenne Syring, Nidhi Thirthamattur, Keith Thompson, Caitlin Vaughn, Mario Viramontes, Chak Shing Wong, Lauren Wszolek
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Pivarski M, von Konrat M, Campbell T, Qazi-Lampert AT, Trouille L, Wade H, Davis A, Aburahmeh S, Aguilar J, Alb C, Alferes K, Barker E, Bitikofer K, Boulware KJ, Bruton C, Cao S, Corona Jr. A, Christian C, Demiri K, Evans D, Evans NM, Flavin C, Gillis J, Gogol V, Heublein E, Huang E, Hutchinson J, Jackson C, Jackson OR, Johnson L, Kirihara M, Kivarkis H, Kowalczyk A, Labontu A, Levi B, Lyu I, Martin-Eberhardt S, Mata G, Martinec JL, McDonald B, Mira M, Nguyen M, Nguyen P, Nolimal S, Reese V, Ritchie W, Rodriguez J, Rodriguez Y, Shuler J, Silvestre J, Simpson G, Somarriba G, Ssozi R, Suwa T, Syring C, Thirthamattur N, Thompson K, Vaughn C, Viramontes MR, Wong CS, Wszolek L (2022) People-Powered Research and Experiential Learning: Unravelling Hidden Biodiversity. Research Ideas and Outcomes 8: e83853. https://doi.org/10.3897/rio.8.e83853
|
Globally, thousands of institutions house nearly three billion scientific collections offering unparallelled resources that contribute to both science and society. For herbaria alone - facilities housing dried plant collections - there are over 3,000 herbaria worldwide with an estimated 350 million specimens that have been collected over the past four centuries. Digitisation has greatly enhanced the use of herbarium data in scientific research, impacting diverse research areas, including biodiversity informatics, global climate change, analyses using next-generation sequencing technologies and many others. Despite the entrance of herbaria into a new era with enhanced scientific, educational and societal relevance, museum specimens remain underused. Natural history museums can enhance learning and engagement in science, particularly for school-age and undergraduate students. Here, we outline a novel approach of a natural history museum using touchscreen technology that formed part of an interactive kiosk in a temporary museum exhibit on biological specimens. We provide some preliminary analysis investigating the efficacy of the tool, based on the Zooniverse platform, in an exhibit environment to engage patrons in the collection of biological data. We conclude there is great potential in using crowd‐sourced science, coupled with online technology to unlock data and information from digital images of natural history specimens themselves. Sixty percent of the records generated by community scientists (citizen scientists) were of high enough quality to be utilised by researchers. All age groups produced valid, high quality data that could be used by researchers, including children (10 and under), teens and adults. Significantly, the paper outlines the implementation of experiential learning through an undergraduate mathematics course that focuses on projects with actual data to gain a deep, practical knowledge of the subject, including observations, the collection of data, analysis and problem solving. We here promote an intergenerational model including children, high school students, undergraduate students, early career scientists and senior scientists, combining experiential learning, museum patrons, researchers and data derived from natural history collections. Natural history museums with their dual remit of education and collections-based research can play a significant role in the field of community engagement and people-powered research. There also remains much to investigate on the use of interactive displays to help learners interpret and appreciate authentic research. We conclude with a brief insight into the next phase of our ongoing people-powered research activities developed and designed by high school students using the Zooniverse platform.
analysis, biodiversity, bryophytes, citizen science, crowd-sourced science, community science, experiential learning, families, interdisplinary research, intergenerational participation, K-12, museum, people-powered research, taxonomy, undergraduate students
Globally, thousands of institutions house nearly three billion scientific collections, each of which can have multiple layers of associated metadata (
We wanted to apply and develop the statistical and computational expertise of college students to determine the scientific quality of data generated by museum patrons.
Specifically, our objectives are to determine the following:
Are the general public able to provide usable real data?
To what scale is real usable data generated (i.e. over 70%, for example)?
To what scale can different age groups (demographics) generate usable data?
Is there a difference in the setting of the kiosk - i.e. one where a facilitator is available (in the Science Hub) versus non-facilitated (in the Specimens exhibit)?
To explore the effectiveness of public participation in a museum setting.
We also seek to demonstrate how this exercise was driven by student work in a formal class setting. Although this was part of an industrial mathematics course at Roosevelt University, this type of university and museum collaboration could be replicated by partnerships between educators and museums seeking to work with community-generated data on a large scale. An underlying goal, as demonstrated below, was for students to develop a series of publicly accessible computational tools for data curation, validation and analysis.
Herbaria are reservoirs of both well‐documented specimens and undescribed diversity (
Outlined below are the process and methodology including development of the kiosk, data collection, data analysis and data validation. All data were taken in units of pixels and, for simplicity of presentation, we present results in units of pixels; however, the images were consistently scaled so that pixels can be converted into microns via the conversion 1 pixel = 1.05 microns.
Experiential learning foregrounds the crucial role experience takes in the learning process (
The MicroPlant project focuses on a group of early land plants often referred to as bryophytes. Bryophytes, including mosses, liverworts and hornworts, are the second largest group of land plants after flowering plants and are pivotal in our understanding of early land plant evolution (e.g.
Given the success of the web-based MicroPlant project on Zooniverse, the Field Museum adapted the measurement tool to a touchscreen kiosk which was first used in the exhibit Specimens: Unlocking the Secrets of Life of the Field Museum. This was a special exhibit which ran from 10 March 2017, through to 7 January 2018 showcasing the museum's collection of over 30 million specimens and their ongoing scientific potential. Museum patrons, who had just viewed many of the typically hidden scientific specimens, were able to interact with digitised versions of liverwort specimens on the kiosk so that they too could contribute to scientific discovery (Fig.
The online platform, based on Zooniverse, developed into touchscreen technology as part of an interactive kiosk in a high-profile exhibit at Field Museum: a) Instructions were mounted as well as available using the touchscreen; b) Students from Roosevelt University testing the platform; c) Depicting details of the interactive, including the workspace for measuring, instructions on what a pair of perpendicular lines looks like, map indicating the geographic locality and number of measurements.
After the Specimens exhibit closed, the kiosk was updated to include a survey question about the participant's age group. The kiosk then became one of the rotating exhibits in the Grainger Science Hub (
Initially, the kiosk was located in the Specimens: Unlocking the Secrets of Life exhibit at the Field Museum. For this phase of data collection, the general public interacted with the kiosk, viewing a brief instructional demonstration which was programmed into the kiosk that showed them an animation of how to measure the length and width of lobules along with the need for these line segments to be perpendicular. They then viewed a randomly displayed MicroPlant image. The image contained a stem of the plant with a various number of lobules, typically between 1 and 10 lobules (Fig.
As these demographic observations were labour intensive, there was a desire for participants to self-classify their demographics; this would also increase accuracy. When the kiosk moved to the Science Hub in 2018, it included a brief demographic survey for participants to fill out. This survey allowed participants to pick one category or to skip the question. However, the survey did not automatically reset between each participant. Demographic information was collected over a 48-day period in 2018. The demographic results were saved separately from the kiosk lobule measurements and these files were matched in the data processing phase (Table
Demographics and the number of participants using the kiosk in each category in the dataset used in this analysis. This does not include all of the people who interacted with the exhibit; rather, it includes only those individuals for whom we have demographic data.
2017 Specimens exhibit demographics | No. of participants | 2018 Science Hub demographics | No. of participants |
Child |
230 | 10 and under | 319 |
Teen | 107 | 11 to 17 | 220 |
Adult | 243 | 18+ | 324 |
Family (group with multiple age groups) | 6 | Skip (unanswered question) | 151 |
Students in the 2020 and 2021 Industrial Applications of Mathematics course at Roosevelt University were tasked to clean and analyse data generated by museum participants and demographic data collected by interns (Fig.
Any time a person presses the submit button on the kiosk, the kiosk records the data and transfers the image measurements to a comma-separated variable (csv) file. Each image was labelled with a unique subject identification number for ease of analysis. In the csv file, each submission receives its own line whether it has measurements or not. The csv file includes the image ID, x and y coordinates, timestamp, degree of angles and other information, in a json format. There will be multiple measurements for each image because the kiosk displays each image on multiple occasions. Each time a measurement on an image is completed, a new data line is added to the csv file. For example, if a particular image was measured ten times, the csv file will add a new line per new measurement for a total of ten separate lines. Using the scripting language PowerShell, we performed an initial extraction and cleaning of the data which removed unnecessary notation, such as added parenthesis or instructional notes and made a clean csv file which could be used for data analysis.
There was a second set of demographic information for data collected in the Science Hub in 2018 and data collected in the Specimens exhibit in 2017. For the 2018 data, these demographic records were recorded on the kiosk itself by museum patrons. For the 2017 data, these records were recorded by a set of interns who observed museum patrons using the kiosk. Each of these was matched to the line segment data from the kiosk using timestamps via Excel VBA. This matching process was verified for samples of the data to guarantee computational accuracy. The initial matched dataset, without cleaning, will be considered all data. All data include anything that was submitted through the kiosk by pressing the submit button along with the associated demographic information.
Once the measurement and demographic data were aligned into a single file, it was important to extract the measurements involving intersecting pairs of line segments. Missing data are data missing one or two lines of measurements. For example, someone could have pressed the submit button without taking any measurements. Another scenario, someone could have tapped the screen leaving only one line as a measurement and then pressed the submit button. Invalid data are data with any measurements of two or more lines that do not intersect. Potentially valid data are any data with an intersecting pair of lines regardless of location on screen. Once this initial data parsing process was done and the data were cleaned so that all invalid data were removed, we performed some manual checks to verify the accuracy of the initial cleaning process. This stage of the data processing is objective; there are no judgements that needed to be made about the quality. It resulted in an excel file where each row in the excel file corresponded to a pair of intersecting line measurements along with the corresponding image and demographic information. We also kept a record of the submitted measurements that had only invalid measurements in order to determine the percentages of high quality data that came from each demographic category.
Once the data was split into a csv file where each row corresponded to a unique pair of intersecting lines, it was necessary to determine which data were of sufficiently high quality to use. As the kiosk specified that pairs of line segments should intersect at 90 degree angles (see Fig.
All data |
Data collected any time a person pushed the submit button on the kiosk, regardless of quality or even existence of measurements. |
Good data |
A set of pairs of line segments that intersected for an image and whose smaller angles of intersection are at least 80 degrees. |
IQR cut data |
A subset of good data for a particular image where the IQR (interquartile ranges) for both the length and the width are calculated and an image-dependent cut is made, based on both of these. |
The images were measured by experts in order to test the validity and quality of data. In order to determine the best way to find accurate lobule measurements from the data, we plotted data gathered from the public from one of these images (ID No. 8735482) along with expert measurements of the same image to see how accurate the angle cuts were. We used this first image to guide our cleaning process and then, after we determined the process, we verified it with a second image. Our goal is to have a set of cuts that leaves us with public data that give the same axis lengths as do the expert data. We assumed that images that contained multiple lobules from the same specimen would have near identical sizes for those lobules; this meant we would be able to use the specimen's lobules' average measurement.
Comparing the expert data with the public data from the first image, we can say that (17/119) = 14.3% of all good data had both small and large axis lengths within the expert’s min and max measurements. Similarly, (47/119) = 39.5% of all good data had small and large axes within 10 pixels of the expert’s average measurements. There were many outliers. The angle cuts alone did not remove data where people measured background leaves or partial lobules and the background leaves and partial lobules had a noticeably different size than the intended lobules. The outliers from the background leaves skewed the averages; when using just the angle cuts, the resulting averages differed from the expert by a large amount and the standard deviations in the public measurements were large. As we wanted a way to cut outliers that did not rely on the expert data, we decided to use IQR cuts to remove these outliers (Table
Comparison of measurements done by an expert with those done by the public after cutting based on angles (above 80 degrees) and by IQR. Comparison of public and expert measurements for image ID. No. 8735482.
Major axis length (pixels) |
Stdev |
Minor axis length (pixels) |
Stdev |
|
Expert |
142.65 |
3.26 |
93.68 |
3.00 |
Public with angle cuts |
159.23 |
84.44 |
117.22 |
69.04 |
Public with IQR cuts |
135.89 |
15.13 |
96.20 |
8.56 |
In order to verify that our process of cleaning and cutting the data leads to measurements which are close to expert measurements, we applied the process to a second image. This image (ID. No. 25352420) had not been used to determine the data cleaning procedure, so it is a useful way to check that our procedure was not biased by the image used to create it. For this image (ID. No. 25352420), the expert predicted the smaller axis length to be 96.49 pixels with standard deviation of 3.99 and the larger axis to be 193.79 pixels with a standard deviation of 2.92. When the general public measured it, they found (after cutting for angle and outliers using IQR) the smaller axis length to be 97.34 pixels with standard deviation of 8.34 and the larger axis to be 187.83 pixels with standard deviation of 6.17. These are statistically the same. This is evidence that removing faulty data using the IQR bounds leads to a dataset which can produce a good measurement (Table
A comparison of expert measurements and public measurements after and IQR cut for a second MicroPlant image. Second comparison between public and expert measurements for image 25352420.
For each image, an image-specific interquartile range (IQR) was found for both the major and minor axes in the remaining good data. One can determine statistical outliers by considering only data that are within 1.5*IQR of the middle quartiles. This was used to remove data that were outliers for one or both axes. For the two sample images, we computed these manually in Excel. In order to extend this to all of the different images, these calculations were done both in Python and using Excel pivot tables. By comparing the two programming solutions with the manual ones, we were able to verify their correctness. Note that, unlike an angle cut, this type of cut depends on all of the data that have been collected for an image. As a result, there may be variations in whether a particular set of measurements is cut when new data are added to the analysis.
Fig.
Fig.
Fig.
Overall, measurements were of high quality. Significantly, all age groups, including children (10 and under), teens and adults, produced data that could be used by researchers. The clustering of measurements obtained by these groups with the expert measurements can be visualised in Fig.
Demographic breakdown of totals and IQR pass work in 2017 and 2018. In this, each number represents the total number of lobules measured, rather than the number of individuals doing the measuring or the number of images used. A kiosk session where no valid measurements were submitted is counted as 1 in the data collected category.
Demographic | Data collected | Number passing IQR | Percent passing IQR |
Child (in 2017) | 503 | 207 | 41% |
Teen (in 2017) | 414 | 197 | 48% |
Adult (in 2017) | 448 | 356 | 79% |
Family (in 2017) | 215 | 131 | 61% |
Total 2017 (Specimens) | 1,580 | 891 | 56% |
10 and under (in 2018) | 1,562 | 775 | 50% |
11-17 (in 2018) | 1,224 | 782 | 64% |
18+ (in 2018) | 1,690 | 1,298 | 77% |
Skip (in 2018) | 627 | 270 | 43% |
Total 2018 (Science Hub) | 5,103 | 3,125 | 61% |
Overall Total | 6,683 | 4,017 | 60% |
While the initial data analysis was performed on a 2018 dataset collected from the Science Hub, a secondary analysis was done on a dataset collected in the Specimens exhibit. This exhibit was focused on the large collection of scientific specimens at the museum and so the kiosk was only a small part of the larger exhibit. This differed from the Science Hub, which is a dedicated space where visitors can interact with scientists, as well as specimens from the Museum's collection. The fact that there was a smaller timeframe where interns collected demographic data from the Specimens exhibit (24 versus 48 days), meant that the amount of data collected from the Specimens exhibit was smaller. As some of the images had a very small amount of data associated with them, we combined the two datasets to perform the IQR cuts. Note that because the IQR cuts depend on the specific dataset used, the results may change when additional data are added in. This happened here; the IQR cut for the 2018 data alone had 3,125 pass the cuts. When we added in the 2017 data, there were 3,126 data out of the 2018 set within the IQR ranges. This suggests that combining the two is robust. When these combined cuts were used to examine the 2017 exhibits data, we found that, although there was a smaller amount of 2017 data collected, the quality was similar and the majority of data collected was usable, based on completeness, angle and IQR cuts. As the quality of data was good, the majority of images in the 2017 dataset had sufficient data to determine the lobule lengths and widths (Table
Overall data passing IQR cuts. In this, each number represents the total number of lobules measured, rather than the number of individuals doing the measuring or the number of images used. A kiosk session where no valid measurements were submitted is counted as 1 in the data collected category.
Data collected |
Data which passed combined IQR cuts |
Percent which passed combined IQR cuts | |
2017 Specimens exhibit data |
1,580 | 891 |
56% |
2018 Science Hub data | 5,103 | 3,126 | 61% |
Combined data | 6,683 | 4,017 | 60% |
To gain more insight into the image measurement data, an analysis was conducted on the images themselves. The classification of the MicroPlant images was based on the number of lobules present, complexity and clarity of the image. Standard deviations of the axis length measurements were used to determine the clustering of the axis measurements. The lower the standard deviation, the closer together the measurements are clustered. The image classifications were then compared to the standard deviations of the axis measurements, post IQR cut for each image. There were no notable trends present between the complexity of the images and the standard deviations of the measurements.
We then looked at images with large standard deviations, meaning the measurements were not very clustered together. Out of the 78 distinct subjects, we found only three images with very large standard deviations, one of them from the 2017 dataset. This was only 4% of the total images displayed on the kiosk. One similarity between these three different images is the number of observations counted, with each having between 6-14 observations total. This can be one possible explanation about the low quality of data collected since such data were very limited. For the image with only six measurements, there may have been confusion determining the difference between the lobules edge and the leaf behind the lobule. The shading of the pictures can leave room for confusion as well; and it might become unclear what is considered part of the lobule and what is not.
With the goal of comparing the accuracy between participant measurements and expert scientist measurements, students conducted the same statistical test used in
Though literally hundreds of citizen or community science platforms exist, to our knowledge, this was one of the first to be featured in a live interactive museum exhibit. Commonly, people-powered research projects engage participants online via platfoms like Zooniverse (
Visitors to kiosk exhibit in Field Museum's Specimens exhibit, summer 2017 (June-August).
Total interactive hours observed | 44.5 |
Approximate number of people engaged with exhibit during observation hours | 580 |
Amount of hours people spent engaged with exhibit | 12 |
Approximate percentage of exhibit patrons who interacted with the Specimen exhibit kiosk | 14-20% |
The Science Hub is designed for hands-on interactions and discussion with scientists. During the time that the kiosk was present in the Science Hub, 23,549 people visited the Science Hub, with 1,014 interactions with the kiosk. Based on this, we estimate that between 4.3% and 12% of the visitors to the Science Hub interacted with the kiosk; if the group sizes were similar to those directly observed in the Specimens Exhibit, approximately 8% of the Science Hub patrons interacted with the kiosk. As the kiosk was a stand-alone exhibit in the Science Hub, it is likely that museum patrons were more inclined to interact with the scientists present rather than a stand-alone computer exhibit.
GitHub houses all of our data and scripts (
Initial IQR cuts and comparisons to expert data were done by hand. Systematic IQR cuts were performed in Excel Pivot Tables for IQR. Results from IQR cuts in Python are in Processed 2017 and 2018 data, which includes totals of the 2017 and 2018 data broken down by demographics, average lengths and standard deviation for each image after IQR cuts and also the All Data sets for both 2017 and 2018. For the third image with expert measurements, t-test results were performed in Excel.
All age groups, including children (10 and under), teens and adults, produced valid, high quality data that could be used by researchers. Significantly, the paper outlines the implementation of experiential learning through an undergraduate mathematics course that focuses on projects with actual data to gain a deep, practical knowledge of the subject, including observations, the collection of data, analysis and problem solving. We are promoting an intergenerational model including children, high school students, undergraduate students, early career scientists and senior scientists, combining experiential learning, people-powered research and data derived from natural history collections.
From this study, the public is capable of producing a usable set of measurements. However, there are two limitations to the precision of these measurements. One is the touchscreen technology. As the smallest unit was a pixel, the difference of just a couple of pixels in the measurement corresponded to a 1% difference in length. This level of precision could not be improved upon with the technology used. The second is the variation in public measurements. Scientists who want to distinguish between species whose size differences are large (such as 10% difference) would be able to use work from the public; however, if the difference is very subtle (such as a 1% difference in length), it would not be possible.
For cases where multiple types of species may exist in an individual image, there could be multiple sizes of measurements in an individual image. This would make the IQR analysis ineffective at finding outliers and so more subtle methods would need to be employed. However, if one found clusters of different sizes, it may be possible to create a machine-learning algorithm to use for the data processing portion.
Interdisciplinary science entails the collaboration of scientists with largely non-overlapping training and core expertise to solve a problem that lies outside the grasp of the individual scientists (
This was an authentic interdisciplinary experience in experiential learning. Students from various backgrounds, specialities and ages were involved in all aspects of this project from inception to completion. While this report focuses on the data analysis, prior collaborations with both college and high school students led to the development of the project. Thus, all involved students participated in a rich, real world learning experience to generate and later analyse a real and meaningful dataset answering questions that were previously unanswered. Students involved in the data analysis found the skills that they acquired through the project to be highly applicable to their post-graduation jobs, with comments such as: "I’ve used a lot of the VBA (Visual Basic for Applications) skills from your classes with the Field Museum". This indicates that this project was good preparation for both research work and industry jobs. As this course was originally developed as part of the PICMath programme, it is a way to both fulfil the goals of the programme and to increase scientific knowledge. Future endeavours could implement student evaluation prior to, during and post course, as student feedback on their learning journey is effective in improving both student satisfaction and learning (
Despite the critical role of experiential learning in building student research skills and capacity, few have explored social interaction mechanisms used to facilitate student experiential learning in an interdisciplinary research team (
There remain many interesting education and learning questions that could be investigated using the current dataset, as well as future studies embarking on similar large scale projects. For example: How many measures must be taken by each kind of user group? Are there significant differences in measurement facility amongst children, adolescents and adults? Are there significant differences between a facilitated audience and a purely online audience? Limited work has investigated the arc of engagement from secondary to post-secondary education and into adulthood. Examining a cross-sectional population set will allow us to study reasons and motivations of learner engagement moving from a formal to an informal setting. The potential also exists to use this project to explore how authentic research experiences can both develop student interest in STEM and STEM careers (
Although it is possible to use this dataset to compare kiosk locations, in the future, having a consistent set of self-described demographic categories would allow for a consistent comparison of how different demographics interact in the the different kiosk locations. However, given that the desire to collect demographic information was realised after some data were collected, the addition of a brief demographic survey to the kiosk was a natural course correction. In addition, collection of demographic information before starting the activity interrupts the activity and presents other challenges when switching between participants.
Our experience with this project yielded insight into how to plan for future projects. For other taxonomic projects, the most robust measurements will involve images where the public can easily identify the object to be measured. As noted in the conclusions, the distinction between smaller lobules and the larger underlying leaves led to a number of inaccurate measurements. As this is a large, multi-year project, students who are involved in it will graduate before it is completed. Faculty and museum leaders are the ones maintaining continuity of the project; they need to make sure that the student data work is documented and stored in a way that is both carefully labelled and accessible. This allows multiple years of students to collaborate and make significant continual progress in a robust manner. In terms of project management, the most challenging aspect was maintaining contact and retaining connections between student cohorts. It is critical to plan for this from the very start of development when pursuing such projects.
Though we are able to tabulate a certain interaction level by patrons with the kiosk (Table
A project which involves the public, high school interns and university students, can allow for the entire community to create scientific discoveries. It allows for scientists to analyse large collections of specimens and it helps to give students an in-depth experience of what it is like to be a professional scientist or data analyst. This occurred at all levels. Given a sufficient number of community members measuring leaves, we were able to obtain high quality measurements, which were comparable to expert measurements, using methods that can be automated. This bodes well for crowd-sourcing taxonomic data collection from images. Mathematics alumni reported that the process of developing and creating these automated data processors was educationally beneficial for them as they were able to apply their skill-set to internships and post-graduation jobs working with data. Students of some cohorts of both the Industrial Applications of Mathematics course and museum interns have continued to pursue graduate degrees.
In today’s society, K-12 students' technological interests with platforms, such as digital making (
The Student Center for Science Engagement at Northeastern Illinois University helped provide funding for student interns. Financial support was provided by the National Science Foundation (Award No. 0949136, 1145898, 1458300, 1541545, 2001509 to M.v.K.). We especially acknowledge the support under the Research Experience for Post-Baccalaureate Students (REPS) in the Biological Sciences Supplemental Funding Opportunity from the National Science Foundation (Award No 2001509), from the Negaunee Fund for Science, Field Museum, and from the Grainger Bioinformatics Center, Field Museum. CME Group Foundation and Field Museum Women's Board funded the Digital Learning Programs at Field Museum. Support is also acknowledged from C. Moreau (PI) and T. Lumbsch (co-PI) - Research Experience for Undergraduates (REU) Site: Evolution of Biodiversity across the Tree of Life (Award No. 1559779). This publication uses data generated via the Zooniverse.org platform, development of which is funded by generous support, including a Global Impact Award from Google and by a grant from the Alfred P. Sloan Foundation. We would like to thank the following individuals for their support with imaging specimens: Jonathan Scheffel and Steve Schulze (Field Museum), Anthony Carmona and Stephanie Maxwell (Northeastern Illinois University) and Ed Gluzman and Vishal Patel (DePaul University, Chicago, Illinois). We are grateful to Brendon Reidy, Xenia Alava, Dara Arabsheibani, Alex Vizzone and Zak Zillen (all Northeastern Illinois University); Kavita Elliott (The Ohio State University); and Pedro Rebollar (DePaul University) for their support with guest interactions. Ramsey Millison and Ariel Wagner (both DePaul University) provided imaging and guest interaction support.
We thank the following individuals for technical and administrative support, consultancy and general assistance throughout the project: Lauren Hasan and Beth Crownover (both Field Museum); Kristina Lugo (Roosevelt University and Field Museum); Anthony Flores, Maha Khan, Alexandra Lopez, Lisa Murata and Oana Vadineanu (all Northeastern Illinois University); Gabriel Somarriba (University of Florida); Mariam Nasser (Elgin Academy, Elgin, Illinois); Joey T Rene Shelley (University of Illinois, Chicago); Audrey Aronowsky (Department of Ecology and Evolution, University of Illinois at Chicago); Beth Sanzenbacher (Bernard Zell Anshe Emet Day School, Chicago, Illinois); Jennifer Campagna (Blaine Elementary School, Chicago, Illinois); Christine LaPointe (Hillcrest Elementary School) (Green Ambassador, Field Museum); Laura Briscoe (New York Botanical Garden); Jordan Newson (Albion College); and Taylor Walker (Hollins University). We acknowledge Jose Hernandez Lopez (Northeastern Illinois University) and Allison Chen (University of California) for their early efforts investigating Google Analytics as part of the demographic survey.
We would like to acknowledge previous cohorts of Roosevelt University students whose quantitative work was influential for this analysis. The following performed analyses on subsets of the kiosk data under the direction of Prof. Steve Cohen: Jonathan Aird, Estevan Carrillo, Esther Fiala, Ian Fluhler, Nathan Gregory, Martin Hayford, Breanna Ivery, Jonathan Kasongo, Brian Keyser, Andrew Moskwa, Vingie Ng, Mary Strickler, Eric Synajie and Haki Wright. Thanks also to Cuong Pham (pictured) for demonstrating the kiosk at the museum.
The following Roosevelt University students performed large scale processing of data from the web platform; this type of parsing and processing work was essential for the current authors to learn how to work with a similarly formatted dataset: Julia Buczek, Matt Caraher, James Crigler (pictured), Davaadulam Ganzorig, Kyler Gillespie, Robert Hennecy, Thomas Hill, Chengjian Li, Ashlyn Liu, Marcos Mercado, Ali Myhelic, Nupur Patel, Rebecca Plata, Jacob Rubinstein, Michael Sassaman, Luke Swanson and Joshua Torres (pictured).
Finally, we would like to give a huge thank you to all of the children, families, teens and adults who visited the museum and took time to engage with the kiosk. Without them, none of this would have been possible.
Melanie Pivarski - managed data analysis, methodology, student work, created diagrams, writing, formal analysis, visualisation; Matt von Konrat - project conception, writing, data analysis, supervision, funding acquisition, project administration, resources; Thomas Campbell - funding acquisition, project conception, writing, diagrams, data analysis, supervision, visualisation; Ayesha T. Qazi-Lampert – observations, investigation, writing, student work, supervision; Laura Trouille, software, resources, validation, writing; Heaven Wade - writing, student work, supervision; Aimee Davis - funding acquisition, project conception, student work, supervision; Selma Aburahmeh - data collection, observations, investigation; Joseph Aguilar - writing/data analysis confirming IQR; Cosmin Alb - writing/data analysis initial IQR work; Ken Alferes - writing/data analysis initial IQR work; Ella Barker - writing/data analysis initial IQR work and cleaning, diagrams, visualisation; Karl Bitikofer - student work, supervision, investigation; Kelli J. Boulware - writing/data analysis of outliers in measurements, IQR confirmation; Carla Bruton - student work, supervision, investigation; Sicong Cao - writing, some background of experiments, confirming IQR; Christine Christian - investigation, writing; Arturo Corona - writing, automating IQR in Excel Pivot tables, data analysis; Kaltra Demiri - writing/data analysis of outliers in measurements, IQR confirmation, diagrams, visualisation; Daniel Evans - data analysis confirming expert/synching specimens data with interns; Nkosi Evans - data collection, digital imaging; Connor Flavin - writing/synching Science Hub datasets/initial IQR work; Jasmine Gillis - future work-zooniverse improvement suggestions; Victoria Gogol - writing/data analysis initial IQR work; Elizabeth Heublein - writing/data analysis of outliers in measurements, IQR confirmation; Edward Huang - data collection, observations, investigation; Jake Hutchinson - writing/data analysis initial IQR work, diagrams, visualisation; Cyrus Jackson - data collection, observations; Odaliz Rubee Jackson - writing/data analysis initial IQR work, diagrams; Lauren Johnson - data collection, digital imaging; Michi Kirihara - writing, automating IQR in python, data analysis; Henry Kivarkis - future work-zooniverse improvement suggestions; Annette Kowalczyk - writing/data analysis initial IQR work, diagrams, visualisation; Alex Labontu - data cleaning in powershell, writing, GitHub documentation; Briajia Levi - data collection, observations; Ian Lyu - future work-zooniverse improvement suggestions; Sylvie Martin-Eberhardt - data collection, observations; Gaby Mata – conceptualisation, data collection, data curation, observations; Joann Lacey Martinec - data collection, observations; Beth McDonald - data curation, student work; Mariola Mira - future work-zooniverse improvement suggestions; Minh Nguyen - future work-data cleaning, python, estimates of accuracy, neural nets; Pansy Nguyen - data collection, observations; Sarah Nolimal - data collection, observations; Victoria Reese – methodology, validation; Will Ritchie - future work-zooniverse improvement suggestions; Joannie Rodriguez - writing/data analysis initial IQR work, diagrams, visualisation; Yarency Rodriguez - observations, student work, supervision; Jacob Shuler - data collection; Jasmine Silvestre - writing/data analysis confirming IQR/diagrams of experimental set up; Glenn Simpson - data collection, observations; Gabriel Somarriba - data collection, observations; Rogers Ssozi - writing/data analysis initial IQR work, diagrams, visualisation; Tomomi Suwa - data collection, observations; Cheyenne Syring - data analysis confirming expert; Nidhi Thirthamattur - conceptualisation, data collection, data curation, observations; Keith Thompson - data collection; Caitlin Vaughn - data collection, observations; Mario R Viramontes - data analysis confirming expert; Chak Shing Wong - writing/data analysis initial IQR work, time stamp synch check diagrams; Lauren Wszolek - future work-data cleaning, python, estimates of accuracy.