Research Ideas and Outcomes : Data Management Plan (NSF Generic)
|
Corresponding author: Suzanne Anderson (suzanne.anderson@colorado.edu)
Received: 02 Jun 2016 | Published: 06 Jun 2016
© 2016 Jeri Fey, Suzanne Anderson.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation: Fey J, Anderson S (2016) Boulder Creek Critical Zone Observatory Data Management Plan. Research Ideas and Outcomes 2: e9419. doi: 10.3897/rio.2.e9419
|
This Data Management Plan (DMP) was created using the DMPTool. It describes all data collected as part of the the Boulder Creek Critical Zone Observatory (CZO) project, which focuses on research in the Boulder Creek watershed. The project is hosted at the Institute or Arctic and Alpine Research (INSTAAR), University of Colorado at Boulder, USA.
The goal for the Boulder Creek CZO is to create and collect meaningful and interesting research of the Earth’s critical zone by making this diverse data available to the public as soon as it is available, as well as providing access to other CZO data sets for similar research of the weathered, hydrologically active near surface environment.
data management plan, hydrology, earth sciences, critical zone, geomorphology, meteorology
The Boulder Creek Critical Zone Observatory (CZO) focuses on research in the Boulder Creek watershed. This encompasses Green Lakes Valley, Gordon Gulch, and Betasso locations covering 1158 km2 at 1480-4120m of elevation. There are two groups of data collected. The first is ongoing data collection that starts in 2008 and is comprised of manual sample collection, manual measurements and data loggers. The second is completed project data.
Ongoing data collections are comprised of:
Completed project data collections are comprised of:
Graduate students typically collect the completed project data collections with a specific topic in mind, resulting in a published paper. The ongoing data sets are collected by the field manager, lab manager and trained students with the purpose of creating a historical record to be used for any research topic relating to the Earth’s critical zone. Data collected in situ and sample data analyzed in the lab, are subjected to a quality assurance and quality control process before being submitted to the Boulder Creek CZO website for public access.
All data has been required to be submitted in comma separated value (.csv) format with accompanying meta data file in text (.txt) format. Currently the meta data files are being converted to .csv files in accordance with ISO-19115 Geographic Metadata standards. The meta data is being modeled from "A Model Information Management System for Ecological Research, Rick C. Ingersoll, Tim R. Seastedt, and Michael Hartman, BioScience Vol. 47, No. 5 (May, 1997), pp. 310-316” which has been expanded and built upon by the creators of its design since its publication.
Metadata must have the following values:
If a new data set is submitted then the meta data is used initially to determine which field location, topic and discipline the data should be saved in. If this is an existing data set that has new data then the log files are updated according to the field manager’s notes.
All data sets get their own web page with searchable meta data listed on the page itself as well as available to download in .csv format. Each web page has a link to download the data directly from a .csv file for completed projects. For ongoing data set collections, the data is inserted into an Oracle relational database which can be queried from the website for specific variables and date ranges.
The database and web server are hosted on a server supported and backed up by the data manager and CU’s managed services group, which is a division of the Office of Information Technology at the University of Colorado at Boulder.
Every data set is accessible from the http://czo.colorado.edu website and can be searched by title, field location, topic, or discipline. These data sets can also be located using interactive GIS/Map located here: http://czo.colorado.edu/geGIS/gmGIS.shtml
The meta data is what gives the Boulder Creek CZO its searching power. This searching capability is also ported to the national CZO site where all of the Boulder Creek CZO data sets are available in addition to data sets from nine other CZOs. Each CZO uses the same meta data formatting in order to be searchable from the national level here http://search.criticalzone.org/.
Each web page provides a description, keywords and citation that can be used for searching or reporting from the data set. There is a data use policy posted on every data set page that explains how to use or re-use this data. Which adheres to NSF’s policy on dissemination.
Data Use Policy:
*CZO Data Products. Defined as a data collected with any monetary or logistical support from a CZO.
**Private. Most private data will be released to the public within 1-2 years, with some exceptionally challenging datasets up to 4 years. To inquire about potential earlier use, please contact us.
The data for ongoing research data sets are updated monthly for all data loggers, only during the fall and winter for snow data sets, and annually in the summer for time lapse and surface chemistry. Typically the data is collected by the field manager, QA/QC’d and posted online within about 2-3 months for public access.
For completed or original datasets the data owner does have some time to work with the data before it is required to be submitted. Below is the Data Sharing Policy posted on every data set web page. Which adheres to NSF’s policy on sharing.
Data Sharing Policy:
* CZO Data Products. Defined as data collected with any monetary or logistical support from a CZO. Logistical support includes the use of any CZO sensors, sampling infrastructure, equipment, vehicles, or labor from a supported investigator, student or staff person. CZO Data Products can acknowledge multiple additional sources of support.
** Private CZO Repository. Defined as a password-protected directory on each CZO’s data server. Files will be accessible by all investigators and collaborators within the given CZO and logins will be maintained by that local CZO’s data managers. Although data values will not be accessible by the public or ingested into any central data system (i.e. CUAHSI HIS), metadata will be fully discoverable by the public. This provides the dual benefit of giving attribution and credit to dataset creators and the CZO in general, while maintaining protection of intellectual property while publications are pending.
† Dataset Creators. Defined as the people who are responsible for designing, collecting, analyzing and providing quality assurance for a dataset. The creators of a dataset are analogous to the authors of a publication, and datasets should be cited in an analogous manner following the emerging international guidelines described at http://www.datacite.org/whycitedata.
For short term archiving purposes this data is backed up nightly and retained for 30 days. However, because of the flat file nature of a UNIX server running an Oracle database and Apache Tomcat web server, the CZO does have full backups created quarterly and saved to external hard drives.
For long term archiving there are a couple of options in place. Currently this is an ongoing funded project which will keep the data available in the near future. This data is also hosted on the National CZO website for the further foreseeable future.
Time series data is formatted so that it can be ingested in the CZO Central Data Portal (Zaslavsky et al., 2011) that forms the center of the CZO Integrated Data Management plan (NSF 1153164 to Aufdenkampe). The National CZO website provides the access to the Central Data Portal.
The goal for the Boulder Creek CZO is create and collect meaningful and interesting research of the Earth’s critical zone, by making this diverse data, available to the public as soon as it is available. As well as providing access to other CZO’s data sets for similar research of the weathered, hydrologically active near surface environment.
National Science Foundation (NSF) Earth Sciences Template
Public Data Management Plan created with the DMPTool: https://dmptool.org/plans/16058.pdf
Boulder Creek Critical Zone Observatory
Institute or Arctic and Alpine Research (INSTAAR)
University of Colorado at Boulder