Research Ideas and Outcomes : Grant Proposal
Print
Grant Proposal
Unicorn–Open science for assessing environmental state, human health and regional economy
expand article infoPekka Neittaanmäki, Timo Huttula§, Juha Karvanen, Tom Frisk|, Jouni Tuomisto, Antti Simola#, Tero Tuovinen, Janne Ropponen§
‡ University of Jyväskylä, Jyväskylä, Finland
§ Finnish Environment Institute, Jyväskylä, Finland
| Pirkanmaa Centre for Economic Development, Transport and the Environment, Tampere, Finland
¶ National Institute for Helth and Welfare, Kuopio, Finland
# VATT Institute for Economic Research, Helsinki, Finland
Open Access

Executive summary

Open data and models are becoming increasingly available, but there are not yet good methods and platforms to turn those into systematic evidence-based decision support. Unicorn will produce such an enviro­­nment based on existing theoretical and practical knowledge about decision support and models. This con­sortium possesses the necessary models, data, and skills to set up an environment and demonstrate its func­tionality and usefulness with several case studies related to the environmental issues, human health, and economy. The Unicorn environment will be built in a generic and systematic way so that it could even be­come an international standard for evidence-based decision support.

Developing a technical environment or standard is not enough. Using the Unicorn environment is a large cul­­tural change for both researchers and decision makers, as the current decision support practices do not re­flect the principles of openness, criticism, or reuse. Therefore, this cultural change must be promoted by train­ing to use the environment, by informing the society about its possibilities, and solving a number of practi­cal and technical problems related to current practices in research institutes, ministries, and municipalities. We acknowledge these problems and offer solutions to them with an extensive interaction plan.

Keywords

Open data, open models, environment, human health, regional economy

List of participants

Consortium leader (PI):

  • Prof. Pekka Neittaanmäki, Dept. of Math. Information Technology, JYU.

Group leaders:

  • Prof. Juha Karvanen, Dept. of Mathematics and Statistics, JYU (Secondary PI)
  • Prof. Tom Frisk, Pirkanmaa Centre for Economic Development, Transport and the Environment
  • Adj. prof. Jouni Tuomisto, Dept. of Health Protection, National Institute for Health and Welfare
  • Adj. prof. Juha Honkatukia, VATT Institute for Economic Research
  • Adj. prof. Timo Huttula, Finnish Environment Institute (SYKE

Third parties involved in the project

Associated partners:

  • Anne Mäkynen, Council of Tampere Region
  • Laura Höijer, Ministry of the Environment
  • Jyrki Huikari, Ministry of Social Affairs and Health
  • Jaana Husu-Kallio, Min. of Agriculture and Forestry
  • Peter Tattersall, Hahmota Oy
  • Antti Poikola, Open Knowledge Finland Ry

State of the art and preliminary work

This research project will be a game-changer in political and governmental decision making by enabling the use of scientific information, which is accessed and analyzed in a unique framework containing multimodal data from di­­fferent sources and most novel data analysis tools to interpret it. The aim of this project is to provide efficient tools that will work for both decision-making processes (Reichert et al. 2015) and research problems. The tools enable data mining from various sources, data analysis and processing, comparison and simulations us­ing the extracted data.

We are in a situation where a lot of good quality scientific data is openly available in national and international data­­bases (avoindata.fi, www.data.gov, thegovlab.org). Also there is a clear societal strive to provide and use open source models and evidence-based decision making and support their open use and development.

A major problem is that there is no single physical or virtual place where all this data can be combined, prepro­cessed, analyzed and visualized easily by researchers for decision makers. The data preprocessing routines are labori­ous, and it is typical that every researcher and software provider develops their own unique routines for this to facilitate their own work. However, the most important societal needs require collaborative multidiscipli­nary at­tempts to solve relevant problems or questions. The processes must be reliable, reproducible, and transparent to sup­port these studies effectively and efficiently. Shared practices, tools, data, working environ­ments and concerted actions are the way forward to improve science and decision support. This is true in all areas, but in this project we will start from environment, human health, and the regional economy, as they are com­plex and chal­lenging enough to offer a good test bed for general development.

It is not enough that experts push data to politicians. There must be practices for mutual communication: experts must answer policy questions in a defendable and useful way; decision makers must more clearly explain their views using evidence; and there must be ICT tools to support this exchange. The focus is on end-users. This consor­­tium has already developed and tested prototypes of such practices and tools in several projects, and is now ready to apply them in the society on a large scale.

The partners of Unicorn are highly competent in their respective fields in handling data at every stage of a decision process and in producing useful, timely, and accurate information for the decision makers. The results of such work have to be understandable and traceable back to original data: no black box solutions. Unicorn can pr­­ovide that and also critical comments and recommendations during the work.

The consortium relies on latest methods and tools from statistics, data processing and simulation. Some researchers will focus on methodology of the Unicorn environment. Mathematical and statistical methods will be tailored and tested inside the environment. Methods for processing information and big data will be considered from theoretical and practical points of view. Skilled people will construct the interfaces between e.g. data­­bases, preprocessing tools, mathematical models, and post-processing tools. The utility of the environment will be demonstrated with case studies. The environment will not be a single product in traditional sense, but a system­atic and coherent set of tools and practices, useful for everyone in different ways.

In this project, we have chosen open policy practice as the basis for the work. Open policy practice is an existing method for decision support (Pohjola 2011). It has been used e.g. in THL in several cases. It has shown to be a flexi­ble method, and implements many important properties of good scientific and policy processes.

Unicorn environment builds upon already existing projects, models and case studies at national and interna­­tional level, complementing and improving on the knowledge generated by them. Most of these projects involve sev­eral members of this consortium, which will ensure that potential interactions between projects might be more easily established. For example, Opasnet, the web-workspace for performing open assessments and facilitating open policy practice will be used. Also, the present project has a close connection with the ongoing CONPAT project funded by the Academy of Finland AKVA research program, where investigation of health effects and economic implications of microbial and chemical contaminants in the Kokemäenjoki River watershed downstream from City of Tampere are already underway. Other existing tools will also be utilized in this project, such as www.jarviwiki.fi and www.vesinetti.fi.

In summary, proposed project produces a new, open virtual work and modeling environment that combines open information from multiple databases and builds up tools for efficient policy studies. This will improve decision-making processes in Finland and other countries and greatly streamline the workflow in multi-disciplinary projects. Previously, similar projects have been done in more closed and less reusable settings. Especially georef­erenced data will be utilized in more efficient ways in several case studies.

Open data and models are a mega-trend and will change the world. Unicorn directs this trend to paths that are the most beneficial for societal decision making by providing quick, reliable and efficient decision support. This i­m­proves the Finnish political and hence economic infrastructure. Significant saving of resources will be mani­fested with improved data collection, analyses and modeling. Also, the quality and amount of assessments that can be done to support work.

Objectives, Concept and Approach

The objectives of the project are:

  • To develop a versatile open web environment for de­­cision support and related data storage and modeling.
  • To develop interfaces between existing data sources and models and the Unicorn environment to facilitate their universal use.
  • To provide and implement tools for producing new open datasets and models to complement and expand the system.
  • To develop a user forum and community of researchers, developers, policy makers, and stakeholders for creating shared understanding about policy-related science.
  • To modernize the working practices of researc­­hers and decision makers e.g. by improving their use of existing data and models.

  • To test, implement, and demonstrate large, multidisciplinary studies utilizing novel, efficient practices and environments.

  • We hypothesize that the major challenges related to evidence-based decision making actually are about changing the practices of researchers and dec­­ision makers. However, the change is slow and dis­cussion feeble until there is practical tools (such as the Unicorn environment) to support and demonstrate the new practices.

Scientific breakthroughs and progress seen

Unicorn has top modelers and researchers from several different fields of research. In addition, they already possess unique expertise and functionalities that can be directly used for developing the Unicorn envi­­ronment. In this very consortium, there is a high potential for synergism by combining different data sources, models and platforms, and decision support practices.

The environment will be adaptable to other disciplines, methods will be flexible, and assessments can be checked and re-used anywhere in the world, thus increasing reliability and penetration of knowledge. It has a potential of becoming a standard for decision support expert work. The government research institutes such as SYKE, THL and VATT have a specific interest in long-term maintenance of an environment that su­­pports their main objective. For publication plan, see the Interaction plan.

Exporting of knowledge of best practices

Finland is famous for producing many crucial information-related products that are now used worldwide as de facto industry standards: Linux operating system, SSH encryption protocol, MySQL database, SMS text me­­ssage. All these breakthroughs were developed in Finland and distributed freely for anyone to use. These products have changed the global market of their respective fields. They have been game changers that forced everyone to rethink what an operating system or database means. They have also created a rich eco­system of service providers on top of the key innovation, rather than producing direct sales income.

Our aim is similar. The revenue comes from selling expertise or knowledge services enabled by the Unicorn environment or its derivatives. Free distribution of the main innovation is actually crucial in spreading the idea to worldwide use and in creating global demand for the product. If the Finnish private se­­ctor is ready, it will get the first shares of the new market. However, it should be remembered that the socie­tal benefits of implementing open knowledge practices are larger than the incoming cash flow stimulated by them and materialize as quicker and more justifiable societal actions.

Data management and sharing of the products of research

Introduction

We will develop the Unicorn environment using flexible exploratory methodology and design science. During the 6 years of the project, we need at least two development stages. The architecture must be modular to provide flexibility with interface generations. The first stage focuses on reusability of compo­­nents, while the second focuses on efficiency.

An open map service will be a key functionality for using location-based information and showing results on a map. The project will complement existing data with focused experiments. A geographical pilot will be the waters downstream Tampere, where we focus on water quality, accidental releases, their health effects and risk management, and the regional economy and air quality. The background studies have been made in the CONPAT- project funded by Academy of Finland.

Other pilots are in Kuopio region in Nothern Savo, where we looks at exploitation of natural resources and its unforeseen risks and in Kainuu region near the Talvivaara site, where SYKE and JU have existing aquatic pr­­jects (e.g. MINEVIEW). VATT has produced long run structural economic projections for each NUTS 3 (Nomenclature of Territorial Units for Statistics (NUTS) is EU standard for referencing the subdivisions of countries for statistical purposes. There are currently 19 NUTS 3 level region in Finland (maakunnat)) level re­gion of Finland. The work is implemented with regional computable general equilibrium (CGE) modeling tech­niques and it uses dialogue with regional experts to exploit their tacit knowledge, e.g. about working popula­tion trends.

Research materials and their management plan

The issue with material management in the and after the project will be the most core of the project. The Uni­­corn environment has to provide robust and efficient queries using many databases and in same time, there are restrictions and rules for the data available. Meanwhile, all the studies should be reproducible. Moreo­ver, even the aim is to use open approach, the privacy setting and authentication during the research-phase is needed. The aim is to tackle on these difficulties when requirement specification will be imple­mented.

We will utilize several existing databases and platforms. Opasnet workspace, designed for decision support, has open models on e.g. burden of disease, air pollution, and contaminants in food. These will be uti­­lized and further developed in real policy cases (see e.g. WP5). Also other cases will be implemented; tenta­tive topics include radon and dioxins where data is already available. An overall objective is to produce a holistic burden of disease model covering all major environmental and lifestyle factors in Finland; however, only a part of this will materialize within this project.

RYMY is a national water- and foodborne outbreak notification and reporting system maintained by National Food Safety Authority Evira and THL. Drinking water and bathing water monitoring data comprising about 120 000 data points per year will be collected via this reporting system tentatively beginning from 2016. We will assess quality of drinking water together with other relevant environment and health data. We will utilize surveillance and quality data for management actions and decision-making at different ad­ministrative levels. Data on environmental health exposure is made available through YHTI database. The VAHTI database is an emissions control and monitoring database of the Finnish Environmental Administration. We will use EU-INSPIRE Directive compliant nationwide open spatial data sets, such as Finnish Me­teorological Institute’s open meteorological data, which are already being utilized by SYKE’s group participating in this project. Also National Land Survey’s relevant data products, like elevation data, are in­cluded, as well as SYKE’s lake depth data products.

The environmental databases at SYKE contain nationwide time series of hydrological observations including surface and waters and representative sites for ground waters. Similarly SYKE’s Hertta-system co­­ntains hydrochemical and hydrobiological data. The interfaces to this data warehouse are under construc­tion and will be completed in the beginnig of 2016. Another important data source maintained by SYKE is the opera­tional WSFS-Vemala-model (Huttunen et al. 2015). It produces hydrological and water quality forecasts in an operational way. The system has been used very much for flood warning and water resource simulations. It has been also a key tool in forecasting the fate and transport of recent accidental pollution releases from Talvivaara Mine and Nordisk Nickel Mine in Finland. Also for this system the interfaces are there and system will be connected to Unicorn as a data source. One important data source is provided by ENVIBASE platform.

Health impacts are like a pyramid: severe rare cases on top and milder common symptoms at bottom; severe cases are more likely to be recorded. At THL the surveillance pyramid has (i) deaths registered by the Statistics Finland, (ii) Hospitalizations registered by the THL to the Hilmo register, (iii) microbiologi­­cally confirmed case registered by the THL to the National Infectious Disease Register (TTR), and (iv) visits to doctor in the primary health care, registered by the THL to AvoHilmo register.

VATT models are based on detailed national input-output tables (Honkatukia 2013, Honkatukia and Simola 2011) and national and regional a­­count system time series starting from 1975, which are provided by Statistics Finland. Official regional input-output tables have not been created since 2002, but the VATT modeling unit creates them with a gravity-based method for all the NUTS3 level regions. The cre­ated regional input-output tables are distributed to other research organizations at the aggregated level deter­mined by Statistics Finland. The model parameterization draws on several sources such as Finnish Longitu­dinal Employer-Employee Data (FLEED), Structures of Earnings, Population Structure, Population Projec­tion, and Statistics on the Finances of Agricultural and Forestry Enterprises (MMYTT). Additionally, the model baselines draw on Ministry of Finance long-term economic predictions and regional level expert as­sessments. The model results are used for anticipating long term occupational needs in order to better allo­cate educational resources.

Impact

In this project, we are developing the Unicorn work environment and practices as a more efficient way to process large amounts of varying data using the latest methods and tools. The aim is to utilize the current data and technology revolution with its full potential. Transparent decision-making process and openness is a megatrend. Data and mathematical methods should be open and re-usable by researchers and policy experts. This will be a key means for governments to discuss and construct trustworthy opinions.

The technology revolution of open data and open models will completely change the way we think about evidence-based decision making (Reichert et al. 2015). Instead of a traditional long chain of static information products such as scientific articles, reviews, expert reports, policy papers, and finally decision recommendations, we can re-think this process as collaborative information collection work. The objective of the process, such as a law about a particular topic, acts as the starting point, and is published on a collaborative environment. The environment is then used as a forum for discussion, assessments, comparing modeled impacts, developing solutions, and finally, based on this write the content of the law. Others can later revisit the proposed law and its rationale.

This process is much more complex than a typical collaborative writing task. Indeed, it is based on large assessments about impacts of policy options and thorough expertise on the underlying questions, both scientific and value-based. Still, we have been able to identify critical rules and practices for such a process, and to develop tools to support the practices. We have also developed tools to collect estimates from experts for computational mo­­dels. Although the initial development of models may be laborious, they are designed to be re-usable within the envi­ronment in other, similar cases.As this practice becomes more prevalent, we expect that the society will start demanding better evidence to back up a decision before it is socially accepted. In other words, it reduces the survival of poor policy initia­­tives.

Concrete manifestations of the technology revolution and the benefit to Finland

Open data and models will change the world and Unicorn directs this trend to paths that are the most beneficial for societal decision making by providing quick, reliable and efficient decision support. The government research institutes spend hundreds of person-years in old-fashioned data analyses and modeling efforts in decision support. Significant sav­ing of resources will be manifested with improved data collection, analyses and modeling. Also, the quality and amount of assessments that can be done to support work in e.g. municipalities is significantly increased.

The virtual environment will be initially set up with ecological, health, and economic aspects (including co­­mplex interactions and spatial data). However, the system will be applicable to other sectors and we expect expansion of use. The virtual environment has a great potential in Finland and also exports value: economically, we see a mar­ket for expert-based consulting and modeling in all societies utilizing technological break-troughs.National databases make it possible to create an assessment and simulation tool that can be used for answering lo­­cal questions, such as mercury in a local lake (see WP5). This would be an improvement to aquatic modeling and leading also improvement to the current general-level recommendations of fish intake, which may partly lead to unnecessary avoidance of fish in the diet or anxiety.

Human activities, institutions, and behavioral changes that are needed to exploit the tech. revolution

Unicorn partners already have practical experience in virtual environments and online modeling. There are technical challenges like user and data exchange interfaces (see WP3 for our solutions). However, our experience has been that cultural and learning challenges are clearly larger: e.g. experts do not often accept the principles of open­­ness of data and criticism by non-experts; there are worries about merit accumulation; and open program­ming languages are not familiar. Constant communication, positive examples, technical support, systematic small-scale testing and incremental improvements, and political support from the employer and research funders can overcome these challenges. Also the policy sector has its challenges: non-familiarity of assessments and scientific data and lack of resources and time in single policy processes. The previously mentioned methods ap­ply here, but in addition, experts have to carefully listen to the information needs of decision makers. See WP2 for our solutions.Public measures to best support the process of change in such a way that the transformation proceeds in a controlled manner, and end with Finland to benefit technology revolution.

Public measures to best support the process of change in such a way that the transformation proceeds in a controlled manner, and end with Finland to benefit technology revolution

The consortium will work closely with the governmental stakeholders in development and utilization of the Unicorn environment and open developer forum. Pi­­rkanmaa Centre for Economic Development, Transport and the Environment, key actor in its region, will coordinate collaboration with Stakeholders. There are also committed associate partners to collaborate with consortium. They include ministries of environ­ment (YM), social affairs and health (STM) and agriculture and forestry (MMM), National Land Survey of Finland and National Supervisory Authority for Welfare and Health (Valvira), Council of Tampere Region, Open Knowledge Finland Ry and Hahmota Oy. We are also actively collaborating with Kuntaliitto (association of municipalities) in this area. We will actively spread information and recruit new associ­ated partners during the course of the project. Especially we are interested to find them from other geographical re­gions in Finland.

The Unicorn consortium will utilize the data resources generated during the project as part of routine legal requirements of environmental monitoring and public health response. For example, local general practitioners in all municipalities of the country report the citizen health information to the AvoHilmo database. In the near fu­­ture, also local health protection authorities or local laboratories will submit their data to the YHTI environmen­tal health-monitoring database. In the central government is currently going to a number of different projects with information resources sharing to develop. The project will support the development of co-ordination and aims to utilize resources effectively by various institutions. The promotion of the use of models provides significant advantages in the production and utilization of information.

Science is powerful in rejecting ideas that are not consistent with observations. Therefore, science should be actively used to estimate what impacts the actions considered could or could not have. Ineffective actions are rejected; uncertain actions are tested in small scale, and then poor actions can be rejected. Thus, the role of experts is to reject poor ideas, and the role of decision makers is to choose among the remaining good ones.

Equal adaptation of capabilities and human resources in the individual, group and institutional level for the system reform

Experts' know-how is exported as much as possible in the form of automated tools to enable more efficient use of time and resources which can be better targeted to make reliable estimates instead of mechanical data processing. People think that they have a right to be heard and their opinions and concerns to be acknowledged. Of­­ten conflicts in a society occur because some group thinks they were not heard and they cannot influence their own case; then they lose trust. Shared understanding is a systematic method that has been developed and implemented in THL. It offers a channel for citizens to be heard also in this project.

Implementation

The project implementation is conducted in seven work packages (Fig. 1).

Figure 1.

Links and interactions between the work packages

WP1. State of art review and inventory of the resources available

Responsible group leader: Prof. Pekka Neittaanmäki (Dep. of Math. Inf. Tech. JYU)

Research Group: Ph.D. Tero Tuovinen, Ph.D. Annemari Soranto

Schedule: M1-M5, M37-M41, (5+5) Linking to: WP2

Description: In this work package we will identify the best practices of available open databases and models. Already pub­­licly available projects, codes, interfaces and modeling frameworks that can help in the realization of Uni­corn will be listed and studied. These include domestic databases and modeling environments such as THL’s Opasnet, and international codes such as the Open Modeling Interface, OpenDA data assimilation tools, OpenEarth tools and Earth System Modeling Framework. There are several large scale domestic ICT-networks and projects going on like VALTORI, ENVIBASE. They will be identified and the collaboration will be established when needed.

We are looking for the latest technological advances, especially from big data and data-analysis and integrate the system that combines latest knowledge of tools and data with easy usable user-interfaces. The work be­gins by defining the user requirements for the system (Unicorn) and planning for the implementation phase. The required human resources will be chosen after defining based on the knowledge that is needed. Espe­cially human interaction with the system will be focused, because the idea is to spread the solution for large amount of researchers and decision makers. An extensive open data source mapping will also done. It will cover all the state research institutes for identifying the state of the art.

Tasks:

  • Reviews of the state of art.
  • In-depth familiarization requirement field.
  • Inventory of the databases available.
  • Preliminary selecting of the methods and the tools.

Deliverables:

  • A review about the methods and technologies that are openly available.
  • A review of integration interfaces of the databases.
  • A review about model – data interface standards.
  • Plan for the next steps.

WP2 Stakeholder contributions and dissemination

Responsible group leader: Research Prof. Tom Frisk (ELY Centre Pirkanmaa)

Research Group: Prof. Pekka Neittaanmäki/JY, M.Sc. Ämer Bilaletdin/ELY, MSc Matti Saura/ELY, Ph.D. Tero Tuovinen, Ph.D. Annemari Soranto/JY, Saija Koljonen/SYKE, M.Sc. Antti Simola/VATT, Kati Valpe/JY

Schedule: M1-M72 (27+36) + (9+12)

Linking to: All workpackages and partners.

Description: Unicorn will develop a unique environment for decision knowledge support. Anyone (company, research inst­­itute, university, or parliament) can use it or set up an instance of their own for their own purposes or for distributing their own knowledge to others. This will produce global demand for expertise about the Uni­corn environment and practices, and thus dissemination needs (see WP2).

In this workpackage the consortium partners will provide a synthetic view of overall achievements of their multidisciplinary research results and update it during the course of the project. The research outcomes will be used to support the decision making of the stakeholders. Our main role is to identify the bene­­fits of involvement of the stakeholders, to identify appropriate stakeholders and the ways to work with them, to inform about the scope of the research and to share information, and to choose the best tech­niques for the engagement of stakeholders. The first, and perhaps the most critical, step in the stakeholder en­gagement process is to identify why the engagement activity is necessary, what outcomes are aimed at, and the scope and the context of the engagement.

Dissemination activities will be focused on two major aspects of the project. Firstly, dissemination through the events, like seminars, workshops and information days, organized by the consortium, and secondly, sharing of project outcomes. Both of the aspects will be implemented through a variety of means du­­ring the project. One method to do dissemination will be project websites, where we will collect all the neces­sary material about efficient use of Unicorn environment. The website will archive externally and inter­nally accessible material, such as presentations, minutes from network meetings and workshops and pro­gress reports. Scientific and technical results will be disseminated to the wider scientific audience throughout the entire duration of the project and beyond by presenting relevant results in scientific journals and conferences. Incorporating the main results of the project in regular university courses will also do dissemination.

Furthermore, the aim is to organize several events that support the gaining and sharing the information. We will do the small workshops, where we will invite 2-5 sp­­ecialists on the topic on focus to present and discuss about the latest advances and possibilities. Moreo­ver, we will organize yearly seminar that bring the collaborative network together and update the status of the project and overall. Finally, we will organize final symposium and by providing a common framework of design including all the models developed in the network.

The one way to do dissemination during the project will be organized training events. We are interested in the user experience and usability of the environment. Organizing the training events, we will get feedback and first-hand information about usability of our solutions. Moreover, it will increase the number of end-users and interest to all this system will increase.

Tasks:

  • To identify the benefits of involvement of the stakeholders
  • To identify appropriate stakeholders
  • To identify the ways to work with stakehold­­ers
  • To share information to stakeholders
  • To produce learning material and training for open policy practice.
  • Facilitating public decision processing by e.g. organizing and synthesizing public discussions.
  • Producing project websites
  • Organizing event

Deliverables:

  • Dissemination plan
  • Project websites
  • The seminars, wor­­kshops and final symposium
  • Sharing information with stakeholders.

WP3 Implementation of the Unicorn environment

Responsible group leader: Prof. Pekka Neittaanmäki (Dep. of Math. Inf. Tech. JYU)

Research Group: T. Tuovinen, A. Soranto, P. Korhonen/SYKE, M.Sc. A. Simola/VATT, J. Tuomisto/ THL

Schedule: M6-M36, M37-M72, (31+36)(36+36) (144 mmonth) Linking to: WP1, WP2, all other WPs.

Description: This work package is the practical and concrete core of the project. In this package, spanning nearly the whole project duration, we will implement and program the Unicorn environment. In WP1, we de­­fine the overall approaches for the implementation (tools, methods, targets) based on feedback from other work packages, especially WP1 and WP2. We will build up the requirements for the system based on the overall objectives and using open discussion. The key functionalities in this package are planning and de­sign of concrete components and structures, building up the basis, programming of the routines, implementing of functionalities, verifying the results and solutions and finally optimization of Unicorn environment for research use. We will utilize existing open source solutions when available. The imple­mented solution will be measured by its reliability, usability and efficiency. Recommendations for after-pro­ject development will be described and documented. All solutions that are used will be openly docu­mented.

In this package we will implement interfaces for open databases and links between the Unicorn environ­­ment and several model codes. Moreover, we will build up a test bed and challenge the environment for sev­eral developed Big Data analyzers. The results will be analyzed and documented for later use. During the project’s first phase, the aim is to build up demonstrator level environment. In the second phase, we are fo­cusing on efficiency and usability of the Unicorn. Our target is not a fully commercial software package, be­cause the production phase will be too time and resource consuming. However, companies can easily uti­lize major parts of the implementation after the project because of our open approach. Moreover specific prob­lems and solutions will open markets for the business.

Tasks :

  • To do requirements definition.
  • To do designing of the structure of Uni­­corn environment
  • To implement of the required components.
  • Build up a user interface
  • Build up a models and analyzers inter­­faces.
  • To integrate the Unicorn to the databases.
  • To validation and verification of the system.
  • Write documentation.

Deliverables:

  • Requirement definition.
  • Design document of the system.
  • Implementation plan.
  • Integrated environment.
  • Validation and verification report.
  • Documentation

WP4 Unicorn pilot case case: demonstration related to the environment

Responsible group leader: Adjunct Prof. Timo Huttula

Research Group: PhD. Janne Juntunen, Dr. Tech. Olli Malve, M.Sc. Janne Ropponen, PhD. Saija Koljonen, M.Sc Niina Kotamäki, M.Sc. Esa Hirvonen, M.Sc. Päivi Korhonen,, Ph.D. Antti Simola/VATT

Schedule: M1-M36, M37-M72, (100+80)

Linking to: WP3, WP5

Description: There is a strong pressure to migrate from environmental monitoring to environmental modeling because of a need for both more efficient use of resources and more relevant, nationwide results. The goals of this work package are:

  1. To evaluate the applicability of environmental data bases for real world aquatic applications, to test analyzing tools developed in WP1 for extracting environmental information and correlate that with information extracted from other databases.
  2. To assess the functionality, accuracy and usability of aquatic models built within the Unicorn environment compared to manually building the models. We will compare the solutions provided by Unicorn to the solutions from already performed in traditional way in Academy of Finland Funded CONPAT (http://en.opasnet.org/w/CONPAT) project focused on Lake Pyhäjärvi and River Kokemäenjoki. This study will show how well the parameterization and all necessary input information can be automatically created for the computational model. Moreover, it will show the potential time savings that can be gained by using the Unicorn environment compared.
  3. Demonstrate the usability of Unicorn to new, previously unmodelled cases. They are:
  • Accidental release of harmful substance on Kallavesi- bridges, where risks on raw water supply of Kuopio city and economic loss due to the fishing capacity losses are studied.
  • Fire in chemical plant at Tampere City, atmospheric deposition and risks on human health and economic effects in the region,
  • Effluent transport in recipient waters of Talvivaara mine, their health and economic effects,
  • Piloting regional applications of lake specific models to support environmental management planning and prioritizing of restoration or other measures.

We will show that by using Unicorn we are able to build a usable hydrological or water quality model or chain of models with first guess parameterizations, and can produce reasonable results without extensive model tuning. This enables us to study rapidly developing situations in previously unmodelled areas. The models utilized range from a box model (LLR, Kotamäki et al. 2015), 1-D a river model such as (e.g. SOBEK, Ropponen and Huttula 2014) as well as a 3-D transport model (COHERENS, Lyuten 2011) will evaluate the need to implement a process based catchment model (e.g. INCA, Granlund et al. 2004, SWAT, Tattari et al. 2009) within Unicorn to solve hydrological scenarios with high spatial accuracy.

For flow and transport model input we will use bathymetry, DEM (digital elevation model), hydrological and meteorological data as well as data on the simulation state variables that are already available in some form at various sources. For example, the operational hydrological system WSFS provides simulated hydrological data and forecasts (water levels and discharges) from all river and lake systems in the country and can be used as input for transport modeling if observational data is unavailable.

Reproducing existing, manually crafted models within Unicorn does not show the true potential of the system since prior knowledge of the challenges encountered during model development will be taken into account when implementing the model setup. Therefore we need complementary demonstrations to assess the capability of the Unicorn environment. We will use Unicorn to model a new area, the lakes in Talvivaara region. Another pilot site will be Lake Kallavesi, which is used as the raw water source for making artificial ground water for the City of Kuopio. We have previously shown that even a modestly calibrated lake model combined to a simple data-assimilation scheme clearly improves the prediction capability of the model compared to using the model or data alone (Mano et al. 2015). The Talvivaara mine complex is facing great challenges concerning of water balances and accidental releases of wastewater is possible as has happened in past.

Furthermore we are able to use the long term monitoring data for both chemical and biological scenarios based on the hydrodynamic models. This will enable us to respond swiftly to diverse environmental challenges (chemical fate models) and it will also work as a tool for directing environmental measures (e.g. prioritization of restorations).

Tasks:

  • Demonstration and applicability tests of Unicorn in assessing the environmental state.
  • Transfer of environmental modeling know-how. Analyzing the existing model applications in order to transfer the modeling know-how in the form of automated tools and model applications to Unicorn.
  • Demonstrations cases. Produce a model application in the CONPAT-region and two other regions.
  • Evaluation of the case studies. Comparison of the results produced within Unicorn with existing models and data.
  • Optimal utilization of environmental data. To find the optimal way for the models to utilize environmental data provided by Unicorn from different sources. To identify and solve issues arising from differing spatial and temporal resolutions of data and models.

Deliverables:

  • Model realizations. Using Unicorn to build the case studies.
  • Simulation results.
  • Scientific publications related to manual and automatic modeling results.

WP5: Unicorn pilot case: Demonstration related to human health

Responsible group leader: Adjunct Prof. Jouni Tuomisto (THL)

Research Group: Jukka Jokinen/THL, Arja Asikainen/THL, Tarja Pitkänen/THL, Sari Ung-Lanki/THL, Mikko Virtanen/THL, Saija Koljonen/SYKE.

Schedule: M1-M36, M37-M72, (91+44 mmonth) Linking to: WP3, WP4, WP6

Description: The aim of work package is to demonstrate and evaluate the functionality, accuracy and usability of models built within the Unicorn environment in the pilot cases related to environment and health. Also, we will develop and implement practical end-user interfaces for citizens and municipalities. We have chosen three cases, within we will combine environmental exposure data and citizen health data into a total burden of disease model that can be used as a basis for further assessments and tools.

Methylmercury is a persistent environmental pollutant and neurotoxin originating from both natural and industrial sources and contaminating fish in lakes. Spatial differences are large, and therefore custo­­mized recommendations are valuable for health protection authorities and people eating fish. In Unicorn, we will open the large methylmercury data measured by SYKE during the last decades and develop an open online model for fishing recommendations. Also economic impacts will be explored, as the results may impact the reputation of some summer cottage lakes. Thus, this work will be done in close collaboration be­tween THL, SYKE, VATT, local authorities, and stakeholder groups.

Indoor environment quality is an important health issue, as a large fraction of Finnish people su­­ffer from indoor problems. This is also a major economic issue as exemplified by the 30-50 billion euro "renova­tion debt" in Finnish housing stock due to moisture damages. In schools the renovation debt is 3.7 billion euro, and this calls for action in municipalities. THL will produce an online indoor air questionnaire to schools and day cares including questions about students’ health and in­door environment quality. Interpretation guidance of these results will be produced based on existing refer­ence material. Data collection will be conducted with an online questionnaire, and data will be nationally collected to Opasnet database. An automated reporting system for individual schools and municipalities will be developed. Monitoring of indoor environment quality is regularly done in schools by the health protection authorities. Unicorn will produce an online data collection interface that can be used to combine monitoring and questionnaire data, and possibly the air pollution data measured by the municipality, for school level decision support. This requires collaboration with THL and municipalities, and VATT for economic evaluation.

Drinking water causes 4-5 waterborne outbreaks annually (according to the national outbreak notif­­ication and reporting system RYMY), 20-30 cases of drinking water quality deterioration, and sporadic wa­terborne illnesses caused by Campylobacter, Giardia, and Legionella. Municipality waterworks make more than 100 000 chemical and microbiological analyses from drinking water annually, but the data is un­derused both in municipalities and nationally. The national YHTI database (containing municipality data on health protection) is actively developing the management of these data. Unicorn will take that data as a part of its modelling system using ReplicaX (see WP7) and analyse it against health data available in THL. The aim is to increase awareness and capabilities of statistical analysis possibilities and offer decision support to municipalities about preventive management. This is a close collaboration between THL, Valvira (Na­tional Supervisory Authority on Health and Welfare), municipalities, JyU and VATT (for economic impacts in WP6).

Tasks :

  • open the large methylmercury data measured by SYKE
  • to develop an open online model for fishing recommendations.
  • to produce an online indoor air questio­­naireto increase awareness and capabilities of sta­­tistical analysis possibilities and offer decision sup­port to municipalities about preventive management.

Deliverables:

  • Policy assessment and open model about methy­­mercury (M12)
  • Online indoor air questionnaire and database for schools (M30)
  • An open platform for making online health assessments (M36)
  • Drinking water database and open model for waterworks (M40)
  • An open total burden of disease model based on the open platform (M60)

WP6 Unicorn pilot case: Demonstration relted to national and regional economy

Responsible group leader: Research Director Juha Honkatukia (VATT)

Research Group: M.Sc., Antti Simola, N.N. Schedule: M1-M36, M37-M72,

Linking to: WP3, WP4, WP5

Description: We will demonstrate the applicability of Unicorn in assessing the linkages between environmental state and regional economy in Conpat-project region and two other regions. We will use existing VATT models and possibly yet to be specified open source statistical and CGE models in the Unicorn environment. VATT mo­­dels rely on a detailed database that is unfortunately not open. Open, but less detailed version of the data will be applied in open source models when suitable in order to bring CGE techniques more available and open to decision makers.

As a background to these studies, the development of VATT models started in 1990s and has aimed at wide applicability in decision-making. Particularly the growing demand for quantitative policy analysis has e­­nsured that VATT models have fulfilled the aim. Policy issues that affect several sectors or have opposing im­pacts are very often analytically intractable leaving computational analysis as the prominent way to do analy­sis.

The VATT models are computable general equilibrium (CGE) models of Finnish economy. The single country model VATTAGE (Honkatukia 2013) is based on the MONASH model (Dixon and Rimmer 2001) and the regional model VERM draws also on TERM (Horridge et al. 2005) and MMRF (Adams et al. 2003) models. Both paragons are developed by Centre of Policy Studies at Victoria University, Australia, and are widely applied internationally. The model development at VATT has espe­­cially concentrated on special characteristics of Finnish economy such as public sector functions and popula­tion age structure, which are depicted in detail.

One attractive feature of CGE models is that they conform to the national account systems. Thus the model results are interpretable in that context and the effects can be expressed as changes in economic indic­­tors. Furthermore, the underlying input-output structure allows a consistent way to extend the economic analysis to material flow accounting. Consequently one of the main application areas has been interdis­ciplinary research with various collaborators. For instance, VERM applications include extreme weather events (Virta et al. 2011), regional wood supply (Honkatukia and Simola 2011), regionalization (Honkatukia 2013) and energy efficiency improve­ments (Airaksinen et al. 2015). In CONPAT consortium VERM is applied to socio-economic analysis of water re­lated contaminants and pathogens.

The aforementioned experience in interdisciplinary research is a good starting point for more general approach of an automatically generated modeling tool. Aside of equilibrium modeling, the VATT researc­­hers have also experience in econometrics methods that are frequently used in model parameterization.

Private sector use of open data is already extensive. Public sector lags behind mainly because it faces more complex problems – the required information does not concentrate on market segments of a single co­­mmodity but to a whole mix of industries in the economy, its demography, long term investments in infrastructure and planning of land use, policies countering the externalities and distributional issues. National account systems were created in order to convey consistent information on national economies. It serves as a natural starting point for CGE models and organizing open data that would benefit public decision-making.

Drinking water management.

We examine the Conpat study area related to management of waterborne outbreaks and regional economy in collaboration with SYKE (WP4) and THL (WP5). The economic effects include direct effects on labor productivity and regional trade balance, and indirect effects on consumer behavior. The former derives straightforwardly from production theory, and the latter from pr­­spect theory. CGE modeling is a consistent way for evaluating both direct and indirect effects simultane­ously.

Regional economic consequences of Talvivaara mining operations.

For this task we co­­nstruct a simple regional CGE model, which can feasibly solve multitude of times in Monte Carlo manner in order to account for uncertainty in economic and environmental outcomes. We use VERM or an open source alternative. With this ex post analysis we can demonstrate how an equivalent ex ante analysis would contribute to decision making by more balanced information of risks and unwanted consequences. Theoretical focus is in political economy of regional development. This is a close collaboration with SYKE (WP4) and local authorities.

School renovations.

We use VATTAGE to assess short and long term consequences of neglecting the renovation of schools with indoor problems. With correct timing, the renovation investments could serve as stimulus. It also has potential long-term productivity effects that are not optimized by markets alone. Thus our approach yields valuable information for public decision makers. In the short run analysis we assess the stimulatory effects of renovations. In the long run analysis we apply recent demographic exten­­sion of VATTAGE for evaluating long-term economic costs of shortsighted decision making. This is a close collaboration with THL (WP5).

Tasks:

  • Evaluation of open source CGE models
  • Management of waterborne outbreaks at Conpat regions
  • Talvivaara case study – regional economic in­­terests vs. environmental risks
  • Long and short term effects of renovation debt – what is good policy?

Deliverables:

  • Unicorn model realizations in cases studies
  • Simulation results
  • Scientific articles

WP7 Statistical methods, models and big data

Responsible group leader: Prof. Juha Karvanen (Dep. of Math. and Stat., Univ. of Jyväskylä)

Research Group: Jouni Helske, N.N.

Schedule: M1-M72 (124 mmonth)

Linking to: WP4, WP5, WP6

Description: Governmental institutes should publish their data as open data whenever not prohibited by confidentiality requirements. In practice, many datasets collected e.g. by THL contain sensitive information such as personal level health data. Naive anonymization, i.e. removal of names, addresses and personal identity numbers, is not suff­­icient to make the data publishable because it is often possible to deduce identity from multivariate data us­ing e.g. age, place of residence, profession, language or medical history.

Synthetic data is offered as a solution for this openness – confidentiality dilemma. Synthetic data or data replica is created by means of simulation and so that the statistical properties of the replica closely resemble the orig­­inal data. The individuals in the original data cannot be identified from the replica and therefore the replica can be published as open data.

Synthetic data offers new possibilities for the citizen science. The program codes developed for the replica can be applied with original data without any changes. The publisher of the data can therefore easily verify the analysis results with the original data. This enables an operations model where some parts of the data analysis are carried out by enthusiastic citizens (e.g. university students) and the employees of the governmental institute coordinate the work.

The concept has been already piloted: R code implementation ReplicaX by Juha Karvanen won the challenge “Utiliza­­tion of health data” in Apps4Finland 2013 competition. As a part of the project, ReplicaX will be developed fur­ther, tested extensively with real data and put in full-scale production use.

In order to efficiently combine multiple databases and models, state-of-the-art statistical methods are needed. As databases contain data with varying levels of uncertainty (stemming from data collecting strategies, modeling choices and sampling variation, among others), different sources of information must be weighted accordingly. For assessing these uncertainties, a Bayesian framework can be used to combine ex­­pert opinions, multiple data sources and models in way, which gives easily interpretable results in form of prob­ability distributions. This enables decision makers to make sophisticated forecasts under alternative sce­narios.

High proportions of the big data stored today are inherently time series. When building generic models, tak­­ing account the time dependency in the data is crucial in order to make proper inferences of the results. E.g. in (Helske 2013), yearly nutrient fluxes of four Finnish rivers were estimated using state space modeling approach which efficiently modeled both the cross-sectional and time dependencies of the data.

Analyzing data with complex time and cross-sectional dependencies with varying sampling frequencies re­­quires flexible models, which are robust enough yet still give meaningful results in realistic computational time. General purpose Bayesian modeling software such as OpenBUGS often requires considerable tuning of the estimation procedures, which are not well suited for time series data due to autocorrelation structures of the simulations relating to model estimation. Therefore it is important to build reliable and easy-to-use tools for analyzing various types of data from open databases. Similar but more restricted methods for forecasting uninvariate time series in frequentist framework were presented in (Hydman 2008) and (​Hyndman 2015) and in broader scope in (Hyndman 2015, Durbin and Koopman 2012) and (Helske 2015).

Without proper software, analysts are forced to use their old methods whether they are suitable for the prob­­lem or not. Aim is to build an efficient and robust Bayesian modeling framework for an open source soft­ware R, which can be used to model multivariate time series data with complex pat­terns and varying sampling frequencies, taking account multiple sources of information and uncertain­ties related to data and model structure.

Tasks:

  • Development of ReplicaX
  • Extensive testing of ReplicaX in real use cases
  • Deployment of ReplicaX for production usage and integration with Unicorn
  • Development of BayesTime
  • Testing BayesTime in pilot cases and with synthetic data from ReplicaX
  • Integration of BayesTime with Unicorn
  • Collaboration with work packages 4-6 by providing statistical support for pilot cases

Deliverables:

  • Open source program code of ReplicaX
  • Scientific articles on ReplicaX and its performance with real datasets
  • Integrated production version of ReplicaX
  • Open source code for BayesTime
  • Scientific articles on theory and usage of BayesTime with datasets related to Unicorn

Budget

The annual costs are presented in Table 1. At this point the costs are not presented on the institutional basis.

Estimated UNICORN-project budget based on Finnish unit costs.

Year 1

Year 2

Year 3

Year 4

Year 5

Year 6

Year 7

Total costs (euro)

Working time (m/m)

39

144

137

136

110

102

49

6976590

Travel (euro)

20000

38000

34000

38000

25000

35000

14000

204000

Material (euro)

2000

4000

3500

500

500

500

0

11000

Machines (euro)

25000

22000

20000

20000

17000

17000

17000

138000

Services (euro)

40000

37000

47000

54000

42000

32000

31000

283000

Other costs (euro)

10500

63000

15000

15500

13000

15000

13000

145000

Total (euro)

97500 164000 119500 128000 97500 99500 75000

7757590

Timeline

Main actions during the project years

The main actions of project during the project years are as follows. A more detailed time char on the task level is presented in Fig. 2.

Figure 2.

Time line of the tasks in UNICORN-project

Year 1: Identification of different groups of stakeholders; Inquiries and questionnaires; Negotiations within the consortium; Invitations to the kick-off seminar; The large kick-off seminar in Tampere, December.

Year 2: Meetings concerning specific themes of the project; Inquiries and questionnaires; Clarifying the role of the different stak­­eholder groups; Information about the first results of the project to stakeholders; Annual seminar; Circular to the stakeholders including the main contents of the annual progress report;

Year 3: Meetings concerning specific themes of the project; Clarifying the role of the different stake­­holder groups; Information about the results of the project to stakeholders; Annual seminar; Practical demonstrations; Circular to the stakeholders including the main contents of the annual progress report;

Year 4: Meetings concerning specific themes of the project; Information about the results of the project to stakeholders; Annual seminar; Practical demonstrations; Circular to the stakeholders including the main contents of the interim report

Year 5: Revising the role of the different stakehol­­ders; Meetings concerning specific themes of the project; Information about the results of the project to stakeholders; Annual seminar; Practical demonstrations

Year 6: Meetings concerning specific themes of the project; Information about the final results of the project to stakeholders; Practical demonstrations

Year 7: Final seminar; Publications.

Call

The proposal was submitted in 2015 to the Strategic Funds of Academy of Finland. It was rejected as too ambitious and having low commercial potential. We strongly believe that proposed Unicorn environment and growing community of it's developers can have an abandant commercial succes. The authors are open for any futher funding suggestions and also forming new consortiums.

Hosting institution

University of Jyväskylä

References