Quantifying the Impact of Data Sharing on Outbreak Dynamics (QIDSOD)

In this project, we will explore the range of data-related decisions made during public health emergencies like the ongoing COVID-19 pandemic and analyze the flow of information, data, and metadata within networks of such decisions. Data sharing is now considered a key component of addressing present, future, and even past public health emergencies, from local to global levels. Researchers, research institutions, journals and others have taken steps towards increasing the sharing of data around the ongoing COVID-19 pandemic and in preparation for future pandemics. We will quantify the effects of data flow modifications to identify parameter sets under which specific modes of sharing or withholding information have the largest effects on outbreak dynamics. For these high-impact parameter sets, we will then assess the current and past availability of corresponding data, metadata, and misinformation, and estimate the effects on outbreak mitigation and preparedness efforts.


Significance of the research question to global infectious diseases
Public health emergencies require profound and swift action at scale with limited resources, often on the basis of incomplete information and frequently under rapidly evolving circumstances. The sharing of data and associated metadata is a relatively new flavor under this broader theme, but one that has been receiving steadily growing attention over the last few years, especially in the context of Public Health Emergencies of International Concern like the Zika outbreak . By now, we have reached a point where data sharing must be considered a key component of addressing present, future and even past public health emergencies, from local to global levels . In response, researchers, research institutions, journals, funders, and others have taken steps towards increasing the sharing of data around ongoing public health emergencies and in preparation for future ones . These measures range from the adoption of open lab notebooks to modifications of policies and funding lines, and they include conversations around infrastructure, cultural change, misinformation or data ethics (e.g. Ekins et al. 2016).
In this project, we will explore the range of data-related decisions made during example outbreaks and analyze the flow of information, data and metadata through the decision network with respect to different types of modifications of such flows. On that basis, we will quantify the effects of such flow modifications on outcome measures relevant to various stakeholders from local to global levels, so as to identify parameter sets under which specific modes of sharing or withholding information have the largest effects. For these high-impact parameter sets, we will then assess the current and past availability of corresponding data, metadata and misinformation, estimate the effects on outbreak mitigation and preparedness efforts and explore mechanisms through which that availability could be optimized.

Approach
The proposed research aims to bridge the knowledge gap between what we have (i.e., different formats of data sharing/withholding decisions and various stakeholders involved during an infectious disease outbreak) and what we need (i.e., quantify the impact of different data-related decisions during outbreaks). This project seeks to address the following three research aims.

Research Aim 1: Model the flow of data/information through a decision network
Disease outbreaks and other public health emergencies involve a potentially wide range of different stakeholders. These stakeholders are typically interconnected by various types of interactions within and between themselves as well as with the pathogen and the respective social, natural and built environments. We propose to model the interactions among different stakeholders as a heterogeneous decision network where nodes denote different stakeholders and edges denote different types of interactions among them, while the interactions can be characterized based on the availability and sharing of data pertaining to said interactions, e.g. as to whether the pathogen is known, how it can be transmitted, or whether vaccination or treatment is available and how much it costs. We refer to the sharing of data or metadata as data flow, which can be modulated in several ways as one interaction triggers the next. For instance, a student might decide to change their behavior towards others based on the outcome of a diagnostic test, and authorities might decide to temporarily close the affected school or not.
For various reasons (e.g., societal, political, technical, or ethical), much of the data relevant for a complete path through such a decision network may not be directly accessible, posing challenges to investigating its potential and real decision chains. For example, outbreak propagation could be better characterized, understood, predicted and communicated if precise demographic information and migration information were available for individuals near the epicenter (McDonald et al. 1992,Pastor-Satorras andVespignani 2001). The existence of such data, e.g. in government databases, does not generally imply its availability to other stakeholders, though sometimes, similar insights can be gleaned from other sources, e.g. mobility data from providers of transport services or fitness apps .
The modeling of the flow of data, metadata and related information through such a decision network enables us to create hypotheses to investigate the impact of the availability and quality of data on specific kinds of decisions, or with respect to specific stakeholders, locations or timing. In designing the model, we will initially focus on data from Public Health Emergencies of International Concern as well as the WHO's R&D Blueprint , keeping in mind the applicability to other epidemiological contexts such as seasonal flu or the opioid crisis.

Research Aim 2: Quantify the impact of data sharing for decision making
Once the structure of the decision network and the nature of potential data flows has been captured in the model, we can study causal relations between individual or aggregated decisions (such as the closure of one school or multiple schools), the associated data flows, and the spread of the pathogen and the disease. Such decision-making processes are often based on randomized controlled experiments (Guo et al. 2020, Kallus and Zhou 2019). However, they could be expensive in multiple dimensions, including in terms of time and financial resources, which makes them challenging to perform in outbreak contexts. One way to address this is to consider observational data of different stakeholders to infer causal effects between a specific decision (e.g., closure of school) and an important outcome (e.g. spread or containment of disease; Pearl 2009, Rubin 1978).
Learning such treatment effects from observational data as in the mobility example above requires us to handle confounding bias, which are unobserved variables that influence both the treatment and the outcome. For example, an individual's poor socioeconomic status can affect their living conditions and may increase their chances to be infected, treated or cured. In addition to the observational data, we can include measures of the information flow between stakeholders (e.g. hygienic advice, or rumors) to infer the existence of hidden confounders (Hill 2011, Guo et al. 2019). To be able to adapt the decision network model to the nature of a given real or hypothetical outbreak scenario, data availability can be modeled generically as a systemic property, more granularly at the level of classes of stakeholders or decisions, and in a yet more fine-grained manner at the level of individual stakeholders or interactions if suitable data are available or can be inferred.

Research Aim 3: Leveraging the information for current and future outbreak management
For a given outbreak management context, we can fine-tune the model parameters for that context in order to use the model to address questions that might arise during outbreak management. While classical epidemiological modeling provides information on expected outbreak dynamics and recommendations on outbreak management like how much of what to stockpile, our model would provide additional insights into whether, how and when details about preparedness and response or associated research should be shared and with whom, or what data quality requirements should be aimed for at which junctions in the network.
These details might range in scope from individual patients to triage protocols used by health workers to institutional or international policies about sharing diagnostic kits, material samples or computational pipelines. In short, the decision network model would behave much like a machine-actionable data management plan for the outbreak in question, which would also allow, for instance, to notify specific stakeholders of datarelated outbreak developments relevant to them (Miksa et al. 2019).

Potential impact of the expected outcomes of the research
The research is expected to yield best practice recommendations in terms of the sharing of data and metadata in the context of specific outbreak-related decisions by stakeholders ranging from individuals to groups and institutions to governments and international bodies. Besides identifying recommendable data sharing scenarios, we will also consider the effects of delays in data sharing, partial sharing as well as the spread of misinformation.

Availability of data and code
To the extent possible, we will follow best practices in sharing our code and data as well as associated documentation, as laid out by Barton et al. 2020. To this end, we have set up a GitHub organization at https://github.com/QIDSOD, which will be our default mode for sharing non-confidential aspects of the project. We welcome and encourage community participation throughout the project.

Funding program
The project is funded by a COVID-19 Rapid Response grant jointly provided by the Global Infectious Diseases Institute (GIDI) at the University of Virginia, in partnership with the Office of the Vice-President for Research of the University of Virginia. It is based on a proposal originally submitted to GIDI's Collaborative Seed Grants program on 2 March 2020.

Grant title
Quantifying the Impact of Data Sharing on Outbreak Dynamics (QIDSOD)

Hosting institution
The School of Engineering and Applied Sciences and the School of Data Science at the University of Virginia.

Ethics and security
At the time of writing, the project has not been assessed externally in terms of its ethics and security implications, but the ethical and security aspects of data-related decisionmaking during public health emergencies are within the scope of the project, and we will document them as well as our interactions with relevant oversight bodies as the project progresses.