Data Management Plan : Opening access to economic data to prevent tobacco related diseases in Africa

The purpose of this project is to demonstrate that tobacco-related data from selected Africa countries can be collected and distributed from an Open Data platform. The platform and data will improve the capacity for tobacco control research in key sub-Saharan African countries, and help develop a continent-wide research approach to tobacco control.


How will the data be collected or created?
Data Collection methods This will involve searching websites of government departments These will include data on:

Desk-based search of official websites of project countries
1.
Prevalence of tobacco-related diseases, and tobacco-related morbidity and mortality (Departments of Health) 3.
Tobacco products manufacturing, tobacco imports/exports (Departments of Trade and Industry) 5.
Tobacco usage, from Surveys by National Statistics Agencies (NSAs).In South Africa unit record administrative data from government departments, repackaged as research datasets, are also shared by the NSA.If data collection instruments (administrative forms) used to collect the data are available on these sites they can provide useful information on the data.

Desk-based search of websites of International Development Organisations
This desk-based study will allow us to discover tobacco data on project countries that has been collected by international organisations.The will include the websites of the international DHS Program and UN bodies such as the World Health Organisation (WHO).From this a "question bank" will be created of useful variables and the datasets where these can be found.

Desk-based search of industry websites
The third component of our desk-based research will involve examing online records of the tobacco industry.From these we hope to obtain data on: Cost of tobacco production, and profits, in the industry, prices of raw tobacco and tobacco products, salaries, capital and foreign investment, mergers and acquisitions, advertising spend, and regulations in the industry

Approaches to data holders
We will create metadata on our platform for surveys already shared by others online.Our desk search may also reveal the existence of datasets with a tobacco data components but which are not in the public domain.In these cases we will approach the relevant research projects in the project countries to release this data and allow the Project to host this on their Open Data portal.This may be a fraught process but any challenges and successes can be written up to inform our future work.

Own surveys
The project has already crowd-sourced data on current prices of tobacco products in two project countries.This may be expanded during the course of the project to all project countries.We will upload and share the metadata and data from these surveys.

How will you manage copyright and Intellectual Property Rights (IPR) issues?
The government data we will collect is not subject to IPR.The tobacco industry data we will collect will be public records.We will therefore not publish information that would compromise any IP rights.However, we will check each traunch of data we obtain, to ensure we have permission to pass the data on to third parties.

Storage and Backup
How will the data be stored and backed up during the research?
The data would be stored on a server managed and backed up by the University of Cape Town's Commerce IT Department.Curation of the data will be the responsibility of DataFirst's Research Data Service.DataFirst is a technical partner on the Project.Each preservation dataset will consist of data files, document files, metadata files, and any programme files used in creating the data files.Data Service staff will be responsible for adding data updates to datasets.We will also handle version control to ensure the most recent and accurate data files are published, and provide tombstone citations to earlier versions for verification or replication of research which may cite these supercede versions of the data.

How will you manage access and security?
Access to the server hosting the preservation datasets will be password controlled.
Passwords will be allocated by the Commerce IT manager only to Data Service staff.Server software will monitor data security and integrity.

Selection and Preservation
Which data are of long-term value and should be retained, shared, and/or preserved?
Criteria for preservation will be: 1. Data is tobacco-related 2.
Data covers project countries 3.
Data is accurate and reliable (we will undertake quality audits to determine this) 4.
Data is unit record data (not aggregated but available at the level at which it was collected) 5.
Data is not readily available from another repository Data Management Plan: Opening access to economic data to prevent tobacco ...

Retention:
It is difficult to predict what data has long-term value.Our policy will be to store unit record tobacco data indefinitely.As these datasets grow, so will their value over time.Time-series data will continue to be useful for economic and health policy research in the long term. Sharing: Because we aim to establish an Open Data portal, all data retained/preserved will also be shared.The Project's policy is aligned to DataFirst's policy: We do not archive data which cannot be shared with researchers in some form and at some access level.

What is the long-term preservation plan for the dataset?
There will be numerous datasets.Our long term preservation plan for the Project's data holdings depends on the sustainability of DataFirst's Research Data Service.The service was established in 2001 and is a unit at the University of Cape Town, a well-funded and well-established university in South Africa.Our sustainability prospects are therefore good.

Figure 1 .
Figure 1.Tobacco Data in Africa Project Data Inventory 2016.Original data available as Suppl.material 1.