Research Ideas and Outcomes : Software Description
PDF
Software Description
WarenstromInfo: a tool for the easy extraction and visualisation of trade flow data
expand article infoOctavio Mesa-Varona, Carolina Plaza-Rodríguez, Lars Valentin, Matthias Filter
‡ German Federal Institute for Risk Assessment, Berlin, Germany
Open Access

Abstract

Background

Epidemiological outbreak investigations often prove to be lengthy and inconclusive due to the time-consuming nature of the currently-used approaches. An alternative approach to address these challenges could involve the application of algorithms to support authorities and food business operators by providing timely, relevant and reliable information. Algorithms, such as gravity models, could be applied as commodity trade models, but they require a large amount of reliable and consistent data on trade for generating projections at international, national or even regional level. Several trade databases, such as UN COMTRADE, EUROSTAT, BACI, CHELEM and GTAP, provide information on trace, albeit with variations in the provided information and in the structure. However, it is worth noting that not all of these databases are freely accessible and data management can pose challenges, hampering the access to the trade data. WarenstromInfo (WI) was created as a software solution that allows easy trade data extraction and visualisation for application in different areas, such as outbreak investigations.

New information

WarenstromInfo (WI) is an application tool that automatically extracts, decodes and visually displays trade flow data from EUROSTAT "EU trade since 2002 by statistical procedure, by HS2-4-6 and CN8 (DS-059322)" (hereinafter referred to EUROSTAT) and the BACI databases, based on user input.

WI was developed by using the open-source desktop software KNIME Analytics Platform. WI offers the flexibility to be executed either as a web service on a KNIME Web Server infrastructure or as a local resource.

To integrate the BACI database into WI, the database is annually downloaded as csv files, rebuilt as a SQLite database and hosted locally into the KNIME Web Server Infrastructure. In order to optimise storage space on the KNIME server, this SQLite database specifically includes only agrifood data, reflecting the tool´s focus. However, if new objectives are established, this database can be expanded. Further, data of the SQLite database can be customised by executing the WI workflow locally, enabling the user to expand the database at any time.

In contrast to BACI, trade data extraction from the EUROSTAT database is performed via the EUROSTAT’s API (Application Programming Interface) applying GET requests and XML data management.

WI displays four User-friendly Interfaces (UIs) designed with interactive KNIME nodes that facilitate the input of variables. The extracted trade flow data are shown through interactive tables directly within the UIs. This feature enables users to easily explore the data in a structured and user-friendly manner. Additionally, WI incorporates the extracted trade flow data into maps. These maps provide a visual representation of the data, allowing users to gain insights and identify patterns and trends geographically. Both, the data table and the maps, can be downloaded as a single Excel file (containing multiple preformatted tables) and as png files, respectively.

Keywords

export-import data, EUROSTAT DS-059313, BACI, KNIME, data extraction tool, foodborne outbreak investigation

Introduction

Foodborne outbreaks have a major impact causing significant public health and economic losses (Perrot et al. 2011). This leads to 1 out of 10 people becoming ill from microbially or chemically contaminated food in the world, causing 600 million illnesses and 420,000 deaths every year (Havelaar et al. 2015, Lee and Yoon 2021). Further, investigations and epidemiological analyses of foodborne outbreaks are often long, being not feasible in many cases to identify the outbreak source (Ravel et al. 2009,National Center for Emerging Zoonotic Infectious Diseases 2017. Therefore, alternative and novel approaches are required to support authorities and food business operators (FBO) in sufficient time, by providing proper, relevant and reliable information to clarify foodborne outbreak investigations (Salamon et al. 2019, Balster and Friedrich 2019,Schlaich et al. 2020).

One potential approach involves the use of commodity trade models. These models can create future flow projections, based on different scenarios for countries, regions and even FBO (Balster and Friedrich 2019, Horn and Friedrich 2019). These models can support traditional outbreak investigation systems by providing projections on relevant information, such as the identification of the most commonly traded commodities per region or per FBO in the period when the outbreak occurred. However, these models require a large amount of historical trade data. Although there are several trade databases available, such as UN COMTRADE, EUROSTAT, BACI, CHELEM and GTAP (United Nations 2003, Saint Vaulry 2008, Gaulier and Zignago 2010, United Nations Department of Economic and Social Affairs 2022), only some of these databases are freely available (Gaulier and Zignago 2010) and data extraction and decoding are often laborious.

Here, we present a new web-based solution called WarenstromInfo (WI) that enables users to easily extract and visualise trade flow data from the EUROSTAT and BACI databases. To the best of our knowledge, there is currently no open-source tool specifically designed to easily extract data from both Eurostat and BACI databases without requiring familiarity with programming software tools such as R and Python and that allows easy extension of the service by incorporating any other database through API integration, such as UN COMTRADE (United Nations 2003).

The integration of the BACI database into the WI tool, along with performance improvements, updates of the API service and improvements of the UIs of the EUROSTAT database, were performed under the FoodDecide research project. FoodDecide aims to develop software solutions for Montenegrin food companies and government agencies, that could enhance food safety and expedite investigations into foodborne disease outbreaks. Previous versions of the WI tool were developed under DEMETER (https://h2020-demeter.eu/) and ZooGlow (https://www.sifo.de/sifo/de/projekte/schutz-kritischer-infrastrukturen/sicherung-der-lebensmittel-und-lebensmittelwarenketten/zooglow/zooglow_node.html) projects.

Project description

Title: 

WarenstromInfo: a tool for the easy extraction and visualisation of trade flow data.

Design description: 

WI is an easy-to-use web-based tool that extracts and visualises trade flow data from the EUROSTAT and BACI databases.

EUROSTAT (https://ec.europa.eu/eurostat/databrowser/view/DS-059313/default/table?lang=en) is an online monthly updated database that provides data on commodities with two-month delay via REST API. Commodity market values in euros (€) and quantities in kilograms are provided as official data reported between European Union (EU) countries and the entire world. EUROSTAT is based on the Common Nomenclature (CN) that is, in turn, based on the Harmonised Commodity Description and Coding System (managed by the World Customs Organisation (WCO)) (EUROSTAT. 2009c).

In contrast, BACI (http://www.cepii.fr/CEPII/en/bdd_modele/bdd_modele_item.asp?id=37) is an online yearly updated database that provides encrypted data on commodities with two years of delay. BACI commodity market values, provided in thousands of dollars ($) and quantities in metric tonnes, are the result of multiple analysis of the UN COMTRADE data amongst all countries in the world. These analyses try to estimate more accurate trade results (Gaulier and Zignago 2010). As there are six versions of the UN COMTRADE based on the Harmonised Standard (HS) (i.e. HS92, HS96, HS02, HS07, HS12 and HS17), the BACI database is also based on the same standard and standard versions.

While WI requests all commodity trade data from EUROSTAT on a monthly basis via the REST API, only agrifood trade data from BACI are accessible on a yearly basis via the WI web service. This is because the BACI database has to be stored locally on the KNIME Server. To optimise storage space on the KNIME Web Server infrastructure and meet the specific interest and purpose of WI, a BACI dataset containing only agrifood commodities is built. However, this dataset could be expanded according to new objectives. Further, WI can also be run locally on a local PC by downloading the WI ZIP folder (Mesa Varona et al. 2023) that comprises the following components:

  1. WarenstromInfo tool workflow: This workflow streamlines the extraction and visualisation of trade flow data.
  2. BACI SQLite database builder workflow: This workflow concentrates on constructing a BACI SQLite database.
  3. A BACI SQlite database including only agrifood data.
  4. An example Excel file and accompanying images: These files serve as illustrative samples for demonstrating the functionality and potential usage of the workflows.
  5. A readme file: The file provides an overview and essential information on how to install KNIME, execute the WI tool and set up and replace the default BACI SQLite Database locally.

The local approach allows us to customise the SQLite BACI database and, therefore, to access any trade data beyond the agrifood data. This procedure requires the installation of the KNIME Analytics Platform (https://www.knime.com/knime-analytics-platform) on a local PC, along with the requisite proficiency in operating the tool. The readme file, provided in the WI ZIP folder, serves as a guide for users to install and execute the WI service locally. Note that KNIME is compatible with Windows, Linux and Mac operating systems.

A step-by-step description of the WI operation is provided in Figures 1-4. The web portal header (upper horizontal grey-coloured section of the UIs; Figs 1, 2, 3, 4), also called banner, provides relevant information of the request in question, including, for example, the name of the selected database and an UI flowchart. The UI flowchart goes along with the data extraction through the UI2, UI3 and UI4. By changing the colours, the UI flowchart shows the status of the execution at any given moment (Fig. 3): Left: UI already filled in – yellow; middle: UI being executed – dark grey framed in yellow; right: UI not yet executed - pale grey.

Figure 1.  

WI User Interfaces: UI1. In the UI1, users can select the desired database (EUROSTAT/BACI) (A). The database selection is concluded by clicking “Next” (B), which will lead the user to the following UI.

Figure 2.  

User Interfaces: UI2. In the UI2, the following input options can be selected: standard selection (C), time range of the search (D), the email option (E), the pre-filter option (F) and the table with the embedded live filter (G). Input data from UI2 are processed by clicking “Next” (H).

Figure 3.  

WI User Interfaces: UI3. In the UI3, countries involved in the query can be selected (I and J). Input data from UI3 are included in the query by clicking “Next” (K).

Figure 4.  

WI User Interfaces: UI4. The UI4 provides an overview of the final WI outputs, displaying a list with the initial input variables (L), three maps displaying trade flows (a world map, a world map focused on Europe and an European map) (M), download options (N), slide filter bars for value and weight preview (O) and trade flow data in the preview table (P).

WI provides four different UIs with a similar appearance for both databases, enabling input variables to be entered (UI1, UI2 and UI3) and the visualisation of the outputs in place (UI4) or via email (Figs 1, 2, 3, 4). In addition, warning messages are displayed in case mandatory fields are not filled in or no trade flow has been found for the selected input variables. In case no trade flow data are found in the EUROSTAT database, the specific reason of the error will be indicated according to the information provided by EUROSTAT (EUROSTAT. 2009a).

  • UI1 Selection of the input variables I (Fig. 1): UI1 displays a single selection option for the two databases (i.e. EUROSTAT and BACI) together with a specific text output to the selected database providing relevant information about it.
  • UI2 Selection of the input variables II (Fig. 2): UI2 provides different options for each selected database. It allows the selection of the desired output value type (i.e. the amount of imported/exported commodity as market value or weight; only for EUROSTAT), flow selection (i.e. export or import flow; only for EUROSTAT), standard version of the database (only for BACI), the time range (monthly selection for EUROSTAT and year selection for BACI), commodities selection and the option to send final WI outputs via email.

For the BACI dataset extraction in the UI2, the HS version selection and year range selection options are linked. The HS system has been revised six times and consequently six versions of the same standard are available. The selection of an HS version has an immediate reaction in the year range selection options displayed. Thus, if the HS95 (i.e. 1995) is selected, all the years from 1995 are offered, while if the HS07 (i.e. 2007) is selected, only those from 2007 are offered. This feature allows users to use harmonised trade data from different sources. For instance, the Food and Agriculture Organisation Corporate Statistical Database (FAOSTAT) website disseminates statistical data collected and maintained by the Food and Agriculture Organisation (FAO). These data apply the FAO standard and FAOSTAT provides a list that shows the correspondence of a FAO product/item with two HS versions (i.e. HS 07 and HS12) (Food and Agriculture Organization of the United Nations 2023).

For the extraction of the EUROSTAT database in the UI2, flow selection features allow selecting the type of trade flow requested (i.e. import or export). In contrast to the BACI database that only provides a single market value and weight for commodities in a trade flow, EUROSTAT reports official import and export market values and weights, based on the EU reporting country (defined as Declarant in the tool). There might be a high variability on market values and weights according to the EU reporting country. This variability is clearly grounded and defined as bilateral asymmetries (UN Comtrade 2023).

In order to facilitate the commodity selection, two filters are displayed in the product table. The first one is called the “pre-filter” option and enables the user to filter for a commodity term (e.g. milk), initial product code (e.g. 01 or 011), the whole commodity code or even show all commodities (i.e. *) by clicking on the “upload table below” button. The second one, a “live-filter” option assembled in the upper part, enables the same search functionalities as the “pre-filter” option, but interactively. This table also offers the individual commodity selection and the “all commodity selection” options via check boxes.

Besides, a further option gives the opportunity to send the output dataset and visualisation maps via email after executing the UI3. In this way, users can save time waiting for extensive data requests. The email address only needs to be entered once, as it is stored together with the username, leading to the automatic subsequent display of it. Note that the email service exclusively operates within the KNIME server.

  • UI3 Selection of the countries involved in the query (Fig. 3): In the UI3, the countries involved in the query are selected in different forms for each database. For the BACI database, the countries are defined as importers and exporters and include, in both cases, all countries in the world. For the EUROSTAT database, they are defined as partners (including all countries in the world) and declarants (including just EU countries). Two tables with similar live embedded filters and functionalities to those from the product table described above, are used to display the countries involved in the search of trade flows.

Once the UI3 is executed, trade flow data extracted via SQL (BACI) or API (EUROSTAT) are displayed (UI4).

  • UI4 Visualisation of the final WI outputs (Fig. 4): UI4 shows the preview of the extracted data only in case the email option is disabled. The UI4 is divided into four sections:

a. List of initial input variables: The first table enables the visualisation of all initial input variables requested by the user, supporting the identification of potential lacking trade flows in the final WI outputs.

If the search includes more than 10 countries and/or commodities, the list of initial input variables will not display all the selected countries or commodities, but instead, the following messages will appear: “More than 10 products selected”, “All countries selected” or “More than 10 countries selected”.

b. Trade flows maps: Visualisation and download options are available for each of the three available map images in png format (i.e. World map, World map zoomed in Europe and a Europe map). Maps display trade flows through arrows facilitating the understanding of trade flow data. These arrows show different thicknesses indicating the magnitude of the commodity trade flow (the higher the trade flow value, the thicker the arrow). In addition, large commodity flows, compared to other flows in the dataset, are highlighted with red arrows. Note that visualisation images are provided in the WI ZIP folder for illustrative purposes. Further, the enhancement of the user experience through visualisation improvements could potentially occur in the future.

c. Excel file: A preformatted Excel file is generated to effectively display professional-looking data by colouring, aligning and providing filtering options to the column titles. A sample Excel file can be found within the WI ZIP folder for reference. This Excel file eases the user work by offering the following pre-configured data management tabs:

·Trade flow data: It displays all trade data extracted with the highest granularity showing different columns for: countries involved (importers and exporters for BACI and declarant and partners for EUROSTAT), EUROSTAT database identification number (only for EUROSTAT), commodity, year (in case of EUROSTAT, month of the data extraction), market value, product weight (either market value or weight for EUROSTAT) and HS version (only for BACI).

·Prod_flow_selected_period: This table displays trade data grouped by countries involved and commodity.

·Total_sum: This table displays trade data grouped only by countries involved.

·Product_per_year: This table displays data grouped by declarant, partner, commodity and year (only for EUROSTAT).

·Initial_variables: All initial variables are displayed.

d. Trade flow data in a preview table: Two sliding filter bars enable the user to define the range of the shown market values and weights of the commodities in the preview table. In case of EUROSTAT, only one sliding filter bar is shown, being in line with the desired output value type selected (i.e. market value or weight of the commodities). Slide bars are disabled if a single trade flow datum is shown.

In this preview table, the user can set up the number of entries per table and page, sort columns in ascending or descending order and filter any term in the table. This table is designed solely to facilitate data evaluation and comprehension.

WI is accessible via the KNIME Server WebPortal upon request to 4sz@bfr.bund.de or downloading the WI ZIP folder (Mesa Varona et al. 2023) containing the workflows for the local execution of the service. A brief description on the WI software tool is shown on the foodrisklabs website (https://foodrisklabs.bfr.bund.de/warenstrom-info-en/).

Funding: 

The research leading to these results received funding from:

  • The Federal Ministry of Education and Research (BMBF), Funding codes from 13N12697 to 13N12701 - ZooGloW.
  • The European Food Safety Authority (EFSA), under Grant Agreement No. GA/EFSA/AFSCO/2016/01-01 -DEMETER.
  • The German Aerospace Centre (DLR), under Grant Agreement No. 60-0103-05.P245-FoodDecide.

Web location (URIs)

Technical specification

Platform: 
Web browsers
Operational system: 
Windows; Linux; Mac

Usage licence

Usage licence: 
Other
IP rights notes: 

CC BY 4.0

Implementation

Implements specification

There are a multitude of tools that allow or facilitate the extraction and visualisation of trade data, some of them being open-source. This is the case for R (Eurostat - e.g. restatapi (Mészáros and Weinand 2023) and eurostat (Lahti et al. 2023); BACI - e.g. baci (Veiga et al. 2023)) and Python (Eurostat - e.g. easy_eurostat (Meulen 2022) and eurostat (Cazzaniga 2023); BACI - e.g. baci (Reyes 2023)) libraries. However, they often entail a deep learning curve. WI was developed for the easy extraction and visualisation of trade data from Eurostat and BACI databases without the need for prior software knowledge. This, together with its great adaptability, which allows us to incorporate new databases in a simple way via API, makes this tool unique.

We recognise that there may be a possible concern regarding the annual update frequency for BACI data and its reflection of the dynamic global sourcing industry. To address this, our approach integrates monthly extractions from the Eurostat API, providing a more granular and up-to-date perspective on economic activities. While BACI data offers a comprehensive annual overview, the monthly Eurostat updates contribute real-time insights, ensuring our tool captures both the long-term trends and the ongoing dynamics within the industry. This dual-frequency approach strikes a balance between data availability and the need for timely, detailed information, enhancing the tool's efficacy in supporting informed decision-making.

This versatile tool finds utility across diverse professional domains that demand the extraction of trade data. In the field of food safety, such trade data could be of great importance not only for outbreak investigation, but also for other areas, such as risk assessment and predictive analysis and forecasting.

For the creation of the WI infrastructure, the user-friendly graphical workbench for the data science KNIME Analytics Platform (Berthold et al. 2009) was used. This open-source software enables the creation of data pipelines, called workflows, in a transparent manner and by connecting KNIME nodes. These nodes are modules that execute specific data processes, such as data reading/writing, data manipulation or data visualisation. They are sequentially connected, with the output of one node serving as the input for the next node in the sequence.

Each executed KNIME node stores all its specific data outputs. Therefore, an executed KNIME workflow that encloses a variety of different connected nodes may store the whole data analysis process including all intermediate results. Workflows can be extended, adapted, manipulated and implemented with new functionalities because of the KNIME modular principle. For the WI KNIME workflow creation, the subscription-based KNIME Web Server infrastructure was used that allows access to different versions of the workflow showing the progression in the acquisition of new workflow functionalities.

This workflow consists of nodes, meta-nodes and component nodes. Although meta-nodes and component nodes group nodes, the component nodes also enable the creation of web-based UIs by using KNIME widget nodes. Each UI is made up of component nodes within other component nodes that are placed for the composite view creating the final UI layout.

Differences on both databases have a great impact on the WI workflow implementation.

BACI database is freely available as downloadable ZIP files (http://www.cepii.fr/CEPII/en/bdd_modele/bdd_modele_item.asp?id=37). Each of the six compressed files consists of three types of csv files:

  • a file for decoding product codes;
  • a file for decoding country codes and
  • files containing coded data on trade flows (each csv file contains the coded data of all trade flows in the world for all products in one year).

In order to build the BACI dataset for WI, ZIP files are annually downloaded (http://www.cepii.fr/CEPII/en/bdd_modele/bdd_modele_item.asp?id=37) and decompressed. The BACI SQLite database builder workflow was created to read each csv file, classify it according to its content, filter agrifood data and store it as a SQLite database. The latter database consists of three datasets that are in line with the three types of csv files available (i.e. dataset enclosing (a) product codes, (b) country codes and (c) coded data on trade flows). This BACI SQLite database is hosted in the internal and external KNIME Web Server infrastructure and is used by WI providing trade flow data quickly and efficiently. Just in case this tool is accessed locally, the user has to create his/her own BACI SQLite database by executing the BACI SQLite database builder workflow. This workflow also allows the user to customise the BACI SQLite database (Fig. 5).

Figure 5.  

A screenshot of the BACI SQLite database builder workflow.

The workflow presented here is organised into two main sections, with interconnected KNIME nodes displayed in each section. The yellow-boxed nodes are responsible for creating and loading the database where the data are stored. The green-boxed section shows:

  1. Locate BACI csv files (BACI files must have been previously downloaded and stored in a folder that is pointed in the “List Files/Folders” KNIME node).
  2. Split the files according to the HS data provided and
  3. Load csv files, filtering and storage in SQLite database (carried out in each consecutive metanode). In this last section, the user can decide not to filter the data or customise the filter criteria of the BACI database, including more types of data apart from those agrifood data that are included in the default configuration of the workflow.

Variable flow connections (red lines) can be removed in the second green-boxed section, if an HPC (High Performance Computing) service is available. This will allow the workflow to run much faster as the nodes will not be executed one after the other, but all at the same time.

The WI infrastructure was established through the utilisation of the KNIME Analytics Platform (version 4.5.1), complemented by the integration of the following dependencies:

  • Continental Nodes for KNIME extension (version: 1.3.0.v202111081724): The pre-formatted features shown in the excel output file are provided by XLS Formatter nodes in this extension under KNIME Community Extensions - Other. XLS Formatter nodes make it possible to add formatting and advance settings to XLS files.
  • GIS and Graph Visualisation extension (version: 1.1.44.K4-5-202302080207): "This extension is part of the BfROpenLab extension group (The Federal Institute for Risk Assessment in Germany (BfR) 2013). Maps and maps functionalities displaying trade flows are provided by the KNIME Nodes from this extension that is available in the following site: https://gitlab.bfr.berlin/binrepos/fcl/raw/4.5. Please note that the latter source is exclusively accessible under KNIME Analytics Platform. It is worth mentioning that the last two digits of the extension site (i.e. 4.5) correspond to the version of KNIME being used (i.e. KNIME 4.5.x). If you are using the latest version of KNIME, please visit the following URL to verify the latest extension site: https://foodrisklabs.bfr.bund.de/installation/. It is important to note that each extension site is associated with a version of KNIME.
  • World Borders Dataset from Thematic Mapping open data platform: This dataset includes detailed geographic and demographic attributes for each country, such as ISO country codes and United Nations region and sub-region codes (Sandvik 2009).
  • Nomenclature of Territorial Units for Statistics (NUTS) dataset: Eurostat, the statistical office of the European Union, provides a data shape dataset for EU countries (EUROSTAT. 2009b).
  • BACI Database (http://www.cepii.fr/CEPII/en/bdd_modele/bdd_modele_item.asp?id=37).
  • "EU trade since 2002 by statistical procedure, by HS2-4-6 and CN8 (DS-059322)" EUROSTAT Database (https://ec.europa.eu/eurostat/comext/newxtweb/).

Acknowledgements

This work was supported by funding from:

  • The Federal Ministry of Education and Research (BMBF), Funding codes from 13N12697 to 13N12701 – ZooGloW.
  • The European Food Safety Authority (EFSA), under Grant Agreement No. GA/EFSA/AFSCO/2016/01-01-DEMETER.
  • The German Aerospace Centre (DLR), under Grant Agreement No. 60-0103-05.P245-FoodDecide.

Author contributions

Octavio Mesa-Varona - Writing - original draft, Software

Carolina Plaza-Rodríguez - Writing - review and editing

Lars Valentin - Writing - review and editing, Software

Matthias Filter - Funding acquisition, Project administration, Writing - review and editing

References

login to comment