Research Ideas and Outcomes :
Software Description
|
Corresponding author: Octavio Mesa-Varona (octavio.mesa-varona@bfr.bund.de)
Academic editor: Editorial Secretary
Received: 05 Sep 2023 | Accepted: 19 Dec 2023 | Published: 07 Feb 2024
© 2024 Octavio Mesa-Varona, Carolina Plaza-Rodríguez, Lars Valentin, Matthias Filter
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Mesa-Varona O, Plaza-Rodríguez C, Valentin L, Filter M (2024) WarenstromInfo: a tool for the easy extraction and visualisation of trade flow data. Research Ideas and Outcomes 10: e112227. https://doi.org/10.3897/rio.10.e112227
|
|
Epidemiological outbreak investigations often prove to be lengthy and inconclusive due to the time-consuming nature of the currently-used approaches. An alternative approach to address these challenges could involve the application of algorithms to support authorities and food business operators by providing timely, relevant and reliable information. Algorithms, such as gravity models, could be applied as commodity trade models, but they require a large amount of reliable and consistent data on trade for generating projections at international, national or even regional level. Several trade databases, such as UN COMTRADE, EUROSTAT, BACI, CHELEM and GTAP, provide information on trace, albeit with variations in the provided information and in the structure. However, it is worth noting that not all of these databases are freely accessible and data management can pose challenges, hampering the access to the trade data. WarenstromInfo (WI) was created as a software solution that allows easy trade data extraction and visualisation for application in different areas, such as outbreak investigations.
WarenstromInfo (WI) is an application tool that automatically extracts, decodes and visually displays trade flow data from EUROSTAT "EU trade since 2002 by statistical procedure, by HS2-4-6 and CN8 (DS-059322)" (hereinafter referred to EUROSTAT) and the BACI databases, based on user input.
WI was developed by using the open-source desktop software KNIME Analytics Platform. WI offers the flexibility to be executed either as a web service on a KNIME Web Server infrastructure or as a local resource.
To integrate the BACI database into WI, the database is annually downloaded as csv files, rebuilt as a SQLite database and hosted locally into the KNIME Web Server Infrastructure. In order to optimise storage space on the KNIME server, this SQLite database specifically includes only agrifood data, reflecting the tool´s focus. However, if new objectives are established, this database can be expanded. Further, data of the SQLite database can be customised by executing the WI workflow locally, enabling the user to expand the database at any time.
In contrast to BACI, trade data extraction from the EUROSTAT database is performed via the EUROSTAT’s API (Application Programming Interface) applying GET requests and XML data management.
WI displays four User-friendly Interfaces (UIs) designed with interactive KNIME nodes that facilitate the input of variables. The extracted trade flow data are shown through interactive tables directly within the UIs. This feature enables users to easily explore the data in a structured and user-friendly manner. Additionally, WI incorporates the extracted trade flow data into maps. These maps provide a visual representation of the data, allowing users to gain insights and identify patterns and trends geographically. Both, the data table and the maps, can be downloaded as a single Excel file (containing multiple preformatted tables) and as png files, respectively.
export-import data, EUROSTAT DS-059313, BACI, KNIME, data extraction tool, foodborne outbreak investigation
Foodborne outbreaks have a major impact causing significant public health and economic losses (
One potential approach involves the use of commodity trade models. These models can create future flow projections, based on different scenarios for countries, regions and even FBO (
Here, we present a new web-based solution called WarenstromInfo (WI) that enables users to easily extract and visualise trade flow data from the EUROSTAT and BACI databases. To the best of our knowledge, there is currently no open-source tool specifically designed to easily extract data from both Eurostat and BACI databases without requiring familiarity with programming software tools such as R and Python and that allows easy extension of the service by incorporating any other database through API integration, such as UN COMTRADE (
The integration of the BACI database into the WI tool, along with performance improvements, updates of the API service and improvements of the UIs of the EUROSTAT database, were performed under the FoodDecide research project. FoodDecide aims to develop software solutions for Montenegrin food companies and government agencies, that could enhance food safety and expedite investigations into foodborne disease outbreaks. Previous versions of the WI tool were developed under DEMETER (https://h2020-demeter.eu/) and ZooGlow (https://www.sifo.de/sifo/de/projekte/schutz-kritischer-infrastrukturen/sicherung-der-lebensmittel-und-lebensmittelwarenketten/zooglow/zooglow_node.html) projects.
WarenstromInfo: a tool for the easy extraction and visualisation of trade flow data.
WI is an easy-to-use web-based tool that extracts and visualises trade flow data from the EUROSTAT and BACI databases.
EUROSTAT (https://ec.europa.eu/eurostat/databrowser/view/DS-059313/default/table?lang=en) is an online monthly updated database that provides data on commodities with two-month delay via REST API. Commodity market values in euros (€) and quantities in kilograms are provided as official data reported between European Union (EU) countries and the entire world. EUROSTAT is based on the Common Nomenclature (CN) that is, in turn, based on the Harmonised Commodity Description and Coding System (managed by the World Customs Organisation (WCO)) (
In contrast, BACI (http://www.cepii.fr/CEPII/en/bdd_modele/bdd_modele_item.asp?id=37) is an online yearly updated database that provides encrypted data on commodities with two years of delay. BACI commodity market values, provided in thousands of dollars ($) and quantities in metric tonnes, are the result of multiple analysis of the UN COMTRADE data amongst all countries in the world. These analyses try to estimate more accurate trade results (
While WI requests all commodity trade data from EUROSTAT on a monthly basis via the REST API, only agrifood trade data from BACI are accessible on a yearly basis via the WI web service. This is because the BACI database has to be stored locally on the KNIME Server. To optimise storage space on the KNIME Web Server infrastructure and meet the specific interest and purpose of WI, a BACI dataset containing only agrifood commodities is built. However, this dataset could be expanded according to new objectives. Further, WI can also be run locally on a local PC by downloading the WI ZIP folder (
The local approach allows us to customise the SQLite BACI database and, therefore, to access any trade data beyond the agrifood data. This procedure requires the installation of the KNIME Analytics Platform (https://www.knime.com/knime-analytics-platform) on a local PC, along with the requisite proficiency in operating the tool. The readme file, provided in the WI ZIP folder, serves as a guide for users to install and execute the WI service locally. Note that KNIME is compatible with Windows, Linux and Mac operating systems.
A step-by-step description of the WI operation is provided in Figures 1-4. The web portal header (upper horizontal grey-coloured section of the UIs; Figs
WI User Interfaces: UI4. The UI4 provides an overview of the final WI outputs, displaying a list with the initial input variables (L), three maps displaying trade flows (a world map, a world map focused on Europe and an European map) (M), download options (N), slide filter bars for value and weight preview (O) and trade flow data in the preview table (P).
WI provides four different UIs with a similar appearance for both databases, enabling input variables to be entered (UI1, UI2 and UI3) and the visualisation of the outputs in place (UI4) or via email (Figs
For the BACI dataset extraction in the UI2, the HS version selection and year range selection options are linked. The HS system has been revised six times and consequently six versions of the same standard are available. The selection of an HS version has an immediate reaction in the year range selection options displayed. Thus, if the HS95 (i.e. 1995) is selected, all the years from 1995 are offered, while if the HS07 (i.e. 2007) is selected, only those from 2007 are offered. This feature allows users to use harmonised trade data from different sources. For instance, the Food and Agriculture Organisation Corporate Statistical Database (FAOSTAT) website disseminates statistical data collected and maintained by the Food and Agriculture Organisation (FAO). These data apply the FAO standard and FAOSTAT provides a list that shows the correspondence of a FAO product/item with two HS versions (i.e. HS 07 and HS12) (
For the extraction of the EUROSTAT database in the UI2, flow selection features allow selecting the type of trade flow requested (i.e. import or export). In contrast to the BACI database that only provides a single market value and weight for commodities in a trade flow, EUROSTAT reports official import and export market values and weights, based on the EU reporting country (defined as Declarant in the tool). There might be a high variability on market values and weights according to the EU reporting country. This variability is clearly grounded and defined as bilateral asymmetries (
In order to facilitate the commodity selection, two filters are displayed in the product table. The first one is called the “pre-filter” option and enables the user to filter for a commodity term (e.g. milk), initial product code (e.g. 01 or 011), the whole commodity code or even show all commodities (i.e. *) by clicking on the “upload table below” button. The second one, a “live-filter” option assembled in the upper part, enables the same search functionalities as the “pre-filter” option, but interactively. This table also offers the individual commodity selection and the “all commodity selection” options via check boxes.
Besides, a further option gives the opportunity to send the output dataset and visualisation maps via email after executing the UI3. In this way, users can save time waiting for extensive data requests. The email address only needs to be entered once, as it is stored together with the username, leading to the automatic subsequent display of it. Note that the email service exclusively operates within the KNIME server.
Once the UI3 is executed, trade flow data extracted via SQL (BACI) or API (EUROSTAT) are displayed (UI4).
a. List of initial input variables: The first table enables the visualisation of all initial input variables requested by the user, supporting the identification of potential lacking trade flows in the final WI outputs.
If the search includes more than 10 countries and/or commodities, the list of initial input variables will not display all the selected countries or commodities, but instead, the following messages will appear: “More than 10 products selected”, “All countries selected” or “More than 10 countries selected”.
b. Trade flows maps: Visualisation and download options are available for each of the three available map images in png format (i.e. World map, World map zoomed in Europe and a Europe map). Maps display trade flows through arrows facilitating the understanding of trade flow data. These arrows show different thicknesses indicating the magnitude of the commodity trade flow (the higher the trade flow value, the thicker the arrow). In addition, large commodity flows, compared to other flows in the dataset, are highlighted with red arrows. Note that visualisation images are provided in the WI ZIP folder for illustrative purposes. Further, the enhancement of the user experience through visualisation improvements could potentially occur in the future.
c. Excel file: A preformatted Excel file is generated to effectively display professional-looking data by colouring, aligning and providing filtering options to the column titles. A sample Excel file can be found within the WI ZIP folder for reference. This Excel file eases the user work by offering the following pre-configured data management tabs:
·Trade flow data: It displays all trade data extracted with the highest granularity showing different columns for: countries involved (importers and exporters for BACI and declarant and partners for EUROSTAT), EUROSTAT database identification number (only for EUROSTAT), commodity, year (in case of EUROSTAT, month of the data extraction), market value, product weight (either market value or weight for EUROSTAT) and HS version (only for BACI).
·Prod_flow_selected_period: This table displays trade data grouped by countries involved and commodity.
·Total_sum: This table displays trade data grouped only by countries involved.
·Product_per_year: This table displays data grouped by declarant, partner, commodity and year (only for EUROSTAT).
·Initial_variables: All initial variables are displayed.
d. Trade flow data in a preview table: Two sliding filter bars enable the user to define the range of the shown market values and weights of the commodities in the preview table. In case of EUROSTAT, only one sliding filter bar is shown, being in line with the desired output value type selected (i.e. market value or weight of the commodities). Slide bars are disabled if a single trade flow datum is shown.
In this preview table, the user can set up the number of entries per table and page, sort columns in ascending or descending order and filter any term in the table. This table is designed solely to facilitate data evaluation and comprehension.
WI is accessible via the KNIME Server WebPortal upon request to 4sz@bfr.bund.de or downloading the WI ZIP folder (
The research leading to these results received funding from:
CC BY 4.0
There are a multitude of tools that allow or facilitate the extraction and visualisation of trade data, some of them being open-source. This is the case for R (Eurostat - e.g. restatapi (
We recognise that there may be a possible concern regarding the annual update frequency for BACI data and its reflection of the dynamic global sourcing industry. To address this, our approach integrates monthly extractions from the Eurostat API, providing a more granular and up-to-date perspective on economic activities. While BACI data offers a comprehensive annual overview, the monthly Eurostat updates contribute real-time insights, ensuring our tool captures both the long-term trends and the ongoing dynamics within the industry. This dual-frequency approach strikes a balance between data availability and the need for timely, detailed information, enhancing the tool's efficacy in supporting informed decision-making.
This versatile tool finds utility across diverse professional domains that demand the extraction of trade data. In the field of food safety, such trade data could be of great importance not only for outbreak investigation, but also for other areas, such as risk assessment and predictive analysis and forecasting.
For the creation of the WI infrastructure, the user-friendly graphical workbench for the data science KNIME Analytics Platform (
Each executed KNIME node stores all its specific data outputs. Therefore, an executed KNIME workflow that encloses a variety of different connected nodes may store the whole data analysis process including all intermediate results. Workflows can be extended, adapted, manipulated and implemented with new functionalities because of the KNIME modular principle. For the WI KNIME workflow creation, the subscription-based KNIME Web Server infrastructure was used that allows access to different versions of the workflow showing the progression in the acquisition of new workflow functionalities.
This workflow consists of nodes, meta-nodes and component nodes. Although meta-nodes and component nodes group nodes, the component nodes also enable the creation of web-based UIs by using KNIME widget nodes. Each UI is made up of component nodes within other component nodes that are placed for the composite view creating the final UI layout.
Differences on both databases have a great impact on the WI workflow implementation.
BACI database is freely available as downloadable ZIP files (http://www.cepii.fr/CEPII/en/bdd_modele/bdd_modele_item.asp?id=37). Each of the six compressed files consists of three types of csv files:
In order to build the BACI dataset for WI, ZIP files are annually downloaded (http://www.cepii.fr/CEPII/en/bdd_modele/bdd_modele_item.asp?id=37) and decompressed. The BACI SQLite database builder workflow was created to read each csv file, classify it according to its content, filter agrifood data and store it as a SQLite database. The latter database consists of three datasets that are in line with the three types of csv files available (i.e. dataset enclosing (a) product codes, (b) country codes and (c) coded data on trade flows). This BACI SQLite database is hosted in the internal and external KNIME Web Server infrastructure and is used by WI providing trade flow data quickly and efficiently. Just in case this tool is accessed locally, the user has to create his/her own BACI SQLite database by executing the BACI SQLite database builder workflow. This workflow also allows the user to customise the BACI SQLite database (Fig.
A screenshot of the BACI SQLite database builder workflow.
The workflow presented here is organised into two main sections, with interconnected KNIME nodes displayed in each section. The yellow-boxed nodes are responsible for creating and loading the database where the data are stored. The green-boxed section shows:
Variable flow connections (red lines) can be removed in the second green-boxed section, if an HPC (High Performance Computing) service is available. This will allow the workflow to run much faster as the nodes will not be executed one after the other, but all at the same time.
The WI infrastructure was established through the utilisation of the KNIME Analytics Platform (version 4.5.1), complemented by the integration of the following dependencies:
This work was supported by funding from:
Octavio Mesa-Varona - Writing - original draft, Software
Carolina Plaza-Rodríguez - Writing - review and editing
Lars Valentin - Writing - review and editing, Software
Matthias Filter - Funding acquisition, Project administration, Writing - review and editing