<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//TaxonX//DTD Taxonomic Treatment Publishing DTD v0 20100105//EN" "../../nlm/tax-treatment-NS0.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tp="http://www.plazi.org/taxpub" article-type="research-article">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">17</journal-id>
      <journal-id journal-id-type="index">urn:lsid:arphahub.com:pub:8E638694-B4E0-570A-856A-746FF325BF6B</journal-id>
      <journal-title-group>
        <journal-title xml:lang="en">Research Ideas and Outcomes</journal-title>
        <abbrev-journal-title xml:lang="en">RIO</abbrev-journal-title>
      </journal-title-group>
      <issn pub-type="epub">2367-7163</issn>
      <publisher>
        <publisher-name>Pensoft Publishers</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.3897/rio.8.e94485</article-id>
      <article-id pub-id-type="publisher-id">94485</article-id>
      <article-id pub-id-type="manuscript">20485</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Conference Abstract</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>FAIR Digital Objects in Official Statistics</article-title>
      </title-group>
      <contrib-group content-type="authors">
        <contrib contrib-type="author" corresp="yes">
          <name name-style="western">
            <surname>ten Bosch</surname>
            <given-names>Olav</given-names>
          </name>
          <email xlink:type="simple">o.tenbosch@cbs.nl</email>
          <uri content-type="orcid">https://orcid.org/0000-0002-1943-7558</uri>
          <xref ref-type="aff" rid="A1">1</xref>
        </contrib>
        <contrib contrib-type="author" corresp="no">
          <name name-style="western">
            <surname>de Jonge</surname>
            <given-names>Edwin</given-names>
          </name>
          <xref ref-type="aff" rid="A1">1</xref>
        </contrib>
        <contrib contrib-type="author" corresp="no">
          <name name-style="western">
            <surname>Laloli</surname>
            <given-names>Henk</given-names>
          </name>
          <xref ref-type="aff" rid="A1">1</xref>
        </contrib>
        <contrib contrib-type="author" corresp="no">
          <name name-style="western">
            <surname>Laaboudi-Spoiden</surname>
            <given-names>Christine</given-names>
          </name>
          <xref ref-type="aff" rid="A2">2</xref>
        </contrib>
      </contrib-group>
      <aff id="A1">
        <label>1</label>
        <addr-line content-type="verbatim">Statistics Netherlands, The Hague, Netherlands</addr-line>
        <institution>Statistics Netherlands</institution>
        <addr-line content-type="city">The Hague</addr-line>
        <country>Netherlands</country>
      </aff>
      <aff id="A2">
        <label>2</label>
        <addr-line content-type="verbatim">Eurostat, Luxembourg, Luxembourg</addr-line>
        <institution>Eurostat</institution>
        <addr-line content-type="city">Luxembourg</addr-line>
        <country>Luxembourg</country>
      </aff>
      <author-notes>
        <fn fn-type="corresp">
          <p>Corresponding author: Olav ten Bosch (<email xlink:type="simple">o.tenbosch@cbs.nl</email>).</p>
        </fn>
        <fn fn-type="edited-by">
          <p>Academic editor: </p>
        </fn>
      </author-notes>
      <pub-date pub-type="collection">
        <year>2022</year>
      </pub-date>
      <pub-date pub-type="epub">
        <day>12</day>
        <month>10</month>
        <year>2022</year>
      </pub-date>
      <volume>8</volume>
      <elocation-id>e94485</elocation-id>
      <uri content-type="arpha" xlink:href="http://openbiodiv.net/23BC3DFF-410C-5130-A88C-740BD1EEB0AF">23BC3DFF-410C-5130-A88C-740BD1EEB0AF</uri>
      <uri content-type="zenodo_dep_id" xlink:href="https://zenodo.org/record/0">0</uri>
      <permissions>
        <copyright-statement>Olav ten Bosch, Edwin de Jonge, Henk Laloli, Christine Laaboudi-Spoiden</copyright-statement>
        <license license-type="creative-commons-attribution" xlink:href="http://creativecommons.org/licenses/by/4.0/" xlink:type="simple">
          <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</license-p>
        </license>
      </permissions>
      <abstract>
        <label>Abstract</label>
        <p>
          <bold>Introduction*<xref ref-type="fn" rid="FN8007675">1</xref></bold>
        </p>
        <p>Statistical offices on national and international scale provide statistics on demography, labour, income, society, economy, environment and other domains. Their collective output is usually referred to as ‘<ext-link ext-link-type="uri" xlink:href="https://ec.europa.eu/eurostat/statistics-explained">Official Statistics</ext-link>’. These offices have a long tradition of publishing data fairly and open, which is often part of their mission statement. For decades they have been providing websites with articles, press releases, graphs and tables of data for free, for research, for policy-making, and for common understanding. However, for users it often is not so easy to find the data needed, to (re-)use it in data-driven work or to refer to the right (sub)set of data in a sustainable way. Therefore, in this article we take a closer look at Official Statistics from a findable, accessibility, interoperability, and reusability (<ext-link ext-link-type="uri" xlink:href="https://www.go-fair.org/">FAIR</ext-link>) perspective.</p>
        <p>
          <bold>Digital Objects in Statistics</bold>
        </p>
        <p>Digital objects in official statistics can be identified on multiple levels. The core concept is the <italic>statistical fact</italic>: a number describing a certain estimate on a certain phenomenon in a certain population over a certain period of time. For example the estimated number of elderly inhabitants in Province Friesland (the Netherlands) on Jan 1, 2020, or the inflation in Belgium for fruits in 2021 are both statistical facts. Each of these statistical facts is uniquely defined and published as a digital object in the online statistical databases of <ext-link ext-link-type="uri" xlink:href="https://opendata.cbs.nl/statline/#/CBS/en/dataset/37259eng/table?dl=6B38E">Statistics Netherlands</ext-link> and <ext-link ext-link-type="uri" xlink:href="https://ec.europa.eu/eurostat/databrowser/view/PRC_HICP_AIND/default/table">Eurostat</ext-link> respectively. Statistical facts may have a production status (provisionary, final, revised) and are typically visualized as a number in a table cell or in a chart.</p>
        <p>Data without metadata are without meaning. A statistical fact refers to metadata (region, time, subject, population, uncertainty, quality etc.) which are essential to understand the context of the fact. We make a distinction here between <italic>structural</italic> or <italic>conceptual</italic> <italic>metadata</italic>, i.e. the structure and definitions of concepts, dimensions and types of data used, and <italic>referential metadata</italic>, i.e. descriptive information on the dataset. The metadata are of utmost importance to the data consumer to understand the data. Metadata have their own dynamics, e.g. classifications change over time. They are published as digital objects too, for example the statistical classification of economic activities (<ext-link ext-link-type="uri" xlink:href="https://ec.europa.eu/eurostat/web/nace-rev2">NACE</ext-link>).</p>
        <p>Statistical facts and their metadata form the foundation for higher level statistics products. News releases and thematic articles that explain statistics in a broader context are examples. This higher level content can be seen as digital objects too as it is usually the main entry level for the general public and search engines and enables their findability and accessibility.</p>
        <p>
          <bold>Standards and FAIR</bold>
        </p>
        <p>Each digital object in official statistics has its own structure, dynamics, dissemination channels and standards. This can make it sometimes hard to work with data from official statistics.</p>
        <p>Statistical databases differ among statistical organizations, both technically as well as in metadata and the API’s that they offer for automated access. Main standards in this field are the Statistical Data and Metadata eXchange (<ext-link ext-link-type="uri" xlink:href="https://www.sdmx.org/">SDMX</ext-link>), <ext-link ext-link-type="uri" xlink:href="https://json-stat.org/">JSON-stat</ext-link>, <ext-link ext-link-type="uri" xlink:href="https://www.odata.org/">OData</ext-link>, or simple formats such as <ext-link ext-link-type="uri" xlink:href="https://www.rfc-editor.org/info/rfc4180">CSV</ext-link>. Commonly agreed structural metadata is organized into SDMX registries (<ext-link ext-link-type="uri" xlink:href="https://registry.sdmx.org/">global registry</ext-link>, <ext-link ext-link-type="uri" xlink:href="https://webgate.ec.europa.eu/sdmxregistry/">Eurostat registry</ext-link>), which provide automated access to statistical metadata, which is good for accessibility.</p>
        <p>The SDMX standard is actually targeted to statistical and financial data which may hinder wider reusability. Therefore some statistical offices are moving to semantic standards. An an example are the <ext-link ext-link-type="uri" xlink:href="https://vocabs.cbs.nl/en">vocabularies and classifications</ext-link> published as linked open data by Statistics Netherlands. Publishing metadata this way makes it possible to reuse and link data across organizations and gives semantic structure that is machine readable. Another example is from the statistical office of the European Union, Eurostat, that is converting the statistical classifications and correspondence tables from their current <ext-link ext-link-type="uri" xlink:href="https://europa.eu/!dC33xR">metadata system</ext-link> into Linked Open Data in the <ext-link ext-link-type="uri" xlink:href="https://op.europa.eu/en/web/eu-vocabularies/business-collections">EU Vocabularies website</ext-link>. The representation is based on <ext-link ext-link-type="uri" xlink:href="https://ddialliance.org/Specification/RDF/XKOS">XKOS</ext-link>, an ontology for modelling statistical classifications, offering machine-readable access for reusing objects as well as facilitating linking among classifications on national, EU or international level. Yet another initiative is from the United Nations Economic Commission for Europe (UNECE), where statistical organizations collectively develop a <ext-link ext-link-type="uri" xlink:href="https://linked-statistics.github.io/COOS/coos.html">Core Ontology for Official Statistics</ext-link> (COOS) describing the statistical production process. All in all for structural metadata, statistical organizations are increasingly moving towards linked data standards to better align to non-statistical communities.</p>
        <p>In the field of referential metadata the Single Integrated Metadata Structure (<ext-link ext-link-type="uri" xlink:href="https://ec.europa.eu/eurostat/data/metadata/metadata-structure">SIMS</ext-link>) is used. It offers machine-readable descriptive metadata such as unit of measure, reference period, confidentiality, quality, accuracy etc. Some of the elements are also covered in the widely used RDF-based Data Catalog Vocabulary (<ext-link ext-link-type="uri" xlink:href="https://www.w3.org/TR/vocab-dcat-2/">DCAT</ext-link>) and the statistical variant (<ext-link ext-link-type="uri" xlink:href="https://unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.58/2017/mtg3/2017-UNECE-topic-i-EC-StatDCAT-ap-paper__1_.pdf">STAT-DCAT</ext-link>), which raises the question whether a further integration of these could improve FAIR-ness of statistical referential metadata.</p>
        <p>With respect to higher level digital objects, such as statistical articles, the use of semantic web ontologies such as <ext-link ext-link-type="uri" xlink:href="https://schema.org/">schema.org</ext-link> and <ext-link ext-link-type="uri" xlink:href="https://www.dublincore.org/">Dublin Core</ext-link> for annotating statistical output in common terms are increasingly being used. The use of <ext-link ext-link-type="uri" xlink:href="https://www.doi.org/">Digital Object Identifiers</ext-link> (DOIs) where applicable makes it easier to refer to statistical output.</p>
        <p>From the above we can see that the use of different standards at different levels creates various ways to identify statistical content, such as Uniform Resource Names (<ext-link ext-link-type="uri" xlink:href="https://www.rfc-editor.org/info/rfc8141">URN</ext-link><ext-link ext-link-type="uri" xlink:href="https://www.rfc-editor.org/info/rfc8141">s</ext-link>), SDMX identifiers, Digital Object Odentifiers (<ext-link ext-link-type="uri" xlink:href="https://www.doi.org/">DOI</ext-link>s), Uniform Resource Identifiers (<ext-link ext-link-type="uri" xlink:href="https://datatracker.ietf.org/doc/html/rfc3986">URI</ext-link>s) or organization specific identifiers. Although they probably all satisfy <ext-link ext-link-type="uri" xlink:href="https://www.go-fair.org/fair-principles/metadata-retrievable-identifier-standardised-communication-protocol/">FAIR principle A1</ext-link>, from a user perspective it would be good to minimize variety here.</p>
        <p>
          <bold>Wrap-up</bold>
        </p>
        <p>Although official statistics have a long tradition and experience in publishing open data, the FAIR principles are an excellent vehicle to further improve findability and enable data-driven work. Openness is not enough, the facts, structural and referential metadata and higher level statistical digital objects should ideally all be optimized from a FAIR point of view. The mix of standards being used at various levels and the distributed statistical system in official statistics may hinder reusability. Moving to semantic-interoperability via generally accepted linked data standards is ongoing and has the promise to increase the reusability of statistics into a broader web of (meta)data. This makes trustful statistics more FAIR, better searchable, findable and interpretable which is necessary for a further integration of official statistics into wider communities.</p>
      </abstract>
      <kwd-group>
        <label>Keywords</label>
        <kwd>statistical (meta)data</kwd>
        <kwd>semantic web</kwd>
        <kwd>findability</kwd>
        <kwd>interoperability</kwd>
        <kwd>SDMX</kwd>
        <kwd>ontologies</kwd>
        <kwd>classifications</kwd>
      </kwd-group>
      <counts>
        <fig-count count="0"/>
        <table-count count="0"/>
        <ref-count count="0"/>
      </counts>
    </article-meta>
    <notes>
      <sec sec-type="Presenting author">
        <title>Presenting author</title>
        <p>Olav ten Bosch</p>
      </sec>
      <sec sec-type="Presented at">
        <title>Presented at</title>
        <p>First International Conference on FAIR Digital Objects, FDO2022, presentation</p>
      </sec>
    </notes>
  </front>
  <back>
    <fn-group>
      <fn id="FN8007675">
        <label>*1</label>
        <p>The views expressed in this paper are those of the authors and do not necessarily reflect the policies of their institutes.</p>
      </fn>
    </fn-group>
  </back>
</article>
