Research Ideas and Outcomes : Data Paper (Generic)
Data Paper (Generic)
Groundwater quality dataset of Semarang area, Indonesia
expand article infoDasapta Erwin Irawan, Thomas Triadi Putranto§, Achmad Darul|
‡ Institut Teknologi Bandung, Bandung, Indonesia
§ Universitas Diponegoro, Semarang, Indonesia
| Institut Teknologi dan Sains Bandung, Bekasi, Indonesia
Open Access


The regional environmental changes are affecting groundwater ecosystems in Semarang area. The development of new settlements, industrial complexes, and trade centers have degraded the groundwater setting of the city, which serves as the capital of Central Java Province. This has led us to compile several groundwater quality dataset that have been taken from 1992 to 2007. Our original motivation is to come up with an open dataset that can be used as the baseline for groundwater monitoring.

The dataset consists of 58 samples were taken in 1992, 1993, 2003, 2006, and 2007 using well point data from several reports from Ministry of Energy and Mineral Resources, engineering consultants, as well as from researchers from Universitas Diponegoro and Institut Teknologi Bandung. Each site has a set of 20 physical and chemical variables.


groundwater management, groundwater quality, Semarang, Central Java, Indonesia


Groundwater is one of the important component in hydrologic cycle. Not only has been recognized as a fundamental geologic agent in natural processes, it is now the object of scientific, practical and economic interest (Toth 2009).

The following paper describes in brief the data set related to our project "Hydrochemical assessment of Semarang Groundwater Quality", which took place in Semarang City, Indonesia. The aim of this project is to understand the water quality classification and distribution in Semarang area and to explain the underlying processes. This analysis is very important with the vast development of infrastructure (Putranto and Rüde 2016) and urban settlement in coastal area and the rate of salinity encroachment (Rahmawati and Marfai 2013).

This paper, however, is placed as starting point of the project, to give the readers a first impression on our data, how was the sampling, the data preparation, and its preliminary characteristics according to basic statistics.

Data documentation

The data set were gathered from several reports from the Ministry of Energy and Mineral Resources (MEMR), Universitas Diponegoro, and Institut Teknologi Bandung. This dataset was previously offline and only available in printed version, written in Bahasa Indonesia (Indonesian language), without standard research data management plan. Therefore, this dataset can be considered as one of the first one to be published in form of data paper in peer-reviewed journal. The following section contains the documentation of the dataset.

All the 58 samples were taken in 1992, 1993, 2003, 2006, and 2007. We do not a complete annual measurement, due to lack of funding. This dataset was mainly based on the limited budget from the MEMR. Therefore data availability was highly connected to moving funding priority, as the government has to monitor a total of 421 groundwater basins across Indonesia. More participation from stakeholders are really needed in this case.

The dataset reflects the change of government policy in water management, specifically groundwater. The 1992 and 1993 dataset captured the impact of centralized national policy in groundwater management, while the 2003-2007 dataset captured the impact of regional autonomy era. Such policy had led to the increasing groundwater pumping since early 2000's. The compilation process refers to the Water Management Framework Directive (Hatvani et al. 2014).

Each observation has 20 variables, which will be explained in more detail in the metadata section: sample id, coord X, coord Y, well depth, water level, water elevation, TDS, pH, EC, K, Ca, Na, Mg, Cl, SO4, HCO3, year, ion balance, screen location, and chemical facies. The chemical composition were tested in the Water Quality Laboratory of MEMR using SNI (Indonesia National Standard) for water and waste water quality testing. The original dataset is available in Pangea Repository (Irawan and Putranto 2016). We have omitted data points with only street address and no fixed coordinates (see Fig. 1).

Figure 1.

The map of sampling locations

aMap of Indonesia. Red box is Semarang area.  
bSampling locations in Semarang area  

The potential use of the data is for statistical analysis, univariate or multivariate (Irawan et al. 2009). In our publication roadmap, we will publish our multivariate analysis using R programming to understand the nature of groundwater and hydrogeological system in the area. This dataset is also useful to support geophysical prospecting (Dahrin 2007) and isotopic analysis (Harnandi and Susana 2008) in the area.

The type of dataset is a table that contains groundwater quality data. It consists of 58 rows (observations) and 20 columns (variables). Each variable will be described in the metadata section. We used the following equipments in data collection stage:

  1. Coordinates were plotted using portable Garmin Etrex 10
  2. Physical parameters, temperature, TDS, EC, and pH, were measured using portable Lutron YK-22CT
  3. Water level data were measured using Solints Model 101B


The following is the metadata of the dataset:

  • Title of the dataset: Dataset: hydrochemical assessment of Semarang area, Indonesia
  • Creator:
    • 1st creator: Irawan, Dasapta Erwin
    • 2nd creator: Putranto, Thomas Triadi
  • Publisher:
  • Date of publication: 2016-07-07
  • Resource type: dataset
  • Contributor: Geological Agency of the Ministry of Energy and Mineral Resources
  • Location: Semarang area
  • Related journal article: Putranto and Rüde 2016
  • License/rights: CC-BY International 4.0
  • Funding:
  • Technical metadata:
    • Dataset size: 8.5 KB
    • File format: csv
    • Name file: data_smg.csv
  • Variable names, type, units, explanation:
    • no, numeric, none, consecutive number
    • location, text, none, local area name
    • id, numeric, none, sample identity
    • depth, numeric, meter, depth of well
    • wat_level, numeric, meter, elevation of water level
    • elev, numeric, meter, elevation
    • tds, numeric, ppm, total dissolved solids
    • ph, numeric, ppm, total dissolved solids
    • ec, numeric, ppm, total dissolved solids
    • k, numeric, ppm, total dissolved solids
    • ca, numeric, ppm, total dissolved solids
    • mg, numeric, ppm, total dissolved solids
    • na, numeric, ppm, total dissolved solids
    • so4, numeric, ppm, sulphate
    • cl, numeric, ppm, chloride
    • hco3, numeric, ppm, bicarbonate
    • year, numeric, none, year taken
    • ionbal, numeric, percentage, ion balance
    • coord_x, numeric, none, x coordinate
    • coord_y, numeric, none, y coordinate
    • screen, numeric, meter, depth of well screen
    • aquifer facies, categorical, type of aquifer hydrochemical facies

Data access and intellectual property

The dataset can be freely accessed from Pangaea Repository (Irawan and Putranto 2016) and the intellectual property holder is the creators of the dataset: Putranto, Thomas Adi; Rude, Thomas; and Irawan, Dasapta Erwin.

Data sharing and re-use

The dataset is shared openly from Pangea Repository (Irawan and Putranto 2016) and can be formally cited, distributed, and re-use under CC-BY-4.0 license.

Data preservation and archiving

The data is preserved and archived in Pangaea Repository by abiding to the repository's terms and conditions.


We would like to appreciate the permission from the Office for Energy and Mineral Resources of Central Java representing the Ministry of Energy and Mineral Resources to publish the data.

Funding program

Universitas Diponegoro Research Grant, Institut Teknologi Bandung Research Grant

Grant title

Universitas Diponegoro Research Grant, Institut Teknologi Bandung Research Grant

Author contributions

All authors have the same amount of contribution to this data paper.

Conflicts of interest

Both authors declare no conflicts of interest upon the publication of this data paper.


login to comment