Corresponding author: Álvaro Briz-Redón (

Academic editor:

Spatial statistics is an important field of data science with many applications in very different areas of study such as epidemiology, criminology, seismology, astronomy and econometrics, among others. In particular, spatial statistics has frequently been used to analyze traffic accidents datasets with explanatory and preventive objectives. Traditionally, these studies have employed spatial statistics techniques at some level of areal aggregation, usually related to administrative units. However, last decade has brought an increasing number of works on the spatial incidence and distribution of traffic accidents at the road level by means of the spatial structure known as a linear network. This change seems positive because it could provide deeper and more accurate investigations than previous studies that were based on areal spatial units. The interest in working at the road level renders some technical difficulties due to the high complexity of these structures, specially in terms of manipulation and rectification. The R Shiny app SpNetPrep, which is available online and via an R package named the same way, has the goal of providing certain functionalities that could be useful for a user which is interested in performing an spatial analysis over a road network structure.

Spatial statistics studies have been commonly based on geographic structures made of polygons representing an administrative or political division of different order, depending on the size of the region being analyzed and on the specific interest of the researchers. More specifically, the basic spatial units in these studies have ranged from larger (countries or counties) to smaller (cities, boroughs, census tracks, etc.), allowing the employment of the usually available information regarding these kind of population units.

However, last years are bringing a higher number of spatial analysis that are defined over network structures, which allow a better understanding of some spatial point patterns of great interest. Basically, the use of spatial networks has become quite frequent when the events of study actually take place in roads, streets, highways, etc., which oblige to discard most of the areal region of the zone of analysis if an accurate investigation is intended. Therefore, the use of linear networks is really interesting to analyze the spatial distribution of traffic accidents (

Let's review now some basic terminology about linear networks in the context of spatial statistics. A planar linear network,

A point process

The

The main feature provided by the

According to the technical difficulties that the development of a spatial analysis over a linear network implies, the

Users can obtain a road network of their interest via the OpenStreetMaps (OSM) platform (

When the user is in possession of a road network in a right R format (these formats will be described later), the

The manual edition (or curation) of a linear network representing a road structure is an important step that must be taken in order to correct possible mistakes (not updated road configurations), remove some undesired parts (pedestrian or secondary roads, depending on the application) and also to simplify some zones of the network whose complexity could obscure the analysis being performed (which is sometimes very notorious in round-abouts or complex intersections).

Furthermore, in view of the difficulties that sometimes can arise when trying to obtain a road network structure, the "Network Edition" section of the

Another important question to take into consideration when working with a linear network structure is its directionality. Depending on the kind of dataset being treated, network direction could be of no interest, but this should not be the case when analyzing traffic-related data. In fact, traffic flow could be dramatically influential for some classical spatial analysis that arise from this kind of data. For example, in order to fit a spatial model to a collection of accident counts at the road segment level (for instance, with the

Again, it is not easy, at all, to find the information required to endow a linear network based on a road structure with a directionality. The network structures available in OSM contain some information regarding the direction of the streets and some cartographic platforms include the direction of traffic (measured in angles) at some points of the structure, but, in general, it can be really hard to obtain such information for a road network of your interest. For this reason, the "Network Direction" section of the

Once the network structure is properly curated and endowed with a direction (if necessary), a point pattern can be formed along the network structure from a dataset containing geocoded information. In the case the information on the location of each event is in the form of a postal address, the R package

Then, when the coordinates of the events of interest are already available, regardless of the way they have been obtained, it is time to project them into the linear network. This step can be achieved straight by using the (shortest) orthogonal projection of each pair of coordinates into the linear network, for example with the

As a summary, Fig.

The present section includes some notes on certain technical aspects that need to be known in order to benefit from

There are two main classes coexisting in R that represent what a linear network is:

Coordinate Reference Systems (CRS) are essential to locate entities in space. Concretely, each CRS defines a specific map projection that unambiguously determines the location of every point on the Earth, which makes impossible to deal simultaneously with two geographic objects described in a different projection system. The usual longitude and latitude coordinates, which range from -180º to 180º and -90º to 90º, respectively, correspond to the WGS84 (World Geodetic System 1984) geographical projection. One important characteristic of the WGS84 projection system is that it considers the whole world as a unique zone, that is, a pair longitude-latitude in this CRS system determines only one point of the Earth. However, this situation does not hold for the Universal Transverse Mercator (UTM) projection system, another well known CRS that divides the world into 60 zones whose coordinates are denoted easting and northing in analogy with longitude and latitude (respectively). The use of the UTM system is more convenient for performing statistical analysis given its higher level of accuracy (specially when working with small areas) and also because the coordinates it provides are expressed in meters, which renders very easy to compute distances. The

>

>

>

>

>

>

Format. RDS has been chosen for all the files possibly involved during the use of the

Manual edition of the geometry of a linear network is one of the main purposes of the

There are four basic actions that can be performed for editing the linear network manually: "Join vertex", "Remove edge", "Add point (+edge)" and "Add two points (+edge)". The user only has to select the more convenient option and proceed intuitively. If "Remove edge" is selected, the click on an edge of the network (anywhere all along its length) serves to mark the edge in red, indicating a removal state. Oppositely, by choosing any of the options "Join vertex", "Add point (+edge)" or "Add two points (+edge)" the user needs to click on two points of the map accordingly to the option being selected. For the "Join vertex" option, two vertex must be clicked, whereas for the "Add two points (+edge)" two points of the map (that are not vertex) have to be clicked. Finally, the "Add point (+edge)" requires that the user clicks on a point and on a vertex of the network (in this order). All these three options that imply the addition of edges (and maybe vertex) to the road network are marked in green. The click of the button "Rebuild linear network" makes this manual editions effective and when the map refreshes the new (edited) road network is available for the user (which can be downloaded by clicking on the button available at the bottom of the application). Now, let's see and example of use of the "Network edition" section of the application (Fig.

The

For practical reasons, the use of a combined condition for the

The following lines include an application of the

The addition of a direction to the road network according to traffic flow may be interesting at some situations. The

The "Network Direction" section of the application allows the user to endow the network with a direction according to traffic flow, which is facilitated by the presence of arrows indicating this information in the OSM layers. The option "Add flow" enable users to define a flow along the network by simply clicking on the two connected vertex that form the edge they want to give direction to (first click on the origin, second on the end, according to traffic flow). Analogously, "Remove flow" performs the opposite action by removing a direction previously defined, which requires to select the two vertex that form the road segment whose direction is being eliminated (the order of the selections is not important). The function

Even though the "Add flow" and "Remove flow" options are sufficient to give direction to the whole network, "Add long flow" and "Remove long flow" attempt to save some time to the user. These functions take advantage of the

The directionality of the linear network is stored in the form of a

Taking this information into account, users can establish neighbouring relationships between the road segments of their networks that respect traffic flow (employing functions from the

Finally, the

A point pattern that lies on a linear network can be created in R with the

It was already mentioned in the overview section of the paper that the automatic creation of a point pattern that lies on a linear network in R implies the orthogonal projection of a collection of geocoded events into the network. This operation generally leads to an accurate representation of the observations, but it can produce some misplaced events along the road network. As it is suggested in Fig.

This paper has presented the main functionalities and purposes of the

The use of linear networks is becoming popular in recent times to provide more realistic investigations of many events of interest that take place along road structures. However, dealing with linear networks can be quite more complicated than using other typical spatial structures, which in some extremes cases could even lead to discard its use.

The

Another important step to perform before the execution of a spatial analysis over a linear network is the revision of the point pattern that is being employed. Point patterns on linear networks are commonly built by applying the orthogonal projection of a set of coordinates into the linear network. Even though this can work well most of the times, the excessive simplifications of the road structure or the inaccuracies derived from the geocoding of the events could cause serious alterations in the pattern. The section "Point Pattern Edition" allows users to inspect and correct a point pattern that lies on a network.

Finally, the

The author wishes to thank Mrs Daymé González-Rodríguez, Dr Francisco Martínez-Ruiz and Dr Francisco Montes for providing helpful suggestions in order to improve the

Workflow that describes all the steps that could be carried out in order to perform a spatial analysis on a point pattern that lies on a linear network. Some of these steps which lead to the final statistical analysis may be skipped but, at least, all of them should be considered. The blocks pointing the steps of the process include some of the R packages that would allow to successfully achieve each of them.

"Network Edition" features.

Overview of the "Network Edition" section of the

Example of a road network uploaded into the application

"Network Edition" example of use (I).

Use of the "Join vertex" (in green), "Remove edge" (in red) and "Add point" options (in green) in the

Network resulting from clicking on "Rebuild linear network" in the situation of

"Network Edition" example of use (II).

Another use of the "Join vertex" (in green) option of the "Network Edition" section

Network resulting from clicking on "Rebuild linear network" in the situation of

Example of use of the

A road network introduced as input in which there is an excess of road segments and vertex

Simplified version of the network in

"Network Direction" features.

A zone of a road network introduced as an input in the "Network Direction" section of the

Manual addition of traffic flow to the network by using the options "Add flow" and "Add long flow"

Example of a linear road network following usual notation for the edges (

"Point Pattern Edition" features.

An example of a point pattern that lies on a road network as it can be visualized in

Information that is displayed (marks of the point pattern, if available, as defined by the user) when an event is clicked