3. Introduction to spatial data in R

Foreword on working directory, data and packages

The Data for this tutorial are provided via Github. In order to reproudce the code with your own data, replace the url with your local filepath

If you work locally, you may define the working directory

setwd("YOUR/FILEPATH") 
# this will not work. if you want to use it, define a path on your computer

This should be the place where your data and other folders are stored. more Information can be found here

Code chunks for saving data and results are disabeled in this tutorial (using a #). Please replace (“YOUR/FILEPATH”) with your own path.

The code is written in such a way, that packages will be automatically installed (if not already installed) and loaded. Packages will be stored in the default location of your computer.

Intro

The sp package (spatial) provoides Classes and methods for spatial (vector) data; the classes document where the spatial location information resides, for 2D or 3D data. Utility functions are provided, e.g. for plotting data as maps, spatial selection, as well as methods for retrieving coordinates, for subsetting, print, summary, etc.

The sf package (simple features = points, lines, polygons and their respective ‘multi’ versions) is the new kid on the block with further functions to work with simple features, a standardized way to encode spatial vector data. It binds to the packages ‘GDAL’ for reading and writing data, to ‘GEOS’ for geometrical operations, and to ‘PROJ’ for projection conversions and datum transformations.

For the time being, it is best to know and use both the sp and the sf packages, as discussed in this post. However, we focus on the sf package. for the following reasons:

  • sf ensures fast reading and writing of data
  • sf provides enhanced plotting performance
  • sf objects can be treated as data frames in most operations
  • sf functions can be combined using %>% operator and works well with the tidyverse collection of R packages.
  • sf function names are relatively consistent and intuitive (all begin with st_)

However, in some cases we need to transform sf objects to sp objects or vive versa. In that case, a simple transformation to the desired class is necessary:

To sp object <- as(object, Class = "Spatial")

To sf object_sf = st_as_sf(object_sp, "sf")

A word of advice: be flexible in the usage of sf and sp. Sometimes you it may be hard to explain why functions work for one data type and do not for the other. But since transformation is quite easy, time is better spend on analyzing your data than on wondering why operations do not work.

Apply a free interpretation of Paul Feyerabend’s “anything goes” argument and use sf and sp packages as you like - and as they work for you.

Now we will begin by exploring simple features.

Load the required sf and sp packages. Packages can be loaded with the library() function in R.

if (!require(sp)){install.packages("sp"); library(sp)}
if (!require(sf)){install.packages("sf"); library(sf)}

Treating spatial data like data frames and ploting them

Most of the time, we already have spatial data and want to explore them. With the sf package, this can be done similarily to data frames with base R.

First we will use the world dataset provided by the spData. spData offers a large variety of spatial datasets for demonstrating, benchmarking and teaching spatial data analysis. It includes R data of class sf (defined by the package ‘sf’), Spatial (‘sp’), and nb (‘spdep’). Unlike other spatial data packages such as ‘rnaturalearth’ and ‘maps’, it also contains data stored in a range of file formats including GeoJSON, ESRI Shapefile and GeoPackage. Some of the datasets are designed to illustrate specific analysis techniques. cycle_hire() and cycle_hire_osm(), for example, is designed to illustrate point pattern analysis techniques. To see all available data sets use ls("package:spData").

# load spData to get the data set
if (!require(spData)){install.packages("spData"); library(spData)}
if (!require(lwgeom)){install.packages("lwgeom"); library(lwgeom)}

We can now perform exploratory operations with this sf object as we did with a regular data frame:

names(world)
##  [1] "iso_a2"    "name_long" "continent" "region_un" "subregion" "type"     
##  [7] "area_km2"  "pop"       "lifeExp"   "gdpPercap" "geom"
plot(world)
## Warning: plotting the first 9 out of 10 attributes; use max.plot = 10 to plot
## all

world[1:10,2] # first 10 rows of 2. column
## Simple feature collection with 10 features and 1 field
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -55.25 xmax: 180 ymax: 83.23324
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
##           name_long                           geom
## 1              Fiji MULTIPOLYGON (((180 -16.067...
## 2          Tanzania MULTIPOLYGON (((33.90371 -0...
## 3    Western Sahara MULTIPOLYGON (((-8.66559 27...
## 4            Canada MULTIPOLYGON (((-122.84 49,...
## 5     United States MULTIPOLYGON (((-122.84 49,...
## 6        Kazakhstan MULTIPOLYGON (((87.35997 49...
## 7        Uzbekistan MULTIPOLYGON (((55.96819 41...
## 8  Papua New Guinea MULTIPOLYGON (((141.0002 -2...
## 9         Indonesia MULTIPOLYGON (((141.0002 -2...
## 10        Argentina MULTIPOLYGON (((-68.63401 -...
# summrizing statistics and indexing/subsetting to the attribute (collumn) lifExp
summary(world["lifeExp"])
##     lifeExp                 geom    
##  Min.   :50.62   MULTIPOLYGON :177  
##  1st Qu.:64.96   epsg:4326    :  0  
##  Median :72.87   +proj=long...:  0  
##  Mean   :70.85                      
##  3rd Qu.:76.78                      
##  Max.   :83.59                      
##  NA's   :10
#more subestting
plot(world[3:6])

plot(world["pop"])

# isolate asia and then plot it
world_asia <- world[world$continent == "Asia", ]
plot(world_asia)
## Warning: plotting the first 9 out of 10 attributes; use max.plot = 10 to plot
## all

For more advanced map making, dedicated visualization packages such as tmap are recommended. We will do that at a later stage. For now it is sufficient to know that basic map visualizations are possible with the sf package.

Understanding sf objects

Simple features, in the most basic definition, consist of spatial and non-spatial attributes.

class(world)
## [1] "sf"         "tbl_df"     "tbl"        "data.frame"

Lets take a look at the the object:

world[1:10, ]
## Simple feature collection with 10 features and 10 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -55.25 xmax: 180 ymax: 83.23324
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
##    iso_a2        name_long     continent region_un          subregion
## 1      FJ             Fiji       Oceania   Oceania          Melanesia
## 2      TZ         Tanzania        Africa    Africa     Eastern Africa
## 3      EH   Western Sahara        Africa    Africa    Northern Africa
## 4      CA           Canada North America  Americas   Northern America
## 5      US    United States North America  Americas   Northern America
## 6      KZ       Kazakhstan          Asia      Asia       Central Asia
## 7      UZ       Uzbekistan          Asia      Asia       Central Asia
## 8      PG Papua New Guinea       Oceania   Oceania          Melanesia
## 9      ID        Indonesia          Asia      Asia South-Eastern Asia
## 10     AR        Argentina South America  Americas      South America
##                 type    area_km2       pop  lifeExp gdpPercap
## 1  Sovereign country    19289.97    885806 69.96000  8222.254
## 2  Sovereign country   932745.79  52234869 64.16300  2402.099
## 3      Indeterminate    96270.60        NA       NA        NA
## 4  Sovereign country 10036042.98  35535348 81.95305 43079.143
## 5            Country  9510743.74 318622525 78.84146 51921.985
## 6  Sovereign country  2729810.51  17288285 71.62000 23587.338
## 7  Sovereign country   461410.26  30757700 71.03900  5370.866
## 8  Sovereign country   464520.07   7755785 65.23000  3709.082
## 9  Sovereign country  1819251.33 255131116 68.85600 10003.089
## 10 Sovereign country  2784468.59  42981515 76.25200 18797.548
##                              geom
## 1  MULTIPOLYGON (((180 -16.067...
## 2  MULTIPOLYGON (((33.90371 -0...
## 3  MULTIPOLYGON (((-8.66559 27...
## 4  MULTIPOLYGON (((-122.84 49,...
## 5  MULTIPOLYGON (((-122.84 49,...
## 6  MULTIPOLYGON (((87.35997 49...
## 7  MULTIPOLYGON (((55.96819 41...
## 8  MULTIPOLYGON (((141.0002 -2...
## 9  MULTIPOLYGON (((141.0002 -2...
## 10 MULTIPOLYGON (((-68.63401 -...

The summary above gives us a lot of information on the Simple feature. On the non-spatial side, we can see that it has 177 features and 10 fields. Let’s think of it as a usual spreadsheet or, to stay in GIS terms, an attribute table.

On the spatial side, we get information on geographical data e.g. the “real world”. Namely the fields geometry type, dimension, bbox and CRS information - epsg (SRID) and proj4string. They are briefly introduced:

Geometry types

Der Punkt als Träger der geometrischen Information

All geometries boil down to one or more points. The most common geometriy types are points, lines, polygons and their respective ‘multi’ versions.

The geometry type of the world object is Multipolygon.

Dimension

Refers to the dimension the data is displayed. Cartesian coordinate system in eiter two (XY) or or thre dimensions (XYZ or rarely XYM).

Bounding box (bbox)

The bounding box is the minimum or smallest bounding or enclosing box of a point, line or polygon or their respective ‘multi’ versions. It defines the extent of a geometry in xmin, ymin, xmax and ymax; depending on the coordinate reference system, the values of the bbox are either spherical (lon/lat) or projected e.g. cartesian coordinates.

# Load the rmarkdown package in order to load figure
if (!require(rmarkdown)){install.packages("rmarkdown"); library(rmarkdown)}
Examples of bounding boxes for different geometries

Examples of bounding boxes for different geometries

The bounding box of an object can be retrieved separately:

st_bbox(world)
##       xmin       ymin       xmax       ymax 
## -180.00000  -90.00000  180.00000   83.64513

Coordinate reference systems (crs)

In geography, a coordinate system is a reference system that enables every location on Earth to be specified by a set of numbers, letters or symbols. Each CRS is defined by:

  • Its measurement framework, which is either geographic (in which spherical coordinates are measured from the earth’s center) or projected e.g. planimetric (in which the earth’s coordinates are projected onto a two-dimensional planar surface)
  • Its units of measurement (feet/meters for projected, lon/lat degrees for geographic)
  • For projected coordinate systems a definition of the map projection
  • Other measurement system properties such as a spheroid of reference, a datum, one or more standard parallels, a central meridian, and possible shifts in the x- and y-directions

A brief summary of what coordinate systems are is given here.

In R, there are two ways to describe a crs. Either by using the epsg code or the proj4string definition.

EPSG (SRID)

The European Petroleum Survey Group Geodesy (EPSG) is best known for its system of spatial reference IDs for coordinate systems. An EPSG code is aunique ID that can be a simple way to identify a CRS.

Use st_crs to extract coordinate system (CS) information from an sf object. This gives us the EPSG code and the PROJ4 projection string.

st_crs(world)
## Coordinate Reference System:
##   EPSG: 4326 
##   proj4string: "+proj=longlat +datum=WGS84 +no_defs"

A list of epsg codes can be found here.

The EPSG code may not always be available for a particular coordinate system, but if a spatial object has a defined coordinate system, it will always have a PROJ4 projection string. Its multi-parameter syntax is briefly discussed next.

proj4string

PROJ is a generic coordinate transformation software that transforms geospatial coordinates from one coordinate reference system (CRS) to another. The PROJ4 syntax consists of a list of parameters used in defining a coordinate system. The parameters are combined in a single string by the + character. Most common parameters are:

Parameter Description
+proj Projection name
+ellps Ellipsoid name
+datum Datum name
+units meters, US survey feet, etc.
+x_0 False easting
+lat_0 Latitude of origin

Further information can be found on the PROJ4 website. If you know the SRID of your crs, you can copy the PROJ4 syntax from this website.

Assigning and transforming crs

Assigning and transformation the crs of an sf object is relatively straight forward. We subset the world data to Canada.

canada <- world[world$name_long == "Canada", ]

In cases when a coordinate reference system (CRS) is missing or the wrong CRS is set, the st_set_crs() function can be used. st_transform will convert to a different crs. All these functions are demonstrated below with the canada object.

st_crs(canada)
## Coordinate Reference System:
##   EPSG: 4326 
##   proj4string: "+proj=longlat +datum=WGS84 +no_defs"
# set crs to NA
st_crs(canada) <- NA
st_crs(canada)
## Coordinate Reference System: NA
# assign crs back with epsg code
st_crs(canada) <- 4326
st_crs(canada)
## Coordinate Reference System:
##   EPSG: 4326 
##   proj4string: "+proj=longlat +datum=WGS84 +no_defs"
# alternatively use proj4 string
st_crs(canada) <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs" 
st_crs(canada)
## Coordinate Reference System:
##   EPSG: 4326 
##   proj4string: "+proj=longlat +datum=WGS84 +no_defs"

Now we transform the canada object to a new crs: EPSG:3347 NAD83 / Statistics Canada Lambert:

st_crs(canada)
## Coordinate Reference System:
##   EPSG: 4326 
##   proj4string: "+proj=longlat +datum=WGS84 +no_defs"
#create/copy new object and transform 
canada_transform <- st_transform(canada, 3347)
#test if transformation was successful
st_crs(canada_transform)
## Coordinate Reference System:
##   EPSG: 3347 
##   proj4string: "+proj=lcc +lat_1=49 +lat_2=77 +lat_0=63.390675 +lon_0=-91.86666666666666 +x_0=6200000 +y_0=3000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"

Now we can compare the two identical datasets with differing coordinate systems.

par(mfrow=c(1,2)) # both plots together
plot(st_geometry(canada_transform), main = "EPSG: 3347")
plot(st_geometry(canada), main = "EPSG: 4326")

CRS and the area of geometries

st_area returns the area of a geometry, in the coordinate reference system used; this means we need to know the units of the crs. In case it is in degrees longitude/latitude, st_geod_area is used for area calculation.

area1 <- st_area(st_geometry(canada))
area1
## 1.003604e+13 [m^2]
# now transforme data to a crs with feet as units and see what happens
area2 <- st_area(st_transform(st_geometry(canada), 2256)) #NAD83 / Montana (ft)
area2
## 1.175294e+14 [ft^2]

How do we convert units to other units? We can use the units package instead of calculating manually. Depending on the crs, the area may change.

if (!require(units)){install.packages("units"); library(units)}
set_units(area1, km^2)
## 10036043 [km^2]
set_units(area2, km^2)
## 10918834 [km^2]

Reading and writing of data

Normally, the data we use is not stored within a package but rather in a folder or database. In that case we use st_read to get the file. However, in our case we get the data from Github.

# datasource was set as path at the beginning
# load from Github
download.file("https://raw.githubusercontent.com/RafHo/teaching/master/angewandte_geodatenverarbeitung/datasource/nc.zip", destfile = "nc.zip")

# unzip shapefile
unzip("nc.zip")

nc_sf <- st_read("nc.shp")
## Reading layer `nc' from data source `C:\Users\Nils Riach\Github\spatial_thinking_website\blogdown\content\courses\angewandte_geodatenverarbeitung\nc.shp' using driver `ESRI Shapefile'
## Simple feature collection with 100 features and 14 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## epsg (SRID):    4267
## proj4string:    +proj=longlat +datum=NAD27 +no_defs

In order to load a file from your local computer use the function like this: nc_sf <- st_read("YOUR/PATH/nc.shp").

In order to write a simple features object to a file, we need the sf object, the dsn, the name we want to give the object and the driver (e.g. shapefile, csv …) :

st_write(nc_sf, dsn = "YOUR/FILEPATH/nc_sf_test.shp", driver= "ESRI Shapefile", delete_layer = TRUE)
# delete layer overwrites

Alternatively you can use the rgdal package in order to read spatial (sp package) data into R and turn them into Spatial* family objects. In that case we need to load that library.

if (!require(rgdal)){install.packages("rgdal"); library(rgdal)}
nc_sp <- readOGR("nc.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\Nils Riach\Github\spatial_thinking_website\blogdown\content\courses\angewandte_geodatenverarbeitung\nc.shp", layer: "nc"
## with 100 features
## It has 14 fields

In order to use the sf functions, we also need to transform the data to sf objects. Otherwise functions may not work or work differently as the plot() examples demonstrate.

nc_sp_to_sf <- st_as_sf(nc_sp)
plot(nc_sf) # sf object
## Warning: plotting the first 10 out of 14 attributes; use max.plot = 14 to plot
## all

plot(nc_sp) # sp object

With rgdal, writing data is quite similar to reading:

writeOGR(nc_sp, dsn="YOUR/FILEPATH", layer = "nc_sp", driver = "ESRI Shapefile", overwrite=TRUE)

But why use rgdal?

GDAL supports over 200 raster (for raster the raster package is recommended) formats and vector formats. Use ogrDrivers() and gdalDrivers() (without arguments) to find out which formats your rgdal install can handle. In fact, st_read and st_write both rely on GDAL. Although, the sf package is faster than rgdal in loading shapefiles, rgdal is still used quite often (probably because the sf package is still quite young). Working with rgdal is not pretty but it’s a powerful and important tool for reading vector data. Knowing the quirks (tips) and creating a cheat sheet for yourself will save a lot of hand wringing and allow you to start having fun with spatial analysis in R.

Previous
Next