7. Standard tools for spatial data analysis II
Foreword on working directory, data and packages
The Data for this tutorial are provided via Github. In order to reproudce the code with your own data, replace the url with your local filepath
If you work locally, you may define the working directory
setwd("YOUR/FILEPATH")
# this will not work. if you want to use it, define a path on your computer
This should be the place where your data and other folders are stored. more Information can be found here
Code chunks for saving data and results are disabeled in this tutorial (using a #). Please replace (“YOUR/FILEPATH”) with your own path.
The code is written in such a way, that packages will be automatically installed (if not already installed) and loaded. Packages will be stored in the default location of your computer.
Intro
Simple features, i.e. records that come with a geometry, can be manipulated or changed. for the case where these manipulations involve geometries, different functions than with data frames are necessary. This is the content of this tutorial.
We will look at different ways to check and change the geometries of sf
objects. Additionally, we will create new geometries
We will work with the sf package
. The Tutorial is inspired by the sf package vignettes and this tutorial by Edzer Pebesma
if (!require(sf)){install.packages("sf"); library(sf)}
## Lade nötiges Paket: sf
## Warning: Paket 'sf' wurde unter R Version 4.1.3 erstellt
## Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 7.2.1; sf_use_s2() is TRUE
Methods for simple feature objects
A complete list of methods for a particular class can be obtained with methods(class = "sf")
. To see al objects in the package use ls("package:sf")
and for all functions us lsf("package:sf")
methods(class = "sf")
## [1] $<- [
## [3] [[<- aggregate
## [5] as.data.frame cbind
## [7] coerce dbDataType
## [9] dbWriteTable filter
## [11] identify initialize
## [13] merge plot
## [15] print rbind
## [17] show slotsFromS3
## [19] st_agr st_agr<-
## [21] st_area st_as_s2
## [23] st_as_sf st_as_sfc
## [25] st_bbox st_boundary
## [27] st_buffer st_cast
## [29] st_centroid st_collection_extract
## [31] st_convex_hull st_coordinates
## [33] st_crop st_crs
## [35] st_crs<- st_difference
## [37] st_drop_geometry st_filter
## [39] st_geometry st_geometry<-
## [41] st_inscribed_circle st_interpolate_aw
## [43] st_intersection st_intersects
## [45] st_is st_is_valid
## [47] st_join st_line_merge
## [49] st_m_range st_make_valid
## [51] st_minimum_rotated_rectangle st_nearest_points
## [53] st_node st_normalize
## [55] st_point_on_surface st_polygonize
## [57] st_precision st_reverse
## [59] st_sample st_segmentize
## [61] st_set_precision st_shift_longitude
## [63] st_simplify st_snap
## [65] st_sym_difference st_transform
## [67] st_triangulate st_union
## [69] st_voronoi st_wrap_dateline
## [71] st_write st_z_range
## [73] st_zm transform
## see '?methods' for accessing help and source code
We will use the nc
data set from Cressie (1991)1, which is stored in datasource
. The data set contains data of sudden infant Death Syndrom (SIDS) from North Carolina’s counties between 1974-78 and 1979-84). For more information about the contents of the collumns refer to this page.
# load from Github
download.file("https://raw.githubusercontent.com/RafHo/teaching/master/angewandte_geodatenverarbeitung/datasource/nc.zip", destfile = "nc.zip")
# unzip shapefile
unzip("nc.zip")
nc_sf <- st_read("nc.shp")
## Reading layer `nc' from data source
## `V:\github\nils_website\blogdown\content\courses\angewandte_geodatenverarbeitung\nc.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 100 features and 14 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## Geodetic CRS: NAD27
Recap: using the right crs
The crs of nc_sf
is EPSG: 4267 which has its units in Lon/Lat. We will transform it to “NAD83 / North Carolina” which has its units in metres. This is done for more convenient handling of the data.
st_crs(nc_sf)
## Coordinate Reference System:
## User input: NAD27
## wkt:
## GEOGCRS["NAD27",
## DATUM["North American Datum 1927",
## ELLIPSOID["Clarke 1866",6378206.4,294.978698213898,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["latitude",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["longitude",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4267]]
nc_sf <- st_transform(nc_sf, 32119) # NC state plane (a2), m
st_crs(nc_sf )
## Coordinate Reference System:
## User input: EPSG:32119
## wkt:
## PROJCRS["NAD83 / North Carolina",
## BASEGEOGCRS["NAD83",
## DATUM["North American Datum 1983",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4269]],
## CONVERSION["SPCS83 North Carolina zone (meters)",
## METHOD["Lambert Conic Conformal (2SP)",
## ID["EPSG",9802]],
## PARAMETER["Latitude of false origin",33.75,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8821]],
## PARAMETER["Longitude of false origin",-79,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8822]],
## PARAMETER["Latitude of 1st standard parallel",36.1666666666667,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8823]],
## PARAMETER["Latitude of 2nd standard parallel",34.3333333333333,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8824]],
## PARAMETER["Easting at false origin",609601.22,
## LENGTHUNIT["metre",1],
## ID["EPSG",8826]],
## PARAMETER["Northing at false origin",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8827]]],
## CS[Cartesian,2],
## AXIS["easting (X)",east,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["northing (Y)",north,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## USAGE[
## SCOPE["Engineering survey, topographic mapping."],
## AREA["United States (USA) - North Carolina - counties of Alamance; Alexander; Alleghany; Anson; Ashe; Avery; Beaufort; Bertie; Bladen; Brunswick; Buncombe; Burke; Cabarrus; Caldwell; Camden; Carteret; Caswell; Catawba; Chatham; Cherokee; Chowan; Clay; Cleveland; Columbus; Craven; Cumberland; Currituck; Dare; Davidson; Davie; Duplin; Durham; Edgecombe; Forsyth; Franklin; Gaston; Gates; Graham; Granville; Greene; Guilford; Halifax; Harnett; Haywood; Henderson; Hertford; Hoke; Hyde; Iredell; Jackson; Johnston; Jones; Lee; Lenoir; Lincoln; Macon; Madison; Martin; McDowell; Mecklenburg; Mitchell; Montgomery; Moore; Nash; New Hanover; Northampton; Onslow; Orange; Pamlico; Pasquotank; Pender; Perquimans; Person; Pitt; Polk; Randolph; Richmond; Robeson; Rockingham; Rowan; Rutherford; Sampson; Scotland; Stanly; Stokes; Surry; Swain; Transylvania; Tyrrell; Union; Vance; Wake; Warren; Washington; Watauga; Wayne; Wilkes; Wilson; Yadkin; Yancey."],
## BBOX[33.83,-84.33,36.59,-75.38]],
## ID["EPSG",32119]]
#in order to find out the units of the crs
st_crs(32119)$units
## [1] "m"
Geometric confirmation or logical queries
Tests for geometric characteristics e.g. topology.
- st_intersects: touch or overlap
- st_disjoint: !intersects
- st_touches: touch
- st_crosses: cross (do not touch)
- st_within: within
- st_contains: contains
- st_overlaps: overlaps
- st_covers: cover
- st_covered_by: covered by
- st_equals: equals
- st_equals_exact: equals, with some fuzz
All these functions return a sparse matrix with matching (TRUE) indexes, or a full logical matrix. This can be specified in the functions. To illustrate this, we will check if two geometries intersect. First we reduce the object to only five geometries to make it more understandable and take a look at thier geometries:
nc1 <- nc_sf[1:5,] # only use rows 1 to 5 of the original data set
plot(st_geometry(nc1))
# default usage of the function
# sparse index list is returned
st_intersects(nc1, nc1)
## Sparse geometry binary predicate list of length 5, where the predicate
## was `intersects'
## 1: 1, 2
## 2: 1, 2, 3
## 3: 2, 3
## 4: 4
## 5: 5
#dense logical matrix with element i,j TRUE
st_intersects(nc1, nc1, sparse = FALSE)
## [,1] [,2] [,3] [,4] [,5]
## [1,] TRUE TRUE FALSE FALSE FALSE
## [2,] TRUE TRUE TRUE FALSE FALSE
## [3,] FALSE TRUE TRUE FALSE FALSE
## [4,] FALSE FALSE FALSE TRUE FALSE
## [5,] FALSE FALSE FALSE FALSE TRUE
Check out the other functions and their functionalities!
Geometric operations
While the functions above return matrixes, the following functions return new geometries
- st_boundary: return the boundary of a geometry
- st_buffer: compute a buffer around this geometry/each geometry
- st_centroid: gives centroid of geometry
- st_convex_hull: creates convex hull of set of points
- st_line_merge: merges lines
- st_polygonize: creates polygon from lines
- st_segmentize: adds points to straight lines
- st_simplify: simplifies lines by removing vertices
- st_union: creates one geometry from several geometries
- st_difference: difference between pairs of geometries that do not intersect
- st_interpolate: Areal-weighted interpolation of polygon data
Let’s take a look at the more basic geometric operations.
par(mfrow=c(1,2)) # both plots together
ncg <- st_geometry(nc_sf[1:3,]) # Get, set, or replace geometry from an sf object- here the first three objects
plot(ncg, col = sf.colors(3, categorical = TRUE))
u <- st_union(ncg) # unionize the three objects to one
plot(u, lwd = 2)
#check where the first and second object overlap or touch
plot(st_intersection(ncg[1], ncg[2]), col = 'red', add = TRUE)
# set buffers inside and outside object
# dist/units depend on crs! Here it is
plot(st_buffer(u, 10000), border = 'blue', add = TRUE)
# st_buffer(u, units::set_unit(10, km)) # with sf devel
plot(st_buffer(u, -5000), border = 'yellow', add = TRUE)
Geometric creation
- st_triangulate: triangulates set of points (not constrained)
- st_voronoi: creates voronoi tesselation
- st_point: creates point geometry from XYZ numeric data.
- st_ …. functions for creating geometries lines, polygons and their respective ‘multi’ versions
More advanced: aggregate, interpolate, relationships and join
Aggregate
Aggregation means condensating data. Aggregated data show some statistics about a variable (typically average or total) in relation to some kind of grouping variable.
In the following example, we will divide the counties into four regions and dissolve them. Assignment to the region is performed by extracting the longtitudinal coordinates of the centroids of each county.
After the counties are dissolved, the area of the new regions need to be calculated.
Meanwhile, the values of "SID79"
are summarized and then the density of SID for the new regions is calculated.
cent <- st_centroid(nc_sf$geometry) # get centroids of counties
coords <- st_coordinates(cent) # get coordinates of centoids
nc_new <- nc_sf
nc_new$longitude <- coords[,1] #get x values which are the longtitudes
nc_new$groups <- cut(nc_new$longitude, 4, labels = c("coast","center_east","center_west", "west")) # name the groups
#library(tidyverse, lib.loc= "T:/AG2019/R/win-library/3.5")
#nc_regions <- nc_new %>%
# dplyr::group_by(groups) %>%
#dplyr::summarize() %>%
#dplyr::mutate(area = as.numeric(st_area(.)))# as numeric damit m^2 weggeht
nc_regions <- nc_new
nc_regions <- as.numeric(st_area(nc_regions))
nc_regions <- nc_new %>% dplyr::group_by(groups) %>% dplyr::summarize()
# uses dplyr package
nc_avg_area <- aggregate(nc_sf["SID79"], nc_regions , FUN = mean)
plot(nc_avg_area)
Interpolation
Create a grid over the bounding box of nc with 20 columns and 10 rows
g <- st_make_grid(nc_sf, n = c(20,10))
plot(g)
perform Areal-weighted interpolation of births in 1974 on to the grid
a1 <- st_interpolate_aw(nc_sf["BIR74"], g, extensive = FALSE)
## Warning in st_interpolate_aw.sf(nc_sf["BIR74"], g, extensive = FALSE):
## st_interpolate_aw assumes attributes are constant or uniform over areas of x
From ?st_interpolate_aw
we can see that extensive =
logical; if TRUE, the attribute variables are assumed to be spatially extensive (like population) and the sum is preserved, otherwise, spatially intensive (like population density) and the mean is preserved.
To learn more about spatially extensive vs spatially intensive variables check this page
Intensive Data are independent of the spatial units (e.g., population density, percentages); data that has been normalized in some fashion.
Extensive Data are dependent on the spatial units (e.g., population totals)
To demonstrate the difference, a2 is interpolated with “extensive=TRUE”
a2 <- st_interpolate_aw(nc_sf["BIR74"], g, extensive = TRUE)
## Warning in st_interpolate_aw.sf(nc_sf["BIR74"], g, extensive = TRUE):
## st_interpolate_aw assumes attributes are constant or uniform over areas of x
## Warning in st_interpolate_aw(nc["BIR74"], g, extensive = TRUE):
## st_interpolate_aw assumes attributes are constant over areas of x
Differences in interpolation are demonstrated by dividing a1 and a2 through nc and mapping the result
sum(a1$BIR74) / sum(nc_sf$BIR74) #not close to one: property is assumed spatially intensive
## [1] 1.430389
sum(a2$BIR74) / sum(nc_sf$BIR74)
## [1] 1
#a1$intensive = a1$BIR74
#a1$extensive = a2$BIR74
#plot(a1[c("intensive", "extensive")])
a1$what = "intensive"
a2$what = "extensive"
if (!require(ggplot2)){install.packages("ggplot2"); library(ggplot2)}
## Lade nötiges Paket: ggplot2
## Warning: Paket 'ggplot2' wurde unter R Version 4.1.3 erstellt
l = st_cast(nc_sf, "MULTILINESTRING") # simplify the polygon to multilinestring
ggplot() + geom_sf(data = rbind(a1,a2), aes(fill = BIR74)) +
geom_sf(data = l, col = 'lightgray') + facet_grid(what~.) +
scale_fill_gradientn(colors = sf.colors(10))
Joins
Joining different data to one another is one of the most needed operations in (geographic) data analysis! There are spatial joins, where the common denominator is space or joins based on attributes.
Spatial joins
Spatial joins. First we need to find counties matching a coordinate (in map units). Create point, make it spatial and join it to nc_sf.
pt <- st_point(c(472038, 210385))
pt_sfc <- st_sfc(pt, crs = st_crs(nc_sf)) # create simple feature geometry list column
pt_sf <- st_sf(a = 1, geom = pt_sfc) #transform to sf object
st_join(pt_sf, nc_sf)
## Simple feature collection with 1 feature and 15 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 472038 ymin: 210385 xmax: 472038 ymax: 210385
## Projected CRS: NAD83 / North Carolina
## a AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74
## 1 1 0.134 1.59 1980 1980 Rowan 37159 37159 80 4606 3
## NWBIR74 BIR79 SID79 NWBIR79 geom
## 1 1057 6427 8 1504 POINT (472038 210385)
st_join(nc_sf, pt_sf, left = FALSE)
## Simple feature collection with 1 feature and 15 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 449766.7 ymin: 194343.4 xmax: 502408.9 ymax: 234475.9
## Projected CRS: NAD83 / North Carolina
## AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74
## 50 0.134 1.59 1980 1980 Rowan 37159 37159 80 4606 3
## NWBIR74 BIR79 SID79 NWBIR79 a geometry
## 50 1057 6427 8 1504 1 MULTIPOLYGON (((491841.7 19...
The first time the join is performed on the points, the second time on the Multipolygon.
Why left = TRUE? Because the MUltipoligon on which we joun has more than one row. So we need to specify how the join is performed.
The difference of “left” being TRUE or False is that either a left join or else an inner join is performed.
Left Join returns all rows from x where there are matching values in y, and all columns from x and y. If there are multiple matches between x and y, all combination of the matches are returned.
Inner Join returns all rows from y, and all columns from x and y. Rows in y with no match in x will have NA values in the new columns. If there are multiple matches between x and y, all combinations of the matches are returned.
Join with self:
nrow(nc_sf)
## [1] 100
x <- st_join(nc_sf,nc_sf)
nrow(x)
## [1] 590
Since we didn’t specify, a left join is performed; this means that all combination of the matches are returned, leading to 590 rows.
If we only want neigbours excluding nc_sf
itself. we can specify the argument “join” within the st_join
function. Values for argument join are the geometric confirmation or logical queries from above:
st_disjoint st_touches st_crosses st_within st_contains st_overlaps st_covers st_covered_by st_equals or st_equals_exact, or user-defined functions of the same profile.
We will use st_touches:
x <- st_join(nc_sf, nc_sf, join = st_touches)
nrow(x)
## [1] 490
Now only 490 rows are returned
Now we want to join only neighnours that touch over a line, not in a point.
We have to define a function which we can retrieve from the explanation of the st_relate function. It is called st_rook, which refers to the so called DE9-IM relationship. Further info can be found here. The short version of the patterns can be found by using ?st_relate()
.
st_rook <- function(a, b, prepared) st_relate(a, b, pattern = "F***1****") # define function
# pattern means that relations touch over a line, not in a point
x <- st_join(nc_sf, nc_sf, join = st_rook) # use it on the nc data set
nrow(x)
## [1] 462
The number of rows is reduced again
If we want to find out which counties touch another county in a point only, we simply have to change the pattern in the function. After that the result is mapped:
sel <- unlist(st_relate(nc_sf, nc_sf, pattern = "F***0****")) #simplifies it to produce a vector which contains all the atomic components which occur in nc_sf e.g.that match the pattern
# The atomic modes are "logical", "integer", "numeric" (synonym "double"), "complex", "character" and "raw"
plot(st_geometry(nc_sf)) # st_geometry: set, or retrieve sfc from an sf object
plot(st_geometry(nc_sf[sel,]), add = TRUE, col = 'red')
Joins based on attributes
It is even more common, and way more simple, to join features based on their attributes. Here, geometries are ignored. If geometries do not match, an empty geometry will be returned. Normally, a data frame
is joined on to a sf
object. The base R function merge
or the dplyr functions left_join
etc. work here:
We will use the nc1
object which contains the first 5 rows of the nc_sf
object.
nc1 <- nc_sf[1:5,] # only use rows 1 to 5 of the original data set
names(nc1)
## [1] "AREA" "PERIMETER" "CNTY_" "CNTY_ID" "NAME" "FIPS"
## [7] "FIPSNO" "CRESS_ID" "BIR74" "SID74" "NWBIR74" "BIR79"
## [13] "SID79" "NWBIR79" "geometry"
we also create a data frame from nc1. Then we add a column with random letters.
nc_data_frame <- as.data.frame(nc1$CNTY_ID)
View(nc_data_frame)
nc_data_frame$random_numbers <- c(10,20,30,40,50)
View(nc_data_frame)
Now we can merge the two :
nc_merge <- merge(nc1, nc_data_frame) # sf object comes first
nc_merge
## Simple feature collection with 25 features and 16 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 363713.3 ymin: 262338.8 xmax: 899686.5 ymax: 318255.5
## Projected CRS: NAD83 / North Carolina
## First 10 features:
## AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74
## 1 0.114 1.442 1825 1825 Ashe 37009 37009 5 1091 1
## 2 0.061 1.231 1827 1827 Alleghany 37005 37005 3 487 0
## 3 0.143 1.630 1828 1828 Surry 37171 37171 86 3188 5
## 4 0.070 2.968 1831 1831 Currituck 37053 37053 27 508 1
## 5 0.153 2.206 1832 1832 Northampton 37131 37131 66 1421 9
## 6 0.114 1.442 1825 1825 Ashe 37009 37009 5 1091 1
## 7 0.061 1.231 1827 1827 Alleghany 37005 37005 3 487 0
## 8 0.143 1.630 1828 1828 Surry 37171 37171 86 3188 5
## 9 0.070 2.968 1831 1831 Currituck 37053 37053 27 508 1
## 10 0.153 2.206 1832 1832 Northampton 37131 37131 66 1421 9
## NWBIR74 BIR79 SID79 NWBIR79 nc1$CNTY_ID random_numbers
## 1 10 1364 0 19 1825 10
## 2 10 542 3 12 1825 10
## 3 208 3616 6 260 1825 10
## 4 123 830 2 145 1825 10
## 5 1066 1606 3 1197 1825 10
## 6 10 1364 0 19 1827 20
## 7 10 542 3 12 1827 20
## 8 208 3616 6 260 1827 20
## 9 123 830 2 145 1827 20
## 10 1066 1606 3 1197 1827 20
## geometry
## 1 MULTIPOLYGON (((387344.7 27...
## 2 MULTIPOLYGON (((408601.4 29...
## 3 MULTIPOLYGON (((478715.7 27...
## 4 MULTIPOLYGON (((878193.4 28...
## 5 MULTIPOLYGON (((769834.9 27...
## 6 MULTIPOLYGON (((387344.7 27...
## 7 MULTIPOLYGON (((408601.4 29...
## 8 MULTIPOLYGON (((478715.7 27...
## 9 MULTIPOLYGON (((878193.4 28...
## 10 MULTIPOLYGON (((769834.9 27...
nc_merge
suddenly has 25 rows instead of the original 5.
We can specify the join further by using the arguments available in the merge function. By specifying the common denominator; the county id; only one match is possible so we keep 5 rows.
nc_merge <- merge(nc1, nc_data_frame, by.x="CNTY_ID", by.y="nc1$CNTY_ID") # specify the columns by which to join
nc_merge
## Simple feature collection with 5 features and 15 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 363713.3 ymin: 262338.8 xmax: 899686.5 ymax: 318255.5
## Projected CRS: NAD83 / North Carolina
## CNTY_ID AREA PERIMETER CNTY_ NAME FIPS FIPSNO CRESS_ID BIR74 SID74
## 1 1825 0.114 1.442 1825 Ashe 37009 37009 5 1091 1
## 2 1827 0.061 1.231 1827 Alleghany 37005 37005 3 487 0
## 3 1828 0.143 1.630 1828 Surry 37171 37171 86 3188 5
## 4 1831 0.070 2.968 1831 Currituck 37053 37053 27 508 1
## 5 1832 0.153 2.206 1832 Northampton 37131 37131 66 1421 9
## NWBIR74 BIR79 SID79 NWBIR79 random_numbers geometry
## 1 10 1364 0 19 10 MULTIPOLYGON (((387344.7 27...
## 2 10 542 3 12 20 MULTIPOLYGON (((408601.4 29...
## 3 208 3616 6 260 30 MULTIPOLYGON (((478715.7 27...
## 4 123 830 2 145 40 MULTIPOLYGON (((878193.4 28...
## 5 1066 1606 3 1197 50 MULTIPOLYGON (((769834.9 27...
——-End——-
Cressie, N (1991), Statistics for spatial data. New York: Wiley, pp. 386–389; Cressie, N, Chan NH (1989) Spatial modelling of regional variables. Journal of the American Statistical Association, 84, 393–401; Cressie, N, Read, TRC (1985) Do sudden infant deaths come in clusters? Statistics and Decisions Supplement Issue 2, 333–349; http://sal.agecon.uiuc.edu/datasets/sids.zip↩︎