Spatial Data Visualization

Tao Tao

12/9/2019

Announcement

Final report

Why spatial visualization with R?

  1. Powerful and professional
  2. Good at interactive operation and systematic data management
  3. Not good at statistical computing
  4. The work is not easy to reproduce
  1. Many good packages (e.g., sf, ggplot)
  2. Powerful statistical learning
  3. The work is easy to reproduce
  4. Not good at interactive operation

When spatial visualization with R?

Point, Line, and Polygon

geometry column

Coordinate Reference System (CRS)

Coordinate Reference System (CRS)

Preparation before start

We need to install the package.

install.packages('sf')

And import the package.

library(sf)
## Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3

Read files

The file format to store the spatial data is shapefile. We could import the shapefile with st_read() function into R.

map <- st_read('Census2010RealignTract.shp')
## Reading layer `Census2010RealignTract' from data source `C:\UMN\Course\R_course\PA5928-Data-management-and-visualization-with-R\Spatial_data_visualization_slides\Census2010RealignTract.shp' using driver `ESRI Shapefile'
## Simple feature collection with 708 features and 10 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 419967.5 ymin: 4924224 xmax: 521254.7 ymax: 5029010
## epsg (SRID):    26915
## proj4string:    +proj=utm +zone=15 +datum=NAD83 +units=m +no_defs

head(map, 3)
## Simple feature collection with 3 features and 10 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 448700 ymin: 4924224 xmax: 516478.2 ymax: 4942115
## epsg (SRID):    26915
## proj4string:    +proj=utm +zone=15 +datum=NAD83 +units=m +no_defs
##   TRACT STATE  CO TRACTCE10       GEOID   ALAND10 AWATER10    Acres
## 1 61502    27 037    061502 27037061502 149642919  4668565 38134.50
## 2 61402    27 037    061402 27037061402 211963640  3786753 53258.86
## 3 81200    27 139    081200 27139081200  90667239  3288582 23205.13
##   Shape_Leng Shape_Area                       geometry
## 1   54449.32  154324840 POLYGON ((477636.4 4932330,...
## 2   66036.81  215530975 POLYGON ((496883.5 4929115,...
## 3   38782.40   93907812 POLYGON ((448700 4932487, 4...

CRS of spatial data

We could use st_crs() to check the CRS of the variable.

st_crs(map)
## Coordinate Reference System:
##   EPSG: 26915 
##   proj4string: "+proj=utm +zone=15 +datum=NAD83 +units=m +no_defs"

Luckily, this spatial dataset contains the right CRS. For the maps in the area of Minnesota, the CRS is the one showing above, 26915 for EPSG.

Assign CRS to spatial data

However, sometimes, the file does not have CRS information. You have to check the original source of the file and assign the right CRS to it.

st_crs(map) <- 26915
## Warning: st_crs<- : replacing crs does not reproject data; use st_transform
## for that

By doing this, there is a message telling that this function will not do the transformation for us. Sometimes, even if the file contains the right CRS, its CRS might be different from other files.

If you want to map several spatial datasets in the same page or want to do geospatial data analysis among them, you have to transfer the CRSs of them to the same one. You could do this by using st_transform().

Simple spatial visualization with ggplot2

We could use ggplot2 to visualize the map. For simply mapping the polygons, we do not need to speficify aes(). And we use geom_sf() to plot map.

library(ggplot2)
ggplot(map) + geom_sf()

Census system

Census system

Census system

Census system

The map shows all the census tracts in the Twin Cities metro areas.

Deal with R spatial data as a data frame

class(map)
## [1] "sf"         "data.frame"

library(dplyr)
new_map <- map %>%
  select(GEOID)
head(new_map, 3)
## Simple feature collection with 3 features and 1 field
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 448700 ymin: 4924224 xmax: 516478.2 ymax: 4942115
## epsg (SRID):    26915
## proj4string:    +proj=utm +zone=15 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
##         GEOID                       geometry
## 1 27037061502 POLYGON ((477636.4 4932330,...
## 2 27037061402 POLYGON ((496883.5 4929115,...
## 3 27139081200 POLYGON ((448700 4932487, 4...

We could join other data to the current geospatial variable by the join functions in dplyr. Firstly, we import the data.

library(readr)
data <- read_csv('data.csv')
Variable Descriptions
GEOID Unique identifier used by Census FactFinder website
POPTOTAL Total population
HHTOTAL Total households
AGEUNDER18 Population age under 18
AGE18_39 Population in this range
AGE40_64 Population in this range
AGE65UP Populationage 65+

Join the the demographic information to the map

Then, we do the join operation.

new_map <- new_map %>%
  left_join(data, by = 'GEOID')
head(new_map, 2)
## Simple feature collection with 2 features and 7 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 477595.8 ymin: 4924224 xmax: 516478.2 ymax: 4941888
## epsg (SRID):    26915
## proj4string:    +proj=utm +zone=15 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
##         GEOID POPTOTAL HHTOTAL AGEUNDER18 AGE18_39 AGE40_64 AGE65UP
## 1 27037061502     2912    1068        726      549     1094     543
## 2 27037061402     3756    1388        899      937     1444     476
##                         geometry
## 1 POLYGON ((477636.4 4932330,...
## 2 POLYGON ((496883.5 4929115,...

Data management

We calculate the percentage of old people in each census tract.

new_map <- new_map %>%
  mutate(old_percent = AGE65UP/POPTOTAL)
head(new_map, 3)
## Simple feature collection with 3 features and 8 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 448700 ymin: 4924224 xmax: 516478.2 ymax: 4942115
## epsg (SRID):    26915
## proj4string:    +proj=utm +zone=15 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
##         GEOID POPTOTAL HHTOTAL AGEUNDER18 AGE18_39 AGE40_64 AGE65UP
## 1 27037061502     2912    1068        726      549     1094     543
## 2 27037061402     3756    1388        899      937     1444     476
## 3 27139081200     6297    2193       1737     1571     2190     799
##                         geometry old_percent
## 1 POLYGON ((477636.4 4932330,...   0.1864698
## 2 POLYGON ((496883.5 4929115,...   0.1267306
## 3 POLYGON ((448700 4932487, 4...   0.1268858

Spatial data visualization

p <- ggplot(new_map, aes(fill = old_percent)) + # use fill to indicate the variable you want to visualize
  geom_sf()
p

Change the color

The color is not good, we could add scale_fill_gradient() to specify the colours we want.

p + scale_fill_gradient(low = 'white', high = 'red')

Add title and theme

Again, we could add more information to make the figure better and more readable, and change the theme a little bit.

p + scale_fill_gradient(low = 'white', high = 'red') +
  labs(title = 'Distribution of old population in the Twin Cities area',
       fill = 'Percentage') +
  theme_bw()

Save the plot with ggsave()

Finally, we could use ggsave() to save the plot we want after we run the ggplot() function.

ggsave('fig1.jpg', width = 10, height = 10)

Add other layers in the current plot

transit_route <- st_read('TransitRoutes.shp')
## Reading layer `TransitRoutes' from data source `C:\UMN\Course\R_course\PA5928-Data-management-and-visualization-with-R\Spatial_data_visualization_slides\TransitRoutes.shp' using driver `ESRI Shapefile'
## Simple feature collection with 227 features and 32 fields
## geometry type:  MULTILINESTRING
## dimension:      XY
## bbox:           xmin: 409591 ymin: 4947433 xmax: 515317.7 ymax: 5046090
## epsg (SRID):    26915
## proj4string:    +proj=utm +zone=15 +datum=NAD83 +units=m +no_defs

ggplot(new_map) +
  geom_sf() +
  geom_sf(data = transit_route)

ggplot(new_map) +
  geom_sf(fill = NA, colour = 'Grey') +
  geom_sf(data = transit_route, colour = 'Blue') +
  labs(title = 'Transit route in the Twin Cities Metro area') +
  theme_bw()