Typically we want to follow a general pipeline for getting data, cleaning it up, summarizing it, and then communicating that data as a story in some way. For more details, the excellent R for Data Science open source and free-online book by Garrett Grolemund & Hadley Wickham gives a nice run down of the various components of this pipeline.
One of the first good habits to get into is always load the functions or libraries you’ll be using throughout your script, in the very beginning. It makes it easier for other folks who may want to run your code, or if folks need to go install some packages, they can do that first before getting half way through without realizing why the code is breaking.
There are many spatial/mapping libraries in R, too many to even cover. Be sure to check out the Resources page for some additional suggestions. However, these are the main spatial libraries I use all the time. I’m going to try to stick to for this demo, plus a few additional ones useful for getting data with geospatial components (hydrology, geography, weather, etc).
## TOOLS library(sf) # reading/writing/analysis of spatial data library(dplyr) # wrangling and summarizing data library(viridis) # a nice color palette library(measurements) # for converting measurements (DMS to DD or UTM) library(mapview) # interactive maps library(tmap) # good mapping package mostly for static chloropleth style maps library(ggspatial) # for adding scale bar/north arrow and basemaps library(ggrepel) # labeling ggplot figures library(ggforce) # amazing cool package for labeling and fussing library(cowplot) ## MOOOOOOO!! for plotting multiple plots ## DATA library(tigris) # ROAD/CENSU DATA, see also the "tidycensus" package library(USAboundaries) # great package for getting state/county boundaries library(sharpshootR) # CDEC.snow.courses, CDECquery, CDECsnowQuery
Next, we want to read the data we’ll be using for our project. For this example, we are going to use the
sharpshootR package in R to load all CDEC Snow Course station locations, and plot them on a map.
# load the snow data from sharpshootR package data(CDEC.snow.courses) # data is a base function in R that loads data
Ok, so we have some data…but it needs some tidying. When we talk about “tidy” data, we mean one observation per row, and one variable per column (see here for more info on tidy data). Good news, there are some great tools in R we can use, including
tidyr. We can also check for missingness in our data using the
Lets do a few things (the bolded words are
summarizedata (tabulate or count over our rows/cols)
Let’s see how many missing values exist for each column in our data frame.
# view missingness library(naniar) # load the functions in this library # with the data we loaded earlier, check for missing data graphically gg_miss_var(CDEC.snow.courses,show_pct = T)
# summarize summary(CDEC.snow.courses) # tabulate each column
## course_number name id elev_feet latitude longitude april.1.Avg.inches agency ## Min. : 1.0 Length:259 Length:259 Min. : 4350 Min. :35.87 Min. :-123.2 Min. : 0.0 Length:259 ## 1st Qu.: 98.0 Class :character Class :character 1st Qu.: 6500 1st Qu.:37.29 1st Qu.:-120.6 1st Qu.:18.5 Class :character ## Median :196.0 Mode :character Mode :character Median : 7250 Median :38.42 Median :-119.9 Median :27.4 Mode :character ## Mean :199.3 Mean : 7583 Mean :38.61 Mean :-120.0 Mean :26.7 ## 3rd Qu.:285.5 3rd Qu.: 8700 3rd Qu.:39.73 3rd Qu.:-119.0 3rd Qu.:34.9 ## Max. :441.0 Max. :11450 Max. :41.81 Max. :-118.2 Max. :77.6 ## watershed ## Length:259 ## Class :character ## Mode :character ## ## ##
Now we can do a few things, including formatting our data/columns to different data types, renaming column names, and filtering/subsetting data.
# Make into a new dataframe CDEC.snow.courses snw <- # fix data formats...character/factor/numeric str(snw)
## 'data.frame': 259 obs. of 9 variables: ## $ course_number : int 1 2 3 5 417 311 4 298 285 9 ... ## $ name : chr "PARKS CREEK" "LITTLE SHASTA" "SWEETWATER" "MIDDLE BOULDER 1" ... ## $ id : chr "PRK" "LSH" "SWT" "MBL" ... ## $ elev_feet : num 6700 6200 5850 6600 6450 6200 5900 5700 5500 7200 ... ## $ latitude : num 41.4 41.8 41.4 41.2 41.6 ... ## $ longitude : num -123 -122 -123 -123 -123 ... ## $ april.1.Avg.inches: num 35.1 16.6 13.1 31.5 35.2 27.4 24.5 17.6 29.2 33.1 ... ## $ agency : chr "Mount Shasta Ranger District" "Goosenest Ranger District" "Mount Shasta Ranger District" "Salmon/Scott River Ranger District" ... ## $ watershed : chr "SHASTA R" "SHASTA R" "SHASTA R" "SCOTT R" ...
$course_number <- as.factor(snw$course_number) # switch from numeric to factor for plotting snw # rename a col: snw %>% # create a "pipe" with (Ctrl or Cmd + Shift + M). snw <- dplyr::rename(apr1avg_in=april.1.Avg.inches) # rename a column (new name = old name) # summary of data: summary(snw)
## course_number name id elev_feet latitude longitude apr1avg_in agency watershed ## 1 : 1 Length:259 Length:259 Min. : 4350 Min. :35.87 Min. :-123.2 Min. : 0.0 Length:259 Length:259 ## 2 : 1 Class :character Class :character 1st Qu.: 6500 1st Qu.:37.29 1st Qu.:-120.6 1st Qu.:18.5 Class :character Class :character ## 3 : 1 Mode :character Mode :character Median : 7250 Median :38.42 Median :-119.9 Median :27.4 Mode :character Mode :character ## 4 : 1 Mean : 7583 Mean :38.61 Mean :-120.0 Mean :26.7 ## 5 : 1 3rd Qu.: 8700 3rd Qu.:39.73 3rd Qu.:-119.0 3rd Qu.:34.9 ## 9 : 1 Max. :11450 Max. :41.81 Max. :-118.2 Max. :77.6 ## (Other):253
dim(snw) # how many rows and cols
##  259 9
# select/filter out the NA's from our dataset dplyr::filter(snw, !is.na(apr1avg_in)) %>% snw <- dplyr::select(-agency) # drop a column (agency) dim(snw)
##  259 8
sf package is amazing…it is becoming the workhorse for all things spatial. In this case, we’ll use the
sf package to quickly convert our point (X/Y) data into a spatial dataset we can work with (for more vignettes on the
sf package, see here). Note here we specify column numbers, but could use column names as well to designate our X/Y information. Also, I already know this data is in WGS84 and is lat/lon in decimal degrees, so using the
crs/EPSG=4326 but you may need to look this up for your data if it’s not part of the data. Most shapefiles already contain this information.
# make spatial st_as_sf(snw, snw_sf <-coords = c(6, 5), # can use col names here too, i.e., c("longitude","latitude") remove = F, # don't remove these lat/lon cols from df crs = 4326) # add projection
Great, now let’s save it out! Good news, we can save/write/read a whole bunch of different file types with
sf. For now, let’s save this as a shapefile for later use. Checkout saving to kml, geopackage, geojson, etc on the sf website.
We can also download a shapefile or set of zipped files directly from a web URL. See example code below…
# write shapefile out for use later st_write(snw_sf, "data_output/all_cdec_snow_stations.shp", delete_dsn = TRUE) st_write(snw_sf, "data_output/all_cdec_snow_stations.geojson", delete_dsn = TRUE) # download a file from a url: # download.file("https://github.com/ryanpeek/spatial_mapping_demo/blob/master/data/shps/cdec_snow_stations.zip?raw=true", destfile = "data/cdec_snow_stations.zip") # unzip the file into your "data" folder # unzip(zipfile = "data/cdec_snow_stations.zip", exdir = "data")
sf has it’s on base plotting function, but we need to give it the
geometry column so it knows what to plot. Remember since
sf are simple data frames, this is easy to do with the
# plain sf map plot(snw_sf$geometry)
# slightly fancier plot(snw_sf["elev_feet"], graticule = TRUE, axes = TRUE, main="Snow Station Elevations (ft)")
tmap package is great and has some nice plotting options. Check out the introduction vignette here.
library(tmap) tmap_mode("plot") tm_shape(snw_sf) + tm_layout(title = "Snow Station Elevations (ft)", frame=TRUE, inner.margin=0.1) + tm_compass(type = "8star", position = c(0.8, 0.8), size = 2) + tm_scale_bar(breaks = c(0, 50, 100), position = c("left", "bottom"), text.size = 0.7)+ tm_symbols(col="elev_feet",title.col = "Elev (m)", n = 5, palette = viridis(n=5, direction = -1), size = .7)
Before we make a nice interactive map, let’s manipulate the data a bit more. For this example, let’s grab some county boundaries and then select snow stations from only those counties, using the
st_intersection command. For those familiar with ArcMap, this is essentially a clip or crop of an existing shapefile using a different shapefile.
We are pulling county data here from the
USAboundaries package, and then loading some pre-cleaned/downloaded lake and stream shapefiles from the HydroSHEDS database. Finally, we crop these data to our selected counties.
Get the county, watershed, rivers, lakes data.
# use library(USAboundaries) to get county data us_counties(resolution = "low", states="CA") %>% counties_spec <- filter(name=="El Dorado" | name=="Placer") # get counties only named El Dorado and Placer
Haven’t mentioned it much yet, but using the
.gpkg format is a must! It’s awesome, it is tidy, and it does away with so much of the pain of dealing with all the millions of different shapefile bits that are always hard to manage. Even better, it’s open source and readable by ArcGIS, QGIS, etc., or as a straight SQLite database. Look for a separate lesson on this site just about geopackage. In the meantime, let’s read in more data from a geopackage (on the github repo for this site).
# read layers from a geopackage st_layers("data/mapping_in_R_data.gpkg")
## Driver: GPKG ## Available layers: ## layer_name geometry_type features fields ## 1 rivs_CA_OR_hydroshed Multi Line String 21976 26 ## 2 cdec_snow_stations Point 261 8 ## 3 h8_tahoe Polygon 1 5 ## 4 lakes_CA_OR_hydroshed Multi Polygon 2042 45 ## 5 h8_centroids Point 5718 20 ## 6 ca_coastline Line String 28807 7 ## 7 usgs_gages_clean 3D Point 2239 2 ## 8 lighthouses Point 54 2 ## 9 oceantrash Point 7063 62 ## 10 piers Point 200 4 ## 11 ports Point 97 17
# using read_sf here for quiet defaults, but can use st_read as well... # HUC8 watershed for Tahoe read_sf("data/mapping_in_R_data.gpkg", layer="h8_tahoe") h8_tahoe <- # Lakes for all of CA/OR from HydroSHEDS read_sf("data/mapping_in_R_data.gpkg", layer="lakes_CA_OR_hydroshed") lakes <- # Rivers for all of CA/OR from HydroSHEDS read_sf("data/mapping_in_R_data.gpkg", layer="rivs_CA_OR_hydroshed")rivers <-
To crop or clip our data, we use
st_intersection and specify the data we want to crop first then the boundary we want to use to crop/clip by second. Both should be in the same projection, and note a warning will occur for lon/lat data types, but it will still work.
# crop the snow stations to the counties of interest st_intersection(snw_sf, counties_spec) # intersect these so we only keep snow stations in these selected counties snw_crop <- # crop rivers and lakes st_intersection(rivers, counties_spec) # crop to our selected counties rivers_crop <- st_intersection(lakes, counties_spec) # crop to our selected countieslakes_crop <-
This is the really fun part…we can quickly and easily plot any spatial data (in sf format) as an interactive map using the
mapview package. For advanced mapview options, check out this page.
mapview(rivers_crop, color="steelblue", lwd=1.2, legend=FALSE) + mapview::mapview(lakes_crop, legend=FALSE) + mapview(snw_crop, col.regions="maroon", layer.name="CDEC SNOW STATIONS") # make a map!