Part III: Snapping Points to NHD Flowline Data and Calculating River Distance with riverdist
Though it’s a bit late in getting posted, this is part III in a spatial series that is becoming a quasi-workshop in the variety of ways we can work with spatial river & stream data in R. This post will continue using the sf
package, and introduce the riverdist
package, but we’ll still be playing with data on rivers and streams.
To recap, in the first few posts we made a leaflet map, scraped data from tables on a webpage, demonstrated how to calculate distance matrices between point data, and showed how to use functions written by folks at USGS (OWI) as well as the Hydrosheds data to download riverline data. That’s a fair amount to chew on I realize, but I’d like to demo a few additional things that might be useful for folks working with river data.
This post will (try to) show you:
- How to use the
riverdist
package to process a riverline network - How to snap spatial point data to the processed riverline data (which we downloaded in the previous post)
- Finally, how to calculate the river distance between sites (so along the river network)
- and a quick
ggmap
example
I think in a future post I’d like to demo how to implement a wavelet analysis of these flow data, as well as show an example of the very cool geoknife
package to assess patterns of river impairment. Onward!
So, hopefully these posts are useful and it doesn’t feel like this:
Load the Packages
The main packages I’m going to use in this post:
# load libraries
suppressMessages({
library(dplyr); # data munging and piping
library(purrr); # for list functions
library(ggplot2); # plotting
library(ggrepel) # for labeling
library(sf); # spatial simple features
library(USAboundaries); # state/county data
library(Imap); # nice mapping/color functions
#library(geoknife); # USGS tool set (Next post)
#library(dataRetrieval); # USGS tool set (next post)
#library(httr) # scraping webdata (to download flowlines)
})
# Note, to plot `sf` objects with `ggplot`, you need to have the most recent version (>2.2.1). A quick and easy check you can add to your code is an `if` statement: `if (utils::packageVersion("ggplot2") > "2.2.1"))`
Calculating River Distances
To use the riverdist
package, we need to load a shapefile with a river network of some type. This example uses NHD Streamline data, cropped to the North Fork American watershed. I downloaded the flowline data for the North Fork American watershed using the get_flowline
function from our previous post, at Stream Order 1, 2, 3, and 4. The function was from an excellent gist by Laura DeCicco (see here), and I modified it slightly to work with the sf
package.
To save time, I’ve put these data on github here for download. Let’s use HUC8 data from the previous post as well. Download it to a “data
” folder and the code below should work.
Load Flowline & HUC Data
Let’s take a look at the Streamline data for Stream Order=2
. Note the river network was downloaded using a bounding box or extent (max and min lat and lon values for a given shapefile), so it extends outside of the specific HUC8 watershed boundary we’re interested in.
# load the streamline data
load("data/nhd_rivs_ord_1_4.rda")
# load the hucs
huc8 <- read_sf(unzip("data/h8_AMR_BEA_YUB.zip"), quiet = F) %>%
st_transform(crs=4326) #%>%
## Reading layer `h8_AMR_BEA_YUB' from data source `/Users/ryanpeek/Documents/github/ryanpeek.github.io/h8_AMR_BEA_YUB.dbf' using driver `ESRI Shapefile'
## Simple feature collection with 4 features and 5 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: -121.5976 ymin: 38.61455 xmax: -119.9829 ymax: 39.77696
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
# warning message here is ok!
# then remove raw files since file is added in memory
file.remove(list.files(pattern = "h8_AMR_BEA_YUB*",recursive = F))
## [1] TRUE TRUE TRUE TRUE
# filter to the NF American only
h8 <- huc8 %>% filter(HU_8_NAME=="North Fork American")
# plot
plot(st_geometry(h8), axes=F, lwd = 3)
plot(st_geometry(rivers2), col="blue", add=T)
title(main = "NF American Watershed & NHD Flowline (Stream order=2)", family="Roboto Condensed")
Crop and Save as Shapefile
In order to use the riverdist
package, we are going to want to trim our flowline data to a single watershed, and get rid of the streams outside of that boundary. Let’s crop the flowline data to the NF American Watershed using sf::st_intersection
. Then we can write the data as a shapefile.
# crop river data by huc8 data with sf function
rivs2 <- st_intersection(rivers2, h8)
# plot to check:
plot(st_geometry(h8), axes=F, lwd=3)
plot(st_geometry(rivs2), col="blue", add=T)
title(main = "NF American Watershed & NHD Flowline (Stream order=2)", family="Roboto Condensed")
# okay looks good lets save:
st_write(rivs2, "data/nfa_rivers2.shp", delete_dsn = T)
## Deleting source `/Users/ryanpeek/Documents/github/ryanpeek.github.io/data/nfa_rivers2.shp' using driver `ESRI Shapefile'
## Writing layer `nfa_rivers2' to data source `/Users/ryanpeek/Documents/github/ryanpeek.github.io/data/nfa_rivers2.shp' using driver `ESRI Shapefile'
## features: 654
## fields: 95
## geometry type: Line String
# "delete_dsn=T" will allow you to overwrite existing file so you don't get an error like this:
### Error in CPL_write_ogr(obj, dsn, layer, driver, as.character(dataset_options), :
### Dataset already exists.
You might see a warning message about attribute field names, that’s okay, it’s just making field names unique. The file should be where you saved it, and have 4 separate files (.dbf
, .prj
, .shp
, .shx
). There can be more associated with a shapefile, but these 4 are the core set that are required.
Clean River Network Topology
Now that we have a river network, we need to clean it and check for errors so that when riverdist
is routing things along the network, distances will be accurate.
The main steps are:
- Load data and transform into a projected coordinate system
- Plot and Check Topology
- Clean the network topology so we can snap points, save data
- Calculate distances along the river network between sites
Load Data and Project
First we need to load the package and project our data. riverdist
won’t work unless data are in a projected coordinate system. If you aren’t sure what the difference between projected (PCS) and geographic coordinate systems (GCS) are, here’s the quickest summary I can provide.
- GCS are based on a spherical earth model (3-dimensional), so use coordinates like latitude & longitude (a datum) in a 3-dimensional space to define locations based on a spheroid model (of which there are several). The datum is used to link the spheroid model to the earth’s surface.
- PCS project locations onto a 2-dimensional Cartesian coordinate plane. That’s why we refer to these as map projections. Different projections have different ways of stretching the 3-d earth’s surface onto a flat 2-d plane. But by putting things onto a flat 2-d surface, calculations (like calculating distances along a river network) are usually easier to do.
# load the package
library(riverdist)
# riverdist only works w/ projected coordinate systems...a few options:
## EPSG:102003 USA_Contiguous_Albers_Equal_Area_Conic
## +proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=37.5 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs
## EPSG:102004 USA_Contiguous_Lambert_Conformal_Conic
## EPSG:102005 USA_Contiguous_Equidistant_Conic
# read in the network and project with Albers Equal Area
rivs <- line2network(path="data/", layer="nfa_rivers2", reproject = "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=37.5 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs")
##
## Units: m
##
## Removed 36 segments with lengths shorter than the connectivity tolerance.
# should see message about "Removed 36 Segments w short lengths"
# plot of all the segments
plot(x=rivs)
Plot and Check Topology
Ok, so we should see a whole bunch of rainbow colored numbered segments. If we want to look at topology, we can plot the nodes (confluences). Green dots represent connections or confluences between segments, red dots are “headwaters” or isolated branches. Notice there are some red dots in some odd locations.
# check topology (nodes)
topologydots(rivers=rivs)
Cleanup: Insert Vertices and Identify River Mouth
The riverdist
tutorial/vignette is pretty good and will walk folks through the main steps. For larger river networks and the finer scale data you have, the more likely you can end up with weird and even persistent things you need to clean (stream braiding, unconnected segments, etc). Be aware this cleaning of topology is a stepwise process and it can take some time, but once you have everything cleaned up, you can save the data and re-use network in the future.
I’ll try to screen shot this process below. First let’s run the cleanup()
function.
# do a clean up:
rivs_fixed <- cleanup(rivers = rivs)
Initially the cleanup will identify all the segments and attempt to remove duplicate lines and dissolve segments that are split (uncessarily). When it asks whether to Dissolve
or not, select yes.
Dissolving...
Simplified from 615 to 243 segments.
Checking for segments with length less than the connectivity tolerance...
0 microsegments identified.
Checking if splitting segments is needed...
Checking between-vertex lengths...
Maximum distance between vertices is 772
Insert vertices to reduce distances between vertices and increase point snapping precision? (y/n) ?
The Insert vertices
question makes a more evenly distributed network so when we try snap points, there will be more vertices available in the network. The histogram provides a sense of the distribution of the segement sizes. I try to select something near the peak so most segments are a consistent size.
step1
Select “y
”, and pick a number…I picked 100 meters for this example. The output will then say it’s inserting vertices, followed by:
Please identify segment number of river mouth:
step2
Here I’ve selected segment 192, and then selected the mouth as 1. Once the river mouth has been identified, the program will ask if you want to remove any additional segments. I select n here, but many options.
Cleanup: Remove Additional Segments/Braiding
The cleanup function will then continue and ask if you want to remove additional segments, and check for braiding. Just walk through and determine what you want to keep or not. If there are braided segment, you’ll need to pick out the components you want to keep and which to delete. This can be a bit tedious (especially with very fine-scale datasets), but once you walk through, you’ll be able to save your updated network. Here I remove as many of the braided segments as I can (207, 213, 205, 204, 214, 138, 10, 11, 232, 230
).
Once done with removing braided segments, riverdist
will ask if you want to build segment routes. I select y here.
Save the Cleaned Data
According to riverdist
, we should save our cleaned topology file as a Rdata
file. Let’s save our file.
save(rivs_fixed, file = "data/nfa_rivs2_fixed.rda")
Plot Cleaned Topology
Let’s take a look at what we did, and why it matters, by plotting the original river network topology vs. the cleaned version.
load("data/nfa_rivs2_fixed.rda")
# now plot the old vs. the new
par(mfrow=c(1,2))
topologydots(rivers=rivs)
graphics::title("Raw River Topology", family="Roboto Condensed")
topologydots(rivs_fixed)
graphics::title("Clean River Topology", family="Roboto Condensed")
We can see that the Clean river topology has fewer nodes and is a cleaner network, which means it will be more accurate and faster for calculating network distances.
## null device
## 1
Snapping Points to Lines
Now that we have a river network, we can snap points to the nearest line, and use that to calculate distances between sites along the river network. First let’s grab some points! For this example let’s go back and grab all CDEC stations from the NF American watershed. To do that I searched for any/all stations in the American River basin using this interface. The resulting page I ended up with was this. Then I used the web-scraping technique I showed in the first post in this series) to pull that data in.
Get Point Data (for Snapping)
library(rvest)
url <- "https://cdec.water.ca.gov/cgi-progs/staSearch?sta=&sensor=211&dur=&active=&lon1=&lon2=&lat1=&lat2=&elev1=-5&elev2=99000&nearby=&basin_chk=on&basin=AMERICAN+R&hydro=CENTRAL+COAST&county=ALAMEDA&operator=%2B&display=sta"
df <- url %>%
read_html() %>%
html_nodes(xpath='//*[@id="main_content"]/div/div[1]/table') %>% # inspect and xpath with chrome
html_table()
df_locs <- df[[1]] # reads in as a list, so we are just eliminating the list and making this a dataframe
head(df_locs)
## ID Station Name River Basin County Longitude
## 1 ABN LAKE AUDRAIN AMERICAN R EL DORADO -120.0370
## 2 ACK ARCADE CK NR DEL PASO HTS AMERICAN R SACRAMENTO -121.3820
## 3 ADR AUBURN DAM RIDGE AMERICAN R PLACER -121.0450
## 4 AFD AMERICAN R BELOW FOLSOM DAM AMERICAN R SACRAMENTO -121.1667
## 5 AFO AMERICAN RIVER AT FAIR OAKS AMERICAN R SACRAMENTO -121.2277
## 6 AHZ AMERICAN R AT HAZEL AVE BRIDGE AMERICAN R SACRAMENTO -121.2240
## Latitude Elevation Feet Operator Map
## 1 38.82000 7300   Placerville Ranger District
## 2 38.63400 50   US Geological Survey
## 3 38.88200 1200   US Bureau of Reclamation
## 4 38.68830 220   US Geological Survey
## 5 38.63546 72   US Geological Survey
## 6 38.63600 100   US Geological Survey
dim(df_locs)
## [1] 139 9
# cleanup data
df_locs <- df_locs %>% rename(station="Station Name", river_basin="River Basin",
lon=Longitude, lat=Latitude, elev_ft=`Elevation Feet`) %>%
select(-Map)
# remove and filter out the weird UNIX code " "
df_locs$elev_ft <- as.numeric(gsub(pattern = " ", replacement = "", df_locs$elev_ft))
# make CDEC data sf object (spatial):
df_locs <- st_as_sf(df_locs,
coords = c("lon", "lat"), # for point data
remove = F, # don't remove these lat/lon cols from df
crs = 4326) # add projection (this is WGS84)