## Part II: Download NHD Flowline Data and Use `sf` Functions

So last post (Part I), I showed how to calculate distance matrices and find out the nearest points between two sets of point data. Now that we know how to identify the nearest neighbor for one set of points to another, let’s look at how we can grab some line data (in this blogpost, I’m using river networks) and crop it based on our point data. I’ll show how to go download a few different sets of river line data, how to buffer, and how to crop/intersect data based on other polygons or point data. The next post (Part III) will show how you can snap the spatial point data to the river network (line) data using the `riverdist` package, and calculate the river distance between sites (so along the river network).

The main packages I’m going to use in this post:

``````# load libraries
suppressMessages({
library(dplyr); # data munging and piping
library(purrr); # for list functions
library(ggplot2); # plotting
library(ggrepel) # for labeling
library(sf); # spatial simple features
library(USAboundaries); # state/county data
library(Imap); # nice mapping/color functions
#library(geoknife); # USGS tool set (Next post)
#library(dataRetrieval); # USGS tool set (next post)
})``````

Note, to plot `sf` objects with `ggplot`, you need to have the most recent version (>2.2.1). A quick and easy check you can add to your code is an `if` statement:

`if (utils::packageVersion("ggplot2") > "2.2.1"))`

I’m going to demo two different sets of watershed/river data which are openly accessible, and I think represent very good data for these types of analyses.

• US: If you work in the US, the go-to data is typically the National Hydrography Dataset. This dataset has myriad of different components, and is linked to the USGS NWIS. These are really amazing data that are publically available, so I’ll walk through just a few of the things you can do.
• Global: There’s a cool dataset with global coverage which includes both stream and lakes called HydroSHEDS. It’s based on high-resolution elevation data obtained during Space shuttle flight as part of NASA’s Shuttle Radar Topography Mission (SRTM). To access the data, you’ll need to create a free account (just an email req’d), but once that’s done you can download any piece of these data. I’ll show the same example with these data as well, and provide a snippet folks can download if need be.

## Using NHD Data

There are many components of the NHD dataset, but thankfully folks over at Office of Water Information at USGS have built some great tools and code to work with NHD data. I’ll show how to download the NHD River data at different scales, and how to crop it to an area of interest using another polygon, or a set of points. If you want to see a few cool posts that utilize the USGS tools and data, I definitely recommend checking out some of these:

First we are going to load a few libraries and functions which are required to download the NHD data. This function is from the blogs linked above, and requires specification of `streamorder` and `mapRange`. Note the default downloads stream layers with a `stream order` = 1, so the finest scale possible (headwater streams are 1, and larger rivers are usually > 4–5).

I’m not going to go into much detail on this function, but the original function comes from a gist by Laura DeCicco (see here), and I modified it slightly to work with the `sf` package.

``````library(httr)

get_flowlines <- function(streamorder, mapRange){
postURL <- "https://cida.usgs.gov/nwc/geoserver/nhdplus/ows"

filterXML <- paste0('<?xml version="1.0"?>',
'<wfs:GetFeature xmlns:wfs="http://www.opengis.net/wfs" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:gml="http://www.opengis.net/gml" service="WFS" version="1.1.0" outputFormat="shape-zip" xsi:schemaLocation="http://www.opengis.net/wfs http://schemas.opengis.net/wfs/1.1.0/wfs.xsd">',
'<wfs:Query xmlns:feature="https://gov.usgs.cida/nhdplus" typeName="feature:nhdflowline_network" srsName="EPSG:4326">',
'<ogc:Filter xmlns:ogc="http://www.opengis.net/ogc">',
'<ogc:And>',
'<ogc:PropertyIsGreaterThan>',
'<ogc:PropertyName>streamorde</ogc:PropertyName>',
'<ogc:Literal>',streamorder-1,'</ogc:Literal>',
'</ogc:PropertyIsGreaterThan>',
'<ogc:BBOX>',
'<ogc:PropertyName>the_geom</ogc:PropertyName>',
'<gml:Envelope>',
'<gml:lowerCorner>',mapRange," ",mapRange,'</gml:lowerCorner>',
'<gml:upperCorner>',mapRange," ",mapRange,'</gml:upperCorner>',
'</gml:Envelope>',
'</ogc:BBOX>',
'</ogc:And>',
'</ogc:Filter>',
'</wfs:Query>',
'</wfs:GetFeature>')

destination = file.path(tempdir(),"nhdflowline_network.zip")
file <- POST(postURL, body = filterXML, write_disk(destination, overwrite=T))

filePath <- tempdir()
print("unzipping...")
unzip(destination, exdir = filePath)

flowLines <- st_read(filePath, layer = 'nhdflowline_network')

return(flowLines)
}``````

Now we have a function to download river network or flowline data. But it’s probably more useful to specify a boundary or location which we can use to subset our flowline data. For this example I’ll show how to get data by county/state with the `USAboundaries` package, or by using a watershed boundary.

#### State & County Data

``````library(USAboundaries) # STATES/counties data
library(ggrepel) # for labeling

# set the state and county names of interest
state_names <- c("california")

# get STATE data
CA<-us_states(resolution = "high", states = state_names) %>%
st_transform(crs = 4326)

# get COUNTY data for a given state
counties_spec <- us_counties(resolution = "low", states=state_names) %>% # use list of state(s) here
filter(name %in% co_names) %>% # filter to just the counties we want
mutate(lon=map_dbl(geometry, ~st_centroid(.x)[]), # add centroid values for labels
lat=map_dbl(geometry, ~st_centroid(.x)[])) # add centroid values for labels

# get range of lat/longs from counties for mapping and river function
mapRange1 <- c(range(st_coordinates(counties_spec)[,1]),range(st_coordinates(counties_spec)[,2]))

# Make a quick Map:

ggplot() +
geom_sf(data=CA, color = "gray30", lwd=2, fill=NA) +
geom_sf(data=counties_spec, fill = NA, show.legend = F, color="gray50", lwd=0.4) +
geom_label_repel(data=counties_spec, aes(x=lon, y=lat, label=name)) +
coord_sf(xlim = mapRange1[c(1:2)], ylim = mapRange1[c(3:4)]) +
theme_bw()`````` #### HUC Data

For watershed boundaries, I’m using hydrologic unit code (HUC) boundaries, specifically HUC8, because it’s a decent size watershed and makes a good example. If you want to download HUC data on your own (there are many other sets of data that you may download as well), you can visit the NRCS website:

• Selecting your area of interest
• Accept the area,
• Then select the data you want to download for that area (e.g., Hydrologic Units: 8 digit Watershed Boundary Dataset).
• Select your output (e.g. shapefile) and a projection (e.g., and WGS84
• Enter an email, and you are good to go.

If you want to grab the data I’m using for this example, you can download it here on github.

I showed this in Part I, but I’m just unzipping the file, reading it, and then removing the unzipped files so that I have the `sf` object in my environment. I’m also filtering to a single HUC8 to make this example a bit simpler.

``````huc8 <- read_sf(unzip("data/h8_AMR_BEA_YUB.zip"), quiet = F) %>%
st_transform(crs=4326) #%>% ``````
``````## Reading layer `h8_AMR_BEA_YUB' from data source `/Users/ryanpeek/Documents/github/ryanpeek.github.io/h8_AMR_BEA_YUB.dbf' using driver `ESRI Shapefile'
## Simple feature collection with 4 features and 5 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: -121.5976 ymin: 38.61455 xmax: -119.9829 ymax: 39.77696
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs``````
``````# then remove raw files since file is added in memory
file.remove(list.files(pattern = "h8_AMR_BEA_YUB*",recursive = F))``````
``##  TRUE TRUE TRUE TRUE``
``````h8 <- huc8 # rename for now

# get map range from this layer for flowlines call
mapRange <- c(range(st_coordinates(h8)[,1]),range(st_coordinates(h8)[,2]))``````

Great! Now we have some boundaries to work with. Let’s take a look at these pieces (the State/counties, and the HUC8) on a `ggplot2` map. Notice there is a `geom_sf` call for `sf` data. For this example, I’m showing both a map where I don’t specify an extent (so it defaults to the first layer, which is CA), and a map with `coord_sf` and the `mapRange` which is the extent (or maximum/minimum bounding box) of the counties. Notice the difference.

``````# defaults to extent of first layer
ggplot() +
geom_sf(data=CA, color = "gray30", lwd=2, fill=NA) +
geom_sf(data=counties_spec, fill = NA, show.legend = F, color="gray50", lwd=0.4) +
geom_label_repel(data=counties_spec, aes(x=lon, y=lat, label=name)) +
geom_sf(data=h8, aes(fill=HU_8_NAME), alpha=0.3, color="maroon")+
labs(title="CA with selected counties & HUC8s", y="Latitude", x="Longitude") +
theme_bw(base_family = "Roboto Condensed")``````