There are many different ways to save/store spatial data. Some formats changed or been introduced through time, others are simply still used because “that’s the way it’s always been”. One such example is the shapefile, which while useful in the context of ArcGIS, using these file types with anything else is a pain…(and it’s sort of a pain even with ArcGIS). The main issue is every shapefile is made up of at least 4 different files all with different file endings like
.dbf, etc.). When reading data in, these files all need to be in the same folder, with the same filepath/name. Only problem is when you have lots of these files, file & folder management becomes really exciting1. Particularly because most folks probably didn’t follow Jenny Bryan’s excellent advice on file naming conventions. Leading to things like this2.
After a recent twitter exchange with a friend and colleague of mine who expressed frustrated shapefiles for additional reasons (truncated field names), but was also was very happy with using the geopackage (
.gpkg) format, I realized it was probably worth typing this up and getting it out there.
The point of all this is to demonstrate one great way to store any/all the different spatial data types in one single file
.gpkg or “geopackage”, which can be easily shared across platforms, operating systems, programs, etc. It’s open-source, plays well with R, Arc, QGIS, and honestly I haven’t found a reason not to use it yet.
Geopackage files are SQLite databases that contain the spatial components. We can store a “shapefile” (made up of multiple files) as a single
gpkg file, or we can actually store multiple files in single
gpkg file! The wonderful thing about this is we can store both the spatial data, the projection/CRS information, along with any additional/associated tables, and all these data are contained in one single file. There’s a nice post about using the
gpkg format here in conjunction with the
sp package…but I’m going give a quick run down on using these file types with the
With the advent of the
sf package, it has become so much easier to read/write in just about any vector-based spatial data file type into R. A few of the ones listed here are just some of the more common forms I run across frequently, but this is by no means an exhaustive list. These are all vector based, and I’m not going to talk about rasters, but
gpkg works for imagery/raster type files too.
.gpx(from GPS devices)
So, these all can turn into one file type, which can be easily shared or queried using SQL based tools, or the
gpkgExample in R
So let’s walk through a quick example where we read in 3 different file types, save them all into the same
gpkg file/database, and read them back in to make sure they are the same (and nothing broke!).
The packages I’ll use for this example are as follows:
If you haven’t used the
here package, check it out…it’s very handy in conjunction with RStudio projects, and dealing with pathnames.
For this example let’s use a few different data sets:
kmlof USGS gages of California (source: USGS)
ocean <- read_csv(paste0(here(), "/data/coastal_cleanup_detailed_summary-CA-USA_2019March.csv")) %>% separate(col=GPS, into = c("lat","lon"), sep = ",", remove = T) %>% mutate(lat=as.numeric(lat), lon=as.numeric(lon)) %>% st_as_sf(coords=c("lon", "lat"), crs=4326, remove=FALSE) st_write(ocean, "data/coastal_cleanup_detailed_summ_CA_2019Mar.geojson")
All this data is available as a zipped folder here:
Quick note, if you have a
kmz file, you’ll need to unzip it first with your computers zip/unzip program (or check out something like 7Zip for Windows/PC or Keka for MacOSX). Once unzipped you should have a
kml which you can use. Ultimately it doesn’t matter much what the data type is/was once it’s been read into R using
sf, because it becomes a spatial data frame which can be exported as whatever suits the user best.
Let’s get all these layers into R using
sf so we can save them into one single
sf commands are fairly simple, and to read things in, we only need
sf_read. All my files live in a
data folder in an RStudio project, so the
here() essentially refers to
# read in shps lighthouses <- st_read(paste0(here(),"/data/CUL_CA_Lighthouses.shp")) ports <- st_read(paste0(here(),"/data/CUL_CA_Ports.shp")) piers <- st_read(paste0(here(), "/data/FishingPiers.shp")) # read in kml gages <- st_read(paste0(here(), "/data/streamgages_clean.kml")) # read in geojson oceantrash <- st_read(paste0(here(),"/data/coastal_cleanup_detailed_summ_CA_2019Mar.geojson"))
Before we do anything else, we need to make sure everything is in the same projection. Typically this is a good idea regardless of what you are doing, so calculations and analyses all use the same coordinate reference system (CRS). This isn’t a requirement for a
gpkg file, but I recommend doing it. For this example, let’s shift everything into California Albers (EPSG 3310).
# check CRS: st_crs(lighthouses) # change/set new CRS: lighthouses <- st_transform(lighthouses, crs = 3310) # check new CRS is set st_crs(lighthouses) # apply to everything else ports <- st_transform(ports, crs=3310) piers <- st_transform(piers, crs=3310) gages <- st_transform(gages, crs=3310) oceantrash <- st_transform(oceantrash, crs=3310)
Let’s quickly plot this and make sure it data makes sense…this map shows Commercial Ports, Piers, and Operational Lighthouses for California.
library(mapview) # filter to commercial ports and active lighthouses mapview(ports[ports$COMMERCIAL==1,], col.regions="darkblue", cex=5, layer.name="Commercial Ports", pch=22) + mapview(piers, col.regions="maroon", cex=4, layer.name="Piers") + mapview(lighthouses[lighthouses$OPERATIONA=="Yes",], col.regions="yellow2", layer.name="Operational Lighthouses")
Now we have all the pieces we need, let’s create a single
gpkg file to hold these. Take a quick note of the file sizes here, the
geojson is ~13 MB, the
shps aren’t big (< 1 MB), and the `kml is about 1.1 MB. So in raw form, total file size ~14.5 MB.
Just as we have been using
st_ commands, there’s nothing special about writing to a
gpkg file. The only added bit we need is to specify the same
gpkg file each time, and change the
st_write(gages, dsn="data/gpkg_in_R_example.gpkg", layer='usgs_gages_clean') st_write(lighthouses, dsn="data/gpkg_in_R_example.gpkg", layer='lighthouses') st_write(oceantrash, dsn="data/gpkg_in_R_example.gpkg", layer='oceantrash') st_write(piers, dsn="data/gpkg_in_R_example.gpkg", layer='piers') # If an existing layer already exists, to overwrite the LAYER add: ( layer_options = "OVERWRITE=YES" ) st_write(ports, dsn="data/gpkg_in_R_example.gpkg", layer='ports', layer_options = "OVERWRITE=YES" ) # Caution/Note, if you want to overwrite the entire gpkg database file, use "delete_dsn = TRUE". This replaces the file and everything that may have been in it!
That’s it! Even better, the file size for the total
gpkg which now has all these files in it, is 2.7 MB!. Overall, I find it a much cleaner and nicer option.
Ok, and finally, how do we get it back in? Well we can check what layers exist in a given
gpkg file with the
st_layers function. Then we read things in a very similar fashion to what we’ve already used with st_read().
## Driver: GPKG ## Available layers: ## layer_name geometry_type features fields ## 1 usgs_gages_clean 3D Point 2239 2 ## 2 lighthouses Point 54 2 ## 3 oceantrash Point 7063 62 ## 4 piers Point 200 4 ## 5 ports Point 97 17
## Reading layer `usgs_gages_clean' from data source `/Users/ryanpeek/Documents/github/WEBSITES/mapping-in-R-workshop/data/gpkg_in_R_example.gpkg' using driver `GPKG' ## Simple feature collection with 2239 features and 2 fields ## geometry type: POINT ## dimension: XYZ ## bbox: xmin: -364760.2 ymin: -602451.8 xmax: 538984.1 ymax: 447251.2 ## z_range: zmin: 0 zmax: 0 ## proj4string: +proj=aea +lat_1=34 +lat_2=40.5 +lat_0=0 +lon_0=-120 +x_0=0 +y_0=-4000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
lighthouses <- st_read(dsn = paste0(here::here(),"/data/gpkg_in_R_example.gpkg"), layer='lighthouses', quiet = TRUE) oceantrash <- st_read(paste0(here::here(),"/data/gpkg_in_R_example.gpkg"), layer='oceantrash', quiet=TRUE) piers <- st_read(dsn=paste0(here::here(),"/data/gpkg_in_R_example.gpkg"), layer='piers', quiet=TRUE) ports <- st_read(dsn=paste0(here::here(),"/data/gpkg_in_R_example.gpkg"), layer='ports', quiet=TRUE)