Misadventures in Open Source GIS

When most ecologists conceptualize “GIS” we often think of a desktop GIS program, most likely ArcGIS. When we want to not use ArcGIS (or can’t), we most likely say “is there an R package for that?” In many cases, there is, and the repository keeps growing! The ever-growing number of R spatial packages reflects the general trend in GIS right now: it’s booming in lateral growth. Everywhere you turn, there’s something new, and actually my motivation for writing this post is just to try to keep up.

Going back “old school,” perhaps the most famous and long-running (i.e. older than I am) open source program is GRASS GIS. Here’s where the open source movement GIS movement fails, though: the point-and-click functionality and friendliness of interfacing with the program stops before you even get to the gate. If I go to the Ubuntu software center, it points me to an outdated version of the program that won’t even launch, so I had to remove it before installing the newer version (where the version number is in the command name). Luckily, though, it points me to a broader repository whereby I can access the “Ubuntu GIS” suite.

# Add Ubuntu Unstable PPA when running LTS Ubuntu release
sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable

The real problem I’ve had with GRASS from the beginning, though, is the beginning: you can’t just open the program and try to open some spatial data like you can in ArcGIS. You have to start by selecting a home folder, defining a “location” which must have uniform spatial reference, and then making a map set with spatial reference info. It’s where I admittedly gave up trying with GRASS long ago, in favor of the ease of just launching ArcGIS and adding a shape file.

QGIS solves more of this problem, but there are again some hangups on the install: it’s not in the Ubuntu software center, and installing it “the old fashioned way” seems to get an older version of the program (and the terminal errors will tell you so if you launch it from the terminal).

I’m thinking about finally trying to learn PostgreSQL in conjunction with PostGIS. I’m trying to retrace my thoughts on that to last year, and I think it was because I found out that SQL was a solid way to build my own GIS tools. I remember settling on that PostGIS was probably my best bet to learn open source GIS, and ironically I just saw a post-doc advertisement that specifically requested PostGIS knowledge.

Want to keep up? I compiled a certainly-not-exhaustive-but-pretty-comprehensive list of people I could find tweeting about open source GIS

Let me know if you should be on that list!

Extracting NetCDF Values to a Shape File

Here’s a script that loops over climate NetCDF bricks in a folder and extracts the values for each layer in the brick of each file, in this case averaged over polygons in a shape file.

rm(list = ls())
library(raster)
library(rgdal)
library(ncdf4)
library(reshape2)
library(stringr)
library(data.table)
setwd("where your climate files are")
your_shapefile <- readOGR("path to your shapefile","the layer name for your shapefile")

climate <- rbindlist(lapply(list.files(pattern="nc"), function(climate_file)
{
#in my case, the weather variable is in the filename
climate_var = ifelse(grepl("pr",climate_file),"pr",ifelse(grepl("tmax",climate_file),"tmax","tmin"))
print(climate_var)
climate_variable <- brick(climate_file,varname=climate_var,stopIfNotEqualSpaced=FALSE)
#I needed to shift the x-axis to be on a -180 to 180 scale
extent(climate_variable)=c(xmin(climate_variable)-360, xmax(climate_variable)-360, ymin(climate_variable), ymax(climate_variable))
blocks <- spTransform(blocks,projection(climate_variable))
r.vals <- extract(climate_variable, blocks, fun=mean,na.rm=TRUE,df=TRUE,layer=1,nl=1680)
r.vals <- melt(r.vals, id.vars = c("ID"),variable.name = "date")
r.vals$climate <- climate_var
#depending on the filename convention of your files, get a variable for your climate model and name it "modelname"
r.vals$model <- modelname
climate_variables[[length(climate_variables)+1]] <- r.vals
}))

climate <- rbindlist(climate_variables)

At this point, you have a data frame called “climate” that has all your data long-form. You can cast this as you see appropriate depending on your needs!

How This Problematic NetCDF File Was Fixed

I can’t be thankful enough for my godsend Twitter friend/NetCDF guru Michael Sumner for coming in clutch to rectify this problematic file! As I take my baby steps in learning how to deal with NetCDF files (~3 weeks after being thrown into the “deep end” and trying to tread water), I realized that one of our files had some sort of problem. Relevant to this issue, NetCDF files are often (not always) multidimensional arrays, where 2 of the dimensions are coordinates. The variables can be defined in relation to the dimensions of the file. So for instance, in my case, temperature is described over space and time (4-dimensional array), and each temperature value corresponds to a location and time. So, each temperature value “goes along with” a set of coordinates and a time. When I imported the NetCDF file as a raster brick (which means I put the array data into a stack of grids), the metadata seemed to show a larger longitudinal “step” than latitudinal, the latter of which was at the correct resolution. Note that this means, as is common, I took what is really a point (a temperature value corresponding to a latitude, longitude and time) and turned it into a grid, where the original point is the center of the resulting cell that gets the value. I tried  plotting it to see what it looked like.

Rplot01
This first plot hints at the real problem: it appears the image has been grabbed at either end and stretched too wide

I tried to crop the image to only the extent I needed but it still was messed up.

I think those are the Great Lakes erroneously stretched into the study area

I took this at face value, that perhaps the grid was somehow “wrong,” but my new friend showed me how to look more deeply into the problem! He opened the file in R and extracted the coordinate values, plotting for a visual diagnosis of the problem.

library(ncdf4) con <- ncdf4::nc_open("file.nc")
lon <- ncdf4::ncvar_get(con, "lon")
lat <- ncdf4::ncvar_get(con, "lat")
ncdf4::nc_close(con)

So, my friend suggested plotting the longitude values stored in this dimension of the array.

Rplot

Herein we start to see a problem! Something is off about the longitudes stored in the array.

head(lon)
# [1] 244.0625 244.1875 244.3125 244.4375 244.5625 244.6875
tail(lon)
# [1] 281.8125 281.9375 282.0625 282.1875 3.5200 3.5100
r <- raster::brick(f, stopIfNotEqualSpaced = FALSE)
extent(r)
# class : Extent
# xmin : 3.056128
# xmax : 282.6414
# ymin : 40
# ymax : 52.875
The first longitude values are the correct lowest values for the raster. For whatever reason, the last 2 longitude values got reassigned to the (very wrong) smallest values in the grid, though they should be the largest values. The extent describes the grid that I assigned the points to when I imported the data into R. For comparison and clarification, I also posted the results of looking at the extent of the raster that I created , to show that they’re “2 different things.” Assuming that the NetCDF stores the center point of each cell, the latitude extents (which are correct) provide a half-cell-width buffer to the actual data point, correctly placing them in the center of the cell.
So to correct, he loaded the raster with the following note…
## let's treat it as if the start is correct,
## the final two columns are incorrect
## we hard-code the half-cell left and right cell edge
## (but note the "centre" might not
## be the right interpretation from this model, if we're being exacting)
valid_lon2 <- c(lon[1] - 0.125/2) + c(0, length(lon)) * 0.125
valid_lat <- range(lat) + c(-1, 1) * 0.125/2
r2 <- setExtent(r, extent(valid_lon2, valid_lat))
r_final <- raster::shift(r2, x = -360)
Now that the extent is rectified he shifted it to our desired reference, namely out of 0-360 longitude to -180/180.
Now the Great Lakes are…great!
Voila! Thank you very much, Michael!

Working with Irregularly Spaced NetCDF

I’m in the process of trying to average raster cells within a shape file of polygons in R. Today I got this error message while trying to import a NetCDF as a raster brick…

“cells are not equally spaced; you should extract values as points.” – brick() function warning message from raster package

When I suppressed this warning message with the stopIfNotEqualSpaced=FALSE option, I was able to look at the metadata and see that indeed, the grid is irregular. The lat/long resolutions differ, making perhaps rectangular cells instead of square cells. Predictably, the number of columns and thus number of cells differ from the other files. Interestingly, the extent is way different too. So I plotted it to take a look.

irregular
my NetCDF file where the longitudinal resolution is coarser than the latitude

I first follow the suggestion of my colleague Andy Allstadt to project my shape file to the raster projection, instead of projecting them both. Using the rasterToPoints() function to convert the raster to a point layer gives me a wide format attribute table, such that the first attribute columns are coordinates, and the remaining columns are the layers in the brick (in my case, each of the monthly averages for all the years in my raster stack). The problem is, the grid cells in this irregular file are much larger than they should be longitudinally, so my blocks do not necessarily overlap the points at the center of each cell, which is how I think the raster was converted to points. So, now what? It looks like maybe I’ll have to re-grid the raster layer?

How I’ve Been Processing Climate Data

From a THREDDS server, I’m using the NetcdfSubset portal to obtain a spatial subset of my climate data set of interest. Since the files I need are too big to download from their HTTP server, they instead give me an OPeNDAP URI with the spatial subset info. I then pass that to nccopy from the NetCDF library.

Then, I installed CDO and wrote a script to do the monthly means for all the files: it averages a daily time step file into a monthly averaged (or whatever metric you choose) file.

cdo monmean foo_hourly_or_daily.nc  foo_monthly_mean.nc

Then, I installed nco to flip the axes.

ncpdq -a lat,lon in.nc out.nc

Then, I imported the correctly-oriented files into R.

rm(list = ls())
library(raster)
library(rgdal)
library(ncdf4)
setwd("wherever your files are")
climate <- brick("filename.nc",varname="whatever your climate variable is")

In my case, as I think is common for climate files, longitude was on a 0-360 scale, instead of -180/180.

Projecting NetCDF Files as Raster Bricks

I need to change the projections of the NetCDF files, so I have read one into R and have been trying to manipulate it there.

f <- file.path("C:/nco/out.nc")
tmin <- brick(f,varname="tmin")
Albers <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs"
proj1 <- projectRaster(tmin,crs=Albers)

Invariably, this leads to a blank raster, where all the values have been replaced with NA. So, something is wrong: I’m trying to maybe put something into the wrong space where it’s not fitting/lining up? A few things that also make me wonder what’s wrong…

  • it can’t rotate (raster::rotate gives me an error saying it doesn’t “look like an appropriate object for this function”)
  • a function like raster(inputfile, varname="lat")doesn’t work; when I try that, it says that “lat” is not a variable (though it’s a dimension for the actual variable of interest, “tmin” in this case, which seems to work for other intents and purposes).

Ideas…

  • use gdalwarp instead to project the NetCDF?
  • after projecting, make the shape file line up to 0-360 coordinates?

Spatially Manipulating NetCDF Files

I ended up getting a bunch of climate NetCDF files from a colleague for each combination of climate model, climate change scenario and variable. So, what I have is a list of 3-D files consisting of observations/predictions of a given weather variable over latitude, longitude and time (you can picture them as cubes, if you like). I will need to spatially adjust the files, and then subset the data.

ncpdq -a lat,lon in.nc out.nc

I need to…

  • keep only the extent within a shape file we’re using
  • figure out how to summarize it by the polygons in the shape file
    • average?
    • keep only the center point?

Some of my closest colleagues, Brooke Bateman and Andy Allstadt, had some good advice for how to work with the netCDF files in R once I get them:

  • ncdf4 package
  • raster package: can open netCDF either as…
    • single layer (with the raster function)
    • the entire thing (brick function)
  • SDMTools
    • sample x,y points from raster using extract data function
    • convert the raster to ASCII

You can load those libraries, and then do a few things to take a look.

library(<span class="pl-smi">raster</span>)
print(raster(<span class="pl-s"><span class="pl-pds">"</span>afile.nc<span class="pl-pds">"</span></span>))  <span class="pl-c">## same as 'ncdump -h afile.nc</span>
b <- brick(<span class="pl-s"><span class="pl-pds">"</span>afile.nc<span class="pl-pds">"</span></span>, <span class="pl-v">varname</span> <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>pr<span class="pl-pds">"</span></span>) #this is the variable name internal to the file

Working in Linux, in a Windows World

If you’re interested in making the switch, there are several things you should know. I use the most popular flavor of Linux: Ubuntu (currently 16.04 Xenial Xerus) so my advice will be tailored to that distribution, though there will be some degree of generality among distributions. Interestingly, nowadays it seems all the popular OS have “gone back to” the Linux model of getting programs (i.e. through some kind of repository software). So, you’re probably familiar with using an “App Store” or something of the like to get your programs now, even on a computer. This actually used to be one of my criticisms of the “feel” of using an OS like Ubuntu, but now it’s pretty standard. Anyway, the easiest way to get new programs for Ubuntu is to search in “Ubuntu Software,” which is basically their “App Store” program. If what you’re looking for isn’t in that main repository, you can download *.deb files from websites, which are basically like Windows setup files.

For stuff that’s more focused on my work as an ecologist, I often use R. A curiosity of sorts is that the install messages for packages are often more verbose; I’m not sure why. In any case, be prepared to track down dependencies that are often automatically taken care of for you in Windows.