What good fortune that I looked up advances in Bayesian methods the day a new task view was published! If you’re just starting with Bayes, they even have packages listed here designed to help you learn. I’ve been using R-INLA and for a few reasons I became curious what else was out there. I’m still assuming Laplace estimations are faster than Markov chain Monte Carlo, so there are more on the list that might be good to check out if I abandon that method altogether.
LaPlace’s Demon, iter-lap: This one appealed to me as an alternative to R-INLA
arm, RSGHB: might be worth checking out for hierarchical models
Bayes images: image analysis
Bayes meta, bspmma: meta-analysis
Bayes tree,tgp: Bayesian additive regression trees (BART)
We’ve been playing with ways to construct networks according to graph theory, ultimately using the R package igraph to investigate wetland connectivity. I can’t reveal too much here because it’s my current research in progress! 🙂 The part of the process I’m currently trying to make more efficient is how to calculate distances for each wetland, to each wetland within a certain distance threshold. Ways I can go…
Make the distance matrix a smaller object. Instead of storing floating point numbers, I can store integers representing our “distance bins” of interest. This might keep it from crashing, but the downside is it will probably still take the same (actually more) time. The time could be sped up with more memory free in the workspace, but it still seems a waste to calculate & consider all these distances that we don’t necessarily need.
Create new spatial objects that represent minimum polygons around my networks, do distance analysis on these. My concern is that the “new shape” will be too imprecise, and it will skew our results (or I’ll have to use the computational power anyway to confirm a distance calculation/figure out what the nearest wetland in the shape is). I’m wondering if this will create new borders that are within 5k.
Create an adjacency matrix for the polygons, and then exclude neighbors from consideration as they’re lumped into networks
I have GDAL installed in Linux (i.e. the easiest way to install/implement it) so the following examples represent command line usage. I have used a smattering of different GDAL utilities, and the links in the descriptions go to the manual page for the utility in each example. I have incorporated these example commands into various bash scripts if I need to implement them in an iteration over multiple files.
This is an example of re-sampling a raster to larger pixel sizes (in this case to a lower resolution of 0.09 degree from the original 0.0002 degree [30m LANDSAT pixels]) by taking the mean of the pixels encompassed in the coarser output.
gdalwarp -tr 0.09 0.09 -r average -ot Float32 file.tif fileout.tif
If you’re going to, say, turn your raster into polygons, you can get rid of clumps below a certain threshold before doing so (in this case, I’m getting rid of single pixels in my unary raster when using an 8-neighbor clumping rule).
gdal_sieve.py -st 2 -8 file.tif
Then, I can make my polygon layer, simplified. In the 2nd line, I project the shapefile to Albers Equal Area.
When most ecologists conceptualize “GIS” we often think of a desktop GIS program, most likely ArcGIS. When we want to not use ArcGIS (or can’t), we most likely say “is there an R package for that?” In many cases, there is, and the repository keeps growing! The ever-growing number of R spatial packages reflects the general trend in GIS right now: it’s booming in lateral growth. Everywhere you turn, there’s something new, and actually my motivation for writing this post is just to try to keep up.
Going back “old school,” perhaps the most famous and long-running (i.e. older than I am) open source program is GRASS GIS. Here’s where the open source movement GIS movement fails, though: the point-and-click functionality and friendliness of interfacing with the program stops before you even get to the gate. If I go to the Ubuntu software center, it points me to an outdated version of the program that won’t even launch, so I had to remove it before installing the newer version (where the version number is in the command name). Luckily, though, it points me to a broader repository whereby I can access the “Ubuntu GIS” suite.
The real problem I’ve had with GRASS from the beginning, though, is the beginning: you can’t just open the program and try to open some spatial data like you can in ArcGIS. You have to start by selecting a home folder, defining a “location” which must have uniform spatial reference, and then making a map set with spatial reference info. It’s where I admittedly gave up trying with GRASS long ago, in favor of the ease of just launching ArcGIS and adding a shape file.
QGIS solves more of this problem, but there are again some hangups on the install: it’s not in the Ubuntu software center, and installing it “the old fashioned way” seems to get an older version of the program (and the terminal errors will tell you so if you launch it from the terminal).
I’m thinking about finally trying to learn PostgreSQL in conjunction with PostGIS. I’m trying to retrace my thoughts on that to last year, and I think it was because I found out that SQL was a solid way to build my own GIS tools. I remember settling on that PostGIS was probably my best bet to learn open source GIS, and ironically I just saw a post-doc advertisement that specifically requested PostGIS knowledge.
Here’s a script that loops over climate NetCDF bricks in a folder and extracts the values for each layer in the brick of each file, in this case averaged over polygons in a shape file.
rm(list = ls())
setwd("where your climate files are")
your_shapefile <- readOGR("path to your shapefile","the layer name for your shapefile")
climate <- rbindlist(lapply(list.files(pattern="nc"), function(climate_file)
#in my case, the weather variable is in the filename
climate_var = ifelse(grepl("pr",climate_file),"pr",ifelse(grepl("tmax",climate_file),"tmax","tmin"))
climate_variable <- brick(climate_file,varname=climate_var,stopIfNotEqualSpaced=FALSE)
#I needed to shift the x-axis to be on a -180 to 180 scale
extent(climate_variable)=c(xmin(climate_variable)-360, xmax(climate_variable)-360, ymin(climate_variable), ymax(climate_variable))
blocks <- spTransform(blocks,projection(climate_variable))
r.vals <- extract(climate_variable, blocks, fun=mean,na.rm=TRUE,df=TRUE,layer=1,nl=1680)
r.vals <- melt(r.vals, id.vars = c("ID"),variable.name = "date")
r.vals$climate <- climate_var
#depending on the filename convention of your files, get a variable for your climate model and name it "modelname"
r.vals$model <- modelname
climate_variables[[length(climate_variables)+1]] <- r.vals
climate <- rbindlist(climate_variables)
At this point, you have a data frame called “climate” that has all your data long-form. You can cast this as you see appropriate depending on your needs!
I can’t be thankful enough for my godsend Twitter friend/NetCDF guru Michael Sumner for coming in clutch to rectify this problematic file! As I take my baby steps in learning how to deal with NetCDF files (~3 weeks after being thrown into the “deep end” and trying to tread water), I realized that one of our files had some sort of problem. Relevant to this issue, NetCDF files are often (not always) multidimensional arrays, where 2 of the dimensions are coordinates. The variables can be defined in relation to the dimensions of the file. So for instance, in my case, temperature is described over space and time (4-dimensional array), and each temperature value corresponds to a location and time. So, each temperature value “goes along with” a set of coordinates and a time. When I imported the NetCDF file as a raster brick (which means I put the array data into a stack of grids), the metadata seemed to show a larger longitudinal “step” than latitudinal, the latter of which was at the correct resolution. Note that this means, as is common, I took what is really a point (a temperature value corresponding to a latitude, longitude and time) and turned it into a grid, where the original point is the center of the resulting cell that gets the value. I tried plotting it to see what it looked like.
I tried to crop the image to only the extent I needed but it still was messed up.
I took this at face value, that perhaps the grid was somehow “wrong,” but my new friend showed me how to look more deeply into the problem! He opened the file in R and extracted the coordinate values, plotting for a visual diagnosis of the problem.
library(ncdf4) con <- ncdf4::nc_open("file.nc")
lon <- ncdf4::ncvar_get(con, "lon")
lat <- ncdf4::ncvar_get(con, "lat")
So, my friend suggested plotting the longitude values stored in this dimension of the array.
Herein we start to see a problem! Something is off about the longitudes stored in the array.
The first longitude values are the correct lowest values for the raster. For whatever reason, the last 2 longitude values got reassigned to the (very wrong) smallest values in the grid, though they should be the largest values. The extent describes the grid that I assigned the points to when I imported the data into R. For comparison and clarification, I also posted the results of looking at the extent of the raster that I created , to show that they’re “2 different things.” Assuming that the NetCDF stores the center point of each cell, the latitude extents (which are correct) provide a half-cell-width buffer to the actual data point, correctly placing them in the center of the cell.
So to correct, he loaded the raster with the following note…
## let's treat it as if the start is correct,
## the final two columns are incorrect
## we hard-code the half-cell left and right cell edge
## (but note the "centre" might not
## be the right interpretation from this model, if we're being exacting)
valid_lon2 <- c(lon - 0.125/2) + c(0, length(lon)) * 0.125
valid_lat <- range(lat) + c(-1, 1) * 0.125/2
r2 <- setExtent(r, extent(valid_lon2, valid_lat))
r_final <- raster::shift(r2, x = -360)
Now that the extent is rectified he shifted it to our desired reference, namely out of 0-360 longitude to -180/180.
Another problem I ran into: apparently Nov-Dec. 1999 monthly averages for HADCM3 B1 precipitation, where I flipped the axes in NCO, are blank.
problem <- brick("hadcm3.b1.pr.NAm.grid_monthly.nc_out.nc")
The above returns the correct axes, but blank plots. The problem is present before I flipped the axes, so I have to go back to the original files before I did the averaging to find out what’s wrong. I wonder if there were blank layers anywhere in the Nov-Dec daily data?
From a THREDDS server, I’m using the NetcdfSubset portal to obtain a spatial subset of my climate data set of interest. Since the files I need are too big to download from their HTTP server, they instead give me an OPeNDAP URI with the spatial subset info. I then pass that to nccopy from the NetCDF library.
Then, I installed CDO and wrote a script to do the monthly means for all the files: it averages a daily time step file into a monthly averaged (or whatever metric you choose) file.