Bayesian Hierarchical Model

These days, the Bayesian hierarchical modeling framework is more-or-less standard for analyzing BBS data. I wrote a collection of posts about gathering BBS and remote sensing data, summarizing and importing into R, and thereby getting the data frame in shape for an analysis. From there, I would clean up the work space such that only the data frame for the analysis is left.

rm(list= ls()[!(ls() %in% c('vars'))])

Here’s an  example of a formula (“test”), and how to fit it using the package R-INLA. In this model, in keeping with the example that I linked to that got us here, “MN” (mean % tree cover) is our variable of interest, and the rest of the variables are nuisance effects. In this model, year and rteobs are random effects (hyper-parameters), while yearfix, firstyear and MN are fixed effects (parameters).

test = abundance ~ f(Year,model="iid") + yearfix + f(rteobs, model="iid") + firstyear + MN
library(INLA)
model = inla(formula=test, data=vars, family="gpoisson", control.compute = list(waic=TRUE), verbose=TRUE)

The generalized Poisson distribution is used to incorporate over-dispersion, and I chose to calculate Watanabe-Akaike Information Criterion (WAIC) in case I was to compare this to another model. To examine the output once the model has stopped running…

summary(model)
names(model)

The latter shows all that was generated in this model run. One I find useful is summary.fitted.values, which are the modeled values of the response variable. If you want to predict values, you can add or replace response variable values in your data frame with NA.

Master Post: BBS 10-Stop Data

This one may not be as step-by-step, because the GIS layer of BBS routes I have may not have been published yet. Nonetheless, here’s what I did.

  1. Cleaning up the spatial data and splitting routes
  2. Assigning segment order once the routes are split
  3. Summarizing spatial covariates by segment
  4. Determining which segments can be considered the same
  5. Importing BBS and spatial data into R at the segment level

The final step looks similar to the route-level tutorial, but with pieceID being the variable to match on. Set “bird” to the AOU code of your species of interest.

dataperspecies <- subset(BBS[which(BBS$Aou == bird),]) 
vars$abundance <- dataperspecies$count[match(vars$pieceID, dataperspecies$pieceID)]
vars[is.na(vars)] <- 0

assess <- data.table(vars)
whattodrop <- assess[,list(abundance=sum(abundance)),by='rteno']
droprtes <- whattodrop$rteno[whattodrop$abundance==0]
perspp <- vars[!(vars$rteno %in% droprtes),]
perspp$Year <- as.integer(perspp$Year)
perspp$yearfix <- perspp$Year
perspp <- perspp[perspp$RunType > 0,]
#vars <- vars[vars$RPID < 102]
perspp <- perspp[c("Year","MN","firstyear","rteno","ObsN","abundance")]

Here is the full script: bbs_10stop_species

A Thought on Downscaled BBS

The BBS was designed to capture avian populations in natural areas, so if the land surrounding a portion of the route was developed, historically the route path was altered and given a new number. The problem is, that new numerical designation was not standardized (though often, it’s 100 + the original route number).

In wanting to maintain spatio-temporal continuity, I thought that the portions of the routes that were still spatially congruent should be considered the same segment. This serves my random effect of location, such that an observation isn’t considered to come from a different location just because the route number is different (if the “old” and “new” segments spatially coincide). I use “coincide” loosely, in that I considered original and +100 route segments to be the same if they intersected. You can measure the degree of overlap and incorporate that if you choose. I made a shape file of intersecting duplicated segments, and imported it into R to adjust the location designations.

repeats <- read.dbf("intersecting_duplicated_segment.dbf")
repeats$new_rteno <- repeats$rteno - 100
repeats$new_route <- paste(repeats$new_rteno,repeats$segment,sep="_")

Now, the newer segments have the same numbering as the original routes. I match the new segments with the original route numbers to the BBS data, and then replace the new route segment numbers with the original numbering.

BBS$new_route<-repeats[match(BBS$route_piece, repeats$route),"new_route"]
BBS$new_rteno<-repeats[match(BBS$route_piece, repeats$route),"new_rteno"]
BBS[!is.na(new_route)]$route_piece <- BBS[!is.na(new_route)]$new_route
BBS[!is.na(new_route)]$rteno <- as.integer(BBS[!is.na(new_route)]$new_rteno)

The new route segments that intersect the original routes now have the old route numbering, and thus counts from these segments appear in the code to to have come from the same routes as the original.

Here’s the full script: bbs_10stop

Down-scaling BBS

As my several recent posts have mentioned, I’m working with BBS spatial data (i.e. route paths and start points). I wanted to “downscale” by splitting routes into smaller segments and looking at data sub-route level. Specifically, I needed to split routes into fifths. I needed to know what order the segments were in from the start of the route to match the finer-scale data.

Here’s how to do it…

  1. Break up the accounted for routes (or maybe even raw data set?) into “good” and “bad” routes depending on if the start is nearest a segment for a matching route
  2. Work with “good starts” and “good routes” (the easiest!)
    1. identify the segments closest to the start point as the first segment of the route
      1. run near on the start points to find out which segment in the master file is closest
      2. use the near_FID to join with the FID of the route segment layer
      3. export the resultant start segments into their own layer (number them 1)
      4. remove join
    2. delete the start segments from the master file of all the broken up segments of the route (select by location: identical to)
    3. use select by location (touches the boundary of) to find the next closest line segment to the start in the master file, and number them 2
    4. repeat 2-4 for the remaining segments, and number them accordingly
  3. Work with bad starts: once the good routes are out of the data set, is the bad point now closest to a remaining matching segment?
    1. break up those newly matched lines
    2. put leftover points into yet another file
  4. Look at remaining points to see if any more lines can be put to use

Given the numbering scheme recommended in the steps above, your segments should be numbered 1-5 on each route. I buffered those segments and averaged tree cover within  each segment buffer. In order to match my segment numbering with the rest of my R code, I renamed the segments.

vcf_subroute <- read.dbf("landcover/bbs_segment_buffers.dbf")
vcf_subroute$interval <- paste("count",vcf_subroute$segment,"0",sep="")

The remainder of the code is similar to that described in the route-level MODIS post, just with the added level of the interval.

vcf_subroute <- vcf_subroute[,c("rteno","interval",grep("[[:digit:]]", names(vcf_subroute), perl=TRUE, value=TRUE))]
vcf_subroute <- melt(vcf_subroute,id=c("rteno","interval"))
vcf_subroute$variable <- as.character(vcf_subroute$variable)
vcf_subroute$Year <- str_extract(vcf_subroute$variable,"[[:digit:]]+")
vcf_subroute$var <- sapply(strsplit(vcf_subroute$variable, "[[:digit:]]+"),"[[",2)
vcf_subroute$variable <- NULL
vcf_subroute <- dcast(vcf_subroute,rteno + interval ~ var,fun.aggregate=mean)
vcf_subroute$rteno <- as.character(vcf_subroute$rteno)

The full script, to this point: bbs10vcf

Cleaning Up BBS Routes

When I downloaded the BBS route shape file (apologies for not linking, I’m not sure if it’s publicly available) I had some work to do. I wrote down the steps I used in cleaning up the layer, and then splitting into fifths.

  1. One line per route. Great! set aside. (Still a note of caution, some BBS routes are too long. When I asked about this, I was told they were digitization errors but weren’t immediately fixable.)
  2. More than one line per route: I needed to only be sure to dissolve segments with matching route identifiers that touched. This is where iterative spatial joins and dissolves came in…
    1. spatial join: what is touching each segment?
    2. do the route identifiers match? if so, keep those…get rid of the ones that don’t
    3. join field to get the target ID’s, which end up making “connectivity groups,” to dissolve by
  3. repeat steps in #2 until there’s only one poly-line per connectable (zero distance) route parts
  4. this leaves disconnected fragments of things that don’t touch (decide what to do with those on a case-by-case basis)

Now, you might want to look at the lengths of the resulting routes and examine ones that are suspiciously long, and decide what to do with them in your analysis.

  1. depending on the geometry of the lines, the nice “split line” tool may not work as expected. I made a topology to figure out where splits had gaps. ones that had gaps were removed from the set
  2. once split, was there mysteriously more or less than 5 segments per route? if so, toss
  3. for the good splits, I matched available good starts to determine order of the segments
  4. I salvaged the remaining “split-able” lines
Adventures in fixing the rest: I tried to be ambitious and work with the problem lines, to little avail. I used “trim lines” to get rid of the short spurs, and that seemed to have worked. Yet, it just didn’t fix all of them.

“Master Post” of BBS Tutorial

I made a series of posts of how to get BBS data and import it into R, as well as an example of predictor data for a model, and how to get that into R. Here I’ll collect it all into a step-by-step by linking to those posts.

  1. How to get BBS data
  2. Importing BBS data into R
  3. Getting predictor data (in my case, remote sensing imagery)
  4. Summarizing images for BBS routes
  5. Making a data frame for a single species analysis (zero-filling)

Here’s the total script at the end of all the posts: bbs_species

To get started, pick a Bird Conservation Region of interest, and a species. Take the BCR specification line out of comment (remove the #) and specify the number(s) of the BCR(s) within the empty vector. Set the “bird” variable to be the AOU code of your chosen species from the species list (if you choose from  that link, remember to remove the leading zero). For instance, to look at eastern wood-pewees in the Blue Ridge…

BBS <- BBS[BBS$BCR %in% c(28), ]
bird <- 4610

Another thought is how to handle nuisance effects. I consider route and observer to be linked, so that I concatenate them for a random effect.

vars$rteobs <- paste(vars$rteno,vars$ObsN,sep="_")

You now have a data frame of BBS data for a single species matched up with remotely sensed data (i.e. ready for an analysis). Half the battle is the data acquisition, cleaning and shaping! 🙂

Now try an example analysis.

BBS Data for a Single Species

BBS data is essentially aggregated point count data, that in its raw form has a row for each species observed at each route in each year. This means that if a species is not recorded in a  given year, it just doesn’t have a record. I need to add those zeros back in, such that if a species is missing from a route, it needs a row with a zero. For my purposes, it makes the most sense to consider routes only where a given species has been recorded at some point in the history of the route (because I’m supposing its presence/absence depends on some annual variation in a predictor variable).

Leading up to this post, we’ve been shaping and reshaping BBS data and predictor variables, and the last post left off with matching up some remote sensing data to BBS data. The variable “bird” is to be set to an AOU number corresponding to the species list we imported prior.

dataperspecies <- subset(BBS[which(BBS$Aou == bird),]) 
vars$abundance <- dataperspecies$count[match(vars$rteno, dataperspecies$rteno)]
vars[is.na(vars)] <- 0

This is to determine total counts of the species on routes, and to toss routes where the species has never occurred.

assess <- data.table(vars)
whattodrop <- assess[,list(abundance=sum(abundance)),by='rteno']
droprtes <- whattodrop$rteno[whattodrop$abundance==0]
vars <- vars[!(vars$rteno %in% droprtes),]
vars <- vars[vars$RunType > 0,]
vars <- vars[vars$RPID < 102]

Last but not least, keep only the columns that will be used in your analysis to minimize space used in the R work space. For my models, in addition to the response and predictor variables, I keep the route number, observer, and whether or not it’s the observer’s first year for consideration as nuisance effects.

Here’s the full script: bbs_species

MODIS and BBS

I have used remote sensing products in my BBS analyses in various capacities, but it all boils down essentially to the same spatial statistics. I wrote a post about how to download and process MODIS VCF images. At the end of that post, you have a folder of TIFF images of your variable of interest at your time step of interest (in that example, one VCF image for each year). From here, I’ve become accustomed to using the Geospatial Modelling Environment (GME), but it’s not ideal because of its dependence on Arc in my book. Anyway, it’s what I know and thus what I can crank out the fastest, though I’m dedicated to finding or making a way for open source solutions in the future. I wrote a post about how to use the GME to do spatial statistics for rasters with your shape file.

In my case, my rasters were named by year and I used that as the identifier for GME, so I ended up with an attribute table with column names with the year and GME variable (ex. “2000MN”). The attribute table is in the DBF associated with the shape file, and can be read into R with the “foreign” package.

vcf <- read.dbf("landcover/bbs_buffers.dbf")

I wanted to just keep the identifier and then the columns that contained my variables of interest (i.e. those with years in the column names).

vcf <- vcf[,c("rteno",grep("[[:digit:]]", names(vcf), perl=TRUE, value=TRUE))]

Then, I wanted to split the year and variable so I could reshape and aggregate the data as needed. In my case, I need to average the values of the variables over years.

vcf <- melt(vcf,id=c("rteno","interval"))
vcf$variable <- as.character(vcf$variable)
vcf$Year <- str_extract(vcf$variable,"[[:digit:]]+")
vcf$var <- sapply(strsplit(vcf$variable, "[[:digit:]]+"),"[[",2)
vcf$variable <- NULL
vcf <- dcast(vcf,rteno ~ var,fun.aggregate=mean)

Then, to link it to the rest of the BBS info from the base script

vcf$rteno <- as.character(vcf$rteno)
vars <- merge(vcf,routeinfo,by="rteno")

Here’s the entire script for importing BBS data and MODIS data, and merging the data sets (insofar as I’ve blogged about): bbs_modis

Basic Script

Until I figure out how to pin posts in a tag(?) here’s where my posts have gotten us so far with a starter script for BBS data. Start with the data tutorial before getting to this step to download North American BBS data. This script imports your data into R…

rm(list = ls()) 
setwd("")
library(foreign) 
library(data.table)
library(reshape2)
library(stringr)
library(ggplot2)

BBS <- rbindlist(lapply(list.files("",full.names=TRUE),read.csv)) 
BBS$rteno <- paste(BBS$statenum,formatC(BBS$Route, width=3, flag="0"),sep="") 
speciesnames <- read.csv("SpeciesList.csv") 

routeinfo <- read.csv("weather.csv") 
routeinfo$rteno <- paste(routeinfo$statenum,formatC(routeinfo$Route, width=3, flag="0"),sep="") 
routeinfo <- routeinfo[c("RouteDataId","Year","ObsN","rteno","EndWind","StartWind","RunType","RPID")]
routeinfo$firstyear <- c(1L,routeinfo$ObsN[-1]!=routeinfo$ObsN[-nrow(routeinfo)])

BBS$countrynum <- NULL
BBS$rteno <- paste(BBS$statenum,formatC(BBS$Route, width=3, flag="0"),sep="") 
BBS$StopTotal <- NULL
BBS <- merge(BBS,routeinfo,by=c("rteno","Year"))

route_data <- read.csv("routes.csv")
route_data$rteno <- paste(route_data$statenum,formatC(route_data$Route, width=3, flag="0"),sep="") 
route_data <- route_data[c("BCR","rteno")]

BBS <- merge(BBS,route_data,by="rteno")
BBS <- BBS[BBS$BCR %in% c(), ]

Next, how about some co-variates? I have spatially related remote sensing data to BBS data.

Accounting for Spatial Autocorrelation

I’m finally biting the bullet and trying to do some basic spatial modeling with my analyses. My 1st test case is a single species (bobolink) as it occurs in my Ch 2 study region (Badlands and Prairies, Potholes and Hardwood Transition). I have created points in the middle of each of my BBS route segments and first off, I need to read them into R.

library(rgdal)
library(spdep)
library(maptools)
library(INLA)

ch2points <- readOGR("CH2POINTS.shp",layer="CH2POINTS")

Earlier in the code, I made a subset of the data frame for my species of interest, and I only want to keep points where they occur.

species_subset <- ch2points[ch2points@data$route %in% vars$route_piece,]

This code will use an adjacency matrix which I’ll generate by tessellating polygons around each point. First, I need to make an identifier to match up the points with the polygons.

species_subset@data$ID <- 1:dim(species_subset@data)[1]

A nice function to create Voronoi polygons is already out there! So, I copied and pasted into my code and it works great. Once the polygons were created (v), I created the adjacency matrix, and generated the identifier to match up the point and polygon layers (but since the order was the same, I ultimately just copied the identifier column I wanted to the polygon layer). Then I translated the adjacency file to an INLA format.

ch2_adj <- poly2nb(v)
v$ID <- 1:length(ch2_adj)
v$route_piece <- species_subset$route
nb2INLA("ch2.adj", ch2_adj)

In order to link the polygons to the original data frame, I matched them by identifiers.

vars$ID <- species_subset$ID[match(vars$route_piece,species_subset$route)]

Now to the real heart of the matter…

model = abundance ~ f(ID,model="besag",graph="ch2.adj") + f(Year,model="rw2") + yearfix + f(rteobs, model="iid") + firstyear

To make the work space as clean as possible, I removed everything but what the model needs…

rm(list= ls()[!(ls() %in% c('vars','model'))])

…before running it! These models are substantially more intense than my previous models, which is to be expected. Still, this model took over 2 hours to run on my laptop!