GDAL Warp Adventure

I’m working with a colleague on processing a raster, in order to define patches by connectivity rules. The input raster was a habitat suitability model, where…

  • 1 = suitable habitat
  • 0 = unsuitable habitat
  • 255 = background “no data” value

So, we needed to ultimately make the “1s” into patches. To do this, we needed to set all of the other values to “no data.” It looked something like this…

gdalwarp -srcnodata 255 -dstnodata 0 input.tif output.tif
gdal_translate -a_nodata 0 output.tif unary.tif

In the 1st line, we changed all the 255’s to 0’s. In the 2nd line, we changed all the 0’s to “no data.” So, the only data value left in the file was 1.

GDAL Polygonize in Action!

#!/bin/bash
for VARIABLE in ./test_export/*.tif; do
   gdal_translate -of GTiff -a_nodata 0 "$VARIABLE" "PPR_no/$(basename "$VARIABLE" .tif).tif"
   gdal_sieve.py -st 2 -8 "PPR_no/$(basename "$VARIABLE" .tif).tif"
   gdal_polygonize.py -8 "PPR_no/$(basename "$VARIABLE" .tif).tif" -f "ESRI Shapefile" "shapefile/$(basename "$VARIABLE" .tif).shp"
   rm "PPR_no/$(basename "$VARIABLE" .tif).tif"
   ogr2ogr -f "ESRI Shapefile" -progress "shapefiles/$(basename "$VARIABLE" .tif).shp" "shapefile/$(basename "$VARIABLE" .tif).shp" -t_srs "EPSG:5070"
done

listofmodisdays=`ls shapefiles/*.shp | cut -d/ -f2 | cut -d- -f1 | uniq`

for i in $listofmodisdays; do
   file="./final/$i.shp"
   for y in $(ls shapefiles/$i*.shp); do
      if [ -f "$file" ]; then
         ogr2ogr -f "ESRI Shapefile" -update -append $file $y -nln $i
      else
         ogr2ogr -f "ESRI Shapefile" $file $y
      fi
   done
done

Parallel Raster Extraction

This is my job submission script for the raster extraction.

#!/bin/bash -l
#PBS -l walltime=48:00:00,nodes=1:ppn=24,mem=61gb
#PBS -m abe
#PBS -M myaddress@email.com
module load gdal/2.0.1
module load proj
module load R
module load ompi
module load udunits
module load geos

export OMP_NUM_THREADS=1
mpirun -v -hostfile $PBS_NODEFILE -np 1 R --slave CMD BATCH ./extract.R /dev/stdout

This is the R script.

library(raster)
library(rgdal)
library(Rmpi)
library(sf)
all <- stack("all.tif")
forest <- st_read(dsn="./forest",layer="forest_simple")

beginCluster(n=23,type="MPI")
raster_cells <- extract(all,forest)
endCluster()

save(raster_cells,file="raster_vals.RData")

 

Super-computing?

Here’s an interesting note: just to try a parallel run with multiple nodes, I requested 9 nodes x 24 cores for 48 hours and got it (maybe because everyone is clearing out their jobs before the shutdown). However, I am just running with a small subset; it scheduled last night and is still running! It finished faster single-core. So, this might be a window into whatever is happening with the parallel runs.

Then, even running parallel w/in a single node, I’m getting that MPI job killed error again, but interestingly only for 2 of the 3 jobs…once again, points to finnicky hardware or connections. My colleague was able to do the same task and not get any errors.

Super-computing

So apparently…

“…Bootstrapping and Markov Chain Monte Carlo, can be [sped] up significantly by using several connected computers in parallel.” – snow simplified

Looks like I missed the boat on model fitting. Currently, I’m trying to use a MPI cluster implemented via snow (technically, clusterR). Here’s my latest error:

Child job 2 terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
mpirun noticed that process rank 21 with PID 0 on node cn0195 exited on signal 9 (Killed).

IMO this points back my personal suspect: the memory error.

Predicting Big Raster Stacks w/HPC

Here’s the latest of what I’m working on…

  1. Fitting models to predict FIA metrics from summarized LiDAR variables
    1. separate models for each NPC system
    2. boral
  2. Validating those models
  3. Using those models to predict the values of FIA metrics across the state

The last objective, given the fine scale of summarized LiDAR data and the broad scale of analysis, is going to be computationally intensive. So, I’m trying to make that work as best I can by using the HPC system.