The best solution I found was to chop up the state by county, crop the raster brick to that extent, and import it as a velox object. Then, I used velox extract to get the values.
Category: quantitative
GDAL Warp Adventure
I’m working with a colleague on processing a raster, in order to define patches by connectivity rules. The input raster was a habitat suitability model, where…
- 1 = suitable habitat
- 0 = unsuitable habitat
- 255 = background “no data” value
So, we needed to ultimately make the “1s” into patches. To do this, we needed to set all of the other values to “no data.” It looked something like this…
gdalwarp -srcnodata 255 -dstnodata 0 input.tif output.tif gdal_translate -a_nodata 0 output.tif unary.tif
In the 1st line, we changed all the 255’s to 0’s. In the 2nd line, we changed all the 0’s to “no data.” So, the only data value left in the file was 1.
GDAL Polygonize in Action!
#!/bin/bash for VARIABLE in ./test_export/*.tif; do gdal_translate -of GTiff -a_nodata 0 "$VARIABLE" "PPR_no/$(basename "$VARIABLE" .tif).tif" gdal_sieve.py -st 2 -8 "PPR_no/$(basename "$VARIABLE" .tif).tif" gdal_polygonize.py -8 "PPR_no/$(basename "$VARIABLE" .tif).tif" -f "ESRI Shapefile" "shapefile/$(basename "$VARIABLE" .tif).shp" rm "PPR_no/$(basename "$VARIABLE" .tif).tif" ogr2ogr -f "ESRI Shapefile" -progress "shapefiles/$(basename "$VARIABLE" .tif).shp" "shapefile/$(basename "$VARIABLE" .tif).shp" -t_srs "EPSG:5070" done listofmodisdays=`ls shapefiles/*.shp | cut -d/ -f2 | cut -d- -f1 | uniq` for i in $listofmodisdays; do file="./final/$i.shp" for y in $(ls shapefiles/$i*.shp); do if [ -f "$file" ]; then ogr2ogr -f "ESRI Shapefile" -update -append $file $y -nln $i else ogr2ogr -f "ESRI Shapefile" $file $y fi done done
Parallel Raster Extraction
This is my job submission script for the raster extraction.
#!/bin/bash -l #PBS -l walltime=48:00:00,nodes=1:ppn=24,mem=61gb #PBS -m abe #PBS -M myaddress@email.com module load gdal/2.0.1 module load proj module load R module load ompi module load udunits module load geos export OMP_NUM_THREADS=1 mpirun -v -hostfile $PBS_NODEFILE -np 1 R --slave CMD BATCH ./extract.R /dev/stdout
This is the R script.
library(raster) library(rgdal) library(Rmpi) library(sf) all <- stack("all.tif") forest <- st_read(dsn="./forest",layer="forest_simple") beginCluster(n=23,type="MPI") raster_cells <- extract(all,forest) endCluster() save(raster_cells,file="raster_vals.RData")
R Markdown
https://bookdown.org/yihui/rmarkdown/word-document.html
Writing a MS-Word document using R (with as little overhead as possible)
https://rdrr.io/github/crsh/papaja/man/apa_table.html
Efficient Zonal Statistics
https://dl.acm.org/citation.cfm?id=3274985
Click to access a9ba03ba65cc9e9e0a0c1b73a8359e6772ac.pdf
https://www.perrygeo.com/
Click to access postgis_zonal.pdf
Click to access zonalstat_tr.pdf
http://www.guru-gis.net/efficient-zonal-statistics-using-r-and-gdal/
https://robbymarrotte.weebly.com/blog/faster-zonal-statistics
Super-computing?
Here’s an interesting note: just to try a parallel run with multiple nodes, I requested 9 nodes x 24 cores for 48 hours and got it (maybe because everyone is clearing out their jobs before the shutdown). However, I am just running with a small subset; it scheduled last night and is still running! It finished faster single-core. So, this might be a window into whatever is happening with the parallel runs.
Then, even running parallel w/in a single node, I’m getting that MPI job killed error again, but interestingly only for 2 of the 3 jobs…once again, points to finnicky hardware or connections. My colleague was able to do the same task and not get any errors.
How do I Classify This?
I’m going to have an output raster brick, and I have to classify it into 2 classes based on thresholds for multiple criteria.
https://pro.arcgis.com/en/pro-app/help/mapping/layer-properties/data-classification-methods.htm
Super-computing
So apparently…
“…Bootstrapping and Markov Chain Monte Carlo, can be [sped] up significantly by using several connected computers in parallel.” – snow simplified
Looks like I missed the boat on model fitting. Currently, I’m trying to use a MPI cluster implemented via snow (technically, clusterR). Here’s my latest error:
Child job 2 terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- mpirun noticed that process rank 21 with PID 0 on node cn0195 exited on signal 9 (Killed).
IMO this points back my personal suspect: the memory error.
Predicting Big Raster Stacks w/HPC
Here’s the latest of what I’m working on…
- Fitting models to predict FIA metrics from summarized LiDAR variables
- separate models for each NPC system
- boral
- Validating those models
- Using those models to predict the values of FIA metrics across the state
The last objective, given the fine scale of summarized LiDAR data and the broad scale of analysis, is going to be computationally intensive. So, I’m trying to make that work as best I can by using the HPC system.