My Annoying Climate Download Process

Here’s a summary of my week: I needed mass amounts of data from the NetcdfSubset portal, but it was too much for the HTTP server to handle (they set a cap) with just selecting the products and spatial extent to download. So, instead they returned to me a URI that needed to be passed through an external program, nccopy, to download the data. I wrote a script that separated the URI into separate files by model and scenario, and thus automated the download to save each combination of model, scenario and variable into separate NetCDF files.

The problem became that the download was really slow, owing to traffic here on my work network. Since there was no file size estimate given to me, I assumed maybe the files were huge. So, I did some internal compression to get them to download faster, but at the expense of read access speed for the files. Once I realized it wasn’t the file size, I redid the request for the files without chunking. I then had to kill that request to tailor the data acquisition for our needs, so I finally got all the temperature files today.

Then, I installed CDO and wrote a script to do the monthly means for all the files: it averages a daily time step file into a monthly averaged (or whatever metric you choose) file. I got a list of the base file names as such:

cat climate_structure | cut -d . -f1 > climate_models

Then  I wrote this simple bash script to loop over them all and write out the monthly averages:

#!/bin/bash
while read climate_model; do
 echo "Now averaging $climate_model"
 cdo monmean ${climate_model}.nc ${climate_model}_monthly.nc
done < climate_models

Leave a Reply

Your email address will not be published. Required fields are marked *