Using NCO utilities for statistical analysis of netCDF files

Using NCO utilities for statistical analysis of netCDF files

Postby yiboj » Mon Apr 04, 2016 10:02 am

The NCO (netCDF Operators) utilities are a set of powerful command line tools for manipulating and analyzing netCDF (including HDF5) files. They can be used to subset, aggregate and modify data and metadata stored in netCDF. If the NCO utilities are not installed on your computer they can be found here: http://nco.sourceforge.net/


Two useful tools are the concatenation and averaging commands; ncrcat/ncecat and ncra/nces respectively. The difference between the ncrXXX and nceXXX commands are that ncr command require that the netcdf files have a set record dimension while the nce commands do not requite a set record dimension. Here we will use the nce commands because the netCDF files we have downloaded from PODAAC do not come with a preset record dimension. The downside of using the nce commands is that, according to the NCO website, (http://nco.sourceforge.net/) it takes more memory.


To set record dimensions for your files, use this command:

ncks --mk_rec_dmn <dimension> <filename>
Example: ncks --mk_rec_dmn time GHRSST_DATA_SET.nc

To identify and download the files that you will want to use the nco utilities on, you can check the other two forum posts: 1) [url]How to batch download data files in a given time range[/url] and 2) Using Python to subset large gridded datasets.

Once you have all of your files make sure they are all in the same directory.
Using the naming convention of files one can easily average many files together.
We used GHRSST (Group for High Resolution Sea Surface Temperature) data; here is an example of one of the granule’s names:

19981231120000-CMC-L4_GHRSST-SSTfnd-CMC0.2deg-GLOB-v02.0-fv02.0_subset.nc

The first set of numbers are the date/time of observation and the rest is meta data, data that describes the data set, eg.,(L4), platform (GHRSST), measurement parameter (sea surface temperature or SST), ect…

For this example all of the files are form the years 1997 and 1998. The goal was to average all the 1998 files together and all the 1997 files together. The command is

nces 1998*.nc ncesAvg1998.nc

nces is the command from the NCO utilities.
1998*.nc is the regular expression that means all files with a string starting with 1998 and ending with .nc, the star means that any alpha numeric characters can be in between the strings 1998 and .nc.

ncesAvg1998.nc is what we want my output file to be called.

After this command there will be a file called ncesAvg1998.nc in the directory.
To use ncecat one will need to do the same thing except use the command ncecat.
The way that these ncecat and nces commands work is:

<command> <inputfile> <inpufile> <inputfile> ... <outputfile>

So for example if we want to average ncesAvg1998.nc and ncesAvg1997.nc (our two average files of the years 1997 and 1998) We enter the following commands

nces ncesAvg1998 ncesAvg1997.nc output.nc
nces ncesAvg*.nc output.nc


Either of the above two commands will work and produce the same result.

To concatenate these the two average files one would enter the following command

ncecat ncesAvg1998 ncesAvg1997.nc output.nc

This will produce a single file that contains all of the data for both of these files. This may make the data easier to go through because you can just open one file instead of 2 or 365. This may also be useful for creating graphs of specified time series.

The image below was created from the averaged data for the year 1998 from the data files.

sst_average.jpg
Averaged SST of Year 1998
sst_average.jpg (36.5 KiB) Viewed 4128 times
yiboj
 
Posts: 130
Joined: Mon Mar 30, 2015 11:22 am

Return to Numerical Analysis

cron