Using Python and OPeNDAP to subset large gridded datasets

Open-source Project for a Network Data Access Protocol

Using Python and OPeNDAP to subset large gridded datasets

Postby edward.m.armstrong » Mon Oct 06, 2014 4:44 pm

*** Please note: the updated version of this python script is available in the following data recipe [Using Python to subset large gridded datasets (v2)] ***

Dealing with large gridded datasets is challenging from the perspective of storage and access. Instead of downloading and storing large datasets locally it is often preferable to spatially subset them on a server side and store smaller pieces, or even use the pieces dynamically for an analysis and discard them. OPeNDAP is a popular service to extract subsets from data in netCDF and HDF.

Many GHRSST Level 4 datasets fit this characteristic of large volumes and the PO.DAAC OPeNDAP server (https://podaac-opendap.jpl.nasa.gov/opendap/allData/) can be used in a subsetting framework to extract subsets as needed from these granules. But OPeNDAP requires a granule (file) specific URL service call including variables and their proper grid index range to correctly perform the subsetting operation. This can be a tedious operation.

The python script opendayL4.py (https://podaac-tools.jpl.nasa.gov/drive ... endapL4.py) is wrapper to OPeNDAP subsetting requests for GHRSST Level 4 datasets. In its current iteration it is “tuned” to the MUR dataset (https://podaac.jpl.nasa.gov/dataset/JPL ... d-GLOB-MUR), but can be easily modified for others. For example, to extract MUR data from 2010-01-01 to 2010-01-05 for a region bounded by -140 to -110 degrees longitude and 10 to 50 degrees latitude, this request can be made on the UNIX command line:

% opendapL4.py --start 2010 01 01 --finish 2010 01 05 --region -140 -110 10 50

The program will first determine if it can find granules for the temporal period chosen and report the subsetted output grid dimensions. It will ask for download confirmation:

~~~~ example of screen output ~~~~~~~

First file: 20100101-JPL-L4UHfnd-GLOB-v01-fv04-MUR.nc
Last file: 20100105-JPL-L4UHfnd-GLOB-v01-fv04-MUR.nc
files obtained at 1-day interval

Longitude range: -140.000000 to -110.000000
Latitude range: 10.000000 to 50.000000
every 1 pixel(s) is obtained

grid dimensions will be ( 2731 x 3641 )

OK to download? [yes or no]: yes

~~~~~~ end example ~~~~~~~~

Once the request is accepted, the subsetted granules are downloaded to the local directory via a curl or wget system call. The output format is netCDF.

The program has dependencies on the python netCDF library and the aforementioned curl and wget system commands.

To modify this program for other datasets including non-GHRSST pay careful attention to parameters() subroutine and the root path defined by variable ‘L4URL’
Last edited by edward.m.armstrong on Mon Dec 15, 2014 11:26 am, edited 1 time in total.
edward.m.armstrong
 
Posts: 5
Joined: Mon Oct 06, 2014 1:23 pm

Re: Using Python and OPeNDAP to subset large gridded dataset

Postby rleal » Mon Dec 08, 2014 3:03 pm

Hi:

How could this script be tuned to accept also hdf data (and maybe output netcdf subsets)? For instance https://podaac-opendap.jpl.nasa.gov/ope ... /chlA/4km/ files are stored in hdf format and I believe the wget/cURL subsetting syntax must be quite different.

Thanks for the post.

Ricardo
rleal
 
Posts: 1
Joined: Mon Dec 08, 2014 2:54 pm

Re: Using Python and OPeNDAP to subset large gridded dataset

Postby edward.m.armstrong » Wed May 13, 2015 10:26 am

Hi Ricardo,

Its not currently usable for HDF4 data like the MODIS CHL A you would like to subset. The netCDF and HDF libraries are different.

Best to use OPeNDAP or our THREDDS servers to access and subset these.
edward.m.armstrong
 
Posts: 5
Joined: Mon Oct 06, 2014 1:23 pm

Re: Using Python and OPeNDAP to subset large gridded dataset

Postby mariakatosvich » Sun Jun 26, 2016 10:39 pm

In this example i will show how to benefit from OPeNDAP subsetting. We will request subsets from worldwide digital elevevation maps (DEMs) that are too big to request as a whole: resulting in approx. 500 MB (over 10,000 x 20,000 points) or 2 GB (over 20,000 x 40,000) for respectively a 1 minute or 30 second grids.

Code: Select all
#!/usr/bin/env python
 
#OPENDAP_SUBSETTING_WITH_PYTHON_TUTORIAL how to benefit from OPeNDAP subsetting in python
#
# This tutorial is also available for Matlab
#
#See also: OPeNDAP_access_with_Matlab_tutorial.py
 
# $Id:  $
# $Date:  $
# $Author:  $
# $Revision:  $
# $HeadURL: https://repos.deltares.nl/repos/OpenEarthTools/trunk/python/io/opendap/OPeNDAP_subsetting_with_python_tutorial.py $
# $Keywords: $
 
# This document is also posted on a wiki: http://public.deltares.nl/display/OET/OPeNDAP+access+with+python
 
# Read data from an opendap server
import urllib
import numpy as np
import netCDF4
import pydap
import matplotlib
import matplotlib.pyplot as plt
import pylab
 
# from opendap import opendap # OpenEarthTools module, see above that makes pypdap quack like netCDF4
 
# Define data on an opendap server
# for converting non-open access GECBO grids to same netCDF structure: see https://repos.deltares.nl/repos/OpenEarthRawData/trunk/gebco/
gridurls =  ['http://geoport.whoi.edu/thredds/dodsC/bathy/etopo1_bed_g2',
             'http://geoport.whoi.edu/thredds/dodsC/bathy/etopo2_v2c.nc',
             'http://geoport.whoi.edu/thredds/dodsC/bathy/srtm30plus_v1.nc',
             'http://geoport.whoi.edu/thredds/dodsC/bathy/srtm30plus_v6',
             'http://geoport.whoi.edu/thredds/dodsC/bathy/smith_sandwell_v9.1.nc',
             'http://geoport.whoi.edu/thredds/dodsC/bathy/smith_sandwell_v11',
             r'F:\checkouts\OpenEarthRawData\gebco\raw\gebco_30sec.nc',
             r'F:\checkouts\OpenEarthRawData\gebco\raw\gebco_1min.nc']
 
lineurl  = 'http://opendap.deltares.nl/thredds/dodsC/opendap/noaa/gshhs/gshhs_i.nc';
 
# Get line data: 1D vectors are small, so we can get all data
linedata = netCDF4.Dataset(lineurl) # opendap(url_line) # when netCDF4 was not compiled with OPeNDAP
 
lines = dict(
   lon=linedata.variables['lon'][:],
   lat=linedata.variables['lat'][:]
   )
linedata.close()
 
# Define bounding box
BB = dict(
   lon=[ 0, 10],
   lat=[50, 55]
   )
 
for i,ncfile in enumerate(gridurls):
 
   print ncfile
 
   # Get grid data
   grid   = netCDF4.Dataset(ncfile) # opendap(ncfile) # when netCDF4 was not compiled with OPeNDAP
 
   # Get all data from lat and lon, but just a pointer to the topography
   # Because getting it all takes a bit too long
   lat = grid.variables['lat'][:]
   lon = grid.variables['lon'][:]
 
   # nonzero returns a tuple of idx per dimension
   # we're unpacking the tuple here so we can lookup max and min
   (latidx,) = np.logical_and(lat >= BB['lat'][0], lat < BB['lat'][1]).nonzero()
   (lonidx,) = np.logical_and(lon >= BB['lon'][0], lon < BB['lon'][1]).nonzero()
 
   #
   assert lat.shape[0] == grid.variables['topo'].shape[0], 'We assume first dim is lat here'
   assert lon.shape[0] == grid.variables['topo'].shape[1], 'We assume 2nd dim is lon here'
 
   # get rid of the non used lat/lon now
   lat = lat[latidx]
   lon = lon[lonidx]
   # Get the topography data with the new indices
   topo = grid.variables['topo'][latidx.min():latidx.max(),
                                 lonidx.min():lonidx.max()]
   title = grid.title # remember so we can close the file before plotting
   grid.close()
 
   Lon,Lat = np.meshgrid(lon, lat)
   # make a pcolormesh (the fastest way to plot simple grids)
   mesh = plt.pcolormesh(Lon,Lat,topo)
   plt.plot(lines['lon'],lines['lat'],'k')
   # some customizations (called on the axes and figure)
   mesh.axes.set_title(title)
   mesh.axes.set_aspect(1/np.cos(np.pi*np.mean(BB['lat'])/180.)) # set aspect to match meters
   plt.xlim(BB['lon'][0],BB['lon'][1])
   plt.ylim(BB['lat'][0],BB['lat'][1])
   mesh.axes.set_xlabel('lon [deg]')
   mesh.axes.set_ylabel('lat [deg]')
   plt.clim(-50,150)
   mesh.figure.colorbar(mesh) # use the mesh to make a colorbar
   # Save the figure
   mesh.figure.savefig('%s.png' % title)
   mesh.figure.clf()

Thanks
Maria.K
mariakatosvich
 
Posts: 3
Joined: Sun Jun 26, 2016 9:37 pm

Re: Using Python and OPeNDAP to subset large gridded dataset

Postby yiboj » Tue Jun 28, 2016 1:24 pm

Hi Maria,

Thanks for sharing the code with us using pyDap module. I have removed the lines with local directory in your code and reposted here, so that other users could run the code directly without any modifications.

Code: Select all
#OPENDAP_SUBSETTING_WITH_PYTHON_TUTORIAL how to benefit from OPeNDAP subsetting in python
#
# This tutorial is also available for Matlab
#
#See also: OPeNDAP_access_with_Matlab_tutorial.py
 
# $Id:  $
# $Date:  $
# $Author:  $
# $Revision:  $
# $HeadURL: https://repos.deltares.nl/repos/OpenEarthTools/trunk/python/io/opendap/OPeNDAP_subsetting_with_python_tutorial.py $
# $Keywords: $
 
# This document is also posted on a wiki: http://public.deltares.nl/display/OET/OPeNDAP+access+with+python
 
# Read data from an opendap server
import urllib
import numpy as np
import netCDF4
import pydap
import matplotlib
import matplotlib.pyplot as plt
import pylab
 
# from opendap import opendap # OpenEarthTools module, see above that makes pypdap quack like netCDF4
 
# Define data on an opendap server
# for converting non-open access GECBO grids to same netCDF structure: see https://repos.deltares.nl/repos/OpenEarthRawData/trunk/gebco/
gridurls =  ['http://geoport.whoi.edu/thredds/dodsC/bathy/etopo1_bed_g2',
             'http://geoport.whoi.edu/thredds/dodsC/bathy/etopo2_v2c.nc',
             'http://geoport.whoi.edu/thredds/dodsC/bathy/srtm30plus_v1.nc',
             'http://geoport.whoi.edu/thredds/dodsC/bathy/srtm30plus_v6',
             'http://geoport.whoi.edu/thredds/dodsC/bathy/smith_sandwell_v9.1.nc',
             'http://geoport.whoi.edu/thredds/dodsC/bathy/smith_sandwell_v11']
 
lineurl  = 'http://opendap.deltares.nl/thredds/dodsC/opendap/noaa/gshhs/gshhs_i.nc';
 
# Get line data: 1D vectors are small, so we can get all data
linedata = netCDF4.Dataset(lineurl) # opendap(url_line) # when netCDF4 was not compiled with OPeNDAP
 
lines = dict(
   lon=linedata.variables['lon'][:],
   lat=linedata.variables['lat'][:]
   )
linedata.close()
 
# Define bounding box
BB = dict(
   lon=[ 0, 10],
   lat=[50, 55]
   )
 
for i,ncfile in enumerate(gridurls):
 
   print ncfile
 
   # Get grid data
   grid   = netCDF4.Dataset(ncfile) # opendap(ncfile) # when netCDF4 was not compiled with OPeNDAP
 
   # Get all data from lat and lon, but just a pointer to the topography
   # Because getting it all takes a bit too long
   lat = grid.variables['lat'][:]
   lon = grid.variables['lon'][:]
 
   # nonzero returns a tuple of idx per dimension
   # we're unpacking the tuple here so we can lookup max and min
   (latidx,) = np.logical_and(lat >= BB['lat'][0], lat < BB['lat'][1]).nonzero()
   (lonidx,) = np.logical_and(lon >= BB['lon'][0], lon < BB['lon'][1]).nonzero()
 
   #
   assert lat.shape[0] == grid.variables['topo'].shape[0], 'We assume first dim is lat here'
   assert lon.shape[0] == grid.variables['topo'].shape[1], 'We assume 2nd dim is lon here'
 
   # get rid of the non used lat/lon now
   lat = lat[latidx]
   lon = lon[lonidx]
   # Get the topography data with the new indices
   topo = grid.variables['topo'][latidx.min():latidx.max(),
                                 lonidx.min():lonidx.max()]
   title = grid.title # remember so we can close the file before plotting
   grid.close()
 
   Lon,Lat = np.meshgrid(lon, lat)
   # make a pcolormesh (the fastest way to plot simple grids)
   mesh = plt.pcolormesh(Lon,Lat,topo)
   plt.plot(lines['lon'],lines['lat'],'k')
   # some customizations (called on the axes and figure)
   mesh.axes.set_title(title)
   mesh.axes.set_aspect(1/np.cos(np.pi*np.mean(BB['lat'])/180.)) # set aspect to match meters
   plt.xlim(BB['lon'][0],BB['lon'][1])
   plt.ylim(BB['lat'][0],BB['lat'][1])
   mesh.axes.set_xlabel('lon [deg]')
   mesh.axes.set_ylabel('lat [deg]')
   plt.clim(-50,150)
   mesh.figure.colorbar(mesh) # use the mesh to make a colorbar
   # Save the figure
   mesh.figure.savefig('%s.png' % title)
   mesh.figure.clf()
yiboj
 
Posts: 97
Joined: Mon Mar 30, 2015 11:22 am

Re: Using Python and OPeNDAP to subset large gridded dataset

Postby mariakatosvich » Fri Jul 01, 2016 11:27 pm

Thanks & welcome yiboj, for modifying the code for making it more recheable..
mariakatosvich
 
Posts: 3
Joined: Sun Jun 26, 2016 9:37 pm


Return to PO.DAAC OPeNDAP

cron