Page 1 of 1

Using Matlab and OPeNDAP to Subset Level 2 Datasets

PostPosted: Wed Jun 22, 2016 10:10 am
by yiboj
Dealing with large datasets is challenging from the perspective of storage and access. Instead of downloading and storing large datasets locally it is often preferable to spatially subset them on a server side and store smaller pieces, or even use the pieces dynamically for an analysis and discard them. OPeNDAP is a popular service to extract subsets from data in netCDF and HDF.

Dr. John Wilkin (Rutgers University) has kindly provided the matlab codes here to download and subset level 2 datasets. The subroutines and main file are listed below. The AMSR2 dataset is used here as an example and please feel free to modify the code.

get_amsr2sst
Code: Select all
function [data,geo] = get_amsr2sst(dataurl,latmin,latmax,lonmin,lonmax,varname)
% Read AMSR2 data from PO-DAAC opendap
%
% See get_amsr2sst_filelist.m which returns information required to
% build a set of data urls for any given day
%
%
% Optional inputs:
%     POLYBOX:  Matrix with columns of lon,lat defining a box
%     in which to extrct the AMSR2 data (e.g. created with
%     ginput)
%     VARNAME variable to extract - default sea_surface_temperature
%
% John Wilkin
% Feb 13, 2015

if nargin < 6
  varname = 'sea_surface_temperature';
end

%lon = nc_varget(dataurl,'lon');
lon = ncread(dataurl,'lon');
lat = ncread(dataurl,'lat');
if nargin > 4
  in = find((lon >= lonmin & lon <= lonmax) & (lat >= latmin & lat <= latmax));
else
  in = ~isnan(lon);
end

if ~isempty(in)
  lon = lon(in);
  lat = lat(in);
  data = ncread(dataurl,varname);
  data = data(in);
  time = ncread(dataurl,'time')/86400+datenum(1981,1,1);
  geo.lon = lon;
  geo.lat = lat;
  geo.time = time;
else
  warning('No data found in the granule')
  data = [];
  geo = [];
end


get_amsr2sst_filelist
Code: Select all
function [opendap,flist] = get_amsr2sst_filelist(day,shortName, version)
% [opendap,flist] = get_amsr2sst_filelist(day,version)
% Get list of Level-2 pass files (FLIST) for AMSR2 microwave SST data
% and the path to the files in the PO-DAAC opendap catalog (OPENDAP)
%
% The data url for an opendap query is the string [opendap flist(k,:)]
%
% Optional input VERSION ('final' or 'realtime','nearrealtime','nrt')
%
% See also get_amsr2sst.m
%
% John Wilkin
% Feb 13, 2015

if nargin<2
  version = 'final';
end
switch version
  case 'final'
    grepstr = ' grep FTP | grep v07.2 | cut -c100-180 ';
  case {'realtime','nearrealtime','nrt'}
    grepstr = ' grep FTP | grep rt | cut -c100-177 ';
end

%
[year,~] = datevec(day);
yday = floor(day-datenum(year,1,1))+1;
startTime = datestr(floor(day),30);
endTime = datestr(1+floor(day),30);

% query the PO-DAAC granule data base for names of the L2 files
query = ['curl "http://podaac.jpl.nasa.gov/ws/search/granule',...
  '/?shortName=' shortName,...
  '&startTime=' startTime,...
  '&endTime=' endTime,...
  '&itemsPerPage=100" | ' grepstr ' | sort'];
[~,flist] = unix(query);

% parse the result of the curl query by finding the file names (which all
% start with the year string)
k = strfind(flist,int2str(year));
if isempty(k)
  disp(['No ' upper(version) ' files found for day ' startTime])
  opendap = [];
  flist = [];
  return
end

% trim off the preamble text not excluded by grep
flist = flist(k(1):end);

% reshape
N = length(k);
flist = reshape(flist,[length(flist)/N N])';

% remove the trailing linefeed
flist(:,end) = [];

% PO-DAAC acknowledge that for some reason the database gives one file at
% the beginning of the list that belongs in the preceding day -
% remove it. I'm not if this means we miss a pass in the successive days.
flist(1,:) = [];

% String giving the path in the opendap catalog to files for this
% day to be used to build the fulll data url in function get_amsr_sst.m
opendap = ['http://podaac-opendap.jpl.nasa.gov:80/opendap/allData/',...
  'ghrsst/data/GDS2/L2P/AMSR2/REMSS/v7.2/',...
  int2str(year) '/' sprintf('%03d',yday) '/'];


Main Routine
Code: Select all
clear
format compact
format long

%*** Input Parametr

shortName='AMSR2-REMSS-L2P-v7.2';

%Read/Write data****************************************************************
datanum = datenum(2014,10,1);
[opendap,flist] = get_amsr2sst_filelist(datanum, shortName, 'final');
[nfile lz]=size(flist);
for i=1:nfile
 [data,geo] = get_amsr2sst([opendap flist(i,:)], -20, 20, -60, 60);
 if (length(data) > 1)
   disp(sprintf('\n*** Writing File %s ***', flist(i,:)));
   nccreate( flist(i,:), 'sea_surface_temperature', 'Dimensions',{'r' length(data)}, 'Datatype','double');
   ncwrite ( flist(i,:), 'sea_surface_temperature', data);
   nccreate ( flist(i,:), 'time');
   ncwrite ( flist(i,:), 'time', geo.time);
   nccreate ( flist(i,:), 'lon', 'Dimensions',{'r' length(geo.lon)}, 'Datatype','double');
   ncwrite ( flist(i,:), 'lon', geo.lon);
   nccreate ( flist(i,:), 'lat', 'Dimensions',{'r' length(geo.lat)}, 'Datatype','double');
   ncwrite ( flist(i,:), 'lat', geo.lat);
 end
end
%End****************************************************************