THREDDS: Web-based data subsetting How-To

Thematic Realtime Environmental Distributed Data Services

THREDDS: Web-based data subsetting How-To

Postby vtsontos » Fri Oct 03, 2014 8:23 pm

The THREDDS service is a useful web-based capability allowing users to undertake spatio-temporal subsetting and extraction of time series of gridded (L3,L4) satellite data maintained at the PO.DAAC into a downloadable netCDF output file. The example workflows here illustrate the usage of this utility in the context of Pathfinder and MUR SST data, THREDDS being used to extract to a continuous series of temperature observations for a user defined area centered on - 17.3 S, 119.4 E around the reefs of the Rowley Shoals (north-western Australia) and for a portion of the Gulf Stream, N.Atlantic (LAT:60->32, LON: -41.43-> -66.55). The following sequence of steps and associated screenshots presented will help guide you through the process of using THREDDS to subset SST data for your area and time period of interest. The first part(I) illustrates end-to-end access and usage of the THREDDS web-form for undertaking spatio-temporal subsetting of a select dataset interactively via a graphical user interface. The second section (II) outlines a complimentary approach involving the construction of an THREDDS URL with extended parameters facilitating automated programmatic, machine-to-machine dataset subsetting and download calls.

I. Interactive Usage of THREDDS for Your Dataset of Interest (eg. Pathfinder SST)
1) Search for all Pathfinder datasets at PODAAC and then select the specific dataset of interest

1.png
Catalog search for Pathfinder data on the PO.DAAC portal
1.png (199.85 KiB) Viewed 2710 times

- On the main menu of the PODAAC website click "Data Discovery" then "Collections" the select "pathfinder." This will take you to the page from this link https://podaac.jpl.nasa.gov/datasetlist?ids=Collections&values=AVHRR_PATHFINDER_L3_SST&view=list
- Alternatively, in the Search field on the PODAAC homepage (top right), enter the term “Pathfinder”
- click on the specific dataset of interest. For this example, choose the 7day SST product.

2) On the dataset description page for the selected dataset, click on the “Data Access” tab. Note the THREDDS link (saying that this dataset is available in this tool), and click on it to call THREDDS)

2.png
Accessing THREDDS via a dataset's Data Access tab
2.png (184.21 KiB) Viewed 2710 times

3) The associated THREDDS catalog page exposes aggregations of several variables for this dataset.

3.png
Selecting the dataset variable of interest to subsett via THREDDS
3.png (78.26 KiB) Viewed 2710 times

Click on the "SST" link: this is average SST over the 7day interval. Note, however, that other variables could also be useful, like SDEV and NUM which gives you the standard deviation and count of SST for each pixel.

4) On the resulting THREDDS dataset variable page for SST, select the "NetCDF Subset" option.

4.png
Using the THREDDS NCSS Web-form to apply spatio-temporal query filters
4.png (121.53 KiB) Viewed 2710 times

- This brings up a web-form within which you can specify the area and time period of interest, plus which of the available variables to include (include SST and Lat/Lon).
- Note: make sure to select the Bounding Box and Time Range radio buttons and not the default ALL selection; otherwise a data request for global data of the full time series may be too large to process.
- Click Submit for your THREDDS request to execute.

5) Once the request is processed, you will get a popup prompt informing you that a netCDF file with your data is ready for download. Click SAVE.
- The resulting NetCDF output file aggregate__avhrr_AVHRR_PATHFINDER_L3_SST_7DAY_NIGHTTIME_V5.ncml.nc will be saved locally onto your computer.
- You can use MATLAB, IDL, R or the analysis tool of your choice to view and plot netCDF data. The output shown here is of the data displayed in generic and free netCDF reader package called Panoply (http://www.giss.nasa.gov/tools/panoply/); this is very useful tool for inspecting the structure and contents of any arbitrary netCDF data file, and even producing basic plots.

5.png
Panoply mapped outputs of the THREDDS extracted Pathfinder SST Australian reef data
5.png (55.18 KiB) Viewed 2710 times

- THREDDS (and/or Panoply) apply the appropriate scaling relationships (slope and intercept terms) documented in the file metadata to convert the source file data values from Degrees kelvin to Deg. C. Values for the given area in December appear to be around 31-34 deg.

6.png
Pathfinder Scaling relationship metadata for the THREDDS NetCDF output file shown in Panoply
6.png (59.92 KiB) Viewed 2710 times


II. Specifying a THREDDS URL to Access and Subset Your Dataset of Interest (eg. MUR-SST)
An alternative approach to subsetting data dynamically via THREDDS that is particularly useful for programmatic, machine-to-machine requests involves the construction and submission of an HTTP call to the THREDDS server via an extended URL with well-defined parameters. Documentation and details on the precise structure of such a THREDDS URL and the range or possible parameters and arguments together with examples are available at:
https://www.unidata.ucar.edu/software/thredds/current/tds/.

Here we illustrate the structure of a THREDDS URL for dynamically subsetting the L4 MUR SST high resolution dataset available at PO.DAAC (see https://podaac.jpl.nasa.gov/dataset/MUR-JPL-L4-GLOB-v4.1). The utility of THREDDS for accessing and downloading only spatio-temporal subsets of interest is highlighted by MUR given the large file sizes (>225mb compressed) characteristic of this global, 1km resolution dataset. MUR also illustrates some of the limits currently of THREDDS for extracting longer time series of high resolution data for larger regions and effective workarounds.
The basic structure of the THREDDS request for MUR is as follows:
https://thredds.jpl.nasa.gov/thredds/ncss/OceanTemperature/MUR-JPL-L4-GLOB-v4.1.nc?var=analysed_sst&north=60.5&west=-66.55&east=-41.43&south=32.0&disableProjSubset=on&horizStride=1&time_start=2011-09-15T09%3A00%3A00Z&time_end=2011-09-15T23%3A59%3A59Z&timeStride=1&addLatLon=true
The first portion of the URL before the ? will vary according to how the data provider catalogs the data, but everything after it is standard and pretty self-explanatory: var = variable (comma separated list in cases of multiple variables for output). N/S/E/W defines the extents of the spatial bounding box. Date ranges for the query are specified by the Start and End parameters with date entries in the standard date/time format shown. Here the query is for a single day’s worth of MUR SST data for a region of the North Atlantic/Gulf Stream. The default output file is in NetCDF format, but other format types can be specified as parameters in the THREDDS URL (see the UNIDATA link above for comprehensive documentation). In this case, the output file size for a day’s worth of 1km SST data for this region is 23mb, with a graphical plot of these data given below.

7.png
Panoply mapped outputs of the THREDDS extracted MUR-SST Gulf Stream data
7.png (52.66 KiB) Viewed 2710 times

This is still quite a lot of data for just one day, but subsetting the data via THREDDS saves one the effort of downloading the entire global file (225mb) and then undertaking the mechanics of subsetting in tools such as Matlab, IDL or some programming language. Once can see that it is simple to just copy and paste the above URL into a Web-browser to issue the THREDDS request and get back in short order the output file with the subsetted data. It is also clear how this could be used programmatically to say iteratively issue THREDDS queries for a series of dates by simply updating the start and end dates in successive URL submissions to dynamically alter URL parameters in automated fashion. It is of course possible to specify THREDDS requests for larger regions and longer periods, but there are limits and THREDDS will return an error if an unwieldy request is received. In the case of MUR for the region here, a single request for more than 2 months of daily data cannot be handled effectively, in which case the solution is to issue a series of requests iteratively for permissible time windows. A request of a 2-month series for this area results in a 2.82GB output file being processed and delivered in about 6.5 minutes over a fast network connection.

Finally, note that depending on the THREDDS server being accessed and the organization of the source data files at the data provider’s site, some differences in the first portion of the URL construct will likely be observed. Compare, for example, the aforementioned URL for a request to the THREDDS server at PO.DAAC with that below for the same dataset accessible via THREDDS at NOAA/Coastwatch (examine the portion before the ? in particular).
http://oceanwatch.pfeg.noaa.gov/thredds/ncss/grid/satellite/MUR/ssta/1day?var=analysed_sst&north=60.5&south=32&east=-41.43&west=-66.55&time_start=2011-09-15T09:00:00Z&time_end=2011-09-15T23:59:59Z
General documentation regarding meaning of those URL elements are accessible at the UNIDATA link above, with additional information detailed available potentially available from the data provider’s THREDDS catalog information pages.
vtsontos
 
Posts: 29
Joined: Wed Sep 03, 2014 10:43 am

Return to PO.DAAC THREDDS

cron