Download Multiple Data Files from PODAAC Drive Using wget

Download Multiple Data Files from PODAAC Drive Using wget

Postby yiboj » Thu Dec 01, 2016 10:30 am

This data recipe shows how to download multiple data files from PODAAC using GNU wget utility command. GNU Wget is a free utility for non-interactive download of files from the Web. It supports http, https, and ftp protocols, as well as retrieval through http proxies. It is a Unix-based command-line tool, but is also available for other operating system, such as Windows, Mac OS X, etc.

[b][color=#FF0000]
1. wget Command Options[/color][/b]

Here is the list of a few key options frequently used:

[b]-nd[/b]
--no-directories
Do not create a hierarchy of directories when retrieving recursively. With this option turned on, all files will get saved to the current directory, without clobbering (if a name shows up more than once, the filenames will get extensions '.n').


[b]-x[/b]
--force-directories
The opposite of '-nd' —create a hierarchy of directories, even if one would not have been created otherwise. E.g. "wget -x http://podaac.jpl.nasa.gov/robots.txt" will save the downloaded file to podaac.jpl.nasa.gov/robots.txt.
[b]
-nH[/b]
--no-host-directories
Disable generation of host-prefixed directories. By default, invoking Wget with "-r http://podaac.jpl.nasa.gov/" will create a structure of directories beginning with podaac.jpl.nasa.gov/. This option disables such behavior.
[b]
-r[/b]
--recursive
Turn on recursive retrieving. The default maximum depth is 5.
[b]
-l depth[/b]
--level=depth
Specify recursion maximum depth level depth.

[i]Try to specify the criteria that match the kind of download you are trying to achieve. If you want to download only one page, use '--page-requisites' without any additional recursion. If you want to download things under one directory, use '-np' to avoid downloading things from other directories. If you want to download all the files from one directory, use '-l 1' to make
sure the recursion depth never exceeds one.[/i]

[b][color=#FF0000]2. Download multiple files from PODAAC FTP site[/color][/b]

Let's take GHRSST SST Level 2 datset from REMSS as an example, the dataset landing page is [url=https://podaac.jpl.nasa.gov/dataset/AMSRE-REMSS-L2P-v7a]https://podaac.jpl.nasa.gov/dataset/AMSRE-REMSS-L2P-v7a[/url]. The FTP link for this dataset is indicated by the red circle in Figure 1.

[attachment=1]amsr-e_ftp.png[/attachment]

* To download one day data files
[code]
% wget -r -nc -np -nH -nd -A "*.nc" "ftp://podaac-ftp.jpl.nasa.gov/allData/ghrsst/data/GDS2/L2P/AMSRE/REMSS/v7/2011/001"
[/code]

* To download one year data files and create sub-directory

[code]
% wget -r -nc -np -nH -d -A "*.nc" "ftp://podaac-ftp.jpl.nasa.gov/allData/ghrsst/data/GDS2/L2P/AMSRE/REMSS/v7/2011/"
[/code]

[b][color=#FF0000]3. Download multiple files from PODAAC Drive[/color][/b]

In order to access PODAAC Drive, all users are required to be registered with NASA Earthdata system. User can login to the PODAAC Drive using the following link [url=https://podaac-uat.jpl.nasa.gov/drive/]https://podaac-uat.jpl.nasa.gov/drive/[/url]. Figure 2 shows the WebDAV/Programmatic API credentials which will be used later to access the files through wget command. Please note that the password is encrypted, it is different from the Earthdata URS password.

[attachment=0]podaac_drive.png[/attachment]

Again we take the GHRSST SST Level 2 datset from REMSS as an example.

* To download one day data files
[code]
% wget --user=LOGIN --password=PASSWORD -r -nc -np -nH -nd -A "*.nc" "https://podaac-uat.jpl.nasa.gov/drive/files/OceanTemperature/ghrsst/data/GDS2/L2P/AMSRE/REMSS/v7/2011/001/"
[/code]

* To download one year data files and create sub-directory
[code]
% wget --user=LOGIN --password=PASSWORD -r -nc -np -nH -d -A "*.nc" "https://podaac-uat.jpl.nasa.gov/drive/files/OceanTemperature/ghrsst/data/GDS2/L2P/AMSRE/REMSS/v7/2011/"
[/code]

Please refer to the following link for more detail information:
[url=https://www.gnu.org/software/wget/]Download and Install wget[/url]
[url=https://www.gnu.org/software/wget/manual/wget.pdf]wget Manual in PDF Format[/url]
Attachments
podaac_drive.png
Figure 2: PODAAC Drive Login Credential Screen
podaac_drive.png (57.44 KiB) Viewed 1571 times
amsr-e_ftp.png
Figure 1: FTP Link of Dataset
amsr-e_ftp.png (113.75 KiB) Viewed 1571 times
yiboj
 
Posts: 54
Joined: Mon Mar 30, 2015 11:22 am

Re: Download Multiple Data Files from PODAAC Drive Using wge

Postby mgangl » Wed Jan 11, 2017 7:21 am

Another option people may be interested in is the -N option for wget:

Code: Select all
-N,  --timestamping              don't re-retrieve files unless newer than local.


With this, you can run the same command over and over on a top level directory (say a year or the entire dataset top level directory) and only download the newest files. This is a common case for many users and we have other ways of addressing this same use case (using rsync and WebDAV).

So a quick change to the command may look like this (and i'm using ASCAT data in this example):

Code: Select all
 wget --user=USER --password=PASSWORD -r -N -np -nH -d -A "*.nc.gz" https://podaac-uat.jpl.nasa.gov/drive/files/allData/ascat/preview/L2/metop_a/coastal_opt/2017/011/


This downloads a bunch of files in the 2017/011 directory. Keep running the command and you won't get any new files- but if we 'fake' out the server, and set the time of one of the downloaded files to a time before the file was created on the server, we can sho how the data will download new data:

Code: Select all
touch -t 201501010000 ascat_20170111_110000_metopa_53088_eps_o_coa_2401_ovw.l2.nc.gz


the above command will set the timestamp of the given file to january 1st, 2015.

When we run the wget command again, you can see that it downloads the newer files from the server, but not the existing, matching files.
mgangl
 
Posts: 12
Joined: Wed Apr 27, 2016 1:31 pm


Return to Data Access and Services

cron