Download Multiple Data Files from PO.DAAC Drive Using wget

Download Multiple Data Files from PO.DAAC Drive Using wget

Postby yiboj » Thu Dec 01, 2016 10:30 am

PO.DAAC Drive can be accessed both by wget and curl command on linux system. The recipe "Download Multiple Data Files from PO.DAAC Drive Using curl" shows how to use curl command, and this recipe will focus on the wget command. The major difference between wget and curl is that wget can download files recursively and curl can upload file to the server.

This data recipe shows how to download multiple data files from PO.DAAC using GNU wget utility command. GNU Wget is a free utility for non-interactive download of files from the Web. It supports http, https, and ftp protocols, as well as retrieval through http proxies. It is a Unix-based command-line tool, but is also available for other operating system, such as Windows, Mac OS X, etc.


1. wget Command Options


Here is the list of a few key options frequently used:

-nd
--no-directories
Do not create a hierarchy of directories when retrieving recursively. With this option turned on, all files will get saved to the current directory, without clobbering (if a name shows up more than once, the filenames will get extensions '.n').


-x
--force-directories
The opposite of '-nd' —create a hierarchy of directories, even if one would not have been created otherwise. E.g. 'wget -x https://podaac.jpl.nasa.gov/robots.txt ' will save the downloaded file to podaac.jpl.nasa.gov/robots.txt.

-nH

--no-host-directories
Disable generation of host-prefixed directories. By default, invoking Wget with '-r https://podaac.jpl.nasa.gov/ ' will create a structure of directories beginning with podaac.jpl.nasa.gov/. This option disables such behavior.

-r

--recursive
Turn on recursive retrieving. The default maximum depth is 5.

-l depth

--level=depth
Specify recursion maximum depth level depth.

Try to specify the criteria that match the kind of download you are trying to achieve. If you want to download only one page, use '--page-requisites' without any additional recursion. If you want to download things under one directory, use '-np' to avoid downloading things from other directories. If you want to download all the files from one directory, use '-l 1' to make sure the recursion depth never exceeds one.

2. Download multiple files from PO.DAAC Drive

In order to access PO.DAAC Drive, all users are required to be registered with NASA Earthdata Login. User can login to the PO.DAAC Drive using the following link https://podaac-tools.jpl.nasa.gov/drive/. Figure 1 shows the WebDAV/Programmatic API credentials which will be used later to access the files through wget command. Please note that the password is encrypted, it is different from the Earthdata Login password.
podaac_drive_login.png
Figure 1: PO.DAAC Drive Login Credential Screen
podaac_drive_login.png (32.3 KiB) Viewed 2099 times


We take the GHRSST SST Level 2 AMSRE datset from REMSS as an example (Figure 2).
podaac_amsre_access.png
Figure 2: PO.DAAC AMSRE L2 v7 Dataset
podaac_amsre_access.png (116.92 KiB) Viewed 2106 times


* To download one day data files
Code: Select all
% wget --user=LOGIN --password=PASSWORD -r -nc -np -nH -nd -A "*.nc" "https://podaac-tools.jpl.nasa.gov/drive/files/allData/ghrsst/data/GDS2/L2P/AMSRE/REMSS/v7/2011/001/”


* To download one year data files and create sub-directory
Code: Select all
% wget --user=LOGIN --password=PASSWORD -r -nc -np -nH -d -A "*.nc" "https://podaac-tools.jpl.nasa.gov/drive/files/allData/ghrsst/data/GDS2/L2P/AMSRE/REMSS/v7/2011/”


Please refer to the following link for more detail information:
Download and Install wget
wget Manual in PDF Format
yiboj
 
Posts: 93
Joined: Mon Mar 30, 2015 11:22 am

Re: Download Multiple Data Files from PODAAC Drive Using wge

Postby mgangl » Wed Jan 11, 2017 7:21 am

Another option people may be interested in is the -N option for wget:

Code: Select all
-N,  --timestamping              don't re-retrieve files unless newer than local.


With this, you can run the same command over and over on a top level directory (say a year or the entire dataset top level directory) and only download the newest files. This is a common case for many users and we have other ways of addressing this same use case (using rsync and WebDAV).

So a quick change to the command may look like this (and i'm using ASCAT data in this example):

Code: Select all
 wget --user=USER --password=PASSWORD -r -N -np -nH -d -A "*.nc.gz" https://podaac-tools.jpl.nasa.gov/drive/files/allData/ascat/preview/L2/metop_a/coastal_opt/2017/011/


This downloads a bunch of files in the 2017/011 directory. Keep running the command and you won't get any new files- but if we 'fake' out the server, and set the time of one of the downloaded files to a time before the file was created on the server, we can sho how the data will download new data:

Code: Select all
touch -t 201501010000 ascat_20170111_110000_metopa_53088_eps_o_coa_2401_ovw.l2.nc.gz


the above command will set the timestamp of the given file to january 1st, 2015.

When we run the wget command again, you can see that it downloads the newer files from the server, but not the existing, matching files.
mgangl
 
Posts: 12
Joined: Wed Apr 27, 2016 1:31 pm

Re: Download Multiple Data Files from PO.DAAC Drive Using wg

Postby besthiker » Wed Jul 24, 2019 6:53 am

I have correct command and I have a valid password/login and user profile. I get a 401 error. Help!

wget -np --user=LOGINNAME--password=PASSWORD https://podaac-tools.jpl.nasa.gov/drive ... TIA.nc.bz2

error:
wget -np --user=<login-name> --password=<pwd-string> https://podaac-tools.jpl.nasa.gov/drive ... TIA.nc.bz2
--2019-07-24 08:46:06-- https://podaac-tools.jpl.nasa.gov/drive ... TIA.nc.bz2
Loaded CA certificate '/opt/local/share/curl/curl-ca-bundle.crt'
Resolving podaac-tools.jpl.nasa.gov...
Connecting to podaac-tools.jpl.nasa.gov ... connected.
HTTP request sent, awaiting response... 401 Unauthorized
Authentication selected: Basic realm="PODAAC_Drive"
Reusing existing connection to podaac-tools.jpl.nasa.gov
HTTP request sent, awaiting response... 401 Unauthorized
besthiker
 
Posts: 1
Joined: Wed Jul 24, 2019 6:42 am

Re: Download Multiple Data Files from PO.DAAC Drive Using wg

Postby yiboj » Thu Jul 25, 2019 11:44 am

Hi,

It seems like you have put the wrong credential, please double check and try again.
Regards,

PODAAC
yiboj
 
Posts: 93
Joined: Mon Mar 30, 2015 11:22 am

Re: Download Multiple Data Files from PO.DAAC Drive Using wg

Postby nimsocean » Wed Jul 31, 2019 10:17 pm

Hi,
I am trying to download data with wget command, but I got the following message.
Could you tell me which version of tls I should use?
Thank you in advance.

>> wget --user=*** --password=*** -r -nc -np -nH -nd -A "*.nc" "https://podaac-tools.jpl.nasa.gov/drive/files/allData/ghrsst/data/GDS2/L2P/AMSR2/REMSS/v8a/2019/189/"
--2019-07-15 16:44:09-- https://podaac-tools.jpl.nasa.gov/drive ... /2019/189/
Resolving podaac-tools.jpl.nasa.gov... 137.78.248.120
Connecting to podaac-tools.jpl.nasa.gov|137.78.248.120|:443... connected.
OpenSSL: error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
Unable to establish SSL connection.
nimsocean
 
Posts: 2
Joined: Thu Jul 04, 2019 9:24 pm

Re: Download Multiple Data Files from PO.DAAC Drive Using wg

Postby yiboj » Thu Aug 01, 2019 12:32 pm

Hi.

Thanks for the inquiry.
You can check you wget version by trying the following command:
Code: Select all
wget -V

We suggest installing more recent wget (> v1.14) in order to support OpenSSL.
Hope this helps.

PODAAC DE
yiboj
 
Posts: 93
Joined: Mon Mar 30, 2015 11:22 am

Re: Download Multiple Data Files from PO.DAAC Drive Using wg

Postby nimsocean » Mon Aug 05, 2019 5:54 pm

Thank you for the reply.

I installed recent version of wget (ver. 1.20.3) and tried again.
However, I got the same error messages.

Should I also consider OpenSSL version?
Currently, I use OpenSSL 1.0.0-fips 29 Mar 2010.
nimsocean
 
Posts: 2
Joined: Thu Jul 04, 2019 9:24 pm

Re: Download Multiple Data Files from PO.DAAC Drive Using wg

Postby yiboj » Tue Aug 06, 2019 1:10 pm

Hi,

Please check your login credential, and make sure you have the right one.
Thanks,

-PODAAC DE
yiboj
 
Posts: 93
Joined: Mon Mar 30, 2015 11:22 am


Return to PO.DAAC Drive