Copying 'latest' files from PO.DAAC via WebDAVThis post assumes you have already setup WebDAV with PO.DAAC Drive. For more information on that, please see the Drive help pages.
We are assuming we've mounted the PO.DAAC Drive WebDAV to '/Volume/files' in this instance.
A common use case for users is to pull data from PO.DAAC to some local disk for further processing. Users don't want to download all of the data each time they make a request, they only want to fetch data that's new or has changed. Using the command line tool 'rsync' we can accomplish this feature.
Once Drive is mounted via webDAV, we use the rsync command to copy new or updated files to our local disk:
- Code: Select all
rsync -avzh SOURCE_DIR DESTINATION_DIR
rsync -avzh /Volumes/files/allData/ostm/preview/L2/GPS-OGDR/c311 /data/archive/ostm/preview/L2/GPS-OGDR/
The above code will 'copy' the files from the source directory (/Volumes/files/allData/ostm/preview/L2/GPS-OGDR/c311) to the destination directory (/data/archive...). This example is using just one subdirectory, c311. the more subdirectories we truncate from the source, the larger our rsync job will become. For example, if we ran the following:
- Code: Select all
rsync -avzh /Volumes/files/allData/ostm/preview/L2/GPS-OGDR/ /data/archive/ostm/preview/L2/GPS-OGDR
We would rsycn ALL the cycle subdirectories. This would, in effect, copy all existing mission data to your destination directory.
So we've now copied all existing data for a cycle (or the dataset, if you executed the second rsync). But this dataset is still ongoing. How do we capture the newly created data hours, days, or weeks later? Simple: we run the same exact command again!
- Code: Select all
rsync -avzh /Volumes/files/allData/ostm/preview/L2/GPS-OGDR/ /data/archive/ostm/preview/L2/GPS-OGDR
command will check the source directory against the local directory, and download any new or changed files! To automate this entire process, we can create a cron job that runs on whatever schedule we'd like. For this example, we will run the rsync job every night at 4am:
- Code: Select all
crontab -e
0 4 * * * rsync -avzh /Volumes/files/allData/ostm/preview/L2/GPS-OGDR/c311 /data/archive/ostm/preview/L2/GPS-OGDR/ > /data/archive/rsync.log
crontab: installing new crontab
This tells cron to run on the 0th minute of the 4th hour, every day, every month, every day of week. We also write the rsync output to a log file (rsync.log) to investigate any issues that may arise.