The Earthdata Cloud Primer has a downloadable PDF titled 'Why Use the Cloud? An Earthdata Vision'.
Introduction of large remote sensing missions, archived by the EOSDIS (Earth Observing System Data and Information System) DAACs (Distributed Active Archive Center), are challenging the end user’s traditional download and access paradigm. While the core functionality of the DAACs (ingest and archive, cataloging, and data access) will always be available, the Earth Science Data and Information System (ESDIS) project office is also exploring new ways to enable science and the transformation of data into knowledge and information. The cloud offers a scalable and effective way to address storage, network, and data movement concerns while offering a tremendous amount of flexibility to the user. The new big missions, Surface Water and Ocean Topography (SWOT) and NASA-ISRO SAR (NISAR), will push EOSDIS into its next phase — co-located, multidisciplinary, cloud based archive and distribution centers that enable analysis next to the data. Large multiyear global datasets from other ongoing missions will also become available in cloud environments. These will be scheduled for availability in the cloud when use cases suggest research will benefit or be enabled. Users should look to the DAACs for information about which datasets are currently hosted in the cloud and which datasets are targeted for hosting in the cloud environment.
The Earthdata Cloud Primer has a downloadable PDF titled 'Bring your own script to the cloud'.
This tutorial explains how you can create and execute a custom script using AWS. AWS supports multiple ways to achieve this. One option, the AWS Lambda compute service, allows you to run code without provisioning or managing servers. It executes your code only when needed and scales automatically, from a few requests per day to thousands per second. You pay only for the compute time you consume - there is no charge when your code is not running. With AWS Lambda, you can run code for virtually any type of application or backend service - all with zero administration. AWS Lambda runs your code on a high-availability compute infrastructure and performs all of the administration of the compute resources, including server and operating system maintenance, capacity provisioning and automatic scaling, code monitoring and logging ...
Yes! While the move to the cloud is intended to enable big data (think petabytes of data) analytics near the data, PO.DAAC understands that the use of the data, in any fashion, is paramount.
Data will always be available through traditional mechanisms, such as downloading entire files, at no charge to the end-user (see below How much will it cost me). PO.DAAC will continue to add value-added services to data as well, which can be utilized within or outside of the cloud. These services, such as subsetting and regridding, will minimize the amount of data needed to be transferred, and minimize the work required by users to integrate PO.DAAC data into their use cases.
PO.DAAC will house all of its data holdings in Amazon Web Services’ Oregon (us-west-2) region. All NASA DAACs will be adding their data to this region over time. This is important because in order to maximize many benefits of cloud computing, the processing should be near the data. This means if you create an account and spin up compute (EC2, lamda, etc) instances, they should also reside in US-west-2. This will maximize transfer speeds from data storage to your compute instance. While you can do work in other regions and even other clouds, there could be a performance penalty on transferring large amounts of data outside of the cloud, though it is still technically possible.
Some features, such as direct S3 access to PO.DAAC data (see below), will only be enabled for users within the AWS us-west-2 cloud region in order to avoid regulatory issues with budgets and oversight. The data can still be downloaded (at no cost to the end-users) via https mechanisms when outside of the cloud.
There is absolutely no charge to download data from PO.DAAC. If you download data to your laptop, that’s all there is to it. We do recommend investigating processing data in the cloud, as the increasing volumes of data, and the proximity to other data of interest in the cloud, might yield faster or more timely performance, but we understand that this process has to be done on a case by case basis. If you’d like to do your analysis next to the data, in the Amazon Web Services (AWS) cloud, please see the next question for cost associated with that.
While downloading data from PO.DAAC is free, the usage of cloud compute resources to subsequently do analysis on that data is something the end user is expected to pay for. Determining this cost can be challenging in this new cloud paradigm, but there are resources available. One resource is the Earthdata Cloud Primer, which contains a section on estimating and understanding cloud costs. It is usually best to start small and use more and larger resources once you understand the impacts of the decisions you make.
PO.DAAC is also looking into partnering with other organizations to procure resources or “credits” on behalf of our users. We will share information relating to credits as it becomes available.
There are a number of ways to get started using PO.DAAC cloud data. The best way is to see what data is available in the cloud.
If you’re interested in PO.DAAC data, visit an PO.DAAC Cloud Earthdata Search Portal.
We have added a ‘MIGRATION’ section to these Cloud Data Pages to better prepare our users for moving to the cloud. In addition please explore:
Please contact firstname.lastname@example.org with your request for Early Access to cloud data. Please include your Earthdata Login username. Once you’ve been added to the early access list, you can then see the available collections after logging into the PO.DAAC Cloud Earthdata Search Portal.
Note: Once added to the early adopter list, you might see duplicate collections in Earthdata Search and Common Metadata Repository (CMR). This is because we are currently in a transition phase from on-premise to the cloud archive. For example, you might see two instances of ‘GHRSST Level 4 MUR Global Foundation Sea Surface Temperature Analysis (v4.1)’ in search results or collection listings. This will remain until the cloud dataset is no longer ‘early access’, but is made publicly available.
In addition to the early adopter datasets, we have a number of ECCO datasets publicly available in the cloud. No special permissions are required to access these datasets.
Please check back to the PO.DAAC cloud data page and sign up for our PO.DAAC Mailing List for major announcements and information.
Direct S3 access is a way of accessing AWS cloud data from S3 storage using cloud native tools. The advantages of this approach is being able to use python and other language libraries to directly access files (or portions of files) in the PO.DAAC archive, list the contents of a “bucket” (a bucket is like a giant bin of files, called objects), and also interrogate and interact with cloud native formats. Direct S3 access is only available to users in the same cloud region as the PO.DAAC data (AWS us-west-2), so to use it, one needs to have resources like an EC2 instance (i.e. virtual machine or compute space in AWS cloud) running in that region.
To get your credentials, which expires after 1 hour of use, you can access the credential endpoint here.
Other services, such as Harmony services, will stage data after transformation in S3 for direct, in-cloud access as well. An example Python Notebook showcasing Harmony and direct S3 access can be found on our PO.DAAC GitHub Tutorials page.
See the Earthdata Cloud Primer downloadable PDF titled 'Glossary and Acronyms Explained'.