Introduction: Access to PO.DAAC datasets in the cloud

PO.DAAC is in the process of moving its data holdings to the cloud. The Cloud Data page at PO.DAAC offers access cloud-based datasets as well as resources to help guide data users in discovering, accessing, and utilizing cloud data.

The Cloud Datasets section provides a listing page for cloud-archived datasets, with more tools/services integration. The Resources section shares information, updates, data recipes, and other materials that help support the user in discovering, accessing and using datasets from and within the Earthdata Cloud. The Migration section offers information on the transition timeline and datasets, what to expect, and migration-specific FAQs and tutorials. For questions on what this transition means, please see the FAQ section.

During this transition to the cloud, this Cloud Data page will be evolving and continuously updated with new content and data - please check back regularly.

 

What does the new cloud paradigm look like?


In the new paradigm, the data storage, and DAAC-provided tools and services built on top of the data are co-located in the Earthdata Cloud (hosted in AWS cloud). So what does this mean to you, the user of the data?

  • PO.DAAC will provide the same level of service to users, while handling large volumes of data, by leveraging the scalability capability of the cloud.

  • PO.DAAC will provide services that are co-located with the data in the cloud to minimize the amount of data downloaded, allowing you to select and access only the data you are interested in, making the data more analysis ready - whether the next step in your workflow is to download and analyze/do your work, or the next step is working in the cloud.

  • Users are not required to move their workflows to the cloud in order to access PO.DAAC data hosted in Earthdata Cloud (in AWS); users do have to update any existing PO.DAAC data access end-points, to point to the new Earthdata Cloud end-points, in order to access the data.

    • The Earthdata Cloud end-points can be found on the respective cloud-enabled dataset landing page, under Data Access

    • Traditional end-points include whole-file download, OPeNDAP, and virtual directory browsing

  • Data download will continue to be freely available to users, from the Earthdata Cloud archive

  • While data download from the Earthdata Cloud archive continues to be freely available, in some cases it may be beneficial for users to move their science and application workflows to the cloud. With the dawn of Big Data era upon us, the cloud offers a scalable and effective way to address storage, network, and data movement concerns while offering a tremendous amount of flexibility to the user. Particularly if working with large data volumes (big data), data access and processing would be more efficient if workflows are taking place in the cloud, "next to the data", which avoids having to download large data volumes.

Three pathway examples to interact and access data (and services) from and within the NASA Earthdata Cloud, are illustrated in the diagram below:

  • Working locally, after downloading data to your local machine, servers, or cluster (green arrows and icons)

  • Within the Cloud: Set up your own AWS EC2 cloud instance, or virtual machine, in the cloud next to the data* (orange arrows and icons)

  • Within the Cloud: Through shareable cloud environments, such as Binder or JupyterHub, set up in an AWS cloud region* (blue arrows and icons)

Note that each of these may have a range of cost models.
*PO.DAAC and other EOSDIS data are being stored in the “us-west-2” region of AWS cloud.

 

 

Regardless of the access pathway, services and systems like the Earthdata Harmony, Earthdata Common Metadata Repository (CMR), and Earthdata Search can be powerful resources in supporting data search, discovery, transformation and access, for data use from and/or within the cloud.

 

Earthdata Harmony

Earthdata Harmony allows you to seamlessly analyze Earth observation data from different NASA data centers, and features:

  • Consistent access patterns to EOSDIS holdings make cross-data center data access easier

  • Data reduction services allow users to request only the data they want, in the format and projection they want

  • Analysis Ready Data and cloud access will help reduce time-to-science

  • Community Development helps reduce the barriers for re-use of code and sharing of domain knowledge

Earthdata CMR

The Earthdata Common Metadata Repository (CMR) is a high-performance, high-quality, continuously evolving metadata system that catalogs Earth Science data and associated service metadata records. These metadata records are registered, modified, discovered, and accessed through programmatic interfaces leveraging standard protocols and APIs. Use CMR to search and in helping to narrow down the data you need, over the spatial and temporal parameters desired, by platform, provider, or collection ID, and much more. For details check out the CMR API documentation.

Earthdata Search

Earthdata Search provides users the ability to search and find data of interest based on several different parameters. The tool is built on top of the Common Metadata Repository (CMR), and provides a more intuitive, point-and-click interface to data search. While the Earthdata Search client (EDSC) allows for some high level dataset discovery, it specializes in allowing space/time searches across all of the DAAC holdings for specific data. In the era of big data, being able to down select data to regions and times of interest can allow for better use of end user time and resources.