Wednesday, December 15, 2021

The 2021 Cloud Hackathon

The 2021 Cloud Hackathon

In mid-November 2021, the NASA Physical Oceanography Distributed Active Archive Center (PO.DAAC), National Snow and Ice Data Center (NSIDC) DAAC, Land Processes (LP) DAAC hosted the Cloud Hackathon: Transitioning Earthdata Workflows to the Cloud with support from Atmospheric Science Data Center (ASDC), Goddard Earth Sciences Data and Information Services Center (GES DISC), Interagency Implementation and Advanced Concepts Team (IMPACT), and Openscapes

The Cloud Hackathon was a virtual 5-day collaborative, open science learning experience that was the first of its kind, aiming to support science and applications researchers that use NASA Earthdata to transition to using NASA Earthdata on the cloud. Nearly 50 scientists, DAAC Mentors, and other NASA/DAAC staff participated in the hackathon, which took place November 15 to 19. The hackathon’s first two goals were for participants to explore NASA Earthdata cloud-based data products, tools, and services, and to engage hands-on with workflows in the cloud while also strengthening participants’ use of cloud data, tools, and services, open science, and open data. Hackathon participants connected via synchronous sessions via the Zoom virtual meeting platform, with hands-on Python coding in a cloud-based JupyterHub for following tutorials and hacking in breakout rooms, as well as asynchronously via Slack, GitHub, and Google Drive. The JupyterHub was set up in partnership with 2i2c in the AWS cloud, where the NASA Earthdata Cloud is also located, with funding support through the NASA Openscapes initiative. All hackathon information and materials are available at: https://nasa-openscapes.github.io/2021-Cloud-Hackathon.

The hackathon was designed to combine skill building as participants followed along with DAAC mentors hands-on in their own cloud instance, with “hack time” where participants learned and worked on projects together in small groups, with support from DAAC mentors where needed. Ten NASA DAAC mentors have worked together since April 2021 as part of the NASA Openscapes project, where they identified common opportunities across DAACs to support researchers and developed shared workflows to create and review learning resources. As part of this collaboration, NASA DAAC mentors created 10 tutorials for the Cloud Hackathon event, to teach in an interactive, participatory style on topics including data discovery, access, subsetting, and interactive plotting, all from a cloud workspace. In addition to the main event, the mentors designed and hosted a two-hour Pre-Hack Clinic the week prior, that supported onboarding brand new users to JupyterHub and Git/GitHub workflows, with 40 participants in attendance. This had the added benefits of working out some early bugs and added to the sense of community between the participants and the DAAC staff and mentors.

Beyond providing a hands-on introduction to NASA Earthdata tools and services in the cloud, the hackathon aimed to support teams as they experimented with enabling science workflows and to better understand scientists' needs. On the first day of the event the participants pitched 11 projects that they then worked on throughout the week, and presented progress on the final day, where they shared about accessing, wrangling, and plotting data for oceanography, hydrology, wildfire, and other research topics. In addition to the participant projects, Dr. Marisol Garcias Reyes, Farallon Institute and a 2021 Better Scientific Software (BSSw) Fellow, joined the group as an invited speaker. Dr. Garcia Reyes shared her journey from the more traditional “download model” to the cloud and offered a Jupyter notebook that participants could also experiment with. Discussions that followed focused both on collaborating around scientific research and on developing data tools to streamline data access and processing. It was exciting to see community building around open science and open data in the Cloud.

Our final goal for the hackathon was to foster community engagement. This hackathon was intentionally limited to 50 people to ensure that there would be adequate support and help, and we saw very little attrition over the week. Project teams from several groups used two hours each Tuesday, Wednesday and Thursday to stay online and work together and engage with DAAC staff who were also in the Zoom room to troubleshoot and continue teaching.

In the post-event survey, the participants were asked to assess the effectiveness of the hackathon for their purposes. 100% of the 20 responses ranked the effectiveness as Very or Extremely Effective (scores of 4 and 5 out of 5=extremely effective). The progress in the comfort level working with NASA Earthdata data and services in the cloud was also evident in the post-event survey. Prior to the cloud hackathon, 65% ranked their comfort level as “1 - Little to no experience using the cloud - not comfortable”, 30% ranked their comfort level as “2 - Beginner cloud experience  - somewhat comfortable”, and 5% as “3 - Intermediate cloud experience - comfortable”. Those rankings shifted to 15% as “2 - Low: starting to gain some confidence/understanding - somewhat comfortable”, 70% as “3 - Average: feeling confident in doing basic data access/work in the cloud - comfortable”, and 15% as “4 - High: I can easily navigate cloud data access/use in the cloud - very comfortable”, after the cloud hackathon week.

 

What’s next

We recognize the transition to the cloud does not happen overnight or even in a week. Through events like this one, we are building relationships with NASA Earth observations data users and across the DAACs, identifying both the common cloud-specific aspects of data accessibility and use, and also where the DAACs would benefit from having unique materials to explain these concepts to their users.

Continued hacking on the cloud - next 3 months. Hackathon participants will continue to have access to the 2i2c JupyterHub in AWS for three months following the Cloud Hackathon as they continue to experiment, and we all continue learning more about what is involved with migrating data access and science workflows to the cloud. This cloud compute environment is supported by the NASA Openscapes project.

AGU Fall Meeting Workshop - December 12, 2021. The DAAC mentors held a short, half-day, virtual cloud workshop at AGU on Sunday, Dec 12. 

NASA’s Earth Science Data Systems (ESDS) Program promotes open data access for open science. The workshop will highlight novel applications that utilize EOSDIS data in the cloud. Participants will leave having a better understanding of how NASA Earthdata Cloud data and services can best be leveraged and integrated within their work across a variety of disciplines and data types. Details: https://agu.confex.com/agu/fm21/meetingapp.cgi/Session/124026

AGU Open Science in Action session - December 17, 2021. Talks and tutorials by Hackathon Mentors, among other leaders in open science. Details: https://agu.confex.com/agu/fm21/meetingapp.cgi/Session/122142 

NASA Openscapes Champions Cohort - March-April 2022. Openscapes will lead a NASA Champions Cohort for 7 research teams. This is a professional development and leadership opportunity for scientists that use data from NASA DAACs and are interested in collaborative open data science practices and migrating their workflows to the cloud. Nominate your team by February 1, 2022

 

Thanks for reading! 

 

Additional Resources:

 

Authors: Aaron Friesz, Amy Steiker, Andy Barrett, Catalina Oaida, Jack McNelis, Luis Lopez, Makhan Virdi, Erin Robinson, Julie Lowndes