West Hub Supports Workshop Showcasing the Use of the Commercial Cloud for Research

Organized by the CloudBank project partners at the University of Washington (UW), UC Berkeley, the University of California San Diego and co-hosted by the West Big Data Innovation Hub, the inaugural Research Running on Cloud Compute and Emerging Technologies workshop (RRoCCET21) recently took place with university participants, commercial cloud providers, federal agencies and the private sector.

Shava Smallen (SDSC) demonstrates the account management dashboard of the CloudBank portal during the opening session of RRoCCET21. Credit: UC San Diego

“Cloud computing is a cross-cutting focus area of the West Hub,” said West Hub Co-Principal Investigator Sarah Stone. “Partnering with CloudBank on this workshop was a unique opportunity to share cloud use cases from researchers using a breadth of technologies across many application spaces.”

RRoCCET21 was intended for researchers seeking to gain access to the expanded research capabilities provided by CloudBank, an NSF-funded cloud access initiative. The methods-focused event featured solution paths including tractable implementation plans, which were informed by case studies ranging from wildfire maps to COVID-19 and social media information.

“Our goal was to inspire the community to use the cloud for research and education, and to show them how,” said Ed Lazowska, West Hub principal investigator, CloudBank co-principal investigator and RRoCCET21 co-organizer, who is a professor in UW’s Paul G. Allen School of Computer Science and Engineering. “Presenters shared how cloud adoption in their domain enabled them to move their research computing ahead – providing a glimpse of what is possible with cloud computing.”

The workshop started with an introduction to CloudBank by Rob Fatland of UW and included a CloudBank portal demonstration by co-principal investigator Shava Smallen of SDSC, an overview of CloudBank training and user support by Naomi Alterman of UW and an overview of running classes in the cloud using the Berkeley Data Stack by Eric Van Dusen of UC Berkeley. Participants gained an understanding of how CloudBank can reduce the plethora of pain points in using the cloud through efficient, multi-cloud account provisioning, usage monitoring, spending alerts and other functions. By removing these barriers, researchers and educators can focus on their science and teaching rather than cloud administration.

“RRoCCET21 was a first-of-its-kind in bringing together researchers, cloud providers and research computing facilitators to discuss best practices regarding public cloud usage across a wide range of domains,” said Alterman. “The workshop was a great success, with a wide variety of participants who all brought such enthusiasm and experience to the table, and to amplify their voices we’ve made recordings of all the talks publicly available on CloudBank’s website.”

The workshop encompassed speakers from a wide variety of institutions, from research universities like MIT, to industry research teams at companies like Google, to international science organizations like CERN. For instance, Raghu Kancherla and Fahad Khan of the University of Central Florida discussed the ground-breaking work they are doing modeling the physics of supercritical carbon dioxide, which can be used by power plants to more efficiently capture energy from turbines. Their work involves running large computational fluid dynamics (CFD) simulations, which can be turbo-charged when run in the cloud as opposed to the compute resources available locally. Their full talk is here: www.youtube.com/watch?v=yPY-p8wK7EA.

CloudBank user Vanessa Frias-Martinez (University of Maryland) described her group’s public transit monitoring system, BALTO, which uses cloud infrastructure and community outreach to understand and improve the quality of transit solutions in Baltimore. This project takes the same technical infrastructure used to build private transit products like Lyft and Waze and uses it to improve the quality of public transit for everyone. Frias-Martinez’s full talk is here: www.youtube.com/watch?v=MZFXKc8CIsE.

Niema Moshiri (UC San Diego) presented his group’s use of the cloud to sequence the genomes of COVID-19 viruses as samples come in from virus testing centers. The scalable nature of the cloud allows them to cost-effectively track the spread of viral mutations in real time, providing on-the-ground, emerging information about details such as the Delta variant. His full talk is here: www.youtube.com/watch?v=vS7DziyUzFM.

Another presentation featured Satra Ghosh (MIT, Harvard Medical School), who discussed use of the cloud to store a massive hub of data regarding the human brain via their DANDI archive. Ghosh described how the DANDI archive takes advantage of the inherently distributed and fluid nature of cloud resources to facilitate the sharing of data among neuroscientists across institutions around the world. His full talk is here: www.youtube.com/watch?v=9Q2GnTocjJo.

“This year’s inaugural RRoCCET21 workshop was absolutely a ton of fun, for several reasons. First, we showed the value of the commercial cloud as a research computing platform. We got to enjoy learning about the presenters’ projects – which were amazing – and we coordinated with them over the value of their talks to the conference attendees,” said Rob Fatland, Director of Cloud and Data Solutions at UW and a member of the RRoCCET21 organizing committee.  “The underlying message is simple enough: successful migration to the cloud means having a good sense of what you are getting yourself into. This means understanding the benefits of tapping into a cloud provider’s unlimited computing resources, as well as understanding the necessary time investment to learn to use those resources effectively. And now, based on the positive feedback, we’re looking forward to hosting RRoCCET22 next year with an in-person component that will facilitate networking, collaboration, and knowledge sharing.”

CloudBank is supported by the National Science Foundation (award no. 1925001). 

About the West Big Data Innovation Hub: 

The West Big Data Innovation Hub is one of four regional hubs funded by the National Science Foundation (NSF) to build and strengthen strategic partnerships across industry, academia, nonprofits, and government. The West Hub community aims to catalyze and scale data science for societal needs – connecting research, education, and practice in thematic areas such as natural resources and hazards, metro data science, health, and data-enabled discovery and learning. Coordinated by UC Berkeley’s Division of Computing, Data Science, and Society, the San Diego Supercomputer Center, and the University of Washington, the West Hub region includes contributors and data enthusiasts from Alaska, Arizona, California, Colorado, Hawaii, Idaho, Montana, Nevada, New Mexico, Oregon, Utah, Washington, Wyoming, and a global network of partners.