West Big Data Innovation Hub
Key impacts of a decade-long project to accelerate societally impactful data innovation
For a decade, the West Hub served as a regional nexus for the art—and science—of using data to explore and address societal needs.
We supported projects and partnerships as diverse as our geographic reach, encompassing mountains and coastlines, rivers and deserts, rural and urban environments. We nurtured collaboration across academia, industry, nonprofits, and government agencies. We connected with colleges and institutions that serve and support diverse learners and communities. We offered a platform and data science skills to students, teachers, and community stakeholders throughout the region.
As one of four regional big data hubs funded by the National Science Foundation, we contributed to the formation of a transformative national big data innovation ecosystem that is responsive to regional needs. We laid a foundation of relationships and infrastructure that will accelerate data-driven innovation for decades to come.
Read more about our lasting impacts below.
Key impacts
1
Broadened awareness among state and local entities of the NSF’s work and impact
Through the facilitation of three data challenges centering on water, transportation, and population statistics, we demonstrated our success in facilitating an understanding of NSF's reach, resources, and impacts. Many of the regional entities with which we partnered were only vaguely aware of NSF prior to the engagement.
The insights gained in the California Water Data Challenge—a collaboration among the West Hub, the California State Water Resources Control Board, and numerous other partners—have changed how the state approaches water data issues. Challenge results contributed to the passage of the 10-year, $1.3 billion Safe & Affordable Drinking Water Fund. This ongoing annual endeavor has generated entries that engaged hundreds of cross-sector partners. Outcomes of the work in California in part inspired open water data legislation in the state of New Mexico. Events like Our Water Data Future brought forward a broader range of perspectives on new and emerging water data needs, with an emphasis on water workforce development.
The 2023 Our Water Data Future event at UC San Diego, top left, attendees of a 2018 California Safe Drinking Water event, top right, and an artist’s rendering of a 2019 challenge event’s discussions, bottom right.
Leveraging data from more than 35 million vehicle miles and hours of video from on-board cameras, the West Hub launched a series of National Transportation Data Challenges in 2019. In partnership with federal and local stakeholders, we organized community problem-solving sessions, roundtables, and technology demonstrations. The result is new data-sharing commitments and data analyses focused on making transportation safer for us all.
The Let’s Make it Count National Census Data Competition brought the 2020 Census data alive for high school students and their communities. Our purpose was to ensure that a student’s ZIP code doesn’t determine their access to STEM skills development opportunities. We improved access to data literacy, particularly in hard-to-reach communities, by producing podcasts, videos, tutorials, and other resources to engage students in hands-on data science education.
The National Transportation Data Challenge kick-off event in 2019, left, and a poster presentation about the challenge, right, by West Hub Executive Director Meredith Lee and Northeast Hub Program Manager KJ Naum.
2
Accelerated a national movement to advance the culture and practice of doing data science for social good
In 2015, we formed a network of data science for social good (DSSG) programs and people to integrate data science into solving social problems. The West Hub and the University of Washington’s eScience Institute:
Created a Data for Good Growth Map to help universities and other stakeholders implement DSSG programs. It highlighted key decision points at each stage of planning and executing a program. The map was foundational to the growth and evolution of the DSSG movement and continues to serve as a resource.
Photos from the Learning and Doing Data for Good (LDDG) conference poster session, left, and in-person audience, right. Photo credit: Louisa Gaylord.
Launched the ongoing DSSG summer program where multidisciplinary student teams and advisors connected with industry, nonprofit, and government partners to address real-world challenges with data science. So far, more than 700 students have worked on projects ranging from the identification of voter dilution in New York, to the adoption of the algorithmic equity tool by the Seattle ACLU, and the use of data collection to strengthen food security in Ghana and Uganda.
Organized Learning and Doing Data for Good events to showcase student work and enable dialogue and networking among students and professionals. The 2023 event took place at the Academic Data Science Alliance Annual Meeting, where students engaged with leaders in data science and shared their own work focused on broad social impact.
LDDG conference organizer and DSSG fellow Juandalyn Burke with DSSG fellows Ari Decter-Frain and Pratik Sachdeva, top left. LDDG Exploring Career Paths panel, top right. DSSG Fellows at the UW eScience Institute, bottom left.
3
Fostered collaborations among the West Hub’s leading institutions that enabled data-intensive projects across research communities
West Hub teams at UC Berkeley, UC San Diego, and the University of Washington laid foundations that simplified access to cloud computing, promoted cybersecurity, and streamlined access to data storage and transfer through NSF-funded projects.
Collaborations supported by the West Hub developed CloudBank. This suite of services simplifies public cloud access for computer science researchers and educators, making it easier for them to manage costs, upgrade research computing, and learn cloud-based technologies. Its educational resources include access to on-demand classes, best practices, and tutorials.
The Trusted CI Working Group—NSF’s cybersecurity infrastructure center of excellence—encompassed all of the Big Data Innovation Hubs. As a member of its Trustworthy Data Group, the West Hub partnered on an effort to survey and report on data security and measured scientists’ assessments of data trustworthiness. This research guided the creation of community standards for trustworthy scientific data and improved cybersecurity processes for science teams across domains.
OSN racks at SDSC, left, and a presentation at the 2018 Open Storage Network Architecture Workshop, right.
The West Hub assisted the Open Storage Network in delivering and refining its storage platform by identifying projects that featured multi-sector collaborations and the participation of underrepresented communities. By 2023, the OSN was hosting 4.5 petabytes of data for 130 projects at organizations like the American Museum of Natural History, NASA, and the USGS. Its storage-as-a-service model makes managing and sharing datasets within and among organizations easier, more cost-effective, and more secure. The West Hub improved data access, security, and management by fostering collaborations through the public and private sectors.
4
Expanded regional and national capacity for data science training
We created accessible spaces for people in various domains to hone their skills and understand how data science can meet the needs of their communities by partnering with organizations like:
The Carpentries on a series of train-the-trainer workshops. Educators, researchers, and community members from across the U.S. gained insights into teaching others how to use state-of-the-art data science skills and tools. For example, a SolarSPELL team is now better prepared to deliver solar-powered, offline technology to communities that remain unconnected.
Participants at an HSI-STEM Hub co-hosted workshop, held at the UW eScience Institute in 2019.
The Hispanic Serving Institution (HSI) STEM Hub to design and deliver data science training with the goal of integrating up-to-date research tools and methods into existing curricula. Scores of participants have enriched their home institutions using their new skills, tools, and standards.
The LA Spokes Project, part of the NSF Spokes effort, responded to regional data literacy needs. We worked with Cal State University Los Angeles and the City of Los Angeles Data Team to support data literacy training and internships. Paid student interns contributed to plans for improved street safety and tree planting for improved neighborhood health.
5
Built partnerships that engaged students and communities with data science training and resources they otherwise would not have had access to
In 2018, the West Hub helped conceive and launch the first Women in Data Science (WiDS) Datathon. Our participation continued in subsequent years and included co-leading the introduction of a research-focused component. By 2024, more than 4,000 Datathon participants had used their skills on topics like global health care, climate change, and deforestation.
Participants at a Datathon Collaboration Day held during the WiDS 2019 Conference in Berkeley, CA.
In 2020, data challenges organized by the West Hub, the Border Solutions Alliance, and UC San Diego supported internships and distributed prizes and stipends to teams of students, researchers, and community members on both sides of the U.S.-Mexico border. Thirty-three teams used data-driven decision-making to assess Covid-19 risk levels related to daily activities.
In 2022, as part of a long-term partnership, the West Hub science writer mentored a group of teens from the Pala Band of Mission Indians as they collected and analyzed data on the pH levels of the San Luis Rey River. Their presentation earned them the “best new team” award at the DataJam. Building on that success, the Pala team took flight in 2023 with an analysis of the local bluebird population, as noted by the California Bluebird Recovery Program.
A DataJam student taking water samples in the San Luis Rey River in 2022.
6
Formed global partnerships to expedite the development and implementation of data-driven solutions to regional challenges. The West Hub:
Joined the global movement around making data findable, accessible, interoperable, and reusable (FAIR), by establishing the first GO FAIR office in the U.S. in 2019. Our coordination with GO FAIR offices in the Netherlands, France, Germany, and Brazil created a global data commons for research and innovation. At workshops and forums, we emphasized the importance of FAIR data principles to a broad network of information, data, and domain scientists in the machine learning and artificial intelligence communities to help them build best practices. Our efforts live on through an NSF-funded Research Coordination Network.
The Road to FAIR and Equitable Science (10 Years of FAIR) Workshop Jan 22-26, 2024
Shared resources and data through VODAN (Virus Outbreak Data Network). We collaborated with CODATA, RDA, and WDS organizations to ensure Covid-19 data sets—and those of future infectious disease outbreaks—would be collected, shared, and integrated across multiple sources, regions, and borders.
Partnered with New Mexico State University to create the Transboundary Groundwater Resilience (TGR) project. This NSF-funded international network of networks united water, social, data, and systems science to catalyze a novel approach to aquifer system research and management. Outreach included hosting events for exchange at the UN Water Conference and World Water Week.