2023 National Workshop on Data Science Education
The sixth annual National Workshop on Data Science Education took place virtually and in person at UC Berkeley from June 20-23, 2023. UC Berkeley's Division of Computing, Data Science, and Society led the event with support from Microsoft and the West Hub. The workshop was for educators at all levels interested in data science education.
Last year, educators from over 100 institutions came together and shared insights on creating a cohesive data science educational ecosystem for undergraduate students. The 2022 workshop featured two panels hosted by the Hubs.
A diverse range of academic institutions from around the nation were represented again at the 2023 workshop, including four-year universities and community colleges. This year, the West Hub organized two sessions. Both were held on the morning of June 22.
Recordings of the events are available.
WEST HUB PANELS
Project-Based Experiential Learning about Data Science
This session (9:00-10:30am PT) provided an in-depth look at project-based experiential learning projects running at University of Pittsburgh and UC San Diego (DataJam), UC Berkeley (Data Science Discovery Program), and the University of Washington (Data Science for Social Good.)
Session Introduction
Ashley Atkins, West Hub
An overview of the morning sessions and an introduction to the value of experiential data learning, as well as common characteristics between the Discovery Program, Data Science for Social Good, and DataJam.
Discovery Program
Anthony Suen, UC Berkeley
The Data Science Discovery Program incubates and accelerates data science research by connecting UC Berkeley students to high impact academic, government, non-profit, and industry projects across the globe. Founded in 2015, Discovery has incubated many cooperated projects between passionate practitioners and highly trained students with over 2000 student researchers and 800+ research projects. Projects have tackled everything from climate change, social justice, public health, and digital humanities. You can check out many of the projects on our project page. The Discovery Program provides projects with technical support from its Data Science Discovery Consultant Program along with cloud computing resources. The Discovery Program allows students to engage in project research as early as their freshman and continue through their entire undergraduate experience.
Data Science for Social Good
Sarah Stone, University of Washington
Launched in 2015, the University of Washington’s Data Science for Social Good (DSSG) summer research and education program partners Student Fellows with Data Scientists from the eScience Institute and Project Leads from academia, government, and the private sector to find data-intensive solutions to pressing societal challenges. Previous projects have involved applying methods such as machine learning to socially imperative topics including public health, homelessness, disaster response and transportation. Keystones of the DSSG program include project-based discussions and training around data science ethics, human-centered design and stakeholder analysis, and partner collaboration. DSSG programs can effectively impact social good, develop productive cross-sector relationships, and provide “real world” data science training for students from diverse disciplinary backgrounds.
DataJam: A mentored data science learning activity and competition that runs throughout each academic year
Judy Cameron, University of Pittsburgh
DataJam is a data science learning activity and competition that runs throughout each academic year to introduce, encourage and engage young people in data science. To date, DataJam has focused on high school-age youth, but plans are underway to expand to community colleges. DataJam is coordinated by Pittsburgh DataWorks, an educational 501c3, and was started in 2013 in Pittsburgh, PA. However, with support from the NSF Northeast and West Big Data Innovation Hubs it expanded nationally in 2021. Several factors contribute to the popularity and potential for widespread dissemination of this program. First, the youth themselves are able to choose the topic of their project so they can focus on learning data science via a project that most interests them and their community. Second, university students from across the country are trained as DataJam mentors and are available by videoconference to mentor teams. Third, a large depository of resources and a centralized website with information about the DataJam is freely available at pghdataworks.org, along with a monthly newsletter about the DataJam, keeping all participants up to date and coordinated nationally. DataJam mentors receive formal mentoring training, and this has been expanded to provide training on how to work in diverse communities including low income, urban and rural communities, immigrant communities, Native American reservations and the unhoused community. Since its inception DataJam has been supported by businesses and industry partners, who provide financial support, and whose data scientists serve as advisors and judges at the annual finale, when all DataJam projects are presented online. Students benefit from first-hand knowledge of how impactful data science is in a wide variety of fields, and businesses benefit from attracting the attention of youth with strong interests in data science.
DataJam: Perspectives on What it Takes to Institute a National Program Locally
Salvatore Ferraro, Caldwell University
To develop a New Jersey hub for the DataJam, efforts have been undertaken to train mentors, recruit schools and develop business partnerships in New Jersey. Caldwell University became involved in training mentors and now offers a DataJam mentor course that coordinates with the original University of Pittsburgh course, thereby alleviating the need for new curriculum to be developed. Caldwell University also runs an annual STEM Teacher conference for K-12 teachers, and DataJam has been advertised through this mechanism. The healthcare pharmaceutical industry is well represented in New Jersey and several strategies have been used to interest them in supporting DataJam.
Data Science Experiential Pathways (DSXP)
Presentations + Interactive Session
This session (11:00am-12:30pm PT) began with a keynote presentation about national trends from the National Science Foundation's perspective, followed by brief descriptions of DSXP and related DSSG, Discovery, and DataJam program expansion. It concluded with an interactive discussion about experiential data learning opportunities.
Exploring Opportunities for Data Science Experiential Pathways
Ashley Atkins, West Hub
This presentation will discuss the importance of data-driven experiential learning for students within the context of workforce development. These opportunities critically equip students with not only data-driven skills but complementary experience in navigating data ethics and translational data communication. Additionally, the presentation will explore the possibility of a multi-institutional pipeline that would create new pathways for data-driven workforce development to meet pressing local and national needs. Pilot efforts are underway at UC Berkeley, UC San Diego, and the University of Washington.
KEYNOTE SESSION PRESENTATION: NSF Perspective on Project-Based Experiential Learning about Data Science Programs
Jennifer Noll, National Science Foundation (NSF)
This presentation will share insights from an analysis of data science education awards across NSF programs. Trends in the portfolio will be shared that provide insight into the directions the field of data science education research has taken. Examples of NSF funded projects that highlight project-based learning, mentoring, career pathways, and broadening participation through projects that are engaged with communities that are underserved in STEM will be discussed as well as how these projects are situated within the larger landscape of data science education projects. Through a better understanding of current trends as well as potential gaps in the portfolio the discussion will also pinpoint opportunities to grow the community, scale up successful projects, and identify potential new directions for the field.
DataJam: A National Model with Local Hubs
Catherine Cramer, San Diego Supercomputer Center, UC San Diego
Judy Cameron, Pittsburgh DataWorks
DataJam started as a local data science learning activity and competition in Western Pennsylvania, but during the COVID-19 pandemic it expanded nationally, responding to the interest at high schools across the country in providing enrichment activities in data science for their students. The expansion was feasible because all of the resources for DataJam were available on a centralized website; communication with teams was by email; and because mentoring of teams was easily transitioned to an online platform. However, soon it became clear that it would be ideal to train mentors and develop business partners in relatively close proximity to teams, and for this a national program with local hubs structure was developed. To date, hubs have been developed in Southern California and New Jersey.
Integrating Real World Data Science at Community Colleges, UCs and Beyond
Anthony Suen, UC Berkeley
The Discovery at UC Berkeley will be a catalyst to expand workforce training to community colleges across the State of California.The Discovery Program will work with UW and UCSD by scaling DataJam model for community college students, creating translational opportunities with Discovery Program and students in regional institutions, and setting up a project pipeline to and from DSSG. Already, we have examples of graduate mentors that have contributed to the pipeline, from supporting transfer students in data science to support long-term research projects with the Discovery Program, and finally supporting advanced Data Science for Social Good project over the summer.
DSSG Expansion: Increasing the Project Pipeline
Sarah Stone, University of Washington
In collaboration with the West Hub we convened a Data for Good Organizers Network consisting of leaders from programs similar to the UW Data Science for Social Good program. This network produced a “Growth Map” white paper for other organizations interested in developing these types of university-hosted summer programs. Each of our programs experiences a huge level of interest from students wanting to use their data science skills on impactful projects. We are interested to work with new partners to develop sister programs. These programs need projects that are at a level of maturity where multiple students can engage full-time for 10-14-weeks to move the work forward. The pipeline model will allow for movement of both projects and students from one program to the next, i.e. the network of universities running Data Science for Social Good programs can increase their pipeline of well designed social impact projects through sourcing projects through programs like Discovery. DSSG projects also have the potential to continue long term development by cycling the project Discovery for the academic year.