Data Science for Public Good
by Joel Thurston, Ph.D. and Cesar Montalvo, Ph.D.
Social and Decision Analytics Division, Biocomplexity Institute at the University of Virginia
In this Methodspace interview I asked Joel and Cesar to tell us about how data science can be used for social good, and how their program is cultivating a next generation of data scientists. This summer we will welcome a series of posts from this year’s cohorts to discuss their research projects. - Janet Salmons, Research Community Manager for Sage Methodspace
JS. Tell us about Data Science for the Public Good. What do you do, how, and why is it important?
JT & CM. The Biocomplexity Institute's Social and Decision Analytics Division’s Data Science for the Public Good (DSPG) program is an experiential learning program that brings together students from across the country and engages them on research projects that address local, state, and federal government challenges around critical social issues. DSPG Young Scholars are undergraduate and graduate students conducting applied research at the intersection of statistics, computation, engineering, and the social sciences. Participants work in collaborative teams, vertically and horizontally integrated with our postdoctoral scholars and research faculty from the Social and Decision Analytics Division, as well as our external project stakeholders.
In the past, participants have worked on an array of different topics, from developing tools and building the capacity of rural Virginia coastal communities to deal with the impact of climate change to assessing the impact of broadband development on rural property values in Iowa to constructing a data commons or open knowledge repository to curate data insights for decision makers across the National Capital Region.
The DSPG program is all about preparing the next generation of data scientists. We live in an increasingly complex and interconnected world, facing challenges that often extend beyond what we think of as traditional community borders. As researchers, we firmly believe that data is one of the essential tools for solving these challenges, and one of the most critical skills a person can possess is data literacy. DSPG teaches people not just how to find and analyze data but how to use it to solve real problems by applying our Data Science Framework.
This framework details our research process from concrete problem identification to data discovery, acquisition, ingestion, and wrangling to the statistical modeling and analysis that produces the actionable insights that ultimately lead to the public good. Throughout the process, we emphasize communication and dissemination as we work with stakeholders and sponsors to clarify and refine our research questions, tweak our modeling approaches, and ultimately frame our results in a language our target audiences will understand. Equally important and ubiquitous to our framework is a heavy emphasis on ethics.
Whether DSPG participants take their next step forward as data scientists, policy makers, entrepreneurs, or follow a career path that hasn’t even been thought of yet, our goal is to give them the tools to succeed in whatever career path they choose.
JS. How do we define working towards the “public good” and why is it important for data scientists to operate in this space?
JT & CM. At the University of Virginia Biocomplexity Institute's Social and Decision Analytics Division we think about data science for the public good as turning data into action to benefit communities. For us, “action” often comes in the form of informing policy decisions by local, state, or federal decision makers. People benefit, for example, when we use data to help emergency services identify areas with a high risk of losing access to staple food items in the event of a natural disaster. Or when we use data to assist county officials to underserved families facing food insecurity in areas generally considered to be higher-income neighborhoods. But actions can be anything that generates, maintains, or improves the welfare of individuals and communities. Data Science for the public good could involve developing tools to understand complex problems such as climate change, combatting misinformation, building trust in science, or working with public and private partners to ensure that data are used in an ethical and responsible manner.
Learn about recent research projects here: https://biocomplexity.virginia.edu/our-research/research-projects.
JS. What are common obstacles data scientists face when trying to serve the public good?
JT & CM. Providing evidence-based insights to policymakers has its share of obstacles, such as the lack of timely data access, non-standardized data, incomplete information, and limited geography reporting. TThere are also privacy concerns beyond the higher profile issues of security breaches and cyberattacks. We are very careful to ensure that malicious actors cannot leverage the insights and data we provide to identify specific individuals or groups of people. Data scientists face technical challenges such as the need for large-scale and costly computing infrastructure. Our work often involves complex data curation and maintenance systems that require extensive amounts of time, money, and human talent. We seek to overcome these obstacles through collaborative and multidisciplinary efforts.
One way in which researchers at the Social and Decision Analytics Division address some of these challenges is by employing our Community Learning through Data-Driven Discovery process. By following this community-focused approach, we ensure that the individuals most impacted by our work are involved in each stage of the research process.
More information about our CLD3 approach can be found online at https://datascienceforthepublicgood.org/economic-mobility/research-framework
JS. What specific knowledge and skills do data scientists need to keep this focus?
JT & CM. Working in a transdisciplinary research environment like the Biocomplexity Institute, you realize that a range of skills are necessary to be a successful data scientist and serve the public good. There are technical skills necessary to identify, acquire, and manipulate data (e.g., learning programming languages). There are also statistical theory, research, and critical thinking skills (e.g., measurement theory and hypothesis testing). In some cases, you may also need a level of subject matter expertise in a particular topic (e.g., population dynamics, social determinants of health). Furthermore, mainly when working in domains relevant to the public good, you will also need to engage in stakeholder management and science communication since you may be working with groups of people (e.g., policymakers, community leaders, members of the public) who are new to using large amounts of data or highly technical analysis.
If it is not already obvious, since it is nearly impossible for anyone to master all these skills, arguably your most important ability to be a successful data scientist for the public good is the ability to work on a team!
Meet the Data Science for the Public Good Co-Coordinators
Cesar Montalvo is a Research Assistant Professor with the Social and Decision Analytics Division of the Biocomplexity Institute and Initiative. He works at the interface of economics, statistics, mathematical models and public policy. Cesar is an economist who graduated from the University San Francisco de Quito and a Master’s degree in Economics from Iowa State University. He received his Ph.D. in Applied Mathematics for Life and Social Sciences from Arizona State University. His dissertation focused on dynamical systems related to social mobility and education.
He has worked on projects regarding the skilled technical workforce and social mobility at the community level. He is currently leading efforts to develop a new method for calculating food insecurity by developing a comprehensive cost-of-living calculator for communities in the National Capital Region.
Cesar is driven by a strong desire to carry out research and practice that contribute to reduce poverty and inequality in our communities.
Joel Thurston is a Senior Scientist with the UVA Biocomplexity Institute Social and Decision Analytics division. Joel received his Ph.D. in Social Psychology from the University of California Santa Barbara (UCSB). Prior to joining the Biocomplexity Institute, he worked for the UCSB Center for Evolutionary Psychology and the U.S. Army Research Institute.
Joel has a long-standing interest in the interface of group perception and group dynamics, conceptualizing and measuring emergent group properties, and the science of team science. He seeks to apply measurement theory and social science research methodologies to develop analytic techniques for administrative data, addressing topics such as how intragroup processes contribute to performance for the U.S. Army. He is currently leading efforts to combine natural language processing techniques with qualitative analysis to identify characteristics of U.S. Army Soldiers that predict individual and unit performance.
Joel and Cesar are part of a team to develop an equity-focused data commons to be used by local, state, and regional government stakeholders to address social issues across the National Capital Region.
To learn more about our Data Science Framework please see https://biocomplexity.virginia.edu/data-science-framework and https://hdsr.mitpress.mit.edu/pub/hnptx6lq/release/10.
More Methodspace posts about data science and computational social science
This blog is the seventh, and penultimate post, in a follow-on to our 2021 “The future of computational social science is Black” series, about a Summer Institute in Computational Social Science organized by Howard University and Mathematica. It continues to bring the power of computational social science to the issues of systemic racism and inequality in America. This marks the third iteration of the successful SICSS model being hosted by a Historically Black College or University.
This blog post is the sixth of eight in a follow-on to our 2021 “The future of computational social science is Black” series, about a Summer Institute in Computational Social Science organized by Howard University and Mathematica. It continues to bring the power of computational social science to the issues of systemic racism and inequality in America. This marks the third iteration of the successful SICSS model being hosted by a Historically Black College or University.
Latanya Sweeney, scholar of technology science, Daniel Paul Professor of the Practice of Government and Technology at the Harvard Kennedy School and in the Harvard Faculty of Arts and Sciences, and director and founder of the Public Interest Tech Lab, delivered the keynote address for SICSS-Howard/Mathematica 2023.
This blog post is the fifth of eight in a follow-on to our 2021 “The future of computational social science is Black” series, about a Summer Institute in Computational Social Science organized by Howard University and Mathematica. It continues to bring the power of computational social science to the issues of systemic racism and inequality in America. This marks the third iteration of the successful SICSS model being hosted by a Historically Black College or University.
This blog post is the fourth of eight in a follow-on to our 2021 “The future of computational social science is Black” series, about a Summer Institute in Computational Social Science organized by Howard University and Mathematica. It continues to bring the power of computational social science to the issues of systemic racism and inequality in America. This marks the third iteration of the successful SICSS model being hosted by a Historically Black College or University.
This blog post is the third of eight in a follow-on to our 2021 “The future of computational social science is Black” series, about a Summer Institute in Computational Social Science organized by Howard University and Mathematica. It continues to bring the power of computational social science to the issues of systemic racism and inequality in America. This marks the third iteration of the successful SICSS model being hosted by a Historically Black College or University.
This blog post is the second of eight in a follow-on to our 2021 “The future of computational social science is Black” series, about a Summer Institute in Computational Social Science organized by Howard University and Mathematica. It continues to bring the power of computational social science to the issues of systemic racism and inequality in America. This marks the third iteration of the successful SICSS model being hosted by a Historically Black College or University.
This blog post is the first of eight in a follow-on to our “The future of computational social science is Black” series, about a Summer Institute in Computational Social Science organized by Howard University and Mathematica. It continues to bring the power of computational social science to the issues of systemic racism and inequality in America. This marks the third iteration of the successful SICSS model being hosted by a Historically Black College or University.
Daniel Lobo shares four learnings from his unforgettable experience at SICSS-Howard/Mathematica 2022.
UDC Assistant Professor, Andrea Adams, talks about their experience during SICSS-Howard/Mathematica 2022.
Hear from the 2021 SICSS-Howard/Mathematica Excellence in Computational Social Science Research Fund awardees about how the funding helped their projects and their future plans.
At SICSS-Howard/Mathematic 2022 two group projects and five individual projects received the Inaugural Excellence in Computational Social Science Research Award.
At SICSS-H/M 2022, Anthony Wutoh, Yahya Shaikh and Carter Clinton discussed how to bridge the diversity gap in biomedical research and the need for interdisciplinary collaboration.
Rashun Miles discusses his experience returning to SICSS-Howard/Mathematica as an alumni and the larger need for restorative spaces for Black students in every university.
LaVerne H. Council spoke at the Closing Plenary of SICSS/Howard-Mathematica 2022, emphasizing the importance of seeking wisdom and truth through data and fair & effective policymaking.
Paula Moreno, founder of Manos Visibles & first Afro-Colombian woman to serve as Minister of Culture of Colombia, delivered the first SICSS-Howard/Mathematica Motivational Address, focused on the technological challenges and solutions for Africans and people of African descent around the world.
Speakers at SICSS-Howard/Mathematica 2022 explore how change does not affect populations equally, and how the exclusion of underrepresented communities can perpetuate social injustice.
Bite-Sized Lunchtime Talks (BSLT) are a SICSS-Howard/Mathematica site-specific innovation that introduces participants to organizations doing cool things with data. 2022 offered a variety of organizations that proved interesting and inspiring to our participants.
Dr. Safiya Noble speaks about her journey as a critical race scholar, her research on the racist and sexist dynamics of search engines, and the importance of community.
At SICSS-Howard/Mathematica 2022, Shawndra B. Hill, Karen Levy, Brandeis Marshall, and Kyla McMullen emphasized themes of technological determinism, bias, and privacy, and discussed connections to inclusion, equity, and diversity.
The opening plenary of SICSS-Howard/Mathematica 2022 featured a fireside chat with Dr. Anthony K. Wutoh, the Provost of Howard University, and Dr. Amy Yeboah Quarkume, an Associate Professor of Africana Studies, to kick off the event.
The first Summer Institute in Computational Social Science held at a Historically Black College or University, returns to Howard University for its two-day pre-institute, Praxis to Power for graduate students, postdoctoral researchers, and beginning faculty who needed more time to practice computational methods.
In this Methodspace interview Dr. Joel Thurston and Dr. Cesar Montalvo tell us about how data science can be used for social good, and how their program for young scholars is cultivating a next generation of data scientists.
In this Sage Methodspace webinar Dr. Matti Nelimarkka and Dr. Friedolin Merhout discuss strategies for engaging with students and building the skills needed to design, plan, and conduct studies using computational social science methods
Learn how computational social sciences help scholars to renew their research in several directions.
This collection of open-access articles offers multiple perspectives on the use of Big Data and ethical protocols for computational research methods.
Summer Institute in Computational Social Science site sponsored by Howard University and Mathematica (SICSS-Howard/Mathematica) awards individuals and teams for the inaugural Excellence in Computational Social Science Research Fund as a unique and exclusive benefit offered to alumni of the site.
A SICSS-Howard Mathematica 2021 participant shares how he reconnected with others in a meaningful way and grew personally during his virtual SICSS experience.
Paul Decker PhD, president and chief executive officer of Mathematica and nationally recognized expert on policy research, delivered the closing plenary address on Friday, June 25th at SICSS-Howard/Mathematica 2021.
The first Summer Institute in Computational Social Science hosted at a Historically Black College or University featured a panel of guest speakers who inspired participants with their research and professional trajectory. Lecture topics include re-entry into the job force for incarcerated people, financial statuses of small businesses in relation to the COVID-19 pandemic, social identities and systems of power, and discriminatory bias within technology.
This blog post is the eighth, and final, post in a follow-on to our 2021 “The future of computational social science is Black” series, about a Summer Institute in Computational Social Science organized by Howard University and Mathematica. It continues to bring the power of computational social science to the issues of systemic racism and inequality in America. This marks the third iteration of the successful SICSS model being hosted by a Historically Black College or University.