Main content start

DataSURF Projects: Summer 2026

PI/Faculty Mentor: Nidhi Bhutani (Orthopedic Surgery)

Description of Topic or Project

Osteoarthritis (OA) is an unmet medical need with no disease modifying drug available to ameliorate the disease. A major bottleneck is late detection of the disease when it is difficult to reverse. Our lab is working towards developing cellular biomarkers in easily accessible blood or joint synovial fluid for an early detection of OA. This project will involve bioinformatic analyses of high resolution single cell data, integration of multi-omic datasets and ML and AI based predictive modeling.

UG Student Researcher’s Responsibilities

This project will involve bioinformatic analyses of high resolution single cell data, integration of multi-omic datasets and ML and AI based predictive modeling.

Prerequisites 

Introductory courses in CS and Stats, and an interest in biology and medicine.

PI/Faculty Mentor: Jen Burney (Environmental Social Sciences, Earth System Science)

Description of Topic or Project

Tracking Air Pollution and its Impacts.

We are seeking student researchers to join our efforts to better understand air pollution and its impacts. This includes three active areas of research: (a) using satellite- and ground-based observations of ozone and its precursors (NOx and VOCs) to understand how much these pollutants are harming food crops in major growing regions around the world; (b) using back-trajectory models to understand where aerosol particulate matter is being transported from, and linking that information with station observations and human health information to understand the relationship between aerosols and different diseases; (c) using satellite observations, trajectory modeling, and station data to understand the abundance and impacts of heavy metals and other toxic compounds in aerosol particulate matter pollution across the country.

UG Student Researcher’s Responsibilities

Students will gather data, conduct statistical analyses, produce maps, and analyze impacts using empirical statistical models and machine learning as appropriate.

Prerequisites 

Students should be comfortable with basic programming in Python or R, have had some experience using spatial data in one of those platforms, and have an interest in environmental pollution and human wellbeing questions.

PI/Faculty Mentor: Giulio De Leo & Andrew Chamberlin (Oceans)

Description of Topic or Project

This project aims to develop an interactive data visualization dashboard that integrates disease ecology data with remote sensing information to model environmental health risks. The student will build a web-based tool (using Python frameworks like Shiny or Streamlit) that connects with Google Earth Engine's API to access and process satellite imagery and environmental data. The dashboard will allow users to visualize spatiotemporal patterns in disease transmission risk factors, perform basic spatial analyses, and generate standardized inputs for existing natural capital modeling tools. This work addresses a critical bottleneck in landscape-level risk assessment: the difficulty of efficiently accessing, processing, and visualizing the right data for decision support. The project sits at the intersection of data science, environmental modeling, and public health, offering experience in full-stack development, geospatial analysis, and user-centered design. The student will work remotely with our team that includes collaborators from multiple institutions focused on using data science to address global health challenges.

UG Student Researcher’s Responsibilities

The student will be responsible for designing and implementing the dashboard architecture, writing code to interface with Google Earth Engine’s API, developing data processing pipelines, creating interactive visualizations, and testing the tool with real-world datasets. They will participate in regular meetings to discuss progress and design decisions, document their code and development process, and prepare a final presentation of their work. The student will also have opportunities to contribute to discussions about how the tool can be integrated with existing modeling frameworks and decision-support systems used by our partners.

Prerequisites 

Required: Programming experience in Python (e.g., CS 106A or equivalent), statistics fundamentals (e.g., Stats 60), data visualization experience.

Preferred but not required: Experience with Shiny for Python or other dashboard frameworks, familiarity with geospatial data analysis (e.g., Earth Systems 144), web development basics (HTML/CSS/JavaScript), experience with APIs or cloud computing services.

PI/Faculty Mentor: Judy Fan (Psychology)

Description of Topic or Project

Data visualization literacy plays a pivotal role in effectively communicating patterns in quantitative data, making it a cornerstone of STEM education. However, the landscape of test-based measures for assessing these skills is fragmented, without clear agreement on how to measure the key components of data visualization literacy. Furthermore, there might also be important aspects of data visualization literacy that are not well captured by existing measures.

For instance, designing effective visualizations remains challenging for beginners, requiring skills typically acquired through years of practice. Critical bottlenecks in the design process—particularly in generating and evaluating alternatives—limit beginners’ ability to communicate about data in clear and compelling ways. Our project seeks to overcome these limitations by developing design-based measures of visualization literacy and an AI-augmented learning environment that guides students in introductory statistics courses through an iterative visualization design process.

This project would be a good fit for a student with interests at the intersection of psychometrics, education, and data visualization.

UG Student Researcher’s Responsibilities

The student researcher would be expected to work closely with 1-2 postdoctoral mentors to develop the skills to contribute to all aspects of this project, as well as participate fully in the intellectual/social life of the Cognitive Tools Lab.

Prerequisites

Basic computer programming ability; familiarity with linear algebra, probability, statistics; Some familiarity with web programming; Strong written/oral communication skills; Strong organizational skills.

PI/Faculty Mentor: David Grusky (Center on Poverty and Inequality, Sociology)

Description of Topic or Project

We are building a new infrastructure for public-use qualitative research that exploits the American Voices Project (AVP) data ... the country's first large nationally representative dataset of immersive interviews about the "story of one's life." We are looking for assistance in (a) building a new platform for analyzing these data using human-in-the-loop LLM tools for coding, (b) carrying out real-time analyses of AVP data as a new form of nationally representative journalism, and (c) exploring the extent to which immersive data are social "dark matter" that make for powerful predictive models of addiction, homelessness, and other social harms. We are also working on a new AVP fielding that will allow us to monitor ongoing crises in real time. Because we have needs in all of these areas (and more!), interested students can be matched to zones that interest them and for which they have the skills to contribute.

UG Student Researcher’s Responsibilities

The responsibilities will depend on the task of interest.

Prerequisites

Depend on task of interest. If you're interested in working on coding tools or on AI interviewing, then experience with LLMs will of course help. If you're interested in analyzing the AVP data, then econometric, NLP, and LLM skills will help. If you're interested in helping with the next fielding, then skills in interview design and sampling will help.

PI/Faculty Mentor: David Lobell (Earth System Science)

Description of Topic or Project

Evaluating climate adaptation strategies in agriculture.

Ongoing climate changes are a big challenge to farmers around the world, who are constantly looking for ways to raise productivity in the face of mounting stressors. But as in any system with many interacting variables, the effects of any change can be hard to predict - especially with the small sample sizes that most farmers and farm advisors have observed. This project is focused on using novel approaches to evaluate how effective different agricultural practices are for adapting to climate change. Our lab is building a collection of datasets and approaches to do this in several regions around the world, and the interested student will work on building out a specific application in one or more regions. Some possible practices to focus on include planting trees along field borders, growing cover crops, diversifying crop rotations, converting to organic, and reducing tillage. The project will likely involve comparing multiple methods of causal inference, including difference-in-difference estimators, causal forests, and synthetic control.

UG Student Researcher’s Responsibilities

Data processing and visualization. Regression analysis. Causal inference.

Prerequisites

Required: STATS 117. Experience with Python or R and some basic statistics and machine learning.

Preferred: DATASCI 161 or ECON 102C or MS&E 226, familiarity with causal inference, interest in and familiarity with geospatial data.

PI/Faculty Mentor: Benjamin Nachman (Particle Physics and Astrophysics)

Description of Topic or Project 

This project will use AI tools and simulations for the statistical analysis of fundamental physics data (particle, nuclear, astrophysics). In particular, we will develop and deploy simulation-based / likelihood-free inference methods where likelihoods are not known explicitly, but we have access to simulations that emulate the data given parameters. These tools allow us to estimate bounds on the sensitivity of a particular experiment to a given set of parameters such as particle masses and interaction strengths.

UG Student Researcher’s Responsibilities

Learn about the data, train neural networks, perform statistical inference.

Prerequisites

Python programming and basic familiarity with neural network training, interest in using AI for precision science.

PI/Faculty Mentor: Jennifer Pan (Communication)

Description of Project

Social media platforms in China have served as critical spaces for public discourse and political expression, but the effects of censorship practices on these platforms remain poorly understood. This project examines patterns of online political expression and censorship during periods of civic mobilization by analyzing large-scale social media data from China's Weibo platform. We aim to analyze posting behavior, user engagement (shares, comments, likes), and content removal patterns for ordinary users. Our high-frequency data collection allows us to measure precisely when and how content disappears from the platform, and whether censorship affects users' subsequent communication behavior. Students working on this project will gain experience with computational social science methods including natural language processing for Chinese text and statistical analysis of social media dynamics. This work requires comfort with large-scale textual datasets, willingness to work with Chinese-language text data, and an interest in the intersection of technology, politics, and communication.

UG Student Researcher’s Responsibilities

Analysis of textual data.

Prerequisites

Required: CS 106A

Preferred: CS124 or CS224; Chinese language-skills or China knowledge.

PI/Faculty Mentor: Grant Parker (African and African American Studies, Classics)

Description of Project

The Anglo-Boer War (aka South African War, 1899-1902) was a transformative event in the making of modern South Africa. Though the Boers (white Afrikaners) eventually lost the war to Britain, they proceeded to gain political supremacy within a newly unified and independent Union of South Africa (1910). The racial segregation that was codified at the end of the war would endure most of the 20th century. Our project seeks to produce nuanced, archivally based understandings of the war, its aftermaths and its impacts on South Africans of all races. An enormous amount of granular data is available for us to create a composite historical resource: title deeds dating from the period, including diagrams, documenting sales over several decades; historic maps of the region, including wartime maps from both sides; post-war reconstruction-era claims made by Boer farmers; and census records. Such documents have already been harvested for selected areas, generating a wealth of big data that awaits coding. This project is undertaken in conjunction with South African-based partners. The final result will be an unmatched tool for researchers in the field.

UG Student Researcher’s Responsibilities

Integrate different kinds of data from spreadsheets; create maps.

Prerequisites

CS 106A. An interest in humanistic (historical) uses of data; an interest in Africa (or other colonial history) would be a plus.

PI/Faculty Mentor: Noah Rosenberg (Biology)

Description of Topic or Project

This project concerns mathematical evolutionary modeling, in the areas of mathematical phylogenetics and population genetics.

(1) Mathematical phylogenetics. An evolutionary tree that describes the relationships among a set of organisms can be characterized as a mathematical structure. The area of mathematical phylogenetics studies the sets of possible evolutionary trees, the probabilities of different tree structures under assumptions about evolutionary processes, and the properties of algorithms for inference of evolutionary trees. 

(2) Population genetics. The area of population genetics studies genetic variation in populations and species. Mathematical features of the statistics used in population genetics affect the biological interpretations of population-genetic data, and population genetics is advanced by the study of mathematical problems associated with these statistics. 

In this project, students will complete activities such as mathematical proofs, stochastic simulations, and bioinformatics analysis in the area of mathematical phylogenetics or population genetics.

UG Student Researcher’s Responsibilities

Students will engage in one or more of the following activities: solving mathematical problems, proving theorems, coding statistical methods, and performing data analysis.

Prerequisites

Math 50 or 60 series

PI/Faculty Mentor: Elaine Treharne (English)

Description of Topic or Project

HANDMADE is a project about how we can best evaluate, describe, and access the history of handwriting, document engineering, and script design. Using computational and digital tools to read and translate early handwriting successfully (whether Babylonian cuneiform or Gothic bookhand) is still a very long way off. HANDMADE aims to test and refine current Handwriting Recognition technologies in Generative AI and proprietary platforms; to investigate handwriting at scale; and to finesse available models the better to analyze particular forms of manuscript production over two millennia.

UG Student Researcher’s Responsibilities

In this project, students will collect and train data, test and adapt current modern computer vision tools for historical materials (for which we have vast data), fine tune several foundation models adding metadata constraints, and move towards building an ensemble to see what improvements we can make.

Prerequisites

At least CS 106A. Some Humanities courses would be an advantage. 

Historical awareness; an appreciation for the indeterminacy of human craft; good imagination.

PI/Faculty Mentor: Data Science for Social Good (May not be offered Summer 2026)

Description of Project

The Data Science for Social Good summer program trains aspiring researchers to work on data science projects with social impact. Working closely with governments and nonprofits, participants take on real-world problems in education, health, energy, public safety, transportation, economic development, international development, and more. Participants include a diverse and inclusive cohort of students who spend ten weeks of the summer working with Stanford researchers and technical mentors, learning insights from data that benefit society. 

Students interested in Data Science for Social Good must complete an additional application. Access the application, learn more about this opportunity, and see example projects.

Prerequisites

Students must be proficient in a programming language such as Python or R. Fellows also must be hungry to learn and grow in the following areas: team data science, working with a project partner, statistics, reproducibility, and programming for data science. In addition, they must be committed to being present at all meetings during the entire duration of the summer program (40 hrs/week).