CHAI Internship Mentor Profiles

This is a tentative list of our seven mentors who will be accepting interns. Please review the list to get a sense of our research interests; you will have the option to list mentors you’d be especially excited to work with in the application form.

Cameron Allen: I am interested in the foundations of intelligence—the computations that enable agency, learning, planning, abstraction, and interaction. My research is focused on trying to answer some of the following questions. What techniques enable agents to quickly learn abstract models of the world? And how can agents flexibly apply what they’ve learned to new problems? How can agents identify the important aspects of their decision problem? When should agents remember something, and what should they remember? How can we design agents so they’ll reliably do what we want, while avoiding unwanted side-effects? I build AI systems as a way to study these core aspects of intelligence, in the hopes that they will allow us to better understand ourselves and our world, and ultimately to improve both.

Ben Plaut: I mostly study generalization: how a model handles unfamiliar inputs. I think many types of safety and alignment failures can be framed as misgeneralization (e.g., a robot has never seen a vase before so it doesn’t know whether breaking vases is positive, negative, or neutral). I’m specifically interested in training agents to recognize when they’re out-of-distribution/uncertain and then act cautiously (e.g., ask for help or or abstain from action). I think this idea is applicable to many different ML regimes, but I mainly work on LLMs and RL. I do a mix of theoretical and empirical work with the goal of designing methods that can potentially scale to very advanced AI but are testable on systems we have today.

Julian Yocum: My core research interest is the question: what is (deep) learning? This question has two parts. 1) What is learned (interpretability)? 2) How is learning possible (developmental interpretability)? In biology these questions would be called “doing neuroscience,” but for some reason in AI we reserve for it the special name “interpretability.” I am especially interested in the geometrical and topological aspects of learning. My approach is to take inspiration from other field, like physics, philosophy, and neuroscience (especially the work at the Redwood Center for Theoretical Neuroscience). Related questions are: what are features; what is emergence; what are causal explanations; what are abstractions; and are abstractions/representations/learning natural/universal/platonic?

Sidhika Balachandar: I work on problems at the intersection of machine learning, fairness, and healthcare. I am interested in creating models in settings where outcome data is missing or biased. In particular, my research has focused on the following questions: What are the challenges of using outcome data collected from human decision makers or crowdsourced platforms? What identification approaches can we use to create models in these settings? Can we leverage external, domain specific information or data?

Jonathan Stray: I’m a Senior Scientist working on the question of how AI systems select information to show us, and what effects that has on individuals and society. These systems include search engines, chat bots, and recommender systems — the algorithms that select and rank content across social media, news apps, streaming music and video, and online shopping. I study how the information that machines show affects well-being, polarization, what citizens know, and other things. I do theory and experiments and talk to people in industry and try to design systems that are better for people. I’m especially interested in the effects of AI on conflict. If the machines encourage our worst impulses, we will end up destroying each other.

Mark Bedaywi: How do we build machines that are mathematically guaranteed to effectively cooperate with humans and each other? How can agents, through repeated interaction, learn to achieve a collective goal together? What are the practical barriers to building agents that work to satisfy a user’s preferences, when it is initially uncertain of those preferences? Can we find and solve technical barriers to effective AI regulation? These are the sorts of questions I work to answer.

Aly Lidayan: I’m a PhD student interested in intrinsically motivated open-ended learning, meta-learning, resource-rationality and goals in humans and AI agents. I’m most interested in advising on ML and/or computational cognitive science projects relating to these topics that involve a mixture of theory (RL theory or mathematical cognitive modeling) and experiments (with humans or ML models).

Karim Abdel Sadek: I work on topics at the intersection of reinforcement learning, multi-agent systems and theoretical CS. My research has spanned more classical theoretical topics, as using predictions to improve online algorithms, and ones more applied and relevant to AI Safety, as on how to mitigate misgeneralization in RL via environment design. My research style is a mix of both theory and practice, with the goal of designing methods which are both well-founded and that have the potential to scale empirically.  Recently, I have been mostly excited about designing better ways to do inverse RL and preference learning, designing protocols with the right incentives in human-AI collaboration, and in more foundational topics in RL theory and game theory. I’d be excited to both supervise projects related to these direction, or different ones you think I could potentially be interested in.

Raj Movva: I’m a fourth-year PhD student with Emma Pierson. I am interested in using foundation models and interpretability methods to build new tools for scientific research, especially involving language models, social science, and healthcare. One direction is towards tools for researchers to better understand large datasets: for example, we built a method to attribute preferences to specific examples in RLHF datasets, enabling interpretable data curation to remove undesirable behaviors and improve personalization. Another direction is towards scientific hypothesis generation: foundation models often outperform human experts in high-stakes prediction tasks, so how can we use these models to advance human knowledge? We’ve made initial progress on this question with sparse autoencoders (SAEs), and we’re now improving and applying these methods to produce new insights on high-dimensional medical datasets (X-rays, biopsies, clinical notes, etc.).