CHAI Internship Mentor Profiles

This is a tentative list of mentors who will be accepting interns. Please review the list to get a sense of our research interests; you will have the option to list mentors you’d be especially excited to work with in the application form.

Michael K. Cohen: Michael’s main research interest is solving the control problem by designing agents that avoid the assumptions of this paper, such as those described in short blog posts here. Another research interest, also for the purpose of solving the control problem, is the Eliciting Latent Knowledge research agenda described here.

Cameron Allen: I am interested in the foundations of intelligence—the computations that enable agency, learning, planning, abstraction, and interaction. My research is focused on trying to answer some of the following questions. What techniques enable agents to quickly learn abstract models of the world? And how can agents flexibly apply what they’ve learned to new problems? How can agents identify the important aspects of their decision problem? When should agents remember something, and what should they remember? How can we design agents so they’ll reliably do what we want, while avoiding unwanted side-effects? I build AI systems as a way to study these core aspects of intelligence, in the hopes that they will allow us to better understand ourselves and our world, and ultimately to improve both.

Scott Emmons: I am interested in both the theory and practice of AI alignment. I have helped characterize how RLHF can lead to deception when the AI sees more than the human, develop multimodal attacks and benchmarks for open-ended agents, and use mechanistic interpretability to find evidence of learned look-ahead in a chess-playing neural network. I am planning to advise internships on similar topics, including follow-ups to these projects and new, related directions.

Micah Carroll: I’m interested in humans’ preference, value, and belief changes, and how they may be affected by interactions with AI systems. I’ve studied how unintended AI manipulation, deception, and preference change behavior may emerge from seemingly reasonable choices of optimization algorithms in RL, both conceptually and in practice (in the context of LLMs and recommender systems). More recently, I’ve also been more excited about understanding reward hacking of LLM reward models, and approaches to scalable oversight. I expect to be most interested in advising projects on these or related topics.

Ben Plaut: I mostly study generalization: how a model handles unfamiliar inputs. I think many types of safety and alignment failures can be framed as misgeneralization (e.g., a robot has never seen a vase before so it doesn’t know whether breaking vases is positive, negative, or neutral). I’m specifically interested in training agents to recognize when they’re in unfamiliar situations and then act cautiously (e.g., ask for help). I think this idea is applicable to many different ML regimes, but I mainly work on LLMs and RL. I do a mix of theoretical and empirical work with the goal of designing methods which can potentially scale to very advanced AI but are testable on systems we have today.

Bhaskar Mishra: My research focuses on designing AI systems capable of forming and reasoning about well-defined probabilistic beliefs. To this end, I am interested in a variety of approaches to designing scalable probabilistic inference algorithms, including alternative representations, evidence sub-sampling, reinforcement learning for meta-level control, and more. One direction I’m exploring frames inference in large Bayesian networks as a meta-level control problem, using reinforcement learning to learn a policy for efficient evidence sub-sampling, approximating inference by focusing on relevant evidence over less relevant evidence. Additionally, I’m designing approximate inference algorithms for Probabilistic Dependency Graphs, an alternative representation that, unlike Bayesian networks, allows an agent to hold intersecting, potentially incompatible beliefs. Interns will have the opportunity to contribute to ongoing projects or pursue new directions related to improving scalability in probabilistic reasoning or exploring applications of these approaches.

Jiahai Feng: I’m interested in understanding empirical deep learning phenomena, in order to predict, monitor, or control the behavior of deep learning systems. In the past, I have studied how language models bind entities in context, and used the resulting insights to build probes that decode language model beliefs as logical propositions. More recently, I am thinking about training dynamics, and about how language models are able to learn and reason out of context.

Aly Lidayan: I am interested in advancing scientific understanding of intelligent agents, in particular intrinsically motivated, open-ended and continual/lifelong learning agents. Recently I’ve been studying intrinsic motivation and exploration (theoretically and empirically), and for future work I’d be interested in advising intern projects on topics including meta-RL and evolutionary perspectives, abstraction discovery and goal setting, causal learning, decision-making based on internal as well as external state, and/or any of the above themes in combination with large language models.

Niklas Lauffer: I’m interested in multiagent learning, cooperative-AI, and human value learning in both RL and LLM settings. Usually my research involves exploring a foundational concept that can be scaled to empirical settings. One of my primary lines of research explores the idea that in most multiagent interactions, only some pieces of information are strategically relevant. Recently, I’ve been exploring how these concepts can be used to decompose multiagent games, better coordinate with and learn from humans, and train more robust and adaptable agents. I’m also interested in designing benchmarks for evaluating agents in multiagent interactions and incorporating formal structure into learning. In the past I’ve worked on online learning in games and formal methods.

Raj Movva: I design and evaluate AI systems to improve social equity, focusing especially on NLP and healthcare. My NLP work considers the idea of “pluralistic alignment”: where do users disagree about what constitutes an aligned large language model, and how do we design disagreement-aware models that simultaneously satisfy multiple groups (e.g. EMNLP 2024)? My healthcare work focuses on evaluating algorithmic fairness (MLHC 2023) and designing ML/LLM-based systems that improve equity of health outcomes (arXiv 2023). Internship projects may involve (a) designing more pluralistic LLMs via alternatives to RLHF, or (b) fine-tuning and evaluating LLMs for health.

Sidhika Balachandar: I’m interested in creating models in settings where outcome data is missing or biased, especially in healthcare (e.g., ICLR 2024). In particular, my research has focused on the following questions: What are the challenges of using outcome data collected from human decision makers or crowdsourced platforms? What identification approaches can we use to create models in these settings? In particular, can we leverage external, domain specific information or data?

Shuvom Sadhuka: I am broadly interested in machine learning and decision-making, especially as applied to biomedical settings. Particular problem areas of interest that I’ve been working on or thinking about recently include: (1) Sequential decision-making: how can we build models to analyze sequences of human decisions when some outcomes are censored as a result of those decisions; (2) Uncertainty quantification and confidence calibration: how do we report uncertainty in a valid, efficient, and useful way? Should we always use calibrated models?; (3) Model sets and multiplicity: how can we evaluate the best model from a set given few labels? When should a set of models agree in their predictions/generations and when should they disagree? I’ve also worked in the past on privacy and statistical genetics and would welcome connections/ideas related to those fields!