Research
CHAI aims to reorient the foundations of AI research toward the development of provably beneficial systems. Currently, it is not possible to specify a formula for human values in any form that we know would provably benefit humanity, if that formula were instated as the objective of a powerful AI system. In short, any initial formal specification of human values is bound to be wrong in important ways. This means we need to somehow represent uncertainty in the objectives of AI systems. This way of formulating objectives stands in contrast to the standard model for AI, in which the AI system's objective is assumed to be known completely and correctly.
Therefore, much of CHAI's research efforts to date have focussed on developing and communicating a new model of AI development, in which AI systems should be uncertain of their objectives, and should be deferent to humans in light of that uncertainty. However, our interests extend to a variety of other problems in the development of provably beneficial AI systems. Our areas of greatest focus so far have been the foundations of rational agency and causality, value alignment and inverse reinforcement learning, human-robot cooperation, multi-agent perspectives and applications, and models of bounded or imperfect rationality. Other areas of interest to our mission include adversarial training and testing for ML systems, various AI capabilities, topics in cognitive science, ethics for AI and AI development robust inference and planning, security problems and solutions, and transparency and interpretability methods.
In addition to purely academic work, CHAI strives to produce intellectual outputs for general audiences as well. We also advise governments and international organizations on policies relevant to ensuring AI technologies will benefit society, and offer insight on a variety of individual-scale and societal-scale risks from AI, such as pertaining to autonomous weapons, the future of employment, and public health and safety.
Below is a list of CHAI's publications since we began operating in 2016. Many of our publications are collaborations with other AI research groups; we view collaborations as key to integrating our perspectives into mainstream AI research.
1. Overviews
1.1. Books
- Stuart Russell. 2021. Human-Compatible Artificial Intelligence. Human-Like Machine Intelligence
- Stuart Russell. 2020. Artificial Intelligence: A Modern Approach (Textbook, 4th Edition). Pearson
- Stuart Russell. 2019. Human Compatible: Artificial Intelligence and The Problem of Control. Penguin Random House
- Joseph Y. Halpern. 2017. Actual Causality (Book). MIT Press 2016
1.2. Overviews of societal-scale risks from AI
- McKane Andrus, Sarah Dean, Thomas Krendl Gilbert, Nathan Lambert, Tom Zick. 2021. AI Development for the Public Interest: From Abstraction Traps to Sociotechnical Risks.
- Simon Zhuang, Dylan Hadfield-Menell. 2021. Consequences of Misaligned AI. NeurIPS 2020
- Raja Chatila, Virginia Dignum, Michael Fisher, Fosca Giannotti, Katharina Morik, Stuart Russell, Karen Yeung. 2021. Trustworthy AI. Reflections on Artificial Intelligence for Humanity
- Stuart Russell. 2021. The history and future of AI. Oxford Review of Economic Policy
- Jonathan Stray. 2021. Beyond Engagement: Aligning Algorithmic Recommendations With Prosocial Goals.
- Dan Hendrycks, Nicholas Carlini, John Schulman, Jacob Steinhardt. 2021. Unsolved Problems in ML Safety.
- Andrew Critch, David Krueger. 2020. AI Research Considerations for Human Existential Safety (ARCHES). (Preprint)
- Olaf Graf, Mark Nitzberg. 2018. Solomon’s Code: Humanity in a World with Thinking Machines. Pegasus Books
- Stuart Russell. 2018. The new weapons of mass destruction?. The Security Times
- Stuart Russell. . Artificial Intelligence and the Problem of Control. Perspectives on Digital Humanism
1.3. Overviews of beneficial AI applications
- Jocelyn Maclure, Stuart Russell. 2021. AI for Humanity: The Global Challenges. Reflections on Artificial Intelligence for Humanity
2. Core topics
2.1. Foundations of rational agency & causality
- Paria Rashidinejad, Banghua Zhu, Cong Ma, Jiantao Jiao, Stuart Russell. 2021. Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism.
- David Silver, Satinder Singh, Doina Precup, and Richard Sutton.. 2021. Reward is Enough. Artificial Intelligence 2021
- David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, and Satinder Singh. 2021. On the Expressivity of Markov Reward. NeurIPS 2021
- Christopher Grimm, Andre Barreto, Gregory Farquhar, David Silver, and Satinder Singh. 2021. Proper Value Equivalence.
- Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, Satinder Singh. 2021. Reward is Enough for Convex MDPs.
- Theodore R. Sumers, Robert D. Hawkins, Mark K. Ho, Thomas L. Griffiths. 2021. Extending rational models of communication from beliefs to actions.
- Smitha Milli, Luca Belli, Moritz Hardt. 2021. Causal Inference Struggles with Agency on Online Platforms.
- Sander Beckers, Frederick Eberhardt, Joseph Y Halpern. 2020. Approximate Causal Abstractions. PMLR
- Dalal Alrajeh, Hana Chockler, Joseph Y Halpern. 2020. Combining experts’ causal judgments. AAAI; Elsevier
- Andrew Critch. 2019. A Parametric, Resource-Bounded Generalization of Löb’s Theorem, and a Robust Cooperation Criterion for Open-Source Game Theory. The Journal of Symbolic Logic, Cambridge University Press
- Joseph Y. Halpern, Evan Piermont. 2019. Partial Awareness. AAAI 2019
- Joseph Y. Halpern, Rafael Pass. 2019. A Conceptually Well-Founded Characterization of Iterated Admissibility Using an ”All I Know” Operator. TARK 2019
- Sander Beckers, Frederick Eberhardt, Joseph Y. Halpern. 2019. Approximate Causal Abstraction. UAI 2019
- Sander Beckers, Joseph Y. Halpern. 2019. Abstracting causal models. AAAI 2019
- Joseph Y. Halpern. 2018. A Note on the Existence of Ratifiable Acts. Review of Symbolic Logic
- Meir Friedenberg, Joseph Y. Halpern. 2018. Combining the Causal Judgments of Experts with Possibly Different Focus Areas. International Conference on Principles of Knowledge Representation and Reasoning
- Gadi Aleksandrowicz, Hana Chockler, Joseph Y. Halpern, Alexander Ivrii. 2017. The Computational Complexity of Structure-Based Causality. JAIR
- Joseph Y. Halpern. 2016. Sufficient Conditions for Causality to be Transitive. Philosophy of Science, 83, 213--226
- Tom Everitt, Daniel Filan, Mayank Daswani, Marcus Hutter. 2016. Self-Modification of Policy and Utility Function in Rational Agents. AGI 2016
2.2. Value alignment and inverse reinforcement learning
- Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike. 2021. Quantifying Differences in Reward Functions. ICLR 2021
- Smitha Milli, Luca Belli, Moritz Hardt. 2021. From Optimizing Engagement to Measuring Value. FaccT 2021
- Vael Gates, Frederick Callaway, Mark K Ho, Tom Griffiths. 2021. A rational model of people’s inferences about others’ preferences based on response times.
- David Lindner, Rohin Shah, Pieter Abbeel, Anca Dragan. 2021. Learning What To Do by Simulating the Past. ICLR 2021
- Kush Bhatia, Peter L. Bartlett, Anca D. Dragan, Jacob Steinhardt. 2021. Agnostic Learning with Unknown Utilities. ITCS 2021
- Cassidy Laidlaw, Stuart Russell. 2021. Uncertain Decisions Facilitate Better Preference Learning. NeurIPS 2021
- Rohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin, Pieter Abbeel, Stuart Russell, Anca Dragan. 2021. The MineRL BASALT Competition on Learning from Human Feedback. NEurIPS 2021
- Kimin Lee, Laura Smith, Pieter Abbeel. 2021. PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training. ICML 2021
- Xiaofei Wang, Kimin Lee, Kourosh Hakhamaneshi, Pieter Abbeel, Michael Laskin. 2021. Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback.
- Olivia Watkins, Abhishek Gupta, Trevor Darrell, Pieter Abbeel, Jacob Andreas . 2021. Teachable Reinforcement Learning via Advice Distillation. NeurIPS 2021
- Kimin Lee, Laura Smith, Anca Dragan, Pieter Abbeel. 2021. B-Pref: Benchmarking Preference-Based Reinforcement Learning. NeurIPS 2021
- Dylan P. Losey, Andrea Bajcsy, Marcia K. O’Malley, Anca D. Dragan. 2021. Physical interaction as communication: Learning robot objectives online from human corrections.
- Daniel S. Brown, Jordan Schneider, Anca D. Dragan, Scott Niekum. 2021. Value Alignment Verification. ICML 2021
- Avik Jain, Lawrence Chan, Daniel S. Brown, Anca D. Dragan. 2021. Optimal Cost Design for Model Predictive Control. L4DC 2021
- Andreea Bobu, Marius Wiggert, Claire Tomlin, Anca D. Dragan. 2021. Feature Expansive Reward Learning: Rethinking Human Input.
- Micah Carroll, Dylan Hadfield-Menell, Stuart Russell, Anca Dragan. 2021. Estimating and Penalizing Preference Shift in Recommender Systems. RecSys 2021
- Arnaud Fickinger, Samuel Cohen, Stuart Russell, Brandon Amos. 2021. Cross-Domain Imitation Learning via Optimal Transport.
- Dan Hendrycks, Mantas Mazeika, Andy Zou, Sahil Patel, Christine Zhu, Jesus Navarro, Dawn Song, Bo Li, Jacob Steinhardt. 2021. What Would Jiminy Cricket Do? Towards Agents That Behave Morally. NeurIPS 2021
- Xin Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven H Wang, Ping Luo, Stuart Russell, Pieter Abbeel, Rohin Shah. 2021. An Empirical Investigation of Representation Learning for Imitation. NeurIPS 2021
- Justin Svegliato, Samer B Nashed, Shlomo Zilberstein. 2021. Ethically compliant sequential decision making.
- Samer B Nashed, Justin Svegliato, Shlomo Zilberstein. 2021. Ethically compliant planning within moral communities.
- Justin Svegliato. 2021. Building efficient, reliable, and ethical autonomous systems.
- Zhao Mandi, Fangchen Liu, Kimin Lee, Pieter Abbeel. 2021. Towards More Generalizable One-shot Visual Imitation Learning.
- Sam Toyer, Rohin Shah, Andrew Critch, Stuart Russell. 2020. The MAGICAL Benchmark for Robust Imitation. NeurIPS 2020
- Alexander Matt Turner, Dylan Hadfield-Menell, Prasad Tadepalli. 2020. Conservative agency via attainable utility preservation.. AIES 2020
- Andreea Bobu, Dexter R.R. Scobee, Jaime F. Fisac, S. Shankar Sastry, Anca D. Dragan. 2020. LESS is More: Rethinking Probabilistic Models of Human Behavior. HRI 2020
- Dylan Hadfield-Menell, Gillian K. Hadfield. 2020. Incomplete Contracting and AI Alignment. AIES 2020
- Gokul Swamy, Siddharth Reddy, Sergey Levine, Anca D. Dragan. 2020. Scaled Autonomy: Enabling Human Operators to Control Robot Fleets. ICRA 2020
- Rachel Freedman, Jana Schaich Borg, Walter Sinnott-Armstrong, John P. Dickerson, Vincent Conitzer. 2020. Adapting a kidney exchange algorithm to align with human values. Artificial Intelligence, 283
- Siddharth Reddy, Anca D. Dragan, Sergey Levine. 2020. SQIL: Imitation Learning via Regularized Behavioral Cloning.. ICLR 2020
- Smitha Milli, Pieter Abbeel, Igor Mordatch. 2020. Interpretable and Pedagogical Examples. (Preprint)
- Eric J. Michaud, Adam Gleave, Stuart Russell. 2020. Understanding Learned Reward Functions. Deep RL Workshop, NeurIPS 2020
- Pedro Freire, Adam Gleave, Sam Toyer, Stuart Russell. 2020. DERAIL: Diagnostic Environments for Reward And Imitation Learning. Deep RL Workshop, NeurIPS 2020
- Rachel Freedman, Rohin Shah, Anca Dragan. 2020. Choice Set Misspecification in Reward Inference. IJCAI-PRICAI-20 Workshop on Artificial Intelligence Safety
- Rachel Freedman. 2020. Aligning with Heterogeneous Preferences for Kidney Exchange. IJCAI-PRICAI-20 Workshop on Artificial Intelligence Safety
- Sam Toyer, Rohin Shah, Andrew Critch, Stuart Russell. 2020. The MAGICAL Benchmark for Robust Imitation. NeurIPS 2020
- Michele Fedrizzi, Nino Civolani, Andrew Critch. 2020. Inconsistency evaluation in pairwise comparison using norm-based distances. Decisions in Economics and Finance
- Kush Bhatia, Ashwin Pananjady, Peter L. Bartlett, Anca D. Dragan, Martin J. Wainwright. 2020. Preference learning along multiple criteria: A game-theoretic perspective. NeurIPS 2020
- Siddharth Reddy, Anca D. Dragan, Sergey Levine. 2020. SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards.. ICLR 2020
- Anna N. Rafferty, Rachel Jansen, Thomas L. Griffiths. 2020. Assessing Mathematics Misunderstandings via Bayesian Inverse Planning. Cognitive Science
- Jonathan Stray, Steven Adler, Dylan Hadfield-Menell. 2020. What are you optimizing for? Aligning Recommender Systems with Human Values. ICML 2020
- Theodore R. Sumers, Mark K. Ho, Robert D. Hawkins, Karthik Narasimhan, Thomas L. Griffiths. 2020. Learning Rewards from Linguistic Feedback.
- Andreea Bobu, Andrea Bajcsy, Jaime F. Fisac, Sampada Deglurkar, Anca D. Dragan. 2019. Quantifying Hypothesis Space Misspecification in Learning from Human-Robot Demonstrations and Physical Corrections. IEEE Transactions on Robotics
- Dmitrii Krasheninnikov, Rohin Shah, Herke van Hoof. 2019. Combining reward information from multiple sources. NeurIPS 2019 Learning with Rich Experience Workshop
- Donald J. Hejna III, Pieter Abbeel, Lerrel Pinto. 2019. Hierarchically Decoupled Imitation for Morphological Transfer. (Preprint)
- Dylan Hadfield-Menell, McKane Andrus, Gillian Hadfield. 2019. Legible Normativity for AI Alignment: The Value of Silly Rules. AIES 2019
- Hong Jun Jeon, Smitha Milli, Anca D. Dragan. 2019. Reward-rational (implicit) choice: A unifying formalism for reward learning. (Preprint)
- Jason Y. Zhang, Anca D. Dragan. 2019. Learning from Extrapolated Corrections. ICRA 2019
- Kelvin Xu, Ellis Ratner, Anca Dragan, Sergey Levine, Chelsea Finn. 2019. Learning a Prior over Intent via Meta-Inverse Reinforcement Learning. ICML 2019
- Lawrence Chan, Dylan Hadfield-Menell, Siddhartha Srinivasa, Anca Dragan. 2019. The Assistive Multi-Armed Bandit. HRI 2019
- Matthew Rahtz, James Fang, Anca D. Dragan, Dylan Hadfield-Menell. 2019. An Extensible Interactive Interface for Agent Design. ICML 2019 Human-in-the-Loop Learning Workshop
- Mayank Agrawal, Joshua C. Peterson, Thomas L. Griffiths. 2019. Using Machine Learning to Guide Cognitive Modeling: A Case Study in Moral Reasoning. CogSci 2019
- Ori Plonsky, Reut Apel, Eyal Ert, Moshe Tennenholtz, David Bourgin, Joshua C. Peterson, Daniel Reichman, Thomas L. Griffiths, Stuart J. Russell, Evan C. Carter, James F. Cavanagh, Ido Erev. 2019. Predicting human decisions with behavioral theories and machine learning. (Preprint)
- Rohin Shah, Dmitrii Krasheninnikov, Jordan Alexander, Pieter Abbeel, Anca Dragan. 2019. Preferences Implicit in the State of the World. ICLR 2019
- Rohin Shah, Noah Gundotra, Pieter Abbeel, Anca D. Dragan. 2019. On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference. ICML 2019
- Sandy H. Huang, Isabella Huang, Ravi Pandya, Anca D. Dragan. 2019. Nonverbal Robot Feedback for Human Teachers. CoRL 2019
- Siddharth Reddy, Anca D. Dragan, Sergey Levine, Shane Legg, Jan Leike. 2019. Learning Human Objectives by Evaluating Hypothetical Behavior. (Preprint)
- Smitha Milli, Anca D. Dragan. 2019. Literal or Pedagogic Human? Analyzing Human Model Misspecification in Objective Learning. UAI 2019
- Xue Bin Peng, Angjoo Kanazawa, Sam Toyer, Pieter Abbeel, Sergey Levine. 2019. Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow. ICLR 2019
- Aaron Tucker, Adam Gleave, Stuart Russell. 2018. Inverse reinforcement learning for video games. NeurIPS 2018 Deep RL Workshop
- Adam Gleave, Oliver Habryka. 2018. Multi-task Maximum Entropy Inverse Reinforcement Learning. ICML 2018 Goals RL Workshop
- Chandrayee Basu, Mukesh Singhal, Anca D. Dragan. 2018. Learning from Richer Human Guidance: Augmenting Comparison-Based Learning with Feature Queries. HRI 2018
- Chris Cundy, Daniel Filan. 2018. Exploring Hierarchy-Aware Inverse Reinforcement Learning. Unpublished (ICML 2018 Goals RL Workshop)
- Dhruv Malik, Malayandi Palaniappan, Jaime F. Fisac, Dylan Hadfield-Menell, Stuart Russell, Anca D. Dragan. 2018. An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning. ICML 2018
- Ellis Ratner, Dylan Hadfield-Menell, Anca D. Dragan. 2018. Simplifying Reward Design through Divide-and-Conquer. RSS 2018
- Nicholas C. Landolfi, Anca D. Dragan. 2018. Social Cohesion in Autonomous Driving. IROS 2018
- Sören Mindermann, Rohin Shah, Adam Gleave, Dylan Hadfield-Menell. 2018. Active Inverse Reward Design. ICML 2018 GoalsRL workshop
- Zeyu Zheng, Junhyuk Oh, Satinder Singh. 2018. On Learning Intrinsic Rewards for Policy Gradient Methods. NeurIPS 2018
- Dorsa Sadigh, Anca Dragan, S. Shankar Sastry, Sanjit Seshia. 2017. Active Preference-Based Learning of Reward Functions. RSS 2017
- Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart Russell, Anca Dragan. 2017. Inverse Reward Design. NeurIPS 2017
- Jaime F. Fisac, Monica A. Gates, Jessica B. Hamrick, Chang Liu, Dylan Hadfield-Menell, Malayandi Palaniappan, Dhruv Malik, S. Shankar Sastry, Thomas L. Griffiths, Anca D. Dragan. 2017. Pragmatic-Pedagogic Value Alignment. ISRR 2017
- Kareem Amin, Nan Jiang, Satinder Singh. 2017. Repeated Inverse Reinforcement Learning. NIPS 2017
- Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell. 2016. Cooperative Inverse Reinforcement Learning. NeurIPS 2016
2.3. Human-robot cooperation
- Paul Knott, Micah Carroll, Sam Devlin, Kamil Ciosek, Katja Hofmann, AD Dragan, Rohin Shah. 2021. Evaluating the Robustness of Collaborative Agents.
- Andrea Bajcsy, Somil Bansal, Ellis Ratner, Claire J. Tomlin, Anca D. Dragan. 2021. A Robust Control Framework for Human Motion Prediction. IEEE Robotics and Automation Letters
- Siddharth Srivastava. 2021. Unifying Principles and Metrics for Safe and Assistive AI. AAAI 2021
- Siddharth Reddy, Anca D. Dragan, Sergey Levine. 2021. Pragmatic Image Compression for Human-in-the-Loop Decision-Making.
- Liting Sun, Xiaogang Jia, Anca D. Dragan. 2021. On complementing end-to-end human behavior predictors with planning.
- Andrea Bajcsy, Anand Siththaranjan, Claire J. Tomlin, Anca D. Dragan. 2021. Analyzing Human Models that Adapt Online.
- Arjun Sripathy, Andreea Bobu, Daniel S. Brown, Anca D. Dragan. 2021. Dynamically Switching Human Prediction Models for Efficient Planning. ICRA 2021
- Matthew Zurek, Andreea Bobu, Daniel S. Brown, Anca D. Dragan. 2021. Situational Confidence Assistance for Lifelong Shared Autonomy. ICRA 2021
- Jensen Gao, Siddharth Reddy, Glen Berseth, Nicholas Hardy, Nikhilesh Natraj, Karunesh Ganguly, Anca Dragan, Sergey Levine. 2021. X2T: Training an X-to-Text Typing Interface with Online Learning from User Feedback. ICLR 2021
- Arnaud Fickinger, Simon Zhuang, Dylan Hadfield-Menell, Stuart Russell. 2020. Multi-Principal Assistance Games.
- David Fridovich-Keil, Ellis Ratner, Lasse Peters, Anca D. Dragan, Claire J. Tomlin. 2020. Efficient Iterative Linear-Quadratic Approximations for Nonlinear Multi-Player General-Sum Differential Games. ICRA 2020
- Somil Bansal, Andrea Bajcsy, Ellis Ratner, Anca D. Dragan, Claire J. Tomlin. 2020. A Hamilton-Jacobi Reachability-Based Framework for Predicting and Analyzing Human Motion for Safe Planning. ICRA 2020
- Vael Gates, Thomas L. Griffiths, Anca D. Dragan. 2020. How to Be Helpful to Multiple People at Once. Other cognitive science 44(6)
- Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart Russell. 2020. Benefits of Assistance over Reward Learning. NeurIPS 2020 Workshop on Cooperative AI
- Arnaud Fickinger, Simon Zhuang, Andrew Critch, Dylan Hadfield-Menell, Stuart Russell. 2020. Multi-Principal Assistance Games: Definition and Collegial Mechanisms. Cooperative AI Workshop, NeurIPS 2020
- Yuqing Du, Stas Tiomkin, Emre Kiciman, Daniel Polani, Pieter Abbeel, Anca D. Dragan. 2020. AvE: Assistance via Empowerment. NeurIPS 2020
- Andrew Critch, Stuart Russell. 2019. Servant of Many Masters: Shifting priorities in Pareto-optimal sequential decision-making. AIES 2019
- Elis Stefansson, Jaime F. Fisac, Dorsa Sadigh, S. Shankar Sastry, Karl H. Johansson. 2019. Human-robot interaction for truck platooning using hierarchical dynamic games. European Control Conference 2019
- Micah Carroll, Rohin Shah, Mark Ho, Thomas Griffiths, Sanjit Seshia, Pieter Abbeel, Anca Dragan. 2019. On the Utility of Learning about Humans for Human-AI Coordination. NeurIPS 2019
- Rohan Choudhury, Gokul Swamy, Dylan Hadfield-Menell, Anca D. Dragan. 2019. On the Utility of Model Learning in HRI. HRI 2019
- Sarath Sreedharan, Siddharth Srivastava, David Smith, Subbarao Kambhampati. 2019. Why Can’t You Do That, HAL? Explaining Unsolvability of Planning Tasks. IJCAI 2019
- Shihui Li, Yi Wu, Xinyue Cui, Honghua Dong, Fei Fang, Stuart Russell. 2019. Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. AAAI 2019
- Aaron Bestick, Ravi Pandya, Ruzena Bajcsy, Anca D. Dragan. 2018. Learning Human Ergonomic Preferences for Handovers. ICRA 2018
- Allan Zhou, Anca D. Dragan. 2018. Cost Functions for Robot Motion Style. IROS 2018
- Andrea Bajcsy, Dylan P. Losey, Marcia K. O'Malley, Anca D. Dragan. 2018. Learning from Physical Human Corrections, One Feature at a Time. HRI 2018
- David Fridovich-Keil, Andrea Bajcsy, Jaime F. Fisac, Sylvia L. Herbert, Steven Wang, Anca D. Dragan, Claire J. Tomlin. 2018. Confidence-aware motion prediction for real-time collision avoidance. International Journal of Robotics Research
- Dorsa Sadigh, Nick Landolfi, Shankar S. Sastry, Sanjit A. Seshia, Anca D. Dragan. 2018. Planning for cars that coordinate with people: leveraging effects on human actions for planning and active information gathering over human internal state. Autonomous Robots
- Jaime F. Fisac, Andrea Bajcsy, Sylvia L. Herbert, David Fridovich-Keil, Steven Wang, Claire J. Tomlin, Anca D. Dragan. 2018. Probabilistically Safe Robot Planning with Confidence-Based Human Predictions. RSS 2018
- Liting Sun, Wei Zhan, Masayoshi Tomizuka, Anca D. Dragan. 2018. Courteous Autonomous Cars. IROS 2018
- Minae Kwon, Sandy H. Huang, Anca D. Dragan. 2018. Expressing Robot Incapability. HRI 2018
- Sandy H. Huang, Kush Bhatia, Pieter Abbeel, Anca D. Dragan. 2018. Establishing Appropriate Trust via Critical States. IROS 2018
- Shun Zhang, Edmund H. Durfee, Satinder P. Singh. 2018. Minimax-regret querying on side effects for safe optimality in factored Markov decision processes. IJCAI 2018
- Siddharth Reddy, Anca D. Dragan, Sergey Levine. 2018. Shared Autonomy via Deep Reinforcement Learning. RSS 2018
- Siddharth Reddy, Anca D. Dragan, Sergey Levine. 2018. Where Do You Think You’re Going?: Inferring Beliefs about Dynamics from Behavior. NeurIPS 2018
- Allan Zhou, Dylan Hadfield-Menell, Anusha Nagabaudi, Anca Dragan. 2017. Expressive Robot Motion Timing. HRI 2017
- Chandrayee Basu, Qian Yang, David Hungerman, Mukesh Singhal, Anca Dragan. 2017. Do You Want Your Autonomous Car to Drive Like You?. HRI 2017
- Chang Liu, Jessica B. Hamrick, Jaime F. Fisac, Anca D. Dragan, J. Karl Hedrick, S. Shankar Sastry, Thomas L. Griffiths. 2017. Goal Inference Improves Objective and Perceived Performance in Human-Robot Collaboration. AAMAS 2017
- Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell. 2017. The Off-Switch Game. IJCAI 2017
- Michael Laskey, Caleb Chuck, Jonathan Lee, Jeffrey Mahler, Sanjay Krishnan, Kevin Jamieson, Anca Dragan, Ken Goldberg. 2017. Comparing Human-Centric and Robot-Centric Sampling for Robot Deep Learning from Demonstrations. ICRA 2017
- Sandy H. Huang, David Held, Pieter Abbeel, Anca Dragan. 2017. Enabling Robots to Communicate their Objectives. RSS 2017
- Smitha Milli, Dylan Hadfield-Menell, Anca Dragan, Stuart Russell. 2017. Should Robots be Obedient?. IJCAI 2017
- Aaron Bestick, Ruzena Bajcsy, Anca Dragan. 2016. Implicitly Assisting Humans to Choose Good Grasps in Robot to Human Handovers. 2016 International Symposium on Experimental Robotics
- Dorsa Sadigh, S. Shankar Sastry, Sanjit A. Seshia, Anca Dragan. 2016. Information Gathering Actions Over Human Internal State. IROS 2016
- Dorsa Sadigh, Shankar Sastry, Sanjit Seshia, Anca Dragan. 2016. Planning for Autonomous Cars that Leverage Effects on Human Actions. RSS 2016
- Jaime F. Fisac, Anayo K. Akametalu, Melanie N. Zeilinger, Shahab Kaynama, Jeremy Gillula, Claire J. Tomlin. 2016. A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems. IEEE Transactions on Automatic Control
- Jaime F. Fisac, Chang Liu, Jessica B. Hamrick, S. Shankar Sastry, J. Karl Hedrick, Thomas L. Griffiths, Anca D. Dragan. 2016. Generating Plans that Predict Themselves. CDC 2016
- Negar Mehr, Roberto Horowitz, Anca Dragan. 2016. Inferring and Assisting with Constraints in Shared Autonomy. CDC 2016
2.4. Multi-agent perspectives and applications
- Xintong Wang, David M Pennock, Nikhil R Devanur, David M Rothschild, Biaoshuai Tao, Michael P Wellman. 2021. Designing a Combinatorial Financial Options Market.
- Charlotte Roman, Michael Dennis, Andrew Critch, Stuart Russell. 2021. Accumulating Risk Capital Through Investing in Cooperation. AAMAS 2021
- Scott Emmons, Caspar Oesterheld, Andrew Critch, Vince Conitzer, Stuart Russell. 2021. Symmetry, Equilibria, and Robustness in Common-Payoff Games. GAIW 2021
- Jonathan Stray. 2021. Designing Recommender Systems to Depolarize.
- Katherine Mayo, Shaily Fozdar, Michael P. Wellman. 2021. An Agent-Based Model of Strategic Adoption of Real-Time Payments.
- Max Olan Smith, Thomas Anthony, Michael P Wellman. 2021. Iterative Empirical Game Solving via Single Policy Best Response. ICLR 2021
- Xintong Wang, Christopher Hoang, Yevgeniy Vorobeychik, Michael P Wellman. 2021. Spoofing the Limit Order Book: A Strategic Agent-Based Analysis. Games 2021
- Yongzhao Wang, Qiurui Ma, Michael P Wellman. 2021. Evaluating Strategy Exploration in Empirical Game-Theoretic Analysis.
- Zun Li, Michael P Wellman. 2021. Evolution Strategies for Approximate Solution of Bayesian Games. AAAI 2021
- Katherine Mayo, Michael P Wellman. 2021. A Strategic Analysis of Portfolio Compression. AAMAS 2021
- Megan Shearer, David Byrd, Tucker Hybinette Balch, Michael P Wellman. 2021. Stability Effects of Arbitrage in Exchange Traded Funds: An Agent-Based Model. ICAIF 2021
- Stephen McAleer, John Lanier, Michael Dennis, Pierre Baldi, Roy Fox. 2021. Improving Social Welfare While Preserving Autonomy via a Pareto Mediator.
- Johannes Treutlein, Michael Dennis, Caspar Oesterheld, Jakob Foerster. 2021. A New Formalism, Method and Open Issues for Zero-Shot Coordination. PMLR 2021
- Jialu Bao, Kun He, Xiaodong Xin, Bart Selman, John E. Hopcroft. 2020. Hidden Community Detection on Two-layer Stochastic Models: a Theoretical Perspective. (Preprint, submitted to TAMC 2020)
- Raphael Köster, Dylan Hadfield-Menell, Gillian K. Hadfield, Joel Z. Leibo. 2020. Silly rules improve the capacity of agents to learn stable enforcement and compliance behaviors. AAMAS 2020
- Robert D. Hawkins, Noah D. Goodman, Adele E. Goldberg, Thomas L. Griffiths. 2020. Generalizing meanings from partners to populations: Hierarchical inference supports convention formation on networks. CogSci 2020
- Stefano V. Albrechta, Peter Stone, Michael P. Wellman. 2020. Special issue on autonomous agents modelling other agents: Guest editorial. Artificial Intelligence 285
- Valerio Capraro, Joseph Y Halpern. 2020. Translucent players: Explaining cooperative behavior in social dilemmas. Rationality and Society 31(4), 371-408
- Zun Li, Michael P. Wellman. 2020. Structure Learning for Approximate Solution of Many-Player Games. AAAI 2020
- Max Olan Smith, Thomas Anthony, Yongzhao Wang, Michael P Wellman. 2020. Learning to play against any mixture of opponents.
- Michael Chang, Sid Kaushik, S. Matthew Weinberg, Tom Griffiths, Sergey Levine. 2020. Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions. ICML 2020
- Qi Zhang, Edmund H. Durfee, Satinder Singh. 2020. Efficient Querying for Cooperative Probabilistic Commitments.
- Anagha Kulkarni, Siddharth Srivastava, Subbarao Kambhampati. 2019. A unified framework for planning in adversarial and cooperative environments. AAAI 2019
- Arunesh Sinha, Michael P. Wellman. 2019. Incentivizing Collaboration in a Competition. AAMAS 2019
- Ittai Abraham, Danny Dolev, Ivan Geffner, Joseph Y. Halpern. 2019. Implementing Mediators with Asynchronous Cheap Talk. PODC 2019
- Ittai Abraham, Danny Dolev, Joseph Y. Halpern. 2019. Distributed Protocols for Leader Election: A Game-Theoretic Perspective. ACM Transactions on Economics and Computation 7(1)
- Joseph Y. Halpern, Rafael Pass. 2019. Sequential equilibrium in computational games. ACM Transactions on Economics and Computation
- Joseph Y. Halpern, Rafael Pass, Daniel Reichman. 2019. On the Existence of Nash Equilibrium in Games with Resource-Bounded Players. SAGT 2019
- Joseph Y. Halpern, Rafael Pass, Lior Seeman. 2019. The truth behind the myth of the folk theorem. Games and Economic Behavior, 117
- Mark K. Ho, Joanna Korman, Thomas L. Griffiths. 2019. The Computational Structure of Unintentional Meaning. CogSci 2019
- Megan Shearer, Gabriel Rauterberg, Michael P. Wellman. 2019. An Agent-Based Model of Financial Benchmark Manipulation. ICML 2019
- Meir Friedenberg, Joseph Y. Halpern. 2019. Blameworthiness in Multi-Agent Settings. AAAI 2019
- Thanh H. Nguyen, Yongzhao Wang, Arunesh Sinha, Michael P. Wellman. 2019. Deception in finitely repeated security games. AAAI 2019
- Xintong Wang, Chris Hoang, Michael P. Wellman. 2019. Learning-Based Trading Strategies in the Face of Market Manipulation. ICML 2019 Workshop on AI in Finance
- Andrew Whalen, Thomas L. Griffiths, Daphna Buchsbaum. 2018. Sensitivity to Shared Information in Social Learning. 3.3. Cognitive science, uncategorized
- Bryce Wiedenbeck, Fengjun Yang, Michael P. Wellman. 2018. A Regression Approach for Modeling Games with Many Symmetric Players. AAAI 2018
- Joseph Y. Halpern, Rafael Pass. 2018. Game Theory with Translucent Players. International Journal of Game Theory
- Mason Wright and Michael P. Wellman. 2018. Evaluating the Stability of Non-Adaptive Trading in Continuous Double Auctions. AAMAS 2018
- Natasha Alechina, Joseph Y. Halpern, Ian A. Kash, Brian Logan. 2018. Incentive-Compatible Mechanisms for Norm Monitoring in Open Multi-agent perspectives and applications. JAIR
- Nishant Desai, Andrew Critch, Stuart J. Russell. 2018. Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making. NeurIPS 2018
- Adam Bjorndahl, Joseph Y. Halpern, Rafael Pass. 2017. Reasoning about Rationality. Games and Economic Behavior 104, 146-164
- Joseph Y. Halpern, Rafael Pass, Lior Seeman. 2017. Computational Extensive-Form Games. EC 2016
- Michael Wellman, Eric Sodomka, Amy Greenwald. 2017. Self-confirming price-prediction strategies for simultaneous one-shot auctions. Games and Economic Behavior, 102, 339–372
- Natasha Alechina, Joseph Y. Halpern, Brian Logan. 2017. Causality, Responsibility and Blame in Team Plans. AAMAS 2017
- Joseph Y. Halpern, Xavier Vilaca. 2016. Rational Consensus (extended abstract). 2016 ACM Symposium on Principles of Distributed Computing
2.5. Models of bounded or imperfect rationality
- Bill Thompson and Thomas L. Griffiths. 2021. Human biases limit cumulative innovation.
- Ruairidh M. Battleday, Joshua C. Peterson, and Thomas L. Griffiths. 2021. From convolutional neural networks to models of higher-level cognition (and back again).
- Thomas A. Langloisa, Nori Jacobyc, Jordan W. Suchowe, and Thomas L. Griffiths. 2021. Serial reproduction reveals the geometry of visuospatial representations. PNAS 2021
- Samarie Wilson, Somya Arora, Qiong Zhang, Thomas L. Griffiths. 2021. A Rational Account of Anchor Effects in Hindsight Bias.
- Ruairidh M Battleday, Joshua C Peterson, Thomas L Griffiths. 2021. From convolutional neural networks to models of higher level cognition (and back again).
- Sreejan Kumar, Ishita Dasgupta, Jonathan D. Cohen, Nathaniel D. Daw, and Thomas L. Griffiths. 2021. Meta-Learning of Structured Task Distributions in Humans and Machines. ICLR 2021
- Frederick Callaway, Antonio Rangel, Thomas L. Griffiths. 2021. Fixation patterns in simple choice reflect optimal information sampling.
- Falk Lieder, Owen X. Chen, Paul M. Krueger, Thomas L. Griffiths. 2020. Cognitive prostheses for goal achievement. Nature Human Behaviour 3:1096–1106
- Falk Lieder, Thomas L. Griffiths. 2020. Advancing rational analysis to the algorithmic level. Behavioral and Brain Sciences, 43, E27
- Frederick Callaway, Antonio Rangel, Tom Griffiths. 2020. Fixation patterns in simple choice are consistent withoptimal use of cognitive resources. (Preprint)
- Mark K. Ho, David Abel, Jonathan D. Cohen, Michael L. Littman, Thomas L. Griffiths. 2020. The Efficiency of Human Cognition Reflects Planned Information Processing. AAAI 2020
- Smitha Milli, Falk Lieder, Tom Griffiths. 2020. A Rational Reinterpretation of Dual-Process Theories. UAI 2020
- Joseph Y Halpern, Evan Piermont. 2020. Dynamic Awareness.
- Xinming Liu, Joseph Halpern. 2020. Bounded Rationality in Las Vegas: Probabilistic Finite Automata Play Multi-Armed Bandits. PMLR
- Ida Momennejad, Jarrod Lewis-Peacock, Kenneth A Norman, Jonathan D Cohen, Satinder Singh, Richard L Lewis. 2020. Rational use of episodic and working memory: A normative account of prospective memory. Neuropsychologia
- Qiong Zhang, Kenneth A. Norman, Tom Griffiths. 2020. The method of loci is an optimal policy for memory search. CogSci 2020
- Rachel Jansen, Anna N. Rafferty, Tom Griffiths. 2020. A rational model of sequential self-assessment. CogSci 2020
- Carlos G. Correa, Mark K. Ho, Frederick Callaway, Tom Griffiths. 2020. Resource-rational Task Decomposition to Minimize Planning Costs. CogSci 2020
- Mark K. Ho, David Abel, Jonathan D. Cohen, Michael L. Littman, Thomas L. Griffiths. 2020. People Do Not Just Plan,They Plan to Plan. AAAI 2020
- Falk Lieder, Thomas L. Griffiths. 2019. Resource-rational analysis: understanding human cognition as the optimal use of limited computational resources. Behavioral and Brain Sciences, 43, E1
- Frederick Callaway, Tom Griffiths. 2019. Attention in value-based choice as optimal sequential sampling. (Preprint)
- Joshua Peterson, David Bourgin, Daniel Reichman, Thomas Griffiths, Stuart Russell. 2019. Cognitive model priors for predicting human decisions. ICML 2019
- Mark K. Ho, David Abel, Tom Griffiths, Michael L. Littman. 2019. The Value of Abstraction. Current Opinion in Behavioral Sciences, 29:111-116
- Ruairidh M. Battleday, Joshua C. Peterson, Thomas L. Griffiths. 2019. Capturing human categorization of natural images at scale by combining deep networks and cognitive models. (Preprint)
- Thomas L. Griffiths, Frederick Callaway, Michael B. Chang, Erin Grant, Paul M. Krueger, Falk Lieder. 2019. Doing more with less: meta-reasoning and meta-learning in humans and machines. Current Opinion in Behavioral Sciences 29: 24-30
- Falk Lieder, Amitai Shenhav, Sebastian Musslick, Thomas L. Griffiths. 2018. Rational metareasoning and the plasticity of cognitive control. PLoS Comp. Biol.
- Falk Lieder, Thomas L. Griffiths, Ming Hsu. 2018. Overrepresentation of extreme events in decision making reflects rational use of cognitive resources. Psychological Review
- Falk Lieder, Thomas L. Griffiths, Quentin J. M. Huys, Noah D. Goodman. 2018. Empirical evidence for resource-rational anchoring and adjustment. Psychonomic Bulletin & Review
- Falk Lieder, Thomas L. Griffiths, Quentin J. M. Huys, Noah D. Goodman. 2018. The anchoring bias reflects rational use of cognitive resources. Psychonomic Bulletin & Review
- Joseph Y. Halpern, Lior Seeman. 2018. Is state-dependent valuation more adaptive than simpler rules?. Behavioural Processes
- Amitai Shenhav, Sebastian Musslick, Falk Lieder, Wouter Kool, Thomas L Griffiths, Jonathan D Cohen, Matthew M Botvinick. 2017. Toward a Rational and Mechanistic Account of Mental Effort. Annual Review of Neuroscience, 40, 9f4b26db33-124
- Falk Lieder, Paul Krueger, Tom Griffiths. 2017. An automatic method for discovering rational heuristics for risky choice. CogSci 2017
- Smitha Milli, Falk Lieder, Tom Griffiths. 2017. When Does Bounded-Optimal Metareasoning Favor Few Cognitive Systems?. AAAI 2017
- Owain Evans, Andreas Stuhlmüller, John Salvatier, Daniel Filan. 2017. Modeling Agents with Probabilistic Programs.
- Nan Rong, Joseph Y. Halpern, Ashutosh Saxena. 2016. MDPs with Unawareness in Robotics. UAI 2016
3. Other topics
3.1. Adversarial training and testing
- Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, Angjoo Kanazawa . 2021. AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control. ACM Transactions on Graphics
- Cassidy Laidlaw, Sahil Singla, Soheil Feizi. 2021. Perceptual Adversarial Robustness: Defense Against Unseen Threat Models. ICLR 2021
- Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell. 2020. Adversarial Policies: Attacking Deep Reinforcement Learning. ICLR 2020
- Albert Zhan, Stas Tiomkin, Pieter Abbeel. 2020. Preventing Imitation Learning with Adversarial Policy Ensembles. ICLR 2020
- Marc Khoury, Dylan Hadfield-Menell. 2020. On the Geometry of Adversarial Examples. (Preprint)
- Xintong Wang, Michael P Wellman. 2020. Market Manipulation: An Adversarial Learning Framework for Detection and Evasion. 29th International Joint Conference on Artificial Intelligence
- Marc Khoury, Dylan Hadfield-Menell. 2019. Adversarial Training with Voronoi Constraints. (Preprint)
- Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, Dawn Song. 2019. Natural Adversarial Examples. CVPR 2021
3.2. AI capabilities, uncategorized
- Dan Hendrycks, Collin Burns, Anya Chen, Spencer Ball. 2021. CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review.
- Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt. 2021. Measuring mathematical problem solving with the math dataset.
- George Matheos, Alexander K. Lew, Matin Ghavamizadeh, Stuart Russell, Marco Cusumano-Towner, Vikash K. Mansinghka. 2021. Transforming Worlds: Automated Involutive MCMC for Open-Universe Probabilistic Models. Proc. 3rd Symposium on Advances in Approximate Bayesian Inference (AABI)
- Feiran Jia, Aditya Mate, Zun Li, Shahin Jabbari, Mithun Chakraborty, Milind Tambe, Michael Wellman, Yevgeniy Vorobeychik. 2021. A Game-Theoretic Approach for Hierarchical Policy-Making.
- Arnaud Fickinger, Hengyuan Hu, Brandon Amos, Stuart Russell, Noam Brown . 2021. Scalable Online Planning via Reinforcement Learning Fine-Tuning. NEurIPS 2021
- Hao Liu, Pieter Abbeel. 2021. Behavior From the Void: Unsupervised Active Pre-Training. NeurIPS 2021
- Adam Stooke, Kimin Lee, Pieter Abbeel, Michael Laskin. 2021. Decoupling Representation Learning from Reinforcement Learning. Proceedings of the 38th International Conference on Machine Learning
- Younggyo Seo, Lili Chen, Jinwoo Shin, Honglak Lee, Pieter Abbeel, Kimin Lee . 2021. State Entropy Maximization with Random Encoders for Efficient Exploration. ICML 2021
- Roshan Rao, Jason Liu, Robert Verkuil, Joshua Meier, John F. Canny, Pieter Abbeel, Tom Sercu, Alexander Rives. 2021. MSA Transformer. bioRxiv
- Hao Liu, Pieter Abbeel. 2021. APS: Active Pretraining with Successor Features. ICML 2021
- Boyuan Chen, Pieter Abbeel, Deepak Pathak. 2021. Unsupervised Learning of Visual 3D Keypoints for Control. ICML 2021
- Ajay Jain, Matthew Tancik, Pieter Abbeel. 2021. Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis.
- Paras Jain, Ajay Jain, Tianjun Zhang, Pieter Abbeel, Joseph E. Gonzalez, Ion Stoica. 2021. Contrastive Code Representation Learning.
- Seunghyun Lee, Younggyo Seo, Kimin Lee, Pieter Abbeel, Jinwoo Shin. 2021. Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble. CoRL 2021
- Wenling Shang, Xiaofei Wang, Aravind Srinivas, Aravind Rajeswaran, Yang Gao, Pieter Abbeel, Michael Laskin. 2021. Reinforcement Learning with Latent Flow.
- Lili Chen, Kimin Lee, Aravind Srinivas, Pieter Abbeel. 2021. Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings.
- Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch. 2021. Decision Transformer: Reinforcement Learning via Sequence Modeling.
- Charles Packer, Pieter Abbeel, Joseph E. Gonzalez . 2021. Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL. NeurIPS 2021
- Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, Yang Gao. 2021. Mastering Atari Games with Limited Data. NeurIPS 2021
- Michael Laskin, Denis Yarats, Hao Liu, Kimin Lee, Albert Zhan, Kevin Lu, Catherine Cang, Lerrel Pinto, Pieter Abbeel. 2021. URLB: Unsupervised Reinforcement Learning Benchmark.
- Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch. 2021. Pretrained Transformers as Universal Computation Engines.
- Abdus Salam Azad, Edward Kim, Qiancheng Wu, Kimin Lee, Ion Stoica, Pieter Abbeel, Sanjit A. Seshia. 2021. Scenic4RL: Programmatic Modeling and Generation of Reinforcement Learning Environments.
- Ellis Ratner; Andrea Bajcsy; Terrence Fong; Claire J. Tomlin; Anca D. Dragan. 2021. Efficient Dynamics Estimation With Adaptive Model Sets. IEEE Robotics and Automation Letters
- Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado van Hasselt, David Silver, Satinder Singh. 2021. Discovery of Options via Meta-Learned Subgoals.
- Zeyu Zheng, Vivek Veeriah, Risto Vuorio, Richard Lewis, Satinder Singh. 2021. Learning State Representations from Random Deep Action-Conditional Predictions. NeurIPS 2021
- Jonathan Stray. 2021. Making Algorithms Work for Reporting.
- Nemanja Djuric, Henggang Cui, Zhaoen Su, Shangxuan Wu, Huahua Wang, Fang-Chieh Chou, Luisa San Martin, Song Feng, Rui Hu, Yang Xu, Alyssa Dayan, Sidney Zhang, Brian C Becker, Gregory P Meyer, Carlos Vallespi-Gonzalez, Carl K Wellington. 2021. Multixnet: Multiclass multistage multimodal motion prediction.
- Arnaud Fickinger, Natasha Jaques, Samyak Parajuli, Michael Chang, Nicholas Rhinehart, Glen Berseth, Stuart Russell, Sergey Levine. 2021. Explore and Control with Adversarial Surprise.
- Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora, Ethan Guo, Collin Burns, Samir Puranik, Horace He, Dawn Song, Jacob Steinhardt. 2021. Measuring Coding Challenge Competence With APPS. NeurIPS 2021
- Minqi Jiang, Michael Dennis, Jack Parker-Holder, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel. 2021. Replay-Guided Adversarial Environment Design. NeurIPS 2021
- Abhinav Bhatia, Justin Svegliato, Shlomo Zilberstein. 2021. On the benefits of randomly adjusting anytime weighted A*.
- Shane Parr, Ishan Khatri, Justin Svegliato, Shlomo Zilberstein. 2021. Agent-aware state estimation for autonomous vehicles.
- Connor Basich, Justin Svegliato, Allyson Beach, Kyle H. Wray, Stefan Witwicki, Shlomo Zilberstein. 2021. Improving Competence via Iterative State Space Refinement. IROS 2021
- Abhinav Bhatia, Justin Svegliato, Shlomo Zilberstein. 2021. Tuning the hyperparameters of anytime planning: A deep reinforcement learning approach.
- Hankook Lee, Kibok Lee, Kimin Lee, Honglak Lee, Jinwoo Shin. 2021. Improving Transferability of Representations via Augmentation-Aware Self-Supervision. NeurIPS 2021
- Paria Rashidinejad, Xiao Hu, Stuart Russell. 2020. Patient-adaptable intracranial pressure morphology analysis using a probabilistic model-based approach. Physiological Measurement
- Sam Toyer, Felipe Trevizan, Sylvie Thiebaux, Lexing Xie. 2020. ASNets: Deep Learning for Generalised Planning. JAIR
- Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt. 2020. Measuring Massive Multitask Language Understanding. ICLR 2021
- Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, Sergey Levine. 2020. Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design. NeurIPS 2020
- Scott Emmons, Ajay Jain, Michael Laskin, Thanard Kurutach, Pieter Abbeel, Deepak Pathak. 2020. Sparse Graphical Memory for Robust Planning. NeurIPS 2020
- Thomas Krendl Gilbert, Andrew Loveridge. 2020. Subjectifying objectivity: Delineating tastes in theoretical quantum gravity research. Social Studies of Science
- Oliver Richardson, Joseph Y Halpern. 2020. Probabilistic Dependency Graphs. AAAI 2021
- Janarthanan Rajendran, Richard Lewis, Vivek Veeriah, Honglak Lee, Satinder Singh. 2020. How Should an Agent Practice?. AAAI-2020
- Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss, Hado Van Hasselt, David Silver, Satinder Singh. 2020. What Can Learned Intrinsic Rewards Capture?. ICML
- IEEE Transactions on Robotics. 2019. Bayesian Relational Memory for Semantic Visual Navigation. ICCV 2019
- Prasad Tadepall, Cameron Barrie, Stuart J. Russell. 2019. Learning Causal Trees with Latent Variables via Controlled Experimentation. AAAI 2019
- Junhyuk Oh, Yijie Guo, Satinder Singh, Honglak Lee. 2018. Self-Imitation Learning. ICML 2018
- Thanard Kurutach, Aviv Tamar, Ge Yang, Stuart Russell, Pieter Abbeel. 2018. Learning Plannable Representations with Causal InfoGAN. ICML 2018 Workshop on Planning and Learning
- Vivek Veeriah, Junhyuk Oh, Satinder Singh. 2018. Many-Goals Reinforcement Learning. (Preprint)
- Yi Wu, Siddharth Srivastava, Nicholas Hay, Simon Du, Stuart Russell. 2018. Discrete-Continuous Mixtures in Probabilistic Programming: Generalized Semantics and Inference Algorithms. ICML 2018
- Han-Chin Shing, Suraj Nair, Ayah Zirikly, Meir Friedenberg, Hal Daumé III, Philip Resnik. 2018. Expert, Crowdsourced, and Machine Assessment of Suicide Risk via Online Postings. Workshop on Computational Linguistics and Clinical Psychology 2018
- Paul Krueger, Falk Lieder, Tom Griffiths. 2017. Enhancing metacognitive reinforcement learning using reward structures and feedback. CogSci 2017
3.3. Cognitive science, uncategorized
- Ethan A. Brooks, Janarthanan Rajendran, Richard L. Lewis, Satinder Singh. 2021. Reinforcement Learning of Implicit and Explicit Control Flow Instructions.
- Thomas A. Langlois, H. Charles Zhao, Erin Grantd, Ishita Dasguptae, Thomas L. Griffiths, and Nori Jacoby. 2021. Passive Attention in Artificial Neural Networks Predicts Human Visual Selectivity. NeurIPS 2021
- Stephan C. Meylan, Sathvik Nair, Thomas L. Griffiths. 2021. Evaluating models of robust word recognition with serial reproduction. Cognition 2021
- Casey Lewry, Kaley Curtis, Nadya Vasilyeva, Fei Xu, Thomas L. Griffiths. 2021. Intuitions about magic track the development of intuitive physics. Cognition 2021
- Arjun Devraj, Qiong Zhang, Thomas L. Griffiths. 2021. The Dynamics of Exemplar and Prototype Representations Depend on Environmental Statistics.
- Ni Ji, Gurrein K Madan, Guadalupe I Fabre, Alyssa Dayan, Casey M Baker, Talya S Kramer, Ijeoma Nwabudike, Steven W Flavell. 2021. A neural circuit for flexible control of persistent behavioral states. eLife 2021
- Aditi Jha, Joshua Peterson, Thomas L. Griffiths. 2020. Extracting low-dimensional psychological representations from convolutional neural networks. CogSci 2020
- Alexander Todorov, Stefan Uddenberg, Joshua Peterson, Thomas Griffiths, Jordan Suchow. 2020. Data-Driven, Photorealistic Social Face-Trait Encoding, Prediction, and Manipulation Using Deep Neural Networks. Patent application
- Antonia Langenhoff, Alex Wiegmann, Joseph Y. Halpern, Joshua B. Tenenbaum, Tobias Gerstenberg. 2020. Predicting responsibility judgments from dispositional inferences and causal attributions. (Preprint)
- Mayank Agrawal, Joshua C. Peterson, Thomas L. Griffiths. 2020. Scaling up psychology via Scientific Regret Minimization. PNAS 2020
- R. Dubey, T. L. Griffiths. 2020. Reconciling novelty and complexity through a rational analysis of curiosity. Psychological Review, 127(3), 455–476
- Sophia Sanborn, Michael Chang, Sergey Levine, Thomas Griffiths. 2020. Sparse Skill Coding: Learning Behavioral Hierarchies with Sparse Codes. ICLR 2020 submission
- Thomas J. H. Morgan, Jordan W. Suchow, Thomas L. Griffiths. 2020. What the Baldwin Effect affects depends on the nature of plasticity. Cognition, 197
- Max Kleiman-Weiner, Felix Sosa, Bill Thompson, Sebastiaan van Opheusden, Tom Griffiths, Samuel Gershman, Fiery Cushman. 2020. Downloading Culture.zip: Social learning by program induction. CogSci 2020
- Anne S. Hsu, Jay B. Martin, Adam N. Sanborn, Thomas L. Griffiths. 2019. Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods 51:1706–1716
- Mathew Hardy, Tom Griffiths. 2019. Demonstrating the Impact of Prior Knowledge in Risky Choice. (Preprint)
- Arnon Lotem, Joseph Y. Halpern, Shimon Edelman, Oren Kolodny. 2017. The evolution of cognitive mechanisms in response to cultural innovations. PNAS
- David Bourgin, Falk Lieder, Daniel Reichman, Nimrod Talmon, Tom Griffiths. 2017. The Structure of Goal Systems Predicts Human Performance. CogSci 2017
3.4. Ethics for AI and AI development
- Thomas Krendl Gilbert. 2021. Mapping the Political Economy of Reinforcement Learning Systems: The Case of Autonomous Vehicles. Simons Institute Newsletter
- Miles Brundage, Shahar Avin, Jasmine Wang, Haydn Belfield, Gretchen Krueger, Gillian Hadfield, Heidy Khlaaf, Jingying Yang, Helen Toner, Ruth Fong, Tegan Maharaj, Pang Wei Koh, Sara Hooker, Jade Leung, Andrew Trask, Emma Bluemke, Jonathan Lebensold, Cullen O'Keefe, Mark Koren, Théo Ryffel, JB Rubinovitz, Tamay Besiroglu, Federica Carugati, Jack Clark, Peter Eckersley, Sarah de Haas, Maritza Johnson, Ben Laurie, Alex Ingerman, Igor Krawczuk, Amanda Askell, Rosario Cammarota, Andrew Lohn, David Krueger, Charlotte Stix, Peter Henderson, Logan Graham, Carina Prunkl, Bianca Martin, Elizabeth Seger, Noa Zilberman, Seán Ó hÉigeartaigh, Frens Kroeger, Girish Sastry, Rebecca Kagan, Adrian Weller, Brian Tse, Elizabeth Barnes, Allan Dafoe, Paul Scharre, Ariel Herbert-Voss, Martijn Rasser, Shagun Sodhani, Carrick Flynn, Thomas Krendl Gilbert, Lisa Dyer, Saif Khan, Yoshua Bengio, Markus Anderljung. 2020. Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims. (Preprint)
- Ravit Dotan, Smitha Milli. 2020. Value-laden Disciplinary Shifts in Machine Learning. (Preprint)
- Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, Jacob Steinhardt. 2020. Aligning AI With Shared Human Values. ICLR 2021
- John Miller, Smitha Milli, Moritz Hardt. 2019. Strategic Classification is Causal Modeling in Disguise. FAT* 2019
- McKane Andrus, Thomas Krendl Gilbert. 2019. Towards a Just Theory of Measurement: A Principled Social Measurement Assurance Program for Machine Learning. AIES 2019
- Roel Dobbe, Thomas Krendl Gilbert, Yonatan Mintz. 2019. Hard Choices in Artificial Intelligence: Addressing Normative Uncertainty through Sociotechnical Commitments. NeurIPS 2019
- Smitha Milli, John Miller, Anca D. Dragan, Moritz Hardt. 2019. The Social Cost of Strategic Classification. FAT* 2019
- Thomas Krendl Gilbert, Yonatan Mintz. 2019. Epistemic Therapy for Bias in Automated Decision-Making. AIES 2019
- Roel Dobbe, Sarah Dean, Thomas Gilbert, Nitin Kohli. 2018. A Broader View on Bias in Automated Decision-Making: Reflecting on Epistemology and Dynamics. FAT/ML 2018
3.5. Robust inference, learning, and planning
- Zaynah Javed, Daniel S. Brown, Satvik Sharma, Jerry Zhu, Ashwin Balakrishna, Marek Petrik, Anca D. Dragan, Ken Goldberg. 2021. Policy Gradient Bayesian Robust Optimization for Imitation Learning. ICML 2021
- Justin Svegliato, Connor Basich, Sandhya Saisubramanian and Shlomo Zilberstein. 2021. Using metareasoning to maintain and restore safety for reliable autonomy.
- Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, Justin Gilmer. 2020. The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization.
- Dan Hendrycks, Xiaoyuan Liu, Eric Wallace, Adam Dziedzic, Rishabh Krishnan, Dawn Song. 2020. Pretrained Transformers Improve Out-of-Distribution Robustness. Association for Computational Linguistics (ACL)
- Dan Hendrycks, Norman Mu, Ekin D. Cubuk, Barret Zoph, Justin Gilmer, Balaji Lakshminarayanan. 2020. AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty. ICLR 2020
- Paria Rashidinejad, Jiantao Jiao, Stuart Russell. 2020. SLIP: Learning to predict in unknown dynamical systems with long-term memory.
- Dieqiao Feng, Carla P Gomes, Bart Selman. 2020. Solving hard AI planning instances using curriculum-driven deep reinforcement learning.
- Adam Stooke, Joshua Achiam, Pieter Abbeel. 2020. Responsive Safety in Reinforcement Learning by PID Lagrangian Methods. ICML 2020
- Jaime F. Fisac, Neil F. Lugovoy, Vicenç Rubies-Royo, Shromona Ghosh, Claire J. Tomlin. 2019. Bridging Hamilton-Jacobi Safety Analysis and Reinforcement Learning. IEEE 2019
- Karthika Mohan, Judea Pearl. 2019. Graphical Models for Processing Missing Data. JASA
- Kush Bhatia, Yi-An Ma, Anca D. Dragan, Peter L. Bartlett, Michael I. Jordan. 2019. Bayesian Robustness: A Nonasymptotic Viewpoint. (Preprint)
- Margaret P. Chapman, Jonathan Lacotte, Aviv Tamar, Donggun Lee, Kevin M. Smith, Victoria Cheng, Jaime F. Fisac, Susmit Jha, Marco Pavone, Claire J. Tomlin. 2019. A Risk-Sensitive Finite-Time Reachability Approach for Safety of Stochastic Dynamic Systems. American Control Conference (ACC) 2019
- Ruibo Tu, Cheng Zhang, Paul Ackermann, Karthika Mohan, Hedvig Kjellström, Kun Zhang. 2019. Causal Discovery in the Presence of Missing Data. AISTATS 2019
- Dan Hendrycks, Steven Basart, Mantas Mazeika, Mohammadreza Mostajabi, Jacob Steinhardt, Dawn Song. 2019. Scaling Out-of-Distribution Detection for Real-World Settings.
- Daniel Kang, Yi Sun, Dan Hendrycks, Tom Brown, Jacob Steinhardt. 2019. Testing robustness against unforeseen adversaries.
- Dan Hendrycks, Mantas Mazeika, Saurav Kadavath, Dawn Song. 2019. Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty. NeurIPS 2019
- Dan Hendrycks, Kimin Lee, Mantas Mazeika. 2019. Using Pre-Training Can Improve Model Robustness and Uncertainty. ICML 2019
- Dan Hendrycks, Thomas Dietterich. 2019. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations. ICLR 2019
- Dan Hendrycks, Mantas Mazeika, Thomas Dietterich. 2019. Deep Anomaly Detection with Outlier Exposure. ICLR 2019
- Karthika Mohan. 2018. On Handling Self-masking and Other Hard Missing Data Problems. AAAI 2018
- Karthika Mohan, Felix Thoemmes, Judea Pearl. 2018. Estimation with Incomplete Data: The Linear Case. IJCAI 2018
- Si Liu, Risheek Garrepalli, Thomas G Dietterich, Alan Fern, Dan Hendrycks. 2018. Open Category Detection with PAC Guarantees. ICML 2018
- Dan Hendrycks, Mantas Mazeika, Duncan Wilson, Kevin Gimpel. 2018. Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise. NeurIPS 2018
3.6. Security problems and solutions
- Sushil Jajodia, George Cybenko, V. S. Subrahmanian, Vipin Swarup, Cliff Wang, Michael Wellman. 2020. Adaptive Autonomous Secure Cyber Systems. Springer/Nature Books
- Ivan Geffner, Joseph Y. Halpern. 2019. Security in Asynchronous Interactive Systems. (Preprint)
- Xinlei Pan, Weiyao Wang, Xiaoshuai Zhang, Bo Li, Jinfeng Yi, Dawn Song. 2019. How You Act Tells a Lot: Privacy-Leaking Attack on Deep Reinforcement Learning. AAMAS 2019
- Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, Michael P. Wellman. 2018. SoK: Security and Privacy in Machine Learning. IEEE European Symposium on Security and Privacy
3.7. Transparency & interpretability
- Daniel Filan, Stephen Casper, Shlomi Hod, Cody Wild, Andrew Critch, Stuart Russell. 2021. Clusterability in Neural Networks.
- Pulkit Verma, Shashank Rao Marpally, Siddharth Srivastava. 2021. Asking the Right Questions: Learning Interpretable Action Models Through Query Answering.
- Olivia Watkins, Sandy Huang, Julius Frost, Kush Bhatia, Eric Weiner, Pieter Abbeel, Trevor Darrell, Bryan Plummer, Kate Saenko, Anca Dragan. 2021. Explaining robot policies.
- Jonathan Stray. 2021. Show me the algorithm: Transparency in recommendation systems.
- Shlomi Hod, Stephen Casper, Daniel Filan, Cody Wild, Andrew Critch, Stuart Russell. 2021. Detecting Modularity in Deep Neural Networks.
- Daniel Filan, Shlomi Hod, Cody Wild, Andrew Critch, Stuart Russell. 2019. Pruned Neural Networks are Surprisingly Modular. (Preprint, under review NeurIPS 2020)
- Smitha Milli, Ludwig Schmidt, Anca D. Dragan, Moritz Hardt. 2019. Model Reconstruction from Model Explanations. FAT* 2019
- Jacob Andreas, Anca Dragan, Dan Klein. 2017. Translating Neuralese. ACL 2017