Research
CHAI aims to reorient the foundations of AI research toward the development of provably beneficial systems. Currently, it is not possible to specify a formula for human values in any form that we know would provably benefit humanity, if that formula were instated as the objective of a powerful AI system. In short, any initial formal specification of human values is bound to be wrong in important ways. This means we need to somehow represent uncertainty in the objectives of AI systems. This way of formulating objectives stands in contrast to the standard model for AI, in which the AI system's objective is assumed to be known completely and correctly.
Therefore, much of CHAI's research efforts to date have focussed on developing and communicating a new model of AI development, in which AI systems should be uncertain of their objectives, and should be deferent to humans in light of that uncertainty. However, our interests extend to a variety of other problems in the development of provably beneficial AI systems. Our areas of greatest focus so far have been the foundations of rational agency and causality, value alignment and inverse reinforcement learning, human-robot cooperation, multi-agent perspectives and applications, and models of bounded or imperfect rationality. Other areas of interest to our mission include adversarial training and testing for ML systems, various AI capabilities, topics in cognitive science, ethics for AI and AI development robust inference and planning, security problems and solutions, and transparency and interpretability methods.
In addition to purely academic work, CHAI strives to produce intellectual outputs for general audiences as well. We also advise governments and international organizations on policies relevant to ensuring AI technologies will benefit society, and offer insight on a variety of individual-scale and societal-scale risks from AI, such as pertaining to autonomous weapons, the future of employment, and public health and safety.
Below is a list of CHAI's publications since we began operating in 2016. Many of our publications are collaborations with other AI research groups; we view collaborations as key to integrating our perspectives into mainstream AI research.
1. Overviews
1.1. Books
- Joseph Y. Halpern. 2017. Actual Causality (Book). MIT Press 2016
- Stuart Russell. 2019. Human Compatible: Artificial Intelligence and The Problem of Control. Penguin Random House
- Stuart Russell. 2020. Artificial Intelligence: A Modern Approach (Textbook, 4th Edition). Pearson
1.2. Overviews of societal-scale risks from AI
- Andrew Critch, David Krueger. 2020. AI Research Considerations for Human Existential Safety (ARCHES). (Preprint)
- Olaf Graf, Mark Nitzberg. 2018. Solomon’s Code: Humanity in a World with Thinking Machines. Pegasus Books
- Stuart Russell. 2018. The new weapons of mass destruction?. The Security Times
2. Core topics
2.1. Foundations of rational agency & causality
- Andrew Critch. 2019. A Parametric, Resource-Bounded Generalization of Löb’s Theorem, and a Robust Cooperation Criterion for Open-Source Game Theory. The Journal of Symbolic Logic, Cambridge University Press
- Gadi Aleksandrowicz, Hana Chockler, Joseph Y. Halpern, Alexander Ivrii. 2017. The Computational Complexity of Structure-Based Causality. JAIR
- Joseph Y. Halpern. 2016. Sufficient Conditions for Causality to be Transitive. Philosophy of Science, 83, 213--226
- Joseph Y. Halpern. 2018. A Note on the Existence of Ratifiable Acts. Review of Symbolic Logic
- Joseph Y. Halpern, Evan Piermont. 2019. Partial Awareness. AAAI 2019
- Joseph Y. Halpern, Rafael Pass. 2019. A Conceptually Well-Founded Characterization of Iterated Admissibility Using an ”All I Know” Operator. TARK 2019
- Sander Beckers, Frederick Eberhardt, Joseph Y. Halpern. 2019. Approximate Causal Abstraction. UAI 2019
- Sander Beckers, Joseph Y. Halpern. 2019. Abstracting causal models. AAAI 2019
- Tom Everitt, Daniel Filan, Mayank Daswani, Marcus Hutter. 2016. Self-Modification of Policy and Utility Function in Rational Agents. AGI 2016
2.2. Value alignment and inverse reinforcement learning
- Aaron Tucker, Adam Gleave, Stuart Russell. 2018. Inverse reinforcement learning for video games. NeurIPS 2018 Deep RL Workshop
- Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike. 2020. Quantifying Differences in Reward Functions. (Preprint, under review NeurIPS 2020)
- Adam Gleave, Oliver Habryka. 2018. Multi-task Maximum Entropy Inverse Reinforcement Learning. ICML 2018 Goals RL Workshop
- Alexander Matt Turner, Dylan Hadfield-Menell, Prasad Tadepalli. 2020. Conservative agency via attainable utility preservation.. AIES 2020
- Andreea Bobu, Andrea Bajcsy, Jaime F. Fisac, Sampada Deglurkar, Anca D. Dragan. 2019. Quantifying Hypothesis Space Misspecification in Learning from Human-Robot Demonstrations and Physical Corrections. IEEE Transactions on Robotics
- Andreea Bobu, Dexter R.R. Scobee, Jaime F. Fisac, S. Shankar Sastry, Anca D. Dragan. 2020. LESS is More: Rethinking Probabilistic Models of Human Behavior. HRI 2020
- Chandrayee Basu, Mukesh Singhal, Anca D. Dragan. 2018. Learning from Richer Human Guidance: Augmenting Comparison-Based Learning with Feature Queries. HRI 2018
- Chris Cundy, Daniel Filan. 2018. Exploring Hierarchy-Aware Inverse Reinforcement Learning. Unpublished (ICML 2018 Goals RL Workshop)
- Dhruv Malik, Malayandi Palaniappan, Jaime F. Fisac, Dylan Hadfield-Menell, Stuart Russell, Anca D. Dragan. 2018. An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning. ICML 2018
- Dmitrii Krasheninnikov, Rohin Shah, Herke van Hoof. 2019. Combining reward information from multiple sources. NeurIPS 2019 Learning with Rich Experience Workshop
- Donald J. Hejna III, Pieter Abbeel, Lerrel Pinto. 2019. Hierarchically Decoupled Imitation for Morphological Transfer. (Preprint)
- Dorsa Sadigh, Anca Dragan, S. Shankar Sastry, Sanjit Seshia. 2017. Active Preference-Based Learning of Reward Functions. RSS 2017
- Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell. 2016. Cooperative Inverse Reinforcement Learning. NeurIPS 2016
- Dylan Hadfield-Menell, Gillian K. Hadfield. 2020. Incomplete Contracting and AI Alignment. AIES 2020
- Dylan Hadfield-Menell, McKane Andrus, Gillian Hadfield. 2019. Legible Normativity for AI Alignment: The Value of Silly Rules. AIES 2019
- Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart Russell, Anca Dragan. 2017. Inverse Reward Design. NeurIPS 2017
- Ellis Ratner, Dylan Hadfield-Menell, Anca D. Dragan. 2018. Simplifying Reward Design through Divide-and-Conquer. RSS 2018
- Gokul Swamy, Siddharth Reddy, Sergey Levine, Anca D. Dragan. 2020. Scaled Autonomy: Enabling Human Operators to Control Robot Fleets. ICRA 2020
- Hong Jun Jeon, Smitha Milli, Anca D. Dragan. 2019. Reward-rational (implicit) choice: A unifying formalism for reward learning. (Preprint)
- Jaime F. Fisac, Monica A. Gates, Jessica B. Hamrick, Chang Liu, Dylan Hadfield-Menell, Malayandi Palaniappan, Dhruv Malik, S. Shankar Sastry, Thomas L. Griffiths, Anca D. Dragan. 2017. Pragmatic-Pedagogic Value Alignment. ISRR 2017
- Jason Y. Zhang, Anca D. Dragan. 2019. Learning from Extrapolated Corrections. ICRA 2019
- Kareem Amin, Nan Jiang, Satinder Singh. 2017. Repeated Inverse Reinforcement Learning. NIPS 2017
- Kelvin Xu, Ellis Ratner, Anca Dragan, Sergey Levine, Chelsea Finn. 2019. Learning a Prior over Intent via Meta-Inverse Reinforcement Learning. ICML 2019
- Lawrence Chan, Dylan Hadfield-Menell, Siddhartha Srinivasa, Anca Dragan. 2019. The Assistive Multi-Armed Bandit. HRI 2019
- Matthew Rahtz, James Fang, Anca D. Dragan, Dylan Hadfield-Menell. 2019. An Extensible Interactive Interface for Agent Design. ICML 2019 Human-in-the-Loop Learning Workshop
- Mayank Agrawal, Joshua C. Peterson, Thomas L. Griffiths. 2019. Using Machine Learning to Guide Cognitive Modeling: A Case Study in Moral Reasoning. CogSci 2019
- Nicholas C. Landolfi, Anca D. Dragan. 2018. Social Cohesion in Autonomous Driving. IROS 2018
- Ori Plonsky, Reut Apel, Eyal Ert, Moshe Tennenholtz, David Bourgin, Joshua C. Peterson, Daniel Reichman, Thomas L. Griffiths, Stuart J. Russell, Evan C. Carter, James F. Cavanagh, Ido Erev. 2019. Predicting human decisions with behavioral theories and machine learning. (Preprint)
- Rachel Freedman, Jana Schaich Borg, Walter Sinnott-Armstrong, John P. Dickerson, Vincent Conitzer. 2020. Adapting a kidney exchange algorithm to align with human values. Artificial Intelligence, 283
- Rohin Shah, Dmitrii Krasheninnikov, Jordan Alexander, Pieter Abbeel, Anca Dragan. 2019. Preferences Implicit in the State of the World. ICLR 2019
- Rohin Shah, Noah Gundotra, Pieter Abbeel, Anca D. Dragan. 2019. On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference. ICML 2019
- Sandy H. Huang, Isabella Huang, Ravi Pandya, Anca D. Dragan. 2019. Nonverbal Robot Feedback for Human Teachers. CoRL 2019
- Siddharth Reddy, Anca D. Dragan, Sergey Levine. 2020. SQIL: Imitation Learning via Regularized Behavioral Cloning.. ICLR 2020
- Siddharth Reddy, Anca D. Dragan, Sergey Levine, Shane Legg, Jan Leike. 2019. Learning Human Objectives by Evaluating Hypothetical Behavior. (Preprint)
- Smitha Milli, Anca D. Dragan. 2019. Literal or Pedagogic Human? Analyzing Human Model Misspecification in Objective Learning. UAI 2019
- Smitha Milli, Pieter Abbeel, Igor Mordatch. 2020. Interpretable and Pedagogical Examples. (Preprint)
- Sören Mindermann, Rohin Shah, Adam Gleave, Dylan Hadfield-Menell. 2018. Active Inverse Reward Design. ICML 2018 GoalsRL workshop
- Xue Bin Peng, Angjoo Kanazawa, Sam Toyer, Pieter Abbeel, Sergey Levine. 2019. Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow. ICLR 2019
- Zeyu Zheng, Junhyuk Oh, Satinder Singh. 2018. On Learning Intrinsic Rewards for Policy Gradient Methods. NeurIPS 2018
2.3. Human-robot cooperation
- Aaron Bestick, Ravi Pandya, Ruzena Bajcsy, Anca D. Dragan. 2018. Learning Human Ergonomic Preferences for Handovers. ICRA 2018
- Aaron Bestick, Ruzena Bajcsy, Anca Dragan. 2016. Implicitly Assisting Humans to Choose Good Grasps in Robot to Human Handovers. 2016 International Symposium on Experimental Robotics
- Allan Zhou, Anca D. Dragan. 2018. Cost Functions for Robot Motion Style. IROS 2018
- Allan Zhou, Dylan Hadfield-Menell, Anusha Nagabaudi, Anca Dragan. 2017. Expressive Robot Motion Timing. HRI 2017
- Andrea Bajcsy, Dylan P. Losey, Marcia K. O'Malley, Anca D. Dragan. 2018. Learning from Physical Human Corrections, One Feature at a Time. HRI 2018
- Andrew Critch, Stuart Russell. 2019. Servant of Many Masters: Shifting priorities in Pareto-optimal sequential decision-making. AIES 2019
- Chandrayee Basu, Qian Yang, David Hungerman, Mukesh Singhal, Anca Dragan. 2017. Do You Want Your Autonomous Car to Drive Like You?. HRI 2017
- Chang Liu, Jessica B. Hamrick, Jaime F. Fisac, Anca D. Dragan, J. Karl Hedrick, S. Shankar Sastry, Thomas L. Griffiths. 2017. Goal Inference Improves Objective and Perceived Performance in Human-Robot Collaboration. AAMAS 2017
- David Fridovich-Keil, Andrea Bajcsy, Jaime F. Fisac, Sylvia L. Herbert, Steven Wang, Anca D. Dragan, Claire J. Tomlin. 2018. Confidence-aware motion prediction for real-time collision avoidance. International Journal of Robotics Research
- David Fridovich-Keil, Ellis Ratner, Lasse Peters, Anca D. Dragan, Claire J. Tomlin. 2020. Efficient Iterative Linear-Quadratic Approximations for Nonlinear Multi-Player General-Sum Differential Games. ICRA 2020
- Dorsa Sadigh, Nick Landolfi, Shankar S. Sastry, Sanjit A. Seshia, Anca D. Dragan. 2018. Planning for cars that coordinate with people: leveraging effects on human actions for planning and active information gathering over human internal state. Autonomous Robots
- Dorsa Sadigh, S. Shankar Sastry, Sanjit A. Seshia, Anca Dragan. 2016. Information Gathering Actions Over Human Internal State. IROS 2016
- Dorsa Sadigh, Shankar Sastry, Sanjit Seshia, Anca Dragan. 2016. Planning for Autonomous Cars that Leverage Effects on Human Actions. RSS 2016
- Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell. 2017. The Off-Switch Game. IJCAI 2017
- Elis Stefansson, Jaime F. Fisac, Dorsa Sadigh, S. Shankar Sastry, Karl H. Johansson. 2019. Human-robot interaction for truck platooning using hierarchical dynamic games. European Control Conference 2019
- Jaime F. Fisac, Anayo K. Akametalu, Melanie N. Zeilinger, Shahab Kaynama, Jeremy Gillula, Claire J. Tomlin. 2016. A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems. IEEE Transactions on Automatic Control
- Jaime F. Fisac, Andrea Bajcsy, Sylvia L. Herbert, David Fridovich-Keil, Steven Wang, Claire J. Tomlin, Anca D. Dragan. 2018. Probabilistically Safe Robot Planning with Confidence-Based Human Predictions. RSS 2018
- Jaime F. Fisac, Chang Liu, Jessica B. Hamrick, S. Shankar Sastry, J. Karl Hedrick, Thomas L. Griffiths, Anca D. Dragan. 2016. Generating Plans that Predict Themselves. CDC 2016
- Liting Sun, Wei Zhan, Masayoshi Tomizuka, Anca D. Dragan. 2018. Courteous Autonomous Cars. IROS 2018
- Micah Carroll, Rohin Shah, Mark Ho, Thomas Griffiths, Sanjit Seshia, Pieter Abbeel, Anca Dragan. 2019. On the Utility of Learning about Humans for Human-AI Coordination. NeurIPS 2019
- Michael Laskey, Caleb Chuck, Jonathan Lee, Jeffrey Mahler, Sanjay Krishnan, Kevin Jamieson, Anca Dragan, Ken Goldberg. 2017. Comparing Human-Centric and Robot-Centric Sampling for Robot Deep Learning from Demonstrations. ICRA 2017
- Minae Kwon, Sandy H. Huang, Anca D. Dragan. 2018. Expressing Robot Incapability. HRI 2018
- Negar Mehr, Roberto Horowitz, Anca Dragan. 2016. Inferring and Assisting with Constraints in Shared Autonomy. CDC 2016
- Rohan Choudhury, Gokul Swamy, Dylan Hadfield-Menell, Anca D. Dragan. 2019. On the Utility of Model Learning in HRI. HRI 2019
- Sandy H. Huang, David Held, Pieter Abbeel, Anca Dragan. 2017. Enabling Robots to Communicate their Objectives. RSS 2017
- Sandy H. Huang, Kush Bhatia, Pieter Abbeel, Anca D. Dragan. 2018. Establishing Appropriate Trust via Critical States. IROS 2018
- Sarath Sreedharan, Siddharth Srivastava, David Smith, Subbarao Kambhampati. 2019. Why Can’t You Do That, HAL? Explaining Unsolvability of Planning Tasks. IJCAI 2019
- Shihui Li, Yi Wu, Xinyue Cui, Honghua Dong, Fei Fang, Stuart Russell. 2019. Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. AAAI 2019
- Shun Zhang, Edmund H. Durfee, Satinder P. Singh. 2018. Minimax-regret querying on side effects for safe optimality in factored Markov decision processes. IJCAI 2018
- Siddharth Reddy, Anca D. Dragan, Sergey Levine. 2018. Shared Autonomy via Deep Reinforcement Learning. RSS 2018
- Siddharth Reddy, Anca D. Dragan, Sergey Levine. 2018. Where Do You Think You’re Going?: Inferring Beliefs about Dynamics from Behavior. NeurIPS 2018
- Smitha Milli, Dylan Hadfield-Menell, Anca Dragan, Stuart Russell. 2017. Should Robots be Obedient?. IJCAI 2017
- Somil Bansal, Andrea Bajcsy, Ellis Ratner, Anca D. Dragan, Claire J. Tomlin. 2020. A Hamilton-Jacobi Reachability-Based Framework for Predicting and Analyzing Human Motion for Safe Planning. ICRA 2020
- Vael Gates, Thomas L. Griffiths, Anca D. Dragan. 2020. How to Be Helpful to Multiple People at Once. Other cognitive science 44(6)
2.4. Multi-agent perspectives and applications
- Adam Bjorndahl, Joseph Y. Halpern, Rafael Pass. 2017. Reasoning about Rationality. Games and Economic Behavior 104, 146-164
- Anagha Kulkarni, Siddharth Srivastava, Subbarao Kambhampati. 2019. A unified framework for planning in adversarial and cooperative environments. AAAI 2019
- Andrew Whalen, Thomas L. Griffiths, Daphna Buchsbaum. 2018. Sensitivity to Shared Information in Social Learning. 3.3. Cognitive science, uncategorized
- Arunesh Sinha, Michael P. Wellman. 2019. Incentivizing Collaboration in a Competition. AAMAS 2019
- Bryce Wiedenbeck, Fengjun Yang, Michael P. Wellman. 2018. A Regression Approach for Modeling Games with Many Symmetric Players. AAAI 2018
- Ittai Abraham, Danny Dolev, Ivan Geffner, Joseph Y. Halpern. 2019. Implementing Mediators with Asynchronous Cheap Talk. PODC 2019
- Ittai Abraham, Danny Dolev, Joseph Y. Halpern. 2019. Distributed Protocols for Leader Election: A Game-Theoretic Perspective. ACM Transactions on Economics and Computation 7(1)
- Jialu Bao, Kun He, Xiaodong Xin, Bart Selman, John E. Hopcroft. 2020. Hidden Community Detection on Two-layer Stochastic Models: a Theoretical Perspective. (Preprint, submitted to TAMC 2020)
- Joseph Y. Halpern, Rafael Pass. 2018. Game Theory with Translucent Players. International Journal of Game Theory
- Joseph Y. Halpern, Rafael Pass. 2019. Sequential equilibrium in computational games. ACM Transactions on Economics and Computation
- Joseph Y. Halpern, Rafael Pass, Daniel Reichman. 2019. On the Existence of Nash Equilibrium in Games with Resource-Bounded Players. SAGT 2019
- Joseph Y. Halpern, Rafael Pass, Lior Seeman. 2017. Computational Extensive-Form Games. EC 2016
- Joseph Y. Halpern, Rafael Pass, Lior Seeman. 2019. The truth behind the myth of the folk theorem. Games and Economic Behavior, 117
- Joseph Y. Halpern, Xavier Vilaca. 2016. Rational Consensus (extended abstract). 2016 ACM Symposium on Principles of Distributed Computing
- Mark K. Ho, Joanna Korman, Thomas L. Griffiths. 2019. The Computational Structure of Unintentional Meaning. CogSci 2019
- Mason Wright and Michael P. Wellman. 2018. Evaluating the Stability of Non-Adaptive Trading in Continuous Double Auctions. AAMAS 2018
- Megan Shearer, Gabriel Rauterberg, Michael P. Wellman. 2019. An Agent-Based Model of Financial Benchmark Manipulation. ICML 2019
- Meir Friedenberg, Joseph Y. Halpern. 2019. Blameworthiness in Multi-Agent Settings. AAAI 2019
- Michael Wellman, Eric Sodomka, Amy Greenwald. 2017. Self-confirming price-prediction strategies for simultaneous one-shot auctions. Games and Economic Behavior, 102, 339–372
- Natasha Alechina, Joseph Y. Halpern, Brian Logan. 2017. Causality, Responsibility and Blame in Team Plans. AAMAS 2017
- Natasha Alechina, Joseph Y. Halpern, Ian A. Kash, Brian Logan. 2018. Incentive-Compatible Mechanisms for Norm Monitoring in Open Multi-agent perspectives and applications. JAIR
- Nishant Desai, Andrew Critch, Stuart J. Russell. 2018. Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making. NeurIPS 2018
- Raphael Köster, Dylan Hadfield-Menell, Gillian K. Hadfield, Joel Z. Leibo. 2020. Silly rules improve the capacity of agents to learn stable enforcement and compliance behaviors. AAMAS 2020
- Robert D. Hawkins, Noah D. Goodman, Adele E. Goldberg, Thomas L. Griffiths. 2020. Generalizing meanings from partners to populations: Hierarchical inference supports convention formation on networks. CogSci 2020
- Stefano V. Albrechta, Peter Stone, Michael P. Wellman. 2020. Special issue on autonomous agents modelling other agents: Guest editorial. Artificial Intelligence 285
- Thanh H. Nguyen, Yongzhao Wang, Arunesh Sinha, Michael P. Wellman. 2019. Deception in finitely repeated security games. AAAI 2019
- Valerio Capraro, Joseph Y Halpern. 2020. Translucent players: Explaining cooperative behavior in social dilemmas. Rationality and Society 31(4), 371-408
- Xintong Wang, Chris Hoang, Michael P. Wellman. 2019. Learning-Based Trading Strategies in the Face of Market Manipulation. ICML 2019 Workshop on AI in Finance
- Zun Li, Michael P. Wellman. 2020. Structure Learning for Approximate Solution of Many-Player Games. AAAI 2020
2.5. Models of bounded or imperfect rationality
- Amitai Shenhav, Sebastian Musslick, Falk Lieder, Wouter Kool, Thomas L Griffiths, Jonathan D Cohen, Matthew M Botvinick. 2017. Toward a Rational and Mechanistic Account of Mental Effort. Annual Review of Neuroscience, 40, 9f4b26db33-124
- Falk Lieder, Amitai Shenhav, Sebastian Musslick, Thomas L. Griffiths. 2018. Rational metareasoning and the plasticity of cognitive control. PLoS Comp. Biol.
- Falk Lieder, Owen X. Chen, Paul M. Krueger, Thomas L. Griffiths. 2020. Cognitive prostheses for goal achievement. Nature Human Behaviour 3:1096–1106
- Falk Lieder, Paul Krueger, Tom Griffiths. 2017. An automatic method for discovering rational heuristics for risky choice. CogSci 2017
- Falk Lieder, Thomas L. Griffiths. 2019. Resource-rational analysis: understanding human cognition as the optimal use of limited computational resources. Behavioral and Brain Sciences, 43, E1
- Falk Lieder, Thomas L. Griffiths. 2020. Advancing rational analysis to the algorithmic level. Behavioral and Brain Sciences, 43, E27
- Falk Lieder, Thomas L. Griffiths, Ming Hsu. 2018. Overrepresentation of extreme events in decision making reflects rational use of cognitive resources. Psychological Review
- Falk Lieder, Thomas L. Griffiths, Quentin J. M. Huys, Noah D. Goodman. 2018. Empirical evidence for resource-rational anchoring and adjustment. Psychonomic Bulletin & Review
- Falk Lieder, Thomas L. Griffiths, Quentin J. M. Huys, Noah D. Goodman. 2018. The anchoring bias reflects rational use of cognitive resources. Psychonomic Bulletin & Review
- Frederick Callaway, Antonio Rangel, Tom Griffiths. 2020. Fixation patterns in simple choice are consistent withoptimal use of cognitive resources. (Preprint)
- Frederick Callaway, Tom Griffiths. 2019. Attention in value-based choice as optimal sequential sampling. (Preprint)
- Joseph Y. Halpern, Lior Seeman. 2018. Is state-dependent valuation more adaptive than simpler rules?. Behavioural Processes
- Joshua Peterson, David Bourgin, Daniel Reichman, Thomas Griffiths, Stuart Russell. 2019. Cognitive model priors for predicting human decisions. ICML 2019
- Mark K. Ho, David Abel, Jonathan D. Cohen, Michael L. Littman, Thomas L. Griffiths. 2020. The Efficiency of Human Cognition Reflects Planned Information Processing. AAAI 2020
- Mark K. Ho, David Abel, Tom Griffiths, Michael L. Littman. 2019. The Value of Abstraction. Current Opinion in Behavioral Sciences, 29:111-116
- Nan Rong, Joseph Y. Halpern, Ashutosh Saxena. 2016. MDPs with Unawareness in Robotics. UAI 2016
- Ruairidh M. Battleday, Joshua C. Peterson, Thomas L. Griffiths. 2019. Capturing human categorization of natural images at scale by combining deep networks and cognitive models. (Preprint)
- Smitha Milli, Falk Lieder, Tom Griffiths. 2017. When Does Bounded-Optimal Metareasoning Favor Few Cognitive Systems?. AAAI 2017
- Smitha Milli, Falk Lieder, Tom Griffiths. 2020. A Rational Reinterpretation of Dual-Process Theories. UAI 2020
- Thomas L. Griffiths, Frederick Callaway, Michael B. Chang, Erin Grant, Paul M. Krueger, Falk Lieder. 2019. Doing more with less: meta-reasoning and meta-learning in humans and machines. Current Opinion in Behavioral Sciences 29: 24-30
3. Other topics
3.1. Adversarial training and testing
- Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell. 2020. Adversarial Policies: Attacking Deep Reinforcement Learning. ICLR 2020
- Albert Zhan, Stas Tiomkin, Pieter Abbeel. 2020. Preventing Imitation Learning with Adversarial Policy Ensembles. ICLR 2020
- Marc Khoury, Dylan Hadfield-Menell. 2019. Adversarial Training with Voronoi Constraints. (Preprint)
- Marc Khoury, Dylan Hadfield-Menell. 2020. On the Geometry of Adversarial Examples. (Preprint)
3.2. AI capabilities, uncategorized
- IEEE Transactions on Robotics. 2019. Bayesian Relational Memory for Semantic Visual Navigation. ICCV 2019
- Junhyuk Oh, Yijie Guo, Satinder Singh, Honglak Lee. 2018. Self-Imitation Learning. ICML 2018
- Paul Krueger, Falk Lieder, Tom Griffiths. 2017. Enhancing metacognitive reinforcement learning using reward structures and feedback. CogSci 2017
- Prasad Tadepall, Cameron Barrie, Stuart J. Russell. 2019. Learning Causal Trees with Latent Variables via Controlled Experimentation. AAAI 2019
- Sam Toyer, Felipe Trevizan, Sylvie Thiebaux, Lexing Xie. 2020. ASNets: Deep Learning for Generalised Planning. JAIR
- Thanard Kurutach, Aviv Tamar, Ge Yang, Stuart Russell, Pieter Abbeel. 2018. Learning Plannable Representations with Causal InfoGAN. ICML 2018 Workshop on Planning and Learning
- Vivek Veeriah, Junhyuk Oh, Satinder Singh. 2018. Many-Goals Reinforcement Learning. (Preprint)
- Yi Wu, Siddharth Srivastava, Nicholas Hay, Simon Du, Stuart Russell. 2018. Discrete-Continuous Mixtures in Probabilistic Programming: Generalized Semantics and Inference Algorithms. ICML 2018
3.3. Cognitive science, uncategorized
- Aditi Jha, Joshua Peterson, Thomas L. Griffiths. 2020. Extracting low-dimensional psychological representations from convolutional neural networks. CogSci 2020
- Alexander Todorov, Stefan Uddenberg, Joshua Peterson, Thomas Griffiths, Jordan Suchow. 2020. Data-Driven, Photorealistic Social Face-Trait Encoding, Prediction, and Manipulation Using Deep Neural Networks. Patent application
- Anne S. Hsu, Jay B. Martin, Adam N. Sanborn, Thomas L. Griffiths. 2019. Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods 51:1706–1716
- Antonia Langenhoff, Alex Wiegmann, Joseph Y. Halpern, Joshua B. Tenenbaum, Tobias Gerstenberg. 2020. Predicting responsibility judgments from dispositional inferences and causal attributions. (Preprint)
- Arnon Lotem, Joseph Y. Halpern, Shimon Edelman, Oren Kolodny. 2017. The evolution of cognitive mechanisms in response to cultural innovations. PNAS
- David Bourgin, Falk Lieder, Daniel Reichman, Nimrod Talmon, Tom Griffiths. 2017. The Structure of Goal Systems Predicts Human Performance. CogSci 2017
- Mathew Hardy, Tom Griffiths. 2019. Demonstrating the Impact of Prior Knowledge in Risky Choice. (Preprint)
- Mayank Agrawal, Joshua C. Peterson, Thomas L. Griffiths. 2020. Scaling up psychology via Scientific Regret Minimization. PNAS 2020
- R. Dubey, T. L. Griffiths. 2020. Reconciling novelty and complexity through a rational analysis of curiosity. Psychological Review, 127(3), 455–476
- Sophia Sanborn, Michael Chang, Sergey Levine, Thomas Griffiths. 2020. Sparse Skill Coding: Learning Behavioral Hierarchies with Sparse Codes. ICLR 2020 submission
- Thomas J. H. Morgan, Jordan W. Suchow, Thomas L. Griffiths. 2020. What the Baldwin Effect affects depends on the nature of plasticity. Cognition, 197
3.4. Ethics for AI and AI development
- John Miller, Smitha Milli, Moritz Hardt. 2019. Strategic Classification is Causal Modeling in Disguise. FAT* 2019
- McKane Andrus, Thomas Krendl Gilbert. 2019. Towards a Just Theory of Measurement: A Principled Social Measurement Assurance Program for Machine Learning. AIES 2019
- Miles Brundage, Shahar Avin, Jasmine Wang, Haydn Belfield, Gretchen Krueger, Gillian Hadfield, Heidy Khlaaf, Jingying Yang, Helen Toner, Ruth Fong, Tegan Maharaj, Pang Wei Koh, Sara Hooker, Jade Leung, Andrew Trask, Emma Bluemke, Jonathan Lebensold, Cullen O'Keefe, Mark Koren, Théo Ryffel, JB Rubinovitz, Tamay Besiroglu, Federica Carugati, Jack Clark, Peter Eckersley, Sarah de Haas, Maritza Johnson, Ben Laurie, Alex Ingerman, Igor Krawczuk, Amanda Askell, Rosario Cammarota, Andrew Lohn, David Krueger, Charlotte Stix, Peter Henderson, Logan Graham, Carina Prunkl, Bianca Martin, Elizabeth Seger, Noa Zilberman, Seán Ó hÉigeartaigh, Frens Kroeger, Girish Sastry, Rebecca Kagan, Adrian Weller, Brian Tse, Elizabeth Barnes, Allan Dafoe, Paul Scharre, Ariel Herbert-Voss, Martijn Rasser, Shagun Sodhani, Carrick Flynn, Thomas Krendl Gilbert, Lisa Dyer, Saif Khan, Yoshua Bengio, Markus Anderljung. 2020. Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims. (Preprint)
- Ravit Dotan, Smitha Milli. 2020. Value-laden Disciplinary Shifts in Machine Learning. (Preprint)
- Roel Dobbe, Sarah Dean, Thomas Gilbert, Nitin Kohli. 2018. A Broader View on Bias in Automated Decision-Making: Reflecting on Epistemology and Dynamics. FAT/ML 2018
- Roel Dobbe, Thomas Krendl Gilbert, Yonatan Mintz. 2019. Hard Choices in Artificial Intelligence: Addressing Normative Uncertainty through Sociotechnical Commitments. NeurIPS 2019
- Smitha Milli, John Miller, Anca D. Dragan, Moritz Hardt. 2019. The Social Cost of Strategic Classification. FAT* 2019
- Thomas Krendl Gilbert, Yonatan Mintz. 2019. Epistemic Therapy for Bias in Automated Decision-Making. AIES 2019
3.5. Robust inference, learning, and planning
- Jaime F. Fisac, Neil F. Lugovoy, Vicenç Rubies-Royo, Shromona Ghosh, Claire J. Tomlin. 2019. Bridging Hamilton-Jacobi Safety Analysis and Reinforcement Learning. IEEE 2019
- Karthika Mohan. 2018. On Handling Self-masking and Other Hard Missing Data Problems. AAAI 2018
- Karthika Mohan, Felix Thoemmes, Judea Pearl. 2018. Estimation with Incomplete Data: The Linear Case. IJCAI 2018
- Karthika Mohan, Judea Pearl. 2019. Graphical Models for Processing Missing Data. JASA
- Kush Bhatia, Yi-An Ma, Anca D. Dragan, Peter L. Bartlett, Michael I. Jordan. 2019. Bayesian Robustness: A Nonasymptotic Viewpoint. (Preprint)
- Margaret P. Chapman, Jonathan Lacotte, Aviv Tamar, Donggun Lee, Kevin M. Smith, Victoria Cheng, Jaime F. Fisac, Susmit Jha, Marco Pavone, Claire J. Tomlin. 2019. A Risk-Sensitive Finite-Time Reachability Approach for Safety of Stochastic Dynamic Systems. American Control Conference (ACC) 2019
- Ruibo Tu, Cheng Zhang, Paul Ackermann, Karthika Mohan, Hedvig Kjellström, Kun Zhang. 2019. Causal Discovery in the Presence of Missing Data. AISTATS 2019
3.6. Security problems and solutions
- Ivan Geffner, Joseph Y. Halpern. 2019. Security in Asynchronous Interactive Systems. (Preprint)
- Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, Michael P. Wellman. 2018. SoK: Security and Privacy in Machine Learning. IEEE European Symposium on Security and Privacy
- Sushil Jajodia, George Cybenko, V. S. Subrahmanian, Vipin Swarup, Cliff Wang, Michael Wellman. 2020. Adaptive Autonomous Secure Cyber Systems. Springer/Nature Books
- Xinlei Pan, Weiyao Wang, Xiaoshuai Zhang, Bo Li, Jinfeng Yi, Dawn Song. 2019. How You Act Tells a Lot: Privacy-Leaking Attack on Deep Reinforcement Learning. AAMAS 2019
3.7. Transparency & interpretability
- Daniel Filan, Shlomi Hod, Cody Wild, Andrew Critch, Stuart Russell. 2019. Pruned Neural Networks are Surprisingly Modular. (Preprint, under review NeurIPS 2020)
- Jacob Andreas, Anca Dragan, Dan Klein. 2017. Translating Neuralese. ACL 2017
- Smitha Milli, Ludwig Schmidt, Anca D. Dragan, Moritz Hardt. 2019. Model Reconstruction from Model Explanations. FAT* 2019