Research Publications – Center for Human-Compatible Artificial Intelligence

Research

CHAI aims to reorient the foundations of AI research toward the development of provably beneficial systems. Currently, it is not possible to specify a formula for human values in any form that we know would provably benefit humanity, if that formula were instated as the objective of a powerful AI system. In short, any initial formal specification of human values is bound to be wrong in important ways. This means we need to somehow represent uncertainty in the objectives of AI systems. This way of formulating objectives stands in contrast to the standard model for AI, in which the AI system's objective is assumed to be known completely and correctly.

Therefore, much of CHAI's research efforts to date have focussed on developing and communicating a new model of AI development, in which AI systems should be uncertain of their objectives, and should be deferent to humans in light of that uncertainty. However, our interests extend to a variety of other problems in the development of provably beneficial AI systems. Our areas of greatest focus so far have been the foundations of rational agency and causality, value alignment and inverse reinforcement learning, human-robot cooperation, multi-agent perspectives and applications, and models of bounded or imperfect rationality. Other areas of interest to our mission include adversarial training and testing for ML systems, various AI capabilities, topics in cognitive science, ethics for AI and AI development robust inference and planning, security problems and solutions, and transparency and interpretability methods.

In addition to purely academic work, CHAI strives to produce intellectual outputs for general audiences as well. We also advise governments and international organizations on policies relevant to ensuring AI technologies will benefit society, and offer insight on a variety of individual-scale and societal-scale risks from AI, such as pertaining to autonomous weapons, the future of employment, and public health and safety.

Below is a list of CHAI's publications since we began operating in 2016. Many of our publications are collaborations with other AI research groups; we view collaborations as key to integrating our perspectives into mainstream AI research.

0. NOTE:

Jonathan Stray. 2022. Risk Ratios. NICAR 2022
J Stray. 2022. Better Conflict Bulletin.
OA Dada, G Obaido, IT Sanusi, K Aruleba, AA Yunusa. 2022. Hidden Gold for IT Professionals, Educators, and Students: Insights From Stack Overflow Survey. IEEE Transactions on Computational Social Systems
OA Dada, K Aruleba, AA Yunusa, IT Sanusi, G Obaido. 2022. Information Technology Roles and Their Most-Used Programming Languages. Oluwaseun Alexander Dada, Kehinde Aruleba, Abdullahi Abubakar Yunusa, Ismaila Temitayo Sanusi, George Obaido

1. Overviews

1.1. Books

C Giffin, T Lombrozo. 2022. Mens Rea in Moral Judgment and Criminal Law. Oxford Academic
Stuart Russell. 2021. Human-Compatible Artificial Intelligence. Human-Like Machine Intelligence
Stuart Russell. 2020. Artificial Intelligence: A Modern Approach (Textbook, 4th Edition). Pearson
Stuart Russell. 2019. Human Compatible: Artificial Intelligence and The Problem of Control. Penguin Random House
Joseph Y. Halpern. 2017. Actual Causality (Book). MIT Press 2016

1.2. Overviews of societal-scale risks from AI

Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian Hadfield, Jeff Clune, Tegan Maharaj, Frank Hutter, Atılım Günes Baydin, Sheila McIlraith, Qiqi Gao, Ashwin Acharya, David Krueger, Anca Dragan, Philip Torr, Stuart Russell, Daniel Kahneman, Jan Brauner, Sören Mindermann. 2023. Managing AI Risks in an Era of Rapid Progress. arXiv:2310.17688
Erik Jones, Anca Dragan, Aditi Raghunathan, Jacob Steinhardt. 2023. Automatically Auditing Large Language Models via Discrete Optimization. arXiv:2303.04381
Stuart Russell. 2022. The promises and perils of AI. interview by Kay Firth-Butterfield and Robin Pomeroy, World Economic Forum Radio Davos
Stuart Russell. 2022. Is the rise of killer machines closer than we think?. interview by Damian Whitworth, The Times
Stuart Russell. 2022. If we succeed. Daedalus
Stuart Russell. 2022. The best of Radio Davos over the last year. World Economic Forum, August 4, 2022
Stuart Russell. 2022. The Foundations of Artificial Intelligence. interview by Daniel bashir, The Gradient podcast
Stuart Russell. 2022. Politicians must prepare for AI or face the consequences. The House (magazine of the UK Houses of Parliament)
Stuart Russell. 2022. AI experts are increasingly afraid of what they’re creating. by Kelsey Pieper, Vox
Stuart Russell. 2022. Rethinking the purpose of AI. interview by Mark Leonard, The World in 30 Minutes, European Council on Foreign Relations
Stuart Russell. 2022. Are we living in an AGI World?. interview by Kay Firth-Butterfield, In AI We Trust? podcast
Stuart Russell. 2022. Banning Lethal Autonomous Weapons: An Education. Issues in Science and Technology, XXXVIII(3)
Stuart Russell. 2022. Why we need to regulate non-state use of arms. Global Agenda, World Economic Forum
Stuart Russell. 2022. Lethal Autonomous Weapons. interview by Anna Höhn, Deutsche Welle
Stuart Russell. 2022. Robotic Weapons Are Coming: What Should We Do About It. by Charlie Burton, Wired,
Stuart Russell. 2022. Microdrones: the AI assassins set to become weapons of mass destruction. interview by Henry Bodkin and Aisling O'Leary
Stuart Russell. 2022. Defense Primer: U.S. Policy on Lethal Autonomous Weapon Systems. Congressional Research Service
Stuart Russell. 2022. Israel’s Autonomous Urban Quadcopter Brings ‘Search & Attack In One’. by David Hambling, Forbes Magazine
J Stray. 2022. Understanding Recommenders..
J Stray. 2022. Democratic Control of Recommender Systems. Metagovernance seminar
Dan Hendrycks. 2022. Natural Selection Favors AIs over Humans. [arxiv]
Dan Hendrycks, Mantas Mazeika. 2022. X-Risk Analysis for AI Research. arXiv:2206.05862
Anthony M. Barrett, Dan Hendrycks, Jessica Newman, Brandie Nonnecke. 2022. Actionable Guidance for High-Consequence AI Risk Management: Towards Standards Addressing AI Catastrophic Risks. arXiv:2206.08966
Thomas Krendl Gilbert, S Dean, N Lambert, T Zick; A Snoswell. 2022. Reward Reports for Reinforcement Learning. Responsible Decision Making in Dynamic Environments workshop, ICML 2022
Thomas Krendl Gilbert , Aaron J. Snoswell , Michael Dennis , Rowan McAllister , and Cathy Wu. 2022. Sociotechnical Specification for the Broader Impacts of Autonomous Vehicles. Fresh Perspectives on the Future of Autonomous Driving workshop, ICRA 2022
Nathaniel Lubinarchive pageThomas Krendl Gilbertarchive page. 2022. Social media is polluting society. Moderation alone won’t fix the problem. MIT Technology Review
Thomas Krendl Gilbert, Sarah Dean, Tom Zick, Nathan Lambert. 2022. Choices, Risks, and Reward Reports: Charting Public Policy for Reinforcement Learning Systems. Center for Long-Term Cybersecurity Whitepaper Series
McKane Andrus, Sarah Dean, Thomas Krendl Gilbert, Nathan Lambert, Tom Zick. 2021. AI Development for the Public Interest: From Abstraction Traps to Sociotechnical Risks.
Simon Zhuang, Dylan Hadfield-Menell. 2021. Consequences of Misaligned AI. NeurIPS 2020
Raja Chatila, Virginia Dignum, Michael Fisher, Fosca Giannotti, Katharina Morik, Stuart Russell, Karen Yeung. 2021. Trustworthy AI. Reflections on Artificial Intelligence for Humanity
Stuart Russell. 2021. The history and future of AI. Oxford Review of Economic Policy
Jonathan Stray. 2021. Beyond Engagement: Aligning Algorithmic Recommendations With Prosocial Goals.
Dan Hendrycks, Nicholas Carlini, John Schulman, Jacob Steinhardt. 2021. Unsolved Problems in ML Safety.
Andrew Critch, David Krueger. 2020. AI Research Considerations for Human Existential Safety (ARCHES). (Preprint)
Olaf Graf, Mark Nitzberg. 2018. Solomon’s Code: Humanity in a World with Thinking Machines. Pegasus Books
Stuart Russell. 2018. The new weapons of mass destruction?. The Security Times
Stuart Russell. . Artificial Intelligence and the Problem of Control. Perspectives on Digital Humanism

1.3. Overviews of beneficial AI applications

Anca Dragan, Andrew Alleyne, Frank Allgöwer, Aaron Ames, Saurabh Amin, James Anderson, Anuradha Annaswamy, Panos Antsaklis, Neda Bagheri, Hamsa Balakrishnan, Bassam Bamieh, John Baras, Margret Bauer, Alexandre Bayen, Paul Bogdan, Steven Brunton, Francesco Bullo, Etienne Burdet, Joel Burdick, Laurent Burlion, Carlos Canudas de Wit, Ming Cao, Christos Cassandras, Aranya Chakrabortty, Giacomo Como, Marie Csete, Fabrizio Dabbene, Munther Dahleh, Amritam Das, Eyal Dassau, Claudio De Persis, Mario di Bernardo, Stefano Di Cairano, Dimos Dimarogonas, Florian Dörfler, John Doyle, Francis Doyle III, Magnus Egerstedt, Johan Eker, Sarah Fay, Dimitar Filev, Angela Fontan, Elisa Franco, Masayuki Fujita, Mario Garcia-Sanz, Dennice Gayme, WPMH Heemels, João Hespanha, Sandra Hirche, Anette Hosoi, Jonathan How, Gabriela Hug, Marija Ilić, Hideaki Ishii, Ali Jadbabaie, Matin Jafarian, Samuel Qing-Shan Jia, Tor Johansen, Karl Johansson, Dalton Jones, Mustafa Khammash, Pramod Khargonekar, Mykel Kochenderfer, Andreas Krause, Anthony Kuh, Dana Kulić, Françoise Lamnabhi-Lagarrigue, Naomi Leonard, Frederick Leve, Na Li, Steven Low, John Lygeros, Iven Mareels, Sonia Martinez, Nikolai Matni, Tommaso Menara, Katja Mombaur, Kevin Moore, Richard Murray, Toru Namerikawa, Angelia Nedich, Sandeep Neema, Mariana Netto, Timothy O’Leary, Marcia O’Malley, Lucy Pao, Antonis Papachristodoulou, George Pappas, Philip Paré, Thomas Parisini, Fabio Pasqualetti, Marco Pavone, Akshay Rajhans, Gireeja Ranade, Anders Rantzer, Lillian Ratliff, J Anthony Rossiter, Dorsa Sadigh, Tariq Samad, Henrik Sandberg, Sri Sarma, Luca Schenato, Jacquelien Scherpen, Angela Schoellig, Rodolphe Sepulchre, Jeff Shamma, Robert Shorten, Bruno Sinopoli, Koushil Sreenath, Jakob Stoustrup, Jing Sun, Paulo Tabuada, Emma Tegling, Dawn Tilbury, Claire Tomlin, Jana Tumova, Kevin Wise, Dan Work, Junaid Zafar, Melanie Zeilinger. 2023. Control for Societal-scale Challenges: Road Map 2030. IEEE Control Systems Society Publication, 2023
Jonathan Stray. 2022. Designing Recommender Systems to Depolarize. First Monday
Raphael Taiwo Aruleba, Tayo Alex Adekiya, Nimibofa Ayawei, George Obaido, Kehinde Aruleba, Ibomoiye Domor Mienye, Idowu Aruleba, Blessing Ogbuokiri. 2022. COVID-19 diagnosis: a review of rapid antigen, RT-PCR and artificial intelligence methods. Bioengineering 9 (4), 153
Jocelyn Maclure, Stuart Russell. 2021. AI for Humanity: The Global Challenges. Reflections on Artificial Intelligence for Humanity

2. Core topics

1.2. Overviews of societal-scale risks from AI

Andrew Critch, Stuart Russell. 2023. TASRA: A Taxonomy of Societal-Scale Risks from AI. arXiv

2.1. Foundations of rational agency & causality

Hanlin Zhu, Baihe Huang, Stuart Russell. 2023. On Representation Complexity of Model-based and Model-free Reinforcement Learning. arXiv:2310.01706
Carlos G Correa, Mark K Ho, Frederick Callaway, Nathaniel D Daw, Thomas L Griffiths. 2023. Humans decompose tasks by trading off utility and computational cost. Journal PLOS Computational Biology
Cameron Rouse Turner, Thomas Morgan, Tom Griffiths. 2023. The joint evolution of sensory systems and decision policy allows cognition. Proceedings of the Annual Meeting of the Cognitive Science Society
Jian-Qiao Zhu, Adam Sanborn, Nick Chater, Tom Griffiths. 2023. Computation-Limited Bayesian Updating. Proceedings of the Annual Meeting of the Cognitive Science Society
Hanlin Zhu, Baihe Huang, Stuart Russell. 2023. On Representation Complexity of Model-based and Model-free Reinforcement Learning. arXiv:2310.01706
Hanlin Zhu, Ruosong Wang, Jason Lee. 2023. Provably Efficient Reinforcement Learning via Surprise Bound. International Conference on Artificial Intelligence and Statistics
Hanlin Zhu, Amy Zhang. 2023. Provably Efficient Offline Goal-Conditioned Reinforcement Learning with General Function Approximation and Single-Policy Concentrability. arXiv:2302.03770
Hanlin Zhu, Paria Rashidinejad, Jiantao Jiao. 2023. Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning. Advances in Neural Information Processing Systems (NeurIPS)
Ted Moskovitz, Brendan O’Donoghue, Vivek Veeriah, Sebastian Flennerhag, Satinder Singh, Tom Zahavy. 2023. Reload: Reinforcement learning with optimistic ascent-descent for last-iterate convergence in constrained mdps. In Proc. 40 th International Conference on Machine Learning
Qi Zhang, Edmund H Durfee, Satinder Singh. 2023. Risk-aware analysis for interpretations of probabilistic achievement and maintenance commitments. Artificial Intelligence Volume 317
Sander Beckers, Joseph Halpern, Christopher Hitchcock. 2023. Causal Models with Constraints. Conference on Causal Learning and Reasoning
Sander Beckers, Hana Chockler, Joseph Y Halpern. 2023. Quantifying harm. Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI 2023)
Adam Bjorndahl, Joseph Y Halpern. 2023. Sequential Language-based Decisions. R. Verbrugge (Ed.): Theoretical Aspects of Rationality and Knowledge 2023 (TARK 2023)
Oliver E Richardson, Joseph Y Halpern, Christopher De Sa. 2023. Inference for probabilistic dependency graphs. Proceedings of the 39th Conference on Uncertainty in Artificial Intelligence (UAI 2023)
Cassidy Laidlaw, Stuart Russell, and Anca Dragan. 2023. Bridging RL Theory and Practice with the Effective Horizon. ICML 2023 Workshop on New Frontiers in Learning, Control, and Dynamical Systems; Proc. NeurIPS-23
Paria Rashidinejad, Hanlin Zhu, Kunhe Yang, Stuart Russell, and Jiantao Jiao. 2023. Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian. Published as a conference paper at ICLR 2023
Weirui Ye, Pieter Abbeel, Yang Gao. 2022. Spending Thinking Time Wisely: Accelerating MCTS with Virtual Expansions. Neural Information Processing Systems (NeurIPS), 2023
H. Geffner and R. Dechter. 2022. Probabilistic and Causal Inference: The Works of Judea Pearl. ACM Press, 2022
J Halpern, M. Soloviev. 2022. Information acquisition under resource limitations in a noisy environment. Journal of the ACM
J Halpern, H. Chockler. 2022. On testing for discrimination using causal models. Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-21)
J Halpern, S. Peters. 2022. Reasoning about causal models with infinitely many variables. Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-21)
T Blanchard, D Murray, T Lombrozo. 2022. Experiments on causal exclusion. Mind & Language 37 (5), 1067-1089
N Vasil, T Lombrozo. 2022. Explanations and Causal Judgments Are Differentially Sensitive to Covariation and Mechanism Information. Frontiers in Psychology 13
T Vrantsidis, T Lombrozo. 2022. Simplicity beyond probability: Simplicity’s role in evaluating explanations goes beyond providing cues to priors and likelihoods. Proceedings of the Annual Meeting of the Cognitive Science Society 44 (44)
Smitha Milli, Luca Belli, Moritz Hardt. 2022. Causal Inference Struggles with Agency on Online Platforms. FaccT 2022
Paria Rashidinejad, Banghua Zhu, Cong Ma, Jiantao Jiao, Stuart Russell. 2021. Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism.
David Silver, Satinder Singh, Doina Precup, and Richard Sutton.. 2021. Reward is Enough. Artificial Intelligence 2021
David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, and Satinder Singh. 2021. On the Expressivity of Markov Reward. NeurIPS 2021
Christopher Grimm, Andre Barreto, Gregory Farquhar, David Silver, and Satinder Singh. 2021. Proper Value Equivalence.
Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, Satinder Singh. 2021. Reward is Enough for Convex MDPs.
Theodore R. Sumers, Robert D. Hawkins, Mark K. Ho, Thomas L. Griffiths. 2021. Extending rational models of communication from beliefs to actions.
Smitha Milli, Luca Belli, Moritz Hardt. 2021. Causal Inference Struggles with Agency on Online Platforms.
Sander Beckers, Frederick Eberhardt, Joseph Y Halpern. 2020. Approximate Causal Abstractions. PMLR
Dalal Alrajeh, Hana Chockler, Joseph Y Halpern. 2020. Combining experts’ causal judgments. AAAI; Elsevier
Andrew Critch. 2019. A Parametric, Resource-Bounded Generalization of Löb’s Theorem, and a Robust Cooperation Criterion for Open-Source Game Theory. The Journal of Symbolic Logic, Cambridge University Press
Joseph Y. Halpern, Evan Piermont. 2019. Partial Awareness. AAAI 2019
Joseph Y. Halpern, Rafael Pass. 2019. A Conceptually Well-Founded Characterization of Iterated Admissibility Using an ”All I Know” Operator. TARK 2019
Sander Beckers, Frederick Eberhardt, Joseph Y. Halpern. 2019. Approximate Causal Abstraction. UAI 2019
Sander Beckers, Joseph Y. Halpern. 2019. Abstracting causal models. AAAI 2019
Joseph Y. Halpern. 2018. A Note on the Existence of Ratifiable Acts. Review of Symbolic Logic
Meir Friedenberg, Joseph Y. Halpern. 2018. Combining the Causal Judgments of Experts with Possibly Different Focus Areas. International Conference on Principles of Knowledge Representation and Reasoning
Gadi Aleksandrowicz, Hana Chockler, Joseph Y. Halpern, Alexander Ivrii. 2017. The Computational Complexity of Structure-Based Causality. JAIR
Joseph Y. Halpern. 2016. Sufficient Conditions for Causality to be Transitive. Philosophy of Science, 83, 213--226
Tom Everitt, Daniel Filan, Mayank Daswani, Marcus Hutter. 2016. Self-Modification of Policy and Utility Function in Rational Agents. AGI 2016

2.2. Value alignment and inverse reinforcement learning

Sunayana Rane, Mark Ho, Ilia Sucholutsky, Thomas L. Griffiths. 2023. Concept Alignment as a Prerequisite for Value Alignment. arXiv:2310.20059
Rachel Freedman, Justin Svegliato, Kyle Wray, Stuart Russell. 2023. Active teacher selection for reinforcement learning from human feedback. arXiv:2310.15288
Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Bıyık, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell. 2023. Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback. arXiv:2307.15217
Joar Skalse, Lucy Farnik, Sumeet Ramesh Motwani, Erik Jenner, Adam Gleave, Alessandro Abate. 2023. STARC: A General Framework For Quantifying Differences Between Reward Functions. arXiv:2309.15257
Manan Tomar, Dibya Ghosh, Vivek Myers, Anca Dragan, Matthew E. Taylor, Philip Bachman, Sergey Levine. 2023. Video-Guided Skill Discovery. ICML 2023 Workshop The Many Facets of Preference-Based Learning
Jeremy Tien, Jerry Zhi-Yang He, Zackory Erickson, Anca D Dragan, Daniel S Brown. 2023. Causal Confusion and Reward Misidentification in Preference-Based Reward Learning. arXiv:2204.06601
Khanh Nguyen. 2023. Language Models are Bounded Pragmatic Speakers: Understanding RLHF from a Bayesian Cognitive Modeling Perspective. In ToM workshop @ ICML, 2023
Changyeon Kim, Younggyo Seo, Hao Liu, Lisa Lee, Jinwoo Shin, Honglak Lee, Kimin Lee. 2023. Guide Your Agent with Adaptive Multimodal Rewards. arXiv:2309.10790
Alexander Pan, Jun Shern Chan, Andy Zou, Nathaniel Li, Steven Basart, Thomas Woodside, Hanlin Zhang, Scott Emmons, Dan Hendrycks. 2023. Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark. International Conference on Machine Learning
Micah Carroll, Alan Chan, Henry Ashton, David Krueger. 2023. Characterizing Manipulation from AI Systems. Journal EEAMO 2023
Richard Ngo, Lawrence Chan, Sören Mindermann. 2023. The alignment problem from a deep learning perspective. arXiv:2209.00626
Evan Hubinger, Adam Jermyn, Johannes Treutlein, Rubi Hudson, Kate Woolverton. 2023. Conditioning Predictive Models: Risks and Strategies. arXiv:2302.00805
Ademi Adeniji, Amber Xie, Carmelo Sferrazza, Younggyo Seo, Stephen James, Pieter Abbeel. 2023. Language reward modulation for pretraining reinforcement learning. arXiv:2308.12270
Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, Kimin Lee. 2023. DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models. arXiv:2305.16381
Alejandro Escontrela, Ademi Adeniji, Wilson Yan, Ajay Jain, Xue Bin Peng, Ken Goldberg, Youngwoon Lee, Danijar Hafner, Pieter Abbeel. 2023. Video Prediction Models as Rewards for Reinforcement Learning. arXiv:2305.14343
Changyeon Kim, Jongjin Park, Jinwoo Shin, Honglak Lee, Pieter Abbeel, Kimin Lee. 2023. Preference transformer: Modeling human preferences using transformers for RL. arXiv:2303.00957
Kimin Lee, Hao Liu, Moonkyung Ryu, Olivia Watkins, Yuqing Du, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Shixiang Shane Gu. 2023. Aligning text-to-image models using human feedback. arXiv:2302.12192
Ted Moskovitz, Aaditya K Singh, DJ Strouse, Tuomas Sandholm, Ruslan Salakhutdinov, Anca D Dragan, Stephen McAleer. 2023. Confronting Reward Model Overoptimization with Constrained RLHF. arXiv:2310.04373
Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Bıyık, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell. 2023. Open problems and fundamental limitations of reinforcement learning from human feedback. arXiv:2307.15217
Cassidy Laidlaw, Shivam Singhal, Anca Dragan. 2023. Preventing Reward Hacking with Occupancy Measure Regularization. ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems
Gaurav R Ghosal, Matthew Zurek, Daniel S Brown, Anca D Dragan. 2023. The Effect of Modeling Human Rationality Level on Learning Rewards from Multiple Feedback Types. In Proc. AAAI Conference on Artificial Intelligence
Daniel Shin, Anca D Dragan, Daniel S Brown. 2023. Benchmarks and algorithms for offline preference-based reward learning. arXiv:2301.01392
M Srivastava, E Biyik, S Mirchandani, N Goodman, D Sadigh. 2022. Assistive Teaching of Motor Control Tasks to Humans. arXiv preprint arXiv:2211.14003
M. Carroll, D. Hadfield-Menell, S. Russell, and A.D. Dragan.. 2022. Estimating and Penalizing Induced Preference Shifts in Recommender Systems. . International Conference on Machine Learning (ICML), 2022.
J. Lin, D. Fried, D. Klein, and A.D. Dragan. 2022. Inducing Structure in Reward Learning by Learning Features. International Journal of Robotics Research, 2022.
Alejandro Escontrela, Xue Bin Peng, Wenhao Yu, Tingnan Zhang, Atil Iscen, Ken Goldberg, Pieter Abbeel. 2022. Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Zhao Mandi, Fangchen Liu, Kimin Lee, Pieter Abbeel. 2022. Towards more Generalizable One-shot Visual Imitation Learning. IEEE International Conference on Robotics and Automation (ICRA)
Xinran Liang, Katherine Shu, Kimin Lee, Pieter Abbeel. 2022. Reward Uncertainty for Exploration in Preference-based Reinforcement Learning. 8th International Conference on Learning Representations (ICLR
Jongjin_Park, Younggyo Seo, Jinwoo Shin, Honglak Lee, Pieter Abbeel, Kimin Lee. 2022. SURF: Semi-supervised Reward Learning with Data Augmentation for Feedback-efficient Preference-based Reinforcement Learning. 8th International Conference on Learning Representations (ICLR)
Jonathan Stray, Alon Halevy, Parisa Assar, Dylan Hadfield-Menell, Craig Boutilier, Amar Ashar, Lex Beattie, Michael Ekstrand, Claire Leibowicz, Connie Moon Sehat, Sara Johansen, Lianne Kerlin, David Vickrey, Spandana Singh, Sanne Vrijenhoek, Amy Zhang, McKane Andrus, Natali Helberger, Polina Proutskova, Tanushree Mitra, Nina Vasan. 2022. Building Human Values into Recommender Systems: An Interdisciplinary Synthesis.
E Bıyık, DP Losey, M Palan, NC Landolfi, G Shevchuk, D Sadigh. 2022. Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences. The International Journal of Robotics Research 41 (1), 45-67
V Myers, E Biyik, N Anari, D Sadigh. 2022. Learning multimodal rewards from rankings. Conference on Robot Learning, 342-352
Erdem Bıyık, Aditi Talati, Dorsa Sadigh. 2022. APReL: A Library for Active Preference-based Reward Learning Algorithms. Proceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction
Peter Barnett1, , Rachel Freedman , Justin Svegliato and Stuart Russell. 2022. Active Reward Learning from Multiple Teachers. SafeAI at AAAI
Mantas Mazeika, Eric Tang, Andy Zou, Steven Basart, Jun Shern Chan, Dawn Song, David Forsyth, Jacob Steinhardt, Dan Hendrycks. 2022. How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios. NeurIPS 2022
Donald Joseph Hejna III, Dorsa Sadigh. 2022. Few-Shot Preference Learning for Human-in-the-Loop RL. Proceedings of the 6th Conference on Robot Learning (CoRL), December 2022
R Shah, V Varma, R Kumar, M Phuong, V Krakovna, J Uesato, Z Kenton. 2022. Goal Misgeneralization: Why Correct Specifications Aren’t Enough For Correct Goals. arXiv preprint arXiv:2210.01790
Rohin Shah, Steven H Wang, Cody Wild, Stephanie Milani, Anssi Kanervisto, Vinicius G Goecks, Nicholas Waytowich, David Watkins-Valls, Bharat Prakash, Edmund Mills, Divyansh Garg, Alexander Fries, Alexandra Souly, Jun Shern Chan, Daniel del Castillo, Tom Lieberum . 2022. Retrospective on the 2021 MineRL BASALT Competition on Learning from Human Feedback. NeurIPS 2021 Competitions and Demonstrations Track, 259-272
David Zhang, Micah Carroll, Andreea Bobu, Anca Dragan. 2022. Time-Efficient Reward Learning via Visually Assisted Cluster Ranking. Human-in-the-loop Learning (HILL) Workshop, NeurIPS 2022.
Adam Gleave, Mohammad Taufeeque, Juan Rocamonde, Erik Jenner, Steven H. Wang, Sam Toyer, Maximilian Ernestus, Nora Belrose, Scott Emmons, Stuart Russell. 2022. imitation: Clean Imitation Learning Implementations. arXiv
Erik Jenner, Herke Van Hoof, Adam Gleave. 2022. Calculus on MDPs: Potential Shaping as a Gradient. arXiv.
Adam Gleave, Sam Toyer. 2022. A Primer on Maximum Causal Entropy Inverse Reinforcement Learning. arXiv
Adam Gleave, Geoffrey Irving. 2022. Uncertainty Estimation for Language Reward Models. arXiv
Joar Skalse, Matthew Farrugia-Roberts, Stuart Russell, Alessandro Abate, Adam Gleave. 2022. Invariance in Policy Optimisation and Partial Identifiability in Reward Learning. arXiv
T Westenbroek, A Siththaranjan, M Sarwari, CJ Tomlin, S Sastry. 2022. On the computational consequences of cost function design in nonlinear optimal control. 2022 IEEE 61st Conference on Decision and Control (CDC), 7423-7430
E Jenner, JMV Skalse, A Gleave. 2022. A general framework for reward function distances. NeurIPS ML Safety Workshop
Daniel Fried, Dan Klein, Anca Dragan. 2022. Inferring Rewards from Language in Context. ACL 2022.
Peter Barnett, Rachel Freedman, Justin Svegliato, Stuart Russell. 2022. Active Reward Learning from Multiple Teachers. SafeAI Workshop (at AAAI 2022)
Micah Carroll, Dylan Hadfield-Menell, Stuart Russell, Anca Dragan. 2022. Estimating and Penalizing Induced Preference Shifts in Recommender Systems. ICML 2022
Cassidy Laidlaw and Stuart Russell. 2022. Uncertain Decisions Facilitate Better Preference Learning. In Advances in Neural Information Processing Systems 34, 2022
E Bıyık, N Anari, D Sadigh . 2022. Batch Active Learning of Reward Functions from Human Preferences. ACM Transactions on Human-Robot Interaction (THRI)
Vivek Myers, Erdem Biyik, Nima Anari, Dorsa Sadigh. 2022. Learning Multimodal Rewards from Rankings. 5th Conference on Robot Learning
Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike. 2021. Quantifying Differences in Reward Functions. ICLR 2021
Smitha Milli, Luca Belli, Moritz Hardt. 2021. From Optimizing Engagement to Measuring Value. FaccT 2021
Vael Gates, Frederick Callaway, Mark K Ho, Tom Griffiths. 2021. A rational model of people’s inferences about others’ preferences based on response times.
David Lindner, Rohin Shah, Pieter Abbeel, Anca Dragan. 2021. Learning What To Do by Simulating the Past. ICLR 2021
Kush Bhatia, Peter L. Bartlett, Anca D. Dragan, Jacob Steinhardt. 2021. Agnostic Learning with Unknown Utilities. ITCS 2021
Cassidy Laidlaw, Stuart Russell. 2021. Uncertain Decisions Facilitate Better Preference Learning. NeurIPS 2021
Rohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin, Pieter Abbeel, Stuart Russell, Anca Dragan. 2021. The MineRL BASALT Competition on Learning from Human Feedback. NEurIPS 2021
Kimin Lee, Laura Smith, Pieter Abbeel. 2021. PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training. ICML 2021
Xiaofei Wang, Kimin Lee, Kourosh Hakhamaneshi, Pieter Abbeel, Michael Laskin. 2021. Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback.
Olivia Watkins, Abhishek Gupta, Trevor Darrell, Pieter Abbeel, Jacob Andreas . 2021. Teachable Reinforcement Learning via Advice Distillation. NeurIPS 2021
Kimin Lee, Laura Smith, Anca Dragan, Pieter Abbeel. 2021. B-Pref: Benchmarking Preference-Based Reinforcement Learning. NeurIPS 2021
Dylan P. Losey, Andrea Bajcsy, Marcia K. O’Malley, Anca D. Dragan. 2021. Physical interaction as communication: Learning robot objectives online from human corrections.
Daniel S. Brown, Jordan Schneider, Anca D. Dragan, Scott Niekum. 2021. Value Alignment Verification. ICML 2021
Avik Jain, Lawrence Chan, Daniel S. Brown, Anca D. Dragan. 2021. Optimal Cost Design for Model Predictive Control. L4DC 2021
Andreea Bobu, Marius Wiggert, Claire Tomlin, Anca D. Dragan. 2021. Feature Expansive Reward Learning: Rethinking Human Input.
Micah Carroll, Dylan Hadfield-Menell, Stuart Russell, Anca Dragan. 2021. Estimating and Penalizing Preference Shift in Recommender Systems. RecSys 2021
Arnaud Fickinger, Samuel Cohen, Stuart Russell, Brandon Amos. 2021. Cross-Domain Imitation Learning via Optimal Transport.
Dan Hendrycks, Mantas Mazeika, Andy Zou, Sahil Patel, Christine Zhu, Jesus Navarro, Dawn Song, Bo Li, Jacob Steinhardt. 2021. What Would Jiminy Cricket Do? Towards Agents That Behave Morally. NeurIPS 2021
Xin Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven H Wang, Ping Luo, Stuart Russell, Pieter Abbeel, Rohin Shah. 2021. An Empirical Investigation of Representation Learning for Imitation. NeurIPS 2021
Justin Svegliato, Samer B Nashed, Shlomo Zilberstein. 2021. Ethically compliant sequential decision making.
Samer B Nashed, Justin Svegliato, Shlomo Zilberstein. 2021. Ethically compliant planning within moral communities.
Justin Svegliato. 2021. Building efficient, reliable, and ethical autonomous systems.
Zhao Mandi, Fangchen Liu, Kimin Lee, Pieter Abbeel. 2021. Towards More Generalizable One-shot Visual Imitation Learning.
Sam Toyer, Rohin Shah, Andrew Critch, Stuart Russell. 2020. The MAGICAL Benchmark for Robust Imitation. NeurIPS 2020
Alexander Matt Turner, Dylan Hadfield-Menell, Prasad Tadepalli. 2020. Conservative agency via attainable utility preservation.. AIES 2020
Andreea Bobu, Dexter R.R. Scobee, Jaime F. Fisac, S. Shankar Sastry, Anca D. Dragan. 2020. LESS is More: Rethinking Probabilistic Models of Human Behavior. HRI 2020
Dylan Hadfield-Menell, Gillian K. Hadfield. 2020. Incomplete Contracting and AI Alignment. AIES 2020
Gokul Swamy, Siddharth Reddy, Sergey Levine, Anca D. Dragan. 2020. Scaled Autonomy: Enabling Human Operators to Control Robot Fleets. ICRA 2020
Rachel Freedman, Jana Schaich Borg, Walter Sinnott-Armstrong, John P. Dickerson, Vincent Conitzer. 2020. Adapting a kidney exchange algorithm to align with human values. Artificial Intelligence, 283
Siddharth Reddy, Anca D. Dragan, Sergey Levine. 2020. SQIL: Imitation Learning via Regularized Behavioral Cloning.. ICLR 2020
Smitha Milli, Pieter Abbeel, Igor Mordatch. 2020. Interpretable and Pedagogical Examples. (Preprint)
Eric J. Michaud, Adam Gleave, Stuart Russell. 2020. Understanding Learned Reward Functions. Deep RL Workshop, NeurIPS 2020
Pedro Freire, Adam Gleave, Sam Toyer, Stuart Russell. 2020. DERAIL: Diagnostic Environments for Reward And Imitation Learning. Deep RL Workshop, NeurIPS 2020
Rachel Freedman, Rohin Shah, Anca Dragan. 2020. Choice Set Misspecification in Reward Inference. IJCAI-PRICAI-20 Workshop on Artificial Intelligence Safety
Rachel Freedman. 2020. Aligning with Heterogeneous Preferences for Kidney Exchange. IJCAI-PRICAI-20 Workshop on Artificial Intelligence Safety
Sam Toyer, Rohin Shah, Andrew Critch, Stuart Russell. 2020. The MAGICAL Benchmark for Robust Imitation. NeurIPS 2020
Michele Fedrizzi, Nino Civolani, Andrew Critch. 2020. Inconsistency evaluation in pairwise comparison using norm-based distances. Decisions in Economics and Finance
Kush Bhatia, Ashwin Pananjady, Peter L. Bartlett, Anca D. Dragan, Martin J. Wainwright. 2020. Preference learning along multiple criteria: A game-theoretic perspective. NeurIPS 2020
Siddharth Reddy, Anca D. Dragan, Sergey Levine. 2020. SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards.. ICLR 2020
Anna N. Rafferty, Rachel Jansen, Thomas L. Griffiths. 2020. Assessing Mathematics Misunderstandings via Bayesian Inverse Planning. Cognitive Science
Jonathan Stray, Steven Adler, Dylan Hadfield-Menell. 2020. What are you optimizing for? Aligning Recommender Systems with Human Values. ICML 2020
Theodore R. Sumers, Mark K. Ho, Robert D. Hawkins, Karthik Narasimhan, Thomas L. Griffiths. 2020. Learning Rewards from Linguistic Feedback.
Andreea Bobu, Andrea Bajcsy, Jaime F. Fisac, Sampada Deglurkar, Anca D. Dragan. 2019. Quantifying Hypothesis Space Misspecification in Learning from Human-Robot Demonstrations and Physical Corrections. IEEE Transactions on Robotics
Dmitrii Krasheninnikov, Rohin Shah, Herke van Hoof. 2019. Combining reward information from multiple sources. NeurIPS 2019 Learning with Rich Experience Workshop
Donald J. Hejna III, Pieter Abbeel, Lerrel Pinto. 2019. Hierarchically Decoupled Imitation for Morphological Transfer. (Preprint)
Dylan Hadfield-Menell, McKane Andrus, Gillian Hadfield. 2019. Legible Normativity for AI Alignment: The Value of Silly Rules. AIES 2019
Hong Jun Jeon, Smitha Milli, Anca D. Dragan. 2019. Reward-rational (implicit) choice: A unifying formalism for reward learning. (Preprint)
Jason Y. Zhang, Anca D. Dragan. 2019. Learning from Extrapolated Corrections. ICRA 2019
Kelvin Xu, Ellis Ratner, Anca Dragan, Sergey Levine, Chelsea Finn. 2019. Learning a Prior over Intent via Meta-Inverse Reinforcement Learning. ICML 2019
Lawrence Chan, Dylan Hadfield-Menell, Siddhartha Srinivasa, Anca Dragan. 2019. The Assistive Multi-Armed Bandit. HRI 2019
Matthew Rahtz, James Fang, Anca D. Dragan, Dylan Hadfield-Menell. 2019. An Extensible Interactive Interface for Agent Design. ICML 2019 Human-in-the-Loop Learning Workshop
Mayank Agrawal, Joshua C. Peterson, Thomas L. Griffiths. 2019. Using Machine Learning to Guide Cognitive Modeling: A Case Study in Moral Reasoning. CogSci 2019
Ori Plonsky, Reut Apel, Eyal Ert, Moshe Tennenholtz, David Bourgin, Joshua C. Peterson, Daniel Reichman, Thomas L. Griffiths, Stuart J. Russell, Evan C. Carter, James F. Cavanagh, Ido Erev. 2019. Predicting human decisions with behavioral theories and machine learning. (Preprint)
Rohin Shah, Dmitrii Krasheninnikov, Jordan Alexander, Pieter Abbeel, Anca Dragan. 2019. Preferences Implicit in the State of the World. ICLR 2019
Rohin Shah, Noah Gundotra, Pieter Abbeel, Anca D. Dragan. 2019. On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference. ICML 2019
Sandy H. Huang, Isabella Huang, Ravi Pandya, Anca D. Dragan. 2019. Nonverbal Robot Feedback for Human Teachers. CoRL 2019
Siddharth Reddy, Anca D. Dragan, Sergey Levine, Shane Legg, Jan Leike. 2019. Learning Human Objectives by Evaluating Hypothetical Behavior. (Preprint)
Smitha Milli, Anca D. Dragan. 2019. Literal or Pedagogic Human? Analyzing Human Model Misspecification in Objective Learning. UAI 2019
Xue Bin Peng, Angjoo Kanazawa, Sam Toyer, Pieter Abbeel, Sergey Levine. 2019. Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow. ICLR 2019
Aaron Tucker, Adam Gleave, Stuart Russell. 2018. Inverse reinforcement learning for video games. NeurIPS 2018 Deep RL Workshop
Adam Gleave, Oliver Habryka. 2018. Multi-task Maximum Entropy Inverse Reinforcement Learning. ICML 2018 Goals RL Workshop
Chandrayee Basu, Mukesh Singhal, Anca D. Dragan. 2018. Learning from Richer Human Guidance: Augmenting Comparison-Based Learning with Feature Queries. HRI 2018
Chris Cundy, Daniel Filan. 2018. Exploring Hierarchy-Aware Inverse Reinforcement Learning. Unpublished (ICML 2018 Goals RL Workshop)
Dhruv Malik, Malayandi Palaniappan, Jaime F. Fisac, Dylan Hadfield-Menell, Stuart Russell, Anca D. Dragan. 2018. An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning. ICML 2018
Ellis Ratner, Dylan Hadfield-Menell, Anca D. Dragan. 2018. Simplifying Reward Design through Divide-and-Conquer. RSS 2018
Nicholas C. Landolfi, Anca D. Dragan. 2018. Social Cohesion in Autonomous Driving. IROS 2018
Sören Mindermann, Rohin Shah, Adam Gleave, Dylan Hadfield-Menell. 2018. Active Inverse Reward Design. ICML 2018 GoalsRL workshop
Zeyu Zheng, Junhyuk Oh, Satinder Singh. 2018. On Learning Intrinsic Rewards for Policy Gradient Methods. NeurIPS 2018
Dorsa Sadigh, Anca Dragan, S. Shankar Sastry, Sanjit Seshia. 2017. Active Preference-Based Learning of Reward Functions. RSS 2017
Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart Russell, Anca Dragan. 2017. Inverse Reward Design. NeurIPS 2017
Jaime F. Fisac, Monica A. Gates, Jessica B. Hamrick, Chang Liu, Dylan Hadfield-Menell, Malayandi Palaniappan, Dhruv Malik, S. Shankar Sastry, Thomas L. Griffiths, Anca D. Dragan. 2017. Pragmatic-Pedagogic Value Alignment. ISRR 2017
Kareem Amin, Nan Jiang, Satinder Singh. 2017. Repeated Inverse Reinforcement Learning. NIPS 2017
Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell. 2016. Cooperative Inverse Reinforcement Learning. NeurIPS 2016

2.3. Human-robot cooperation

Jessy Lin, Nicholas Tomlin, Jacob Andreas, Jason Eisner. 2023. Decision-Oriented Dialogue for Human-AI Collaboration. arXiv:2305.20076
Mason Nakamura, Justin Svegliato, Samer B Nashed, Shlomo Zilberstein, Stuart Russell. 2023. Formal Composition of Robotic Systems as Contract Programs. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Philipp Wu, Yide Shentu, Zhongke Yi, Xingyu Lin, Pieter Abbeel. 2023. GELLO: A General, Low-Cost, and Intuitive Teleoperation Framework for Robot Manipulators. arXiv:2309.13037
Jensen Gao, Siddharth Reddy, Glen Berseth, Anca D Dragan, Sergey Levine. 2023. Bootstrapping Adaptive Human-Machine Interfaces with Offline Reinforcement Learning. arXiv:2309.03839
Jerry Zhi-Yang He, Daniel S Brown, Zackory Erickson, Anca Dragan. 2023. Quantifying Assistive Robustness Via the Natural-Adversarial Frontier. arXiv:2310.10610
Ran Tian, Masayoshi Tomizuka, Anca D Dragan, Andrea Bajcsy. 2023. Towards Modeling and Influencing the Dynamics of Human Learning. In Proc. 2023 ACM/IEEE International Conference on Human-Robot Interaction
Joey Hong, Anca Dragan, Sergey Levine. 2023. Learning to Influence Human Behavior with Offline Reinforcement Learning. arXiv:2303.02265
Andreea Bobu, Andi Peng, Pulkit Agrawal, Julie Shah, Anca D Dragan. 2023. Aligning Robot and Human Representations. arXiv:2302.01928
J.Z.Y. He, A. Raghunathan, D.S. Brown, Z. Erickson, and A.D. Dragan. 2022. Learning Representations that Enable Generalization in Assistive Tasks. Conference on Robot Learning (CORL), 2022.
S. Reddy, S. Levine, and A.D. Dragan. 2022. First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization. . Neural Information Processing Systems (NeurIPS), 2022
A. Sripathy, A. Bobu, Z. Li, K. Sreenath, D.S. Brown, and A.D. Dragan. 2022. Teaching Robots to Span the Space of Functional Expressive Motion. . International Conference on Intelligent Robots and Systems (IROS), 2022
R. Tian, L. Sun, A. Bajcsy, M. Tomizuka, and A.D. Dragan. 2022. Safety Assurances for Human-Robot Interaction via Confidence-aware Game-theoretic Human Models. International Conference on Robotics and Automation (ICRA), 2022.
S. Chen*, J. Gao*, S. Reddy, G. Berseth, A.D. Dragan, and S. Levine. 2022. ASHA: Assistive Teleoperation via Human-in-the-Loop Reinforcement Learning. . * International Conference on Robotics and Automation (ICRA), 2022
Ryan Hoque, Lawrence Yunliang Chen, Satvik Sharma, Karthik Dharmarajan, Brijen Thananjeyan, Pieter Abbeel, Ken Goldberg.. 2022. Fleet-DAgger: Interactive Robot Fleet Learning with Scalable Human Supervision. Conference on Robot Learning (CoRL)
Sarah Young, Jyothish Pari, Pieter Abbeel, Lerrel Pinto. 2022. Playful Interactions for Representation Learning. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Kourosh Hakhamaneshi, Ruihan Zhao, Albert Zhan, Pieter Abbeel, Michael Laskin. 2022. Hierarchical Few-Shot Imitation with Skill Transition Models. 8th International Conference on Learning Representations (ICLR)
Sumers, T. R., Hawkins, R. D., Ho, M. K., Griffiths, T. L., & Hadfield-Menell, D.. 2022. How to talk so your robot will learn: Instructions, descriptions, and pragmatics. Advances in Neural Information Processing Systems, 36
M Srivastava, E Biyik, S Mirchandani, N Goodman, D Sadigh. 2022. Assistive Teaching of Motor Control Tasks to Humans. NeurIPS 2022
E Brockbank, H Wang, J Yang, S Mirchandani, E Bıyık, D Sadigh, JE Fan. 2022. How do people incorporate advice from artificial agents when making physical judgments?. CogSci 2022
E Bıyık. 2022. Learning Preferences for Interactive Autonomy. Stanford University
E Bıyık. 2022. Learning from Humans for Adaptive Interaction. 2022 Piooneers Workshop at the 17th ACM/IEEE International Conference on Human-Robot Interaction
Priya Sundaresan, Suneel Belkhale, Dorsa Sadigh . 2022. Learning Visuo-Haptic Skewering Strategies for Robot-Assisted Feeding. Proceedings of the 6th Conference on Robot Learning (CoRL), December 2022
Kanishk Gandhi, Siddharth Karamcheti, Madeline Liao, Dorsa Sadigh. 2022. Eliciting Compatible Demonstrations for Multi-Human Imitation Learning. Proceedings of the 6th Conference on Robot Learning (CoRL), December 2022
Jennifer Grannen*, Yilin Wu*, Suneel Belkhale, Dorsa Sadigh. 2022. Learning Bimanual Scooping Policies for Food Acquisition. Proceedings of the 6th Conference on Robot Learning (CoRL), December 2022
Siddharth Karamcheti*, Raj Palleti*, Yuchen Cui, Percy Liang, Dorsa Sadigh. 2022. Shared Autonomy for Robotic Manipulation with Language Corrections. Workshop on Learning with Natural Language Supervision @ ACL, May 2022
Suneel Belkhale, Ethan Kroll Gordon, Yuxiao Chen, Siddhartha Srinivasa, Tapomayukh Bhattacharjee, Dorsa Sadigh. 2022. Balancing Efficiency and Comfort in Robot-Assisted Bite Transfer. International Conference on Robotics and Automation (ICRA), May 2022
Zhangjie Cao*, Zihan Wang*, Dorsa Sadigh. 2022. Learning from Imperfect Demonstrations via Adversarial Confidence Transfer. International Conference on Robotics and Automation (ICRA), May 2022
Dylan Losey, Hong Jun Jeon, Mengxi Li, Krishnan Srinivasan, Ajay Mandlekar, Animesh Garg, Jeannette Bohg, Dorsa Sadigh. 2022. Learning Latent Actions to Control Assistive Robots. Journal of Autonomous Robots (AURO), 2022
TR Sumers, RD Hawkins, MK Ho, TL Griffiths, D Hadfield-Menell. 2022. How to talk so your robot will learn: Instructions, descriptions, and pragmatics. arXiv preprint arXiv:2206.07870
T Sumers, RD Hawkins, MK Ho, TL Griffiths, D Hadfield-Menell. 2022. How to talk so AI will learn: Instructions, descriptions, and autonomy. Advances in Neural Information Processing Systems
H Hu, JF Fisac. 2022. Active uncertainty reduction for human-robot interaction: An implicit dual control approach. Algorithmic Foundations of Robotics XV: Proceedings of the Fifteenth Workshop on the Algorithmic Foundations of Robotics
H Hu, K Nakamura, JF Fisac. 2022. SHARP: Shielding-aware robust planning for safe and efficient human-robot interaction. IEEE Robotics and Automation Letters 7 (2), 5591-5598
H Hu, JF Fisac. 2022. Active uncertainty learning for human-robot interaction: An implicit dual control approach. arXiv preprint arXiv:2202.07720
Mesut Yang, Micah Carroll, Anca Dragan. 2022. Optimal Behavior Prior: Improving Human-AI Collaboration Through Generalizable Human Models.. Human-in-the-loop Learning (HILL) Workshop, NeurIPS 2022
Naman Shah, Pulkit Verma, Trevor Angle, Siddharth Srivastava. 2022. JEDAI: A System for Skill-Aligned Explainable Robot Planning. AAAMS 2022
Mehdi Dadvar, Keyvan Majd, Elena Oikonomou, Georgios Fainekos, Siddharth Srivastava. 2022. Joint Communication and Motion Planning for Cobots.
Paul Knott, Micah Carroll, Sam Devlin, Kamil Ciosek, Katja Hofmann, AD Dragan, Rohin Shah. 2021. Evaluating the Robustness of Collaborative Agents.
Andrea Bajcsy, Somil Bansal, Ellis Ratner, Claire J. Tomlin, Anca D. Dragan. 2021. A Robust Control Framework for Human Motion Prediction. IEEE Robotics and Automation Letters
Siddharth Srivastava. 2021. Unifying Principles and Metrics for Safe and Assistive AI. AAAI 2021
Siddharth Reddy, Anca D. Dragan, Sergey Levine. 2021. Pragmatic Image Compression for Human-in-the-Loop Decision-Making.
Liting Sun, Xiaogang Jia, Anca D. Dragan. 2021. On complementing end-to-end human behavior predictors with planning.
Andrea Bajcsy, Anand Siththaranjan, Claire J. Tomlin, Anca D. Dragan. 2021. Analyzing Human Models that Adapt Online.
Arjun Sripathy, Andreea Bobu, Daniel S. Brown, Anca D. Dragan. 2021. Dynamically Switching Human Prediction Models for Efficient Planning. ICRA 2021
Matthew Zurek, Andreea Bobu, Daniel S. Brown, Anca D. Dragan. 2021. Situational Confidence Assistance for Lifelong Shared Autonomy. ICRA 2021
Jensen Gao, Siddharth Reddy, Glen Berseth, Nicholas Hardy, Nikhilesh Natraj, Karunesh Ganguly, Anca Dragan, Sergey Levine. 2021. X2T: Training an X-to-Text Typing Interface with Online Learning from User Feedback. ICLR 2021
Arnaud Fickinger, Simon Zhuang, Dylan Hadfield-Menell, Stuart Russell. 2020. Multi-Principal Assistance Games.
David Fridovich-Keil, Ellis Ratner, Lasse Peters, Anca D. Dragan, Claire J. Tomlin. 2020. Efficient Iterative Linear-Quadratic Approximations for Nonlinear Multi-Player General-Sum Differential Games. ICRA 2020
Somil Bansal, Andrea Bajcsy, Ellis Ratner, Anca D. Dragan, Claire J. Tomlin. 2020. A Hamilton-Jacobi Reachability-Based Framework for Predicting and Analyzing Human Motion for Safe Planning. ICRA 2020
Vael Gates, Thomas L. Griffiths, Anca D. Dragan. 2020. How to Be Helpful to Multiple People at Once. Other cognitive science 44(6)
Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart Russell. 2020. Benefits of Assistance over Reward Learning. NeurIPS 2020 Workshop on Cooperative AI
Arnaud Fickinger, Simon Zhuang, Andrew Critch, Dylan Hadfield-Menell, Stuart Russell. 2020. Multi-Principal Assistance Games: Definition and Collegial Mechanisms. Cooperative AI Workshop, NeurIPS 2020
Yuqing Du, Stas Tiomkin, Emre Kiciman, Daniel Polani, Pieter Abbeel, Anca D. Dragan. 2020. AvE: Assistance via Empowerment. NeurIPS 2020
Andrew Critch, Stuart Russell. 2019. Servant of Many Masters: Shifting priorities in Pareto-optimal sequential decision-making. AIES 2019
Elis Stefansson, Jaime F. Fisac, Dorsa Sadigh, S. Shankar Sastry, Karl H. Johansson. 2019. Human-robot interaction for truck platooning using hierarchical dynamic games. European Control Conference 2019
Micah Carroll, Rohin Shah, Mark Ho, Thomas Griffiths, Sanjit Seshia, Pieter Abbeel, Anca Dragan. 2019. On the Utility of Learning about Humans for Human-AI Coordination. NeurIPS 2019
Rohan Choudhury, Gokul Swamy, Dylan Hadfield-Menell, Anca D. Dragan. 2019. On the Utility of Model Learning in HRI. HRI 2019
Sarath Sreedharan, Siddharth Srivastava, David Smith, Subbarao Kambhampati. 2019. Why Can’t You Do That, HAL? Explaining Unsolvability of Planning Tasks. IJCAI 2019
Shihui Li, Yi Wu, Xinyue Cui, Honghua Dong, Fei Fang, Stuart Russell. 2019. Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. AAAI 2019
Aaron Bestick, Ravi Pandya, Ruzena Bajcsy, Anca D. Dragan. 2018. Learning Human Ergonomic Preferences for Handovers. ICRA 2018
Allan Zhou, Anca D. Dragan. 2018. Cost Functions for Robot Motion Style. IROS 2018
Andrea Bajcsy, Dylan P. Losey, Marcia K. O'Malley, Anca D. Dragan. 2018. Learning from Physical Human Corrections, One Feature at a Time. HRI 2018
David Fridovich-Keil, Andrea Bajcsy, Jaime F. Fisac, Sylvia L. Herbert, Steven Wang, Anca D. Dragan, Claire J. Tomlin. 2018. Confidence-aware motion prediction for real-time collision avoidance. International Journal of Robotics Research
Dorsa Sadigh, Nick Landolfi, Shankar S. Sastry, Sanjit A. Seshia, Anca D. Dragan. 2018. Planning for cars that coordinate with people: leveraging effects on human actions for planning and active information gathering over human internal state. Autonomous Robots
Jaime F. Fisac, Andrea Bajcsy, Sylvia L. Herbert, David Fridovich-Keil, Steven Wang, Claire J. Tomlin, Anca D. Dragan. 2018. Probabilistically Safe Robot Planning with Confidence-Based Human Predictions. RSS 2018
Liting Sun, Wei Zhan, Masayoshi Tomizuka, Anca D. Dragan. 2018. Courteous Autonomous Cars. IROS 2018
Minae Kwon, Sandy H. Huang, Anca D. Dragan. 2018. Expressing Robot Incapability. HRI 2018
Sandy H. Huang, Kush Bhatia, Pieter Abbeel, Anca D. Dragan. 2018. Establishing Appropriate Trust via Critical States. IROS 2018
Shun Zhang, Edmund H. Durfee, Satinder P. Singh. 2018. Minimax-regret querying on side effects for safe optimality in factored Markov decision processes. IJCAI 2018
Siddharth Reddy, Anca D. Dragan, Sergey Levine. 2018. Shared Autonomy via Deep Reinforcement Learning. RSS 2018
Siddharth Reddy, Anca D. Dragan, Sergey Levine. 2018. Where Do You Think You’re Going?: Inferring Beliefs about Dynamics from Behavior. NeurIPS 2018
Allan Zhou, Dylan Hadfield-Menell, Anusha Nagabaudi, Anca Dragan. 2017. Expressive Robot Motion Timing. HRI 2017
Chandrayee Basu, Qian Yang, David Hungerman, Mukesh Singhal, Anca Dragan. 2017. Do You Want Your Autonomous Car to Drive Like You?. HRI 2017
Chang Liu, Jessica B. Hamrick, Jaime F. Fisac, Anca D. Dragan, J. Karl Hedrick, S. Shankar Sastry, Thomas L. Griffiths. 2017. Goal Inference Improves Objective and Perceived Performance in Human-Robot Collaboration. AAMAS 2017
Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell. 2017. The Off-Switch Game. IJCAI 2017
Michael Laskey, Caleb Chuck, Jonathan Lee, Jeffrey Mahler, Sanjay Krishnan, Kevin Jamieson, Anca Dragan, Ken Goldberg. 2017. Comparing Human-Centric and Robot-Centric Sampling for Robot Deep Learning from Demonstrations. ICRA 2017
Sandy H. Huang, David Held, Pieter Abbeel, Anca Dragan. 2017. Enabling Robots to Communicate their Objectives. RSS 2017
Smitha Milli, Dylan Hadfield-Menell, Anca Dragan, Stuart Russell. 2017. Should Robots be Obedient?. IJCAI 2017
Aaron Bestick, Ruzena Bajcsy, Anca Dragan. 2016. Implicitly Assisting Humans to Choose Good Grasps in Robot to Human Handovers. 2016 International Symposium on Experimental Robotics
Dorsa Sadigh, S. Shankar Sastry, Sanjit A. Seshia, Anca Dragan. 2016. Information Gathering Actions Over Human Internal State. IROS 2016
Dorsa Sadigh, Shankar Sastry, Sanjit Seshia, Anca Dragan. 2016. Planning for Autonomous Cars that Leverage Effects on Human Actions. RSS 2016
Jaime F. Fisac, Anayo K. Akametalu, Melanie N. Zeilinger, Shahab Kaynama, Jeremy Gillula, Claire J. Tomlin. 2016. A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems. IEEE Transactions on Automatic Control
Jaime F. Fisac, Chang Liu, Jessica B. Hamrick, S. Shankar Sastry, J. Karl Hedrick, Thomas L. Griffiths, Anca D. Dragan. 2016. Generating Plans that Predict Themselves. CDC 2016
Negar Mehr, Roberto Horowitz, Anca Dragan. 2016. Inferring and Assisting with Constraints in Shared Autonomy. CDC 2016

2.4. Multi-agent perspectives and applications

Kelsey Rebecca Allen, Franziska Brändle, Matthew Botvinick, Judith Fan, Samuel J Gershman, Thomas L Griffiths, Joshua Hartshorne, Tobias U Hauser, Mark K Ho, Joshua de Leeuw, Wei Ji Ma, Kou Murayama, Jonathan D Nelson, Bas van Opheusden, H Thomas Pouncy, Janet Rafner, Iyad Rahwan, Robb Rutledge, Jacob Sherson, Ozgur Simsek, Hugo Spiers, Christopher Summerfield, Mirko Thalmann, Natalia Velez, Andrew Watrous, Joshua Tenenbaum, Eric Schulz. 2023. Using Games to Understand the Mind. PsyArXiv - Preprint
Johannes Treutlein. 2023. Modeling evidential cooperation in large worlds. arXiv:2307.04879
Caspar Oesterheld, Johannes Treutlein, Emery Cooper, Rubi Hudson. 2023. Incentivizing honest performative predictions with proper scoring rules. Conference on Uncertainty in Artificial Intelligence
Shangding Gu, Jakub Grudzien Kuba, Yuanpei Chen, Yali Du, Long Yang, Alois Knoll, Yaodong Yang. 2023. Safe multi-agent reinforcement learning for multi-robot control. Journal Artificial Intelligence
Yongzhao Wang, Michael P Wellman. 2023. Empirical Game-Theoretic Analysis for Mean Field Games. Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems
Zun Li, Marc Lanctot, Kevin R McKee, Luke Marris, Ian Gemp, Daniel Hennes, Kate Larson, Yoram Bachrach, Michael P Wellman, Paul Muller. 2023. Search-Improved Game-Theoretic Multiagent Reinforcement Learning in General and Negotiation Games. Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems
Max Olan Smith, Michael P Wellman. 2023. Co-Learning Empirical Games and World Models. arXiv:2305.14223
Yongzhao Wang, Michael P Wellman. 2023. Regularization for Strategy Exploration in Empirical Game-Theoretic Analysis. arXiv:2302.04928
Zun Li, Marc Lanctot, Kevin R McKee, Luke Marris, Ian Gemp, Daniel Hennes, Paul Muller, Kate Larson, Yoram Bachrach, Michael P Wellman. 2023. Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning. arXiv:2302.00797
Max Olan Smith, Thomas Anthony, Michael P Wellman. 2023. Strategic Knowledge Transfer. Journal of Machine Learning Research
Katherine Mayo, Shaily Fozdar, Michael P Wellman. 2023. Flagging Payments for Fraud Detection: A Strategic Agent-Based Model. 2023, Association for the Advancement of Artificial Intelligence
Ivan Geffner, Joseph Y Halpern. 2023. Communication games, sequential equilibrium, and mediators. arXiv:2309.14618
Ittai Abraham, Danny Dolev, Ittay Eyal, Joseph Y Halpern. 2023. Colordag: An incentive-compatible blockchain. arXiv:2308.11379
Kaya Alpturer, Joseph Y Halpern, Ron van der Meyden. 2023. Optimal Eventual Byzantine Agreement Protocols with Omission Failures. Proceedings of the 2023 ACM Symposium on Principles of Distributed Computing
Xinming Liu, Joseph Y Halpern. 2023. Strategic Play By Resource-Bounded Agents in Security Games. In Proc. 2023 International Conference on Autonomous Agents and Multiagent Systems
Ivan Geffner, Joseph Y Halpern. 2023. Lower Bounds on Implementing Mediators in Asynchronous Systems with Rational and Malicious Agents. Journal of the ACM
Meir Friedenberg, Joseph Y Halpern. 2023. Joint Behavior and Common Belief. R. Verbrugge (Ed.): Theoretical Aspects of Rationality and Knowledge 2023 (TARK 2023)
Niko A Grupen, Michael Hanlon, Alexis Hao, Daniel D Lee, Bart Selman. 2023. Policy-Value Alignment and Robustness in Search-based Multi-Agent Learning. arXiv:2301.11857
Niklas Lauffer, Ameesh Shah, Micah Carroll, Michael D Dennis, Stuart Russell. 2023. Who needs to know? Minimal knowledge for optimal coordination. In Proc. ICML-23, 2023
Yuqing Du, Pieter Abbeel, Aditya Grover. 2022. It Takes Four to Tango: Multiagent Self Play for Automatic Curriculum Generation. 8th International Conference on Learning Representations (ICLR)
NA Grupen, B Selman, DD Lee. 2022. Cooperative Multi-Agent Fairness and Equivariant Policies. AAAI Conference on Artificial Intelligence
C KONICKI, M CHAKRABORTY, MP WELLMAN. 2022. Exploiting Extensive-Form Structure in Empirical Game-Theoretic Analysis. 18th Conference on Web and Internet Economics (WINE),
Z LI, F JIA, A MATE, S JABBARI, M CHAKRABORTY, M TAMBE, AND Y VOROBEYCHIK. 2022. Solving Structured Hierarchical Games Using Differential Backward Induction. 38th Conference on Uncertainty in Artificial Intelligence (UAI),
Y WANG, Q MA AND MP WELLMAN. 2022. Evaluating Strategy Exploration in Empirical Game-Theoretic Analysis. 21st International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS)
Hardy, M. D., Krafft, P. M., Thompson, B., & Griffiths, T. L.. 2022. Overcoming Individual Limitations Through Distributed Computation: Rational Information Accumulation in Multigenerational Populations.. Topics in Cognitive Science, 14(3), 550–573
Hawkins, R. D., Franke, M., Frank, M. C., Goldberg, A. E., Smith, K., Griffiths, T. L., & Goodman, N. D.. 2022. From partners to populations: A hierarchical Bayesian account of coordination and convention. Psychological Review
A Critch, M Dennis, S Russell . 2022. Cooperative and uncooperative institution designs: Surprises and problems in open-source game theory.
S Emmons, C Oesterheld, A Critch, V Conitzer, S Russell. 2022. For learning in symmetric teams, local optima are global nash equilibria. International Conference on Machine Learning, 5924-5943
E Biyik, A Lalitha, R Saha, A Goldsmith, D Sadigh. 2022. Partner-Aware Algorithms in Decentralized Cooperative Bandit Teams. Proceedings of the AAAI Conference on Artificial Intelligence 36 (9), 9296-9303
Z Cao, E Biyik, G Rosman, D Sadigh. 2022. Leveraging Smooth Attention Prior for Multi-Agent Trajectory Prediction. 2022 International Conference on Robotics and Automation (ICRA), 10723-10730
Andy Shih, Stefano Ermon, Dorsa Sadigh. 2022. Conditional Imitation Learning for Multi-Agent Games. 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI), March 2022
Bidipta Sarkar*, Aditi Talati*, Andy Shih*, Dorsa Sadigh. 2022. PantheonRL: A MARL Library for Dynamic Training Interactions. Proceedings of the 36th AAAI Conference on Artificial Intelligence (Demo Track), February 2022
Shushman Choudhury, Jayesh Gupta, Mykel J. Kochenderfer, Dorsa Sadigh, Jeannette Bohg. 2022. Dynamic Multi-Robot Task Allocation under Uncertainty and Temporal Constraints. Journal of Autonomous Robots (AURO), 2022
PJK Christoffersen, AA Haupt, D Hadfield-Menell. 2022. Get It in Writing: Formal Contracts Mitigate Social Dilemmas in Multi-Agent RL. arXiv preprint arXiv:2208.10469
TR Sumers, RD Hawkins, MK Ho, TL Griffiths, D Hadfield-Menell. 2022. Linguistic communication as (inverse) reward design. arXiv preprint arXiv:2204.05091
Raphael Köster, Dylan Hadfield-Menell, Richard Everett, Laura Weidinger, Gillian K Hadfield, Joel Z Leibo. 2022. Spurious normativity enhances learning of compliance and enforcement behavior in artificial agents. Proceedings of the National Academy of Sciences 119 (3), e2106028118
DR Anthony, DP Nguyen, D Fridovich-Keil, JF Fisac. 2022. Back to the Future: Efficient, Time-Consistent Solutions in Reach-Avoid Games. 2022 International Conference on Robotics and Automation (ICRA), 6830-6836
Pavel Czempin, Adam Gleave. 2022. Reducing Exploitability with Population Based Training. arXiv
JG Kuba, X Feng, S Ding, H Dong, J Wang, Y Yang. 2022. Heterogeneous-agent mirror learning: A continuum of solutions to cooperative marl. arXiv preprint arXiv:2208.01682
M Wen, JG Kuba, R Lin, W Zhang, Y Wen, J Wang, Y Yang. 2022. Multi-agent reinforcement learning is a sequence modeling problem. arXiv preprint arXiv:2205.14953
Z Dou, JG Kuba, Y Yang. 2022. Understanding Value Decomposition Algorithms in Deep Cooperative Multi-Agent Reinforcement Learning. arXiv preprint arXiv:2202.04868
Timon Willi*, Alistair Letcher*, Johannes Treutlein*, Jakob Foerster. 2022. COLA: Consistent Learning with Opponent-Learning Awareness. ICML 2022
Caspar Oesterheld, Johannes Treutlein, Roger Grosse, Vincent Conitzer, Jakob Foerster. 2022. Similarity-based Cooperation. arXiv:2211.14468
N. Lauffer, M. Ghasemi, A. Hashemi, Y. Savas, and U. Topcu.. 2022. No‑regret Learning in Dynamic Stackelberg Games. arXiv preprint 2022.
Yuqing Du, Pieter Abbeel, Aditya Grover. 2022. It Takes Four to Tango: Multiagent Selfplay for Automatic Curriculum Generation. ICLR 2022
Eladio Montero-Porras, Jelena Grujić, Elias Fernández Domingos & Tom Lenaerts. 2022. Inferring strategies from observations in long iterated Prisoner’s dilemma experiments. Scientific Reports volume 12, Article number 7589 (2022)
Elias Fernández Domingos, Inês Terrucha, Rémi Suchon, Jelena Grujić, Juan C. Burguillo, Francisco C. Santos & Tom Lenaerts. 2022. Delegation to artificial agents fosters prosocial behaviors in the collective risk dilemma. Scientific reports volume 12, Article number: 8492 (2022)
Xintong Wang, David M Pennock, Nikhil R Devanur, David M Rothschild, Biaoshuai Tao, Michael P Wellman. 2021. Designing a Combinatorial Financial Options Market.
Charlotte Roman, Michael Dennis, Andrew Critch, Stuart Russell. 2021. Accumulating Risk Capital Through Investing in Cooperation. AAMAS 2021
Scott Emmons, Caspar Oesterheld, Andrew Critch, Vince Conitzer, Stuart Russell. 2021. Symmetry, Equilibria, and Robustness in Common-Payoff Games. GAIW 2021
Jonathan Stray. 2021. Designing Recommender Systems to Depolarize.
Katherine Mayo, Shaily Fozdar, Michael P. Wellman. 2021. An Agent-Based Model of Strategic Adoption of Real-Time Payments.
Max Olan Smith, Thomas Anthony, Michael P Wellman. 2021. Iterative Empirical Game Solving via Single Policy Best Response. ICLR 2021
Xintong Wang, Christopher Hoang, Yevgeniy Vorobeychik, Michael P Wellman. 2021. Spoofing the Limit Order Book: A Strategic Agent-Based Analysis. Games 2021
Yongzhao Wang, Qiurui Ma, Michael P Wellman. 2021. Evaluating Strategy Exploration in Empirical Game-Theoretic Analysis.
Zun Li, Michael P Wellman. 2021. Evolution Strategies for Approximate Solution of Bayesian Games. AAAI 2021
Katherine Mayo, Michael P Wellman. 2021. A Strategic Analysis of Portfolio Compression. AAMAS 2021
Megan Shearer, David Byrd, Tucker Hybinette Balch, Michael P Wellman. 2021. Stability Effects of Arbitrage in Exchange Traded Funds: An Agent-Based Model. ICAIF 2021
Stephen McAleer, John Lanier, Michael Dennis, Pierre Baldi, Roy Fox. 2021. Improving Social Welfare While Preserving Autonomy via a Pareto Mediator.
Johannes Treutlein, Michael Dennis, Caspar Oesterheld, Jakob Foerster. 2021. A New Formalism, Method and Open Issues for Zero-Shot Coordination. PMLR 2021
Jialu Bao, Kun He, Xiaodong Xin, Bart Selman, John E. Hopcroft. 2020. Hidden Community Detection on Two-layer Stochastic Models: a Theoretical Perspective. (Preprint, submitted to TAMC 2020)
Raphael Köster, Dylan Hadfield-Menell, Gillian K. Hadfield, Joel Z. Leibo. 2020. Silly rules improve the capacity of agents to learn stable enforcement and compliance behaviors. AAMAS 2020
Robert D. Hawkins, Noah D. Goodman, Adele E. Goldberg, Thomas L. Griffiths. 2020. Generalizing meanings from partners to populations: Hierarchical inference supports convention formation on networks. CogSci 2020
Stefano V. Albrechta, Peter Stone, Michael P. Wellman. 2020. Special issue on autonomous agents modelling other agents: Guest editorial. Artificial Intelligence 285
Valerio Capraro, Joseph Y Halpern. 2020. Translucent players: Explaining cooperative behavior in social dilemmas. Rationality and Society 31(4), 371-408
Zun Li, Michael P. Wellman. 2020. Structure Learning for Approximate Solution of Many-Player Games. AAAI 2020
Max Olan Smith, Thomas Anthony, Yongzhao Wang, Michael P Wellman. 2020. Learning to play against any mixture of opponents.
Michael Chang, Sid Kaushik, S. Matthew Weinberg, Tom Griffiths, Sergey Levine. 2020. Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions. ICML 2020
Qi Zhang, Edmund H. Durfee, Satinder Singh. 2020. Efficient Querying for Cooperative Probabilistic Commitments.
Anagha Kulkarni, Siddharth Srivastava, Subbarao Kambhampati. 2019. A unified framework for planning in adversarial and cooperative environments. AAAI 2019
Arunesh Sinha, Michael P. Wellman. 2019. Incentivizing Collaboration in a Competition. AAMAS 2019
Ittai Abraham, Danny Dolev, Ivan Geffner, Joseph Y. Halpern. 2019. Implementing Mediators with Asynchronous Cheap Talk. PODC 2019
Ittai Abraham, Danny Dolev, Joseph Y. Halpern. 2019. Distributed Protocols for Leader Election: A Game-Theoretic Perspective. ACM Transactions on Economics and Computation 7(1)
Joseph Y. Halpern, Rafael Pass. 2019. Sequential equilibrium in computational games. ACM Transactions on Economics and Computation
Joseph Y. Halpern, Rafael Pass, Daniel Reichman. 2019. On the Existence of Nash Equilibrium in Games with Resource-Bounded Players. SAGT 2019
Joseph Y. Halpern, Rafael Pass, Lior Seeman. 2019. The truth behind the myth of the folk theorem. Games and Economic Behavior, 117
Mark K. Ho, Joanna Korman, Thomas L. Griffiths. 2019. The Computational Structure of Unintentional Meaning. CogSci 2019
Megan Shearer, Gabriel Rauterberg, Michael P. Wellman. 2019. An Agent-Based Model of Financial Benchmark Manipulation. ICML 2019
Meir Friedenberg, Joseph Y. Halpern. 2019. Blameworthiness in Multi-Agent Settings. AAAI 2019
Thanh H. Nguyen, Yongzhao Wang, Arunesh Sinha, Michael P. Wellman. 2019. Deception in finitely repeated security games. AAAI 2019
Xintong Wang, Chris Hoang, Michael P. Wellman. 2019. Learning-Based Trading Strategies in the Face of Market Manipulation. ICML 2019 Workshop on AI in Finance
Andrew Whalen, Thomas L. Griffiths, Daphna Buchsbaum. 2018. Sensitivity to Shared Information in Social Learning. 3.3. Cognitive science, uncategorized
Bryce Wiedenbeck, Fengjun Yang, Michael P. Wellman. 2018. A Regression Approach for Modeling Games with Many Symmetric Players. AAAI 2018
Joseph Y. Halpern, Rafael Pass. 2018. Game Theory with Translucent Players. International Journal of Game Theory
Mason Wright and Michael P. Wellman. 2018. Evaluating the Stability of Non-Adaptive Trading in Continuous Double Auctions. AAMAS 2018
Natasha Alechina, Joseph Y. Halpern, Ian A. Kash, Brian Logan. 2018. Incentive-Compatible Mechanisms for Norm Monitoring in Open Multi-agent perspectives and applications. JAIR
Nishant Desai, Andrew Critch, Stuart J. Russell. 2018. Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making. NeurIPS 2018
Adam Bjorndahl, Joseph Y. Halpern, Rafael Pass. 2017. Reasoning about Rationality. Games and Economic Behavior 104, 146-164
Joseph Y. Halpern, Rafael Pass, Lior Seeman. 2017. Computational Extensive-Form Games. EC 2016
Michael Wellman, Eric Sodomka, Amy Greenwald. 2017. Self-confirming price-prediction strategies for simultaneous one-shot auctions. Games and Economic Behavior, 102, 339–372
Natasha Alechina, Joseph Y. Halpern, Brian Logan. 2017. Causality, Responsibility and Blame in Team Plans. AAMAS 2017
Joseph Y. Halpern, Xavier Vilaca. 2016. Rational Consensus (extended abstract). 2016 ACM Symposium on Principles of Distributed Computing

2.5. Models of bounded or imperfect rationality

Ruiqi He, Carlos G Correa, Thomas L Griffiths, Mark K Ho. 2023. Structurally guided task decomposition in spatial navigation tasks. arXiv:2310.02221
Declan Campbell, Sreejan Kumar, Tyler Giallanza, Jonathan D Cohen, Thomas L Griffiths. 2023. Relational Constraints On Neural Networks Reproduce Human Biases towards Abstract Geometric Regularity. arXiv:2309.17363
Frederick Callaway, Thomas L Griffiths, Kenneth A Norman, Qiong Zhang. 2023. Optimal metacognitive control of memory recall. Journal Psychological Review
Nikolay Sukhov, Rachit Dubey, Annie Duke, Tom Griffiths. 2023. When to Keep Trying and When to Let Go: Benchmarking Optimal Quitting. PsyArXiv - Preprint
Mark K Ho, Jonathan D Cohen, Tom Griffiths. 2023. Rational simplification and rigidity in human planning. PsyArXiv - Preprint
David Abel, André Barreto, Hado van Hasselt, Benjamin Van Roy, Doina Precup, Satinder Singh. 2023. On the Convergence of Bounded Agents. arXiv:2307.11044
Joseph Y Halpern, Aditya Saraf. 2023. Chunking Tasks for Present-Biased Agents. Proceedings of the 24th ACM Conference on Economics and Computation
Cassidy Laidlaw and Stuart Russell. 2022. Uncertain Decisions Facilitate Better Preference Learning. Proceedings of NeurIPS-21
K Oktar, T Lombrozo. 2022. Deciding to be authentic: Intuition is favored over deliberation when authenticity matters. Cognition 223, 105021
D Kinney, T Lombrozo. 2022. Evaluations of Causal Claims Reflect a Trade-Off Between Informativeness and Compression. Proceedings of the Annual Meeting of the Cognitive Science Society 44 (44)
Bai, X., Fiske, S. T., & Griffiths, T. L. 2022. Globally inaccurate stereotypes can result from locally adaptive exploration. Psychological Science 33(5) 671–684
Callaway, F., Hardy, M., & Griffiths, T. 2022. Optimal nudging for cognitively bounded agents: A framework for modeling, predicting, and controlling the effects of choice architectures.
Callaway, F., Griffiths, T. L., & Karreskog, G. 2022. Rational heuristics for one-shot games.
Callaway, F., van Opheusden, B., Gul, S., Das, P., Krueger, P. M., Griffiths, T. L., & Lieder, F.. 2022. Rational use of cognitive resources in human planning. Nature Human Behaviour,. Nature Human Behaviour, 6, 1–14
Dasgupta, I., & Griffiths, T. L.. 2022. Clustering and the efficient use of cognitive resources.. Journal of Mathematical Psychology, 109, 102675
Dubey, R., Griffiths, T. L., & Dayan, P. . 2022. The pursuit of happiness: A reinforcement learning perspective on habituation and comparisons. PLoS Computational Biology, 18(8), e1010316
DMRL Ho, M. K., Abel, D., Correa, C. G., Littman, M. L., Cohen, J. D., & Griffiths, T. L. 2022. People construct simplified mental representations to plan.. Nature, 606(7912), 129-136
Jain, Y. R., Callaway, F., Griffiths, T. L., Dayan, P., He, R., Krueger, P. M., & Lieder, F. 2022. A computational process-tracing method for measuring people’s planning strategies and how they change over time. Behavior Research Methods, 1-43
DMRL Russek, E., Acosta-Kane, D., van Opheusden, B., Mattar, M. G., & Griffiths, T. . 2022. Time spent thinking in online chess reflects the value of computation.
Zhang, Q., Griffiths, T. L., & Norman, K. A.. 2022. Optimal policies for free recall. Psychological Review
Justin Svegliato , Connor Basich , Sandhya Saisubramanian , and Shlomo Zilberstein. 2022. Metareasoning for Safe Decision Making in Autonomous Systems. ICRA
Abhinav Bhatia, Justin Svegliato, Samer B. Nashed, Shlomo Zilberstein. 2022. Tuning the Hyperparameters of Anytime Planning: A Metareasoning Approach with Deep Reinforcement Learning. ICAPS
Samer B. Nashed , Justin Svegliato , Abhinav Bhatia , Stuart Russell , Shlomo Zilberstein. 2022. Selecting the Partial State Abstractions of MDPs: A Metareasoning Approach with Deep Reinforcement Learning. IROS
Connor Basich , Justin Svegliatob , Kyle H. Wrayc , Stefan Witwickic , Joydeep Biswasd Shlomo Zilbersteina. 2022. Competence-Aware Systems. AIJ
M Curmei, AA Haupt, B Recht, D Hadfield-Menell. 2022. Towards Psychologically-Grounded Dynamic Preference Models. Proceedings of the 16th ACM Conference on Recommender Systems, 35-48
Cassidy Laidlaw and Anca Dragan. 2022. The Boltzmann Policy Distribution: Accounting for Systematic Suboptimality in Human Models. ICLR 2022
Cassidy Laidlaw, Anca Dragan. 2022. The Boltzmann Policy Distribution: Accounting for Systematic Suboptimality in Human Models. ICLR 2022
Samer B. Nashed*, Justin Svegliato*, Abhinav Bhatia, Shlomo Zilberstein, Stuart Russell. 2022. Selecting the Partial State Abstractions of MDPs: A Metareasoning Approach with Deep Reinforcement Learning. ICRA 2022
Justin Svegliato, Connor Basich, Sandhya Saisubramanian, Shlomo Zilberstein. 2022. Metareasoning for Safe Decision Making in Autonomous Systems. ICRA 2022
Abhinav Bhatia, Justin Svegliato, Samer Nashed, Shlomo Zilberstein. 2022. Tuning the Hyperparameters of Anytime Planning: A Metareasoning Approach with Deep RL. ICAPS 2022
Bill Thompson and Thomas L. Griffiths. 2021. Human biases limit cumulative innovation.
Ruairidh M. Battleday, Joshua C. Peterson, and Thomas L. Griffiths. 2021. From convolutional neural networks to models of higher-level cognition (and back again).
Thomas A. Langloisa, Nori Jacobyc, Jordan W. Suchowe, and Thomas L. Griffiths. 2021. Serial reproduction reveals the geometry of visuospatial representations. PNAS 2021
Samarie Wilson, Somya Arora, Qiong Zhang, Thomas L. Griffiths. 2021. A Rational Account of Anchor Effects in Hindsight Bias.
Ruairidh M Battleday, Joshua C Peterson, Thomas L Griffiths. 2021. From convolutional neural networks to models of higher level cognition (and back again).
Sreejan Kumar, Ishita Dasgupta, Jonathan D. Cohen, Nathaniel D. Daw, and Thomas L. Griffiths. 2021. Meta-Learning of Structured Task Distributions in Humans and Machines. ICLR 2021
Frederick Callaway, Antonio Rangel, Thomas L. Griffiths. 2021. Fixation patterns in simple choice reflect optimal information sampling.
Falk Lieder, Owen X. Chen, Paul M. Krueger, Thomas L. Griffiths. 2020. Cognitive prostheses for goal achievement. Nature Human Behaviour 3:1096–1106
Falk Lieder, Thomas L. Griffiths. 2020. Advancing rational analysis to the algorithmic level. Behavioral and Brain Sciences, 43, E27
Frederick Callaway, Antonio Rangel, Tom Griffiths. 2020. Fixation patterns in simple choice are consistent withoptimal use of cognitive resources. (Preprint)
Mark K. Ho, David Abel, Jonathan D. Cohen, Michael L. Littman, Thomas L. Griffiths. 2020. The Efficiency of Human Cognition Reflects Planned Information Processing. AAAI 2020
Smitha Milli, Falk Lieder, Tom Griffiths. 2020. A Rational Reinterpretation of Dual-Process Theories. UAI 2020
Joseph Y Halpern, Evan Piermont. 2020. Dynamic Awareness.
Xinming Liu, Joseph Halpern. 2020. Bounded Rationality in Las Vegas: Probabilistic Finite Automata Play Multi-Armed Bandits. PMLR
Ida Momennejad, Jarrod Lewis-Peacock, Kenneth A Norman, Jonathan D Cohen, Satinder Singh, Richard L Lewis. 2020. Rational use of episodic and working memory: A normative account of prospective memory. Neuropsychologia
Qiong Zhang, Kenneth A. Norman, Tom Griffiths. 2020. The method of loci is an optimal policy for memory search. CogSci 2020
Rachel Jansen, Anna N. Rafferty, Tom Griffiths. 2020. A rational model of sequential self-assessment. CogSci 2020
Carlos G. Correa, Mark K. Ho, Frederick Callaway, Tom Griffiths. 2020. Resource-rational Task Decomposition to Minimize Planning Costs. CogSci 2020
Mark K. Ho, David Abel, Jonathan D. Cohen, Michael L. Littman, Thomas L. Griffiths. 2020. People Do Not Just Plan,They Plan to Plan. AAAI 2020
Falk Lieder, Thomas L. Griffiths. 2019. Resource-rational analysis: understanding human cognition as the optimal use of limited computational resources. Behavioral and Brain Sciences, 43, E1
Frederick Callaway, Tom Griffiths. 2019. Attention in value-based choice as optimal sequential sampling. (Preprint)
Joshua Peterson, David Bourgin, Daniel Reichman, Thomas Griffiths, Stuart Russell. 2019. Cognitive model priors for predicting human decisions. ICML 2019
Mark K. Ho, David Abel, Tom Griffiths, Michael L. Littman. 2019. The Value of Abstraction. Current Opinion in Behavioral Sciences, 29:111-116
Ruairidh M. Battleday, Joshua C. Peterson, Thomas L. Griffiths. 2019. Capturing human categorization of natural images at scale by combining deep networks and cognitive models. (Preprint)
Thomas L. Griffiths, Frederick Callaway, Michael B. Chang, Erin Grant, Paul M. Krueger, Falk Lieder. 2019. Doing more with less: meta-reasoning and meta-learning in humans and machines. Current Opinion in Behavioral Sciences 29: 24-30
Falk Lieder, Amitai Shenhav, Sebastian Musslick, Thomas L. Griffiths. 2018. Rational metareasoning and the plasticity of cognitive control. PLoS Comp. Biol.
Falk Lieder, Thomas L. Griffiths, Ming Hsu. 2018. Overrepresentation of extreme events in decision making reflects rational use of cognitive resources. Psychological Review
Falk Lieder, Thomas L. Griffiths, Quentin J. M. Huys, Noah D. Goodman. 2018. Empirical evidence for resource-rational anchoring and adjustment. Psychonomic Bulletin & Review
Falk Lieder, Thomas L. Griffiths, Quentin J. M. Huys, Noah D. Goodman. 2018. The anchoring bias reflects rational use of cognitive resources. Psychonomic Bulletin & Review
Joseph Y. Halpern, Lior Seeman. 2018. Is state-dependent valuation more adaptive than simpler rules?. Behavioural Processes
Amitai Shenhav, Sebastian Musslick, Falk Lieder, Wouter Kool, Thomas L Griffiths, Jonathan D Cohen, Matthew M Botvinick. 2017. Toward a Rational and Mechanistic Account of Mental Effort. Annual Review of Neuroscience, 40, 9f4b26db33-124
Falk Lieder, Paul Krueger, Tom Griffiths. 2017. An automatic method for discovering rational heuristics for risky choice. CogSci 2017
Smitha Milli, Falk Lieder, Tom Griffiths. 2017. When Does Bounded-Optimal Metareasoning Favor Few Cognitive Systems?. AAAI 2017
Owain Evans, Andreas Stuhlmüller, John Salvatier, Daniel Filan. 2017. Modeling Agents with Probabilistic Programs.
Nan Rong, Joseph Y. Halpern, Ashutosh Saxena. 2016. MDPs with Unawareness in Robotics. UAI 2016

2.6. Models of human cognition

Evan Russek, Frederick Callaway, Thomas L. Griffiths . 2023. Inverting cognitive models with machine learning to infer preferences from fixations. NeuRIPS 2023 Workshop on Gaze Meets ML
Ilia Sucholutsky, Lukas Muttenthaler, Adrian Weller, Andi Peng, Andreea Bobu, Been Kim, Bradley C. Love, Erin Grant, Iris Groen, Jascha Achterberg, Joshua B. Tenenbaum, Katherine M. Collins, Katherine L. Hermann, Kerem Oktar, Klaus Greff, Martin N. Hebart, Nori Jacoby, Qiuyi Zhang, Raja Marjieh, Robert Geirhos, Sherol Chen, Simon Kornblith, Sunayana Rane, Talia Konkle, Thomas P. O'Connell, Thomas Unterthiner, Andrew K. Lampinen, Klaus-Robert Müller, Mariya Toneva, Thomas L. Griffiths. 2023. Getting aligned on representational alignment. arXiv:2310.13018
David B Kinney, Tania Lombrozo. 2023. Building Compressed Causal Models of the World. PsyArXiv - Preprint
Tania Lombrozo, Emily G Liquin. 2023. Explanation Is Effective Because It Is Selective. Current Directions in Psychological Science
Corey Cusimano, Tania Lombrozo. 2023. People recognize and condone their own morally motivated reasoning. Journal Cognition
Casey Lewry, George Tsai, Tania Lombrozo. 2023. Are ethical explanations explanatory? Meta-ethical beliefs shape judgments about explanations for social change. PsyArXiv
Thalia Vrantsidis, Tania Lombrozo. 2023. The Edge of Ockham’s Razor: Examining Boundary Conditions on Preferences for Simpler Explanations. Proceedings of the Annual Meeting of the Cognitive Science Society
David Kinney, Tania Lombrozo. 2023. Tell Me Your (Cognitive) Budget, and I’ll Tell You What You Value: Evidential Relationships Between Values, Data, and Generic Causal Claims about the Social World. Journal Proceedings of the Annual Meeting of the Cognitive Science Society
Casey Lewry, Sera Gorucu, Emily G Liquin, Tania Lombrozo. 2023. Minimally counterintuitive stimuli trigger greater curiosity than merely improbable stimuli. Journal Cognition
Daniel Reichman, Falk Lieder, David D Bourgin, Nimrod Talmon, Thomas L Griffiths. 2023. The Computational Challenges of Means Selection Problems: Network Structure of Goal Systems Predicts Human Performance. Journal Cognitive science
Michael Y Li, Fred Callaway, William D Thompson, Ryan P Adams, Thomas L Griffiths. 2023. Learning to Learn Functions. Journal: Cognitive Science
Feng Xia, Jianqiao Zhu, Tom Griffiths. 2023. Comparing Human Predictions from Expert Advice to On-line Optimization Algorithms. Proceedings of the Annual Meeting of the Cognitive Science Society
Sunayana Rane, Mira L Nencheva, Zeyu Wang, Casey Lew-Williams, Olga Russakovsky, Tom Griffiths. 2023. Predicting word learning in children from the performance of computer vision systems. Proceedings of the Annual Meeting of the Cognitive Science Society

3. Other topics

3.1. Adversarial training and testing

Sam Toyer, Olivia Watkins, Ethan Adrian Mendes, Justin Svegliato, Luke Bailey, Tiffany Wang, Isaac Ong, Karim Elmaaroufi, Pieter Abbeel, Trevor Darrell, Alan Ritter, Stuart Russell. 2023. Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game. arXiv:2311.01011
Tony Tong Wang, Adam Gleave, Tom Tseng, Kellin Pelrine, Nora Belrose, Joseph Miller, Michael D Dennis, Yawen Duan, Viktor Pogrebniak, Sergey Levine, Stuart Russell. 2023. Adversarial Policies Beat Superhuman Go AIs. arXiv:2211.00241
Luke Bailey, Euan Ong, Stuart Russell, and Scott Emmons. 2023. Image Hijacks: Adversarial Images can Control Generative Models at Runtime. arXiv:2309.00236
S Casper, K Hariharan, D Hadfield-Menell. 2022. Diagnostics for Deep Neural Networks with Automated Copy/Paste Attacks. NeurIPS ML Safety Workshop
S Casper, D Hadfield-Menell, G Kreiman. 2022. White-Box Adversarial Policies in Deep Reinforcement Learning. arXiv preprint arXiv:2209.02167
Tony Wang, Adam Gleave, Nora Belrose, Tom Tseng, Joseph Miller, Michael Dennis, Yawen Duan, Viktor Pogrebniak, Sergey Levine, Stuart Russell. 2022. Adversarial Policies Beat Professional-Level Go AIs. arXiv.
Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, Angjoo Kanazawa . 2021. AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control. ACM Transactions on Graphics
Cassidy Laidlaw, Sahil Singla, Soheil Feizi. 2021. Perceptual Adversarial Robustness: Defense Against Unseen Threat Models. ICLR 2021
Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell. 2020. Adversarial Policies: Attacking Deep Reinforcement Learning. ICLR 2020
Albert Zhan, Stas Tiomkin, Pieter Abbeel. 2020. Preventing Imitation Learning with Adversarial Policy Ensembles. ICLR 2020
Marc Khoury, Dylan Hadfield-Menell. 2020. On the Geometry of Adversarial Examples. (Preprint)
Xintong Wang, Michael P Wellman. 2020. Market Manipulation: An Adversarial Learning Framework for Detection and Evasion. 29th International Joint Conference on Artificial Intelligence
Marc Khoury, Dylan Hadfield-Menell. 2019. Adversarial Training with Voronoi Constraints. (Preprint)
Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, Dawn Song. 2019. Natural Adversarial Examples. CVPR 2021

3.2. AI capabilities, uncategorized

Joey Hong, Anca Dragan, Sergey Levine. 2023. Offline RL with Observation Histories: Analyzing and Improving Sample Complexity. arXiv:2310.20663
Wilka Carvalho, Andre Saraiva, Angelos Filos, Andrew Kyle Lampinen, Loic Matthey, Richard L. Lewis, Honglak Lee, Satinder Singh, Danilo J. Rezende, Daniel Zoran. 2023. Combining Behaviors with the Successor Features Keyboard. arXiv:2310.15940
Vint Lee, Pieter Abbeel, Youngwoon Lee. 2023. DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing. arXiv:2311.01450
Hao Liu, Pieter Abbeel . 2023. Blockwise Parallel Transformers for Large Context Models. 37th Conference on Neural Information Processing Systems
Carmelo Sferrazza, Younggyo Seo, Hao Liu, Youngwoon Lee, Pieter Abbeel. 2023. The Power of the Senses: Generalizable Manipulation from Vision and Touch through Masked Multimodal Learning. arXiv:2311.00924
Sherry Yang, KwangHwan Cho, Amil Merchant, Pieter Abbeel, Dale Schuurmans, Igor Mordatch, Ekin Dogus Cubuk. 2023. Scalable Diffusion for Materials Generation. NeurIPS 2023 AI for Science Workshop
Boyi Li, Philipp Wu, Pieter Abbeel, Jitendra Malik. 2023. Interactive Task Planning with Language Models. arXiv:2310.10645
Yilun Du, Mengjiao Yang, Pete Florence, Fei Xia, Ayzaan Wahid, Brian Ichter, Pierre Sermanet, Tianhe Yu, Pieter Abbeel, Joshua B. Tenenbaum, Leslie Kaelbling, Andy Zeng, Jonathan Tompson. 2023. Video Language Planning. arXiv:2310.10625
Hao Liu, Matei Zaharia, Pieter Abbeel. 2023. Exploration with Principles for Diverse AI Supervision. arXiv:2310.08899
Weirui Ye, Yunsheng Zhang, Mengchen Wang, Shengjie Wang, Xianfan Gu, Pieter Abbeel, Yang Gao. 2023. Foundation Reinforcement Learning: towards Embodied Generalist Agents with Foundation Prior Assistance. arXiv:2310.02635
Wilson Yan, Danijar Hafner, Stephen James, Pieter Abbeel. 2023. Temporally Consistent Video Transformer for Long-Term Video Prediction. arXiv:2210.02396
Weirui Ye, Yunsheng Zhang, Pieter Abbeel, Yang Gao. 2023. Become a Proficient Player with Limited Data through Watching Pure Videos. The 11 th International Conference on Learning Representations
Mengjiao Yang, Yilun Du, Kamyar Ghasemipour, Jonathan Tompson, Dale Schuurmans, Pieter Abbeel. 2023. Learning Interactive Real-World Simulators. arXiv:2310.06114
Lingjun Zhao, Khanh Nguyen, and Hal Daumé III. 2023. Define, Evaluate, and Improve Task-Oriented Cognitive Capabilities for Instruction Generation Models. In ACL Findings, 2023
R Thomas McCoy, Shunyu Yao, Dan Friedman, Matthew Hardy, Thomas L Griffiths. 2023. Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve. arXiv:2309.13638
Theodore Sumers, Shunyu Yao, Karthik Narasimhan, Thomas L Griffiths. 2023. Cognitive architectures for language agents. arXiv:2309.02427
Rachit Dubey, Matthew Hardy, Tom Griffiths, Rahul Bhui. 2023. AI-generated visuals of car-free American cities help increase support for sustainable transport policies. PsyArXiv - Preprint
Ilia Sucholutsky, Ruairidh M Battleday, Katherine M Collins, Raja Marjieh, Joshua Peterson, Pulkit Singh, Umang Bhatt, Nori Jacoby, Adrian Weller, Thomas L Griffiths. 2023. On the Informativeness of Supervision Signals. Conference Uncertainty in Artificial Intelligence
Bhishma Dedhia, Michael Chang, Jake C Snell, Thomas L Griffiths, Niraj K Jha. 2023. Im-Promptu: In-Context Composition from Image Prompts. arXiv:2305.17262
Minkyu Shin, Jin Kim, Bas van Opheusden, Thomas L Griffiths. 2023. Superhuman artificial intelligence can improve human decision-making by increasing novelty. Proceedings of the National Academy of Sciences
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L Griffiths, Yuan Cao, Karthik Narasimhan. 2023. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. arXiv:2305.10601
Joshua Peterson, Marina Mancoridis, Tom Griffiths. 2023. To each their own theory: Exploring the limits of individual differences in decisions under risk. Proceedings of the Annual Meeting of the Cognitive Science Society
Natalia Vélez, Brian Christian, Mathew Hardy, Bill D Thompson, Thomas L Griffiths. 2023. How do Humans Overcome Individual Computational Limitations by Working Together?. Journal Cognitive Science
Krishnamurthy Dj Dvijotham, Shayegan Omidshafiei, Kimin Lee, Katherine M Collins, Deepak Ramachandran, Adrian Weller, Mohammad Ghavamzadeh, Milad Nasr, Ying Fan, Jeremiah Zhe Liu. 2023. Algorithms for Optimal Adaptation of Diffusion Models to Reward Functions. ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems
Kihyuk Sohn, Nataniel Ruiz, Kimin Lee, Daniel Castro Chin, Irina Blok, Huiwen Chang, Jarred Barber, Lu Jiang, Glenn Entis, Yuanzhen Li, Yuan Hao, Irfan Essa, Michael Rubinstein, Dilip Krishnan. 2023. StyleDrop: Text-to-Image Generation in Any Style. arXiv:2306.00983
Megan Kinniment, Lucas Jun Koba Sato, Haoxing Du, Brian Goodrich, Max Hasin, Lawrence Chan, Luke Harold Miles, Tao R Lin, Hjalmar Wijk, Joel Burget, Aaron Ho, Elizabeth Barnes, Paul Christiano. 2023. Evaluating Language-Model Agents on Realistic Autonomous Tasks. Alignment Research Center, Evaluations Team
Philippe Hansen-Estruch, Ilya Kostrikov, Michael Janner, Jakub Grudzien Kuba, Sergey Levine. 2023. Idql: Implicit q-learning as an actor-critic method with diffusion policies. arXiv:2304.10573
Danqing Wang, Kevin Yang, Hanlin Zhu, Xiaomeng Yang, Andrew Cohen, Lei Li, Yuandong Tian. 2023. Learning Personalized Story Evaluation. arXiv:2310.03304
George Obaido, Friday Joseph Agbo, Christine Alvarado, Solomon Sunday Oyelere. 2023. Analysis of Attrition Studies Within the Computer Sciences. Journal IEEE Access
Simphiwe M Simelane, Phumlani G Dlamini, Fadekemi J Osaye, George Obaido, Blessing Ogbukiri, Kehinde Aruleba, Cadavious M Jones, Chidozie W Chukwu, Oluwaseun F Egbelowo. 2023. Modeling the impact of public health education on tungiasis dynamics with saturated treatment: Insight through the Caputo fractional derivative. Journal; Mathematical Biosciences and Engineering
M. Fishman, N. Kumar, C. Allen, N. Danas, M. Littman, S. Tellex, and G. Konidaris. 2023. Task Scoping: Generating Task-Specific Simplifications of Open-Scope Planning Problems. IJCAI Workshop on Bridging the Gap Between AI Planning and Reinforcement Learning
Tom Zahavy, Vivek Veeriah, Shaobo Hou, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut, Demis Hassabis, Satinder Singh. 2023. Diversifying AI: Towards Creative Chess with AlphaZero. arXiv:2308.09175
David Abel, André Barreto, Benjamin Van Roy, Doina Precup, Hado van Hasselt, Satinder Singh. 2023. A Definition of Continual Reinforcement Learning. arXiv:2307.11046
Robert Lange, Tom Schaul, Yutian Chen, Tom Zahavy, Valentin Dalibard, Chris Lu, Satinder Singh, Sebastian Flennerhag. 2023. Discovering evolution strategies via meta-black-box optimization. Proceedings of the Companion Conference on Genetic and Evolutionary Computation
Jakob Bauer, Kate Baumli, Feryal Behbahani, Avishkar Bhoopchand, Nathalie Bradley-Schmieg, Michael Chang, Natalie Clay, Adrian Collister, Vibhavari Dasagi, Lucy Gonzalez, Karol Gregor, Edward Hughes, Sheleem Kashem, Maria Loks-Thompson, Hannah Openshaw, Jack Parker-Holder, Shreya Pathak, Nicolas Perez-Nieves, Nemanja Rakicevic, Tim Rocktäschel, Yannick Schroecker, Satinder Singh, Jakub Sygnowski, Karl Tuyls, Sarah York, Alexander Zacherl, Lei M Zhang. 2023. Human-Timescale Adaptation in an Open-Ended Task Space. In Proc. 40 th International Conference on Machine Learning
Chris Lu, Yannick Schroecker, Albert Gu, Emilio Parisotto, Jakob Foerster, Satinder Singh, Feryal Behbahani. 2023. Structured state space models for in-context reinforcement learning. arXiv:2303.03982
Bernardo Avila Pires, Feryal Behbahani, Hubert Soyer, Kyriacos Nikiforou, Thomas Keck, Satinder Singh. 2023. Hierarchical Reinforcement Learning in Complex 3D Environments. arXiv:2302.14451
Wilka Carvalho, Angelos Filos, Richard L Lewis, Satinder Singh. 2023. Composing task knowledge with modular successor feature approximators. arXiv:2301.12305
Sebastian Flennerhag, Tom Zahavy, Brendan O'Donoghue, Hado van Hasselt, András György, Satinder Singh. 2023. Optimistic meta-gradients. arXiv:2301.03236
Dieqiao Feng, Yuanqi Du, Carla P Gomes, Bart Selman. 2023. Weighted Sampling without Replacement for Deep Top-k Classification. In Proc. 40 th International Conference on Machine Learning
Hao Liu, Matei Zaharia, Pieter Abbeel. 2023. Ring Attention with Blockwise Transformers for Near-Infinite Context. arXiv:2310.01889
Jiangliu Wang, Jianbo Jiao, Yibing Song, Stephen James, Zhan Tong, Chongjian Ge, Pieter Abbeel, Yun-Hui Liu. 2023. Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training. arXiv:2309.13942
Amber Xie, Youngwoon Lee, Pieter Abbeel, Stephen James. 2023. Language-Conditioned Path Planning. arXiv:2308.16893
Kevin Zakka, Philipp Wu, Laura Smith, Nimrod Gileadi, Taylor Howell, Xue Bin Peng, Sumeet Singh, Yuval Tassa, Pete Florence, Andy Zeng, Pieter Abbeel. 2023. RoboPianist: Dexterous Piano Playing with Deep Reinforcement Learning. 7th Annual Conference on Robot Learning
Hiroshi Yoshitake, Pieter Abbeel. 2023. The Impact of Overall Optimization on Warehouse Automation. arXiv:2308.06036
Nikhil Mishra, Pieter Abbeel, Xi Chen, Maximilian Sieb. 2023. Convolutional Occupancy Models for Dense Packing of Complex, Novel Objects. arXiv:2308.00091
Zhongyu Li, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, Koushil Sreenath. 2023. Robust and versatile bipedal jumping control through reinforcement learning. Journal - Robotics: Science and Systems XIX
Xingyu Lin, John So, Sashwat Mahalingam, Fangchen Liu, Pieter Abbeel. 2023. SpawnNet: Learning Generalizable Visuomotor Skills from Pre-trained Networks. arXiv:2307.03567
David Venuto, Sherry Yang, Pieter Abbeel, Doina Precup, Igor Mordatch, Ofir Nachum. 2023. Multi-environment pretraining enables transfer to action limited datasets. In Proc. 40 th International Conference on Machine Learning
Abdus Salam Azad, Izzeddin Gur, Jasper Emhoff, Nathaniel Alexis, Aleksandra Faust, Pieter Abbeel, Ion Stoica. 2023. CLUTR: curriculum learning via unsupervised task representation learning. In Proc. 40 th International Conference on Machine Learning
Joey Hejna, Pieter Abbeel, Lerrel Pinto. 2023. Improving long-horizon imitation through instruction prediction. In Proc. AAAI Conference on Artificial Intelligence
Xinran Liang, Anthony Han, Wilson Yan, Aditi Raghunathan, Pieter Abbeel. 2023. ALP: Action-Aware Embodied Learning for Perception. arXiv:2306.10190
Wilson Yan, Danijar Hafner, Stephen James, Pieter Abbeel. 2023. Temporally Consistent Transformers for Video Generation. In Proc. 40 th International Conference on Machine Learning
Mengjiao Yang, Yilun Du, Bo Dai, Dale Schuurmans, Joshua B Tenenbaum, Pieter Abbeel. 2023. Probabilistic Adaptation of Text-to-Video Models. arXiv:2306.01872
Gaoyue Zhou, Victoria Dean, Mohan Kumar Srirama, Aravind Rajeswaran, Jyothish Pari, Kyle Hatch, Aryan Jain, Tianhe Yu, Pieter Abbeel, Lerrel Pinto, Chelsea Finn, Abhinav Gupta. 2023. Train Offline, Test Online: A Real Robot Learning Benchmark. arXiv:2306.00942
Dongyoung Kim, Jinwoo Shin, Pieter Abbeel, Younggyo Seo. 2023. Accelerating Reinforcement Learning with Value-Conditional State Entropy Exploration. arXiv:2305.19476
Hao Liu, Pieter Abbeel. 2023. Blockwise Parallel Transformer for Long Context Large Models. arXiv:2305.19370
Hao Liu, Pieter Abbeel. 2023. Emergent agentic transformer from chain of hindsight experience. arXiv:2305.16554
Arnav Gudibande, Eric Wallace, Charlie Snell, Xinyang Geng, Hao Liu, Pieter Abbeel, Sergey Levine, Dawn Song. 2023. The False Promise of Imitating Proprietary LLMs. arXiv:2305.15717
YuXuan Liu, Pieter Abbeel. 2023. Perception for Real-World Robotic Applications. Technical Report No. UCB/EECS-2023-122
YuXuan Liu, Xi Chen, Pieter Abbeel. 2023. Self-Supervised Instance Segmentation by Grasping. arXiv:2305.06305
Philipp Wu, Arjun Majumdar, Kevin Stone, Yixin Lin, Igor Mordatch, Pieter Abbeel, Aravind Rajeswaran. 2023. Masked trajectory models for prediction, representation, and control. arXiv:2305.02968
YuXuan Liu, Nikhil Mishra, Pieter Abbeel, Xi Chen. 2023. Distributional Instance Segmentation: Modeling Uncertainty and High Confidence Predictions with Latent-MaskRCNN. arXiv:2305.01910
Kevin Zakka, Laura Smith, Nimrod Gileadi, Taylor Howell, Xue Bin Peng, Sumeet Singh, Yuval Tassa, Pete Florence, Andy Zeng, Pieter Abbeel. 2023. RoboPianist: A Benchmark for High-Dimensional Robot Control. arXiv:2304.04150
Arjun Majumdar, Karmesh Yadav, Sergio Arnaud, Yecheng Jason Ma, Claire Chen, Sneha Silwal, Aryan Jain, Vincent-Pierre Berges, Pieter Abbeel, Jitendra Malik, Dhruv Batra, Yixin Lin, Oleksandr Maksymets, Aravind Rajeswaran, Franziska Meier. 2023. Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?. arXiv:2303.18240
Sherry Yang, Ofir Nachum, Yilun Du, Jason Wei, Pieter Abbeel, Dale Schuurmans. 2023. Foundation models for decision making: Problems, methods, and opportunities. arXiv:2303.04129
Zhongyu Li, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, Koushil Sreenath. 2023. Robust and versatile bipedal jumping control through multi-task reinforcement learning. arXiv:2302.09450
Yuqing Du, Olivia Watkins, Zihan Wang, Cédric Colas, Trevor Darrell, Pieter Abbeel, Abhishek Gupta, Jacob Andreas. 2023. Guiding pretraining in reinforcement learning with large language models. arXiv:2302.06692
Seohong Park, Kimin Lee, Youngwoon Lee, Pieter Abbeel. 2023. Controllability-Aware Unsupervised Skill Discovery. arXiv:2302.05103
Tianjun Zhang, Fangchen Liu, Justin Wong, Pieter Abbeel, Joseph E Gonzalez. 2023. The Wisdom of Hindsight Makes Language Models Better Instruction Followers. arXiv:2302.05206
Hao Liu, Carmelo Sferrazza, Pieter Abbeel. 2023. Chain of Hindsight Aligns Language Models with Feedback. arXiv:2302.02676
Younggyo Seo, Junsu Kim, Stephen James, Kimin Lee, Jinwoo Shin, Pieter Abbeel. 2023. Multi-view masked world models for visual robotic manipulation. arXiv:2302.02408
Hao Liu, Wilson Yan, Pieter Abbeel. 2023. Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment. arXiv:2302.00902
Yilun Dai, Mengjiao Yang, Bo Dai, Hanjun Dai, Ofir Nachum, Josh Tenenbaum, Dale Schuurmans, Pieter Abbeel. 2023. Learning universal policies via text-guided video generation. arXiv:2302.00111
Xinyang Geng, Arnav Gudibande, Hao Liu, Eric Wallace, Pieter Abbeel, Sergey Levine, Dawn Song. 2023. Koala: A Dialogue Model for Academic Research. BAIR
Ajay Jain, Amber Xie, Pieter Abbeel. 2023. Vectorfusion: Text-to-svg by abstracting pixel-based diffusion models. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition
Jessy Lin, Yuqing Du, Olivia Watkins, Danijar Hafner, Pieter Abbeel, Dan Klein, Anca Dragan. 2023. Learning to model the world with language. arXiv:2308.01399
Vivek Myers, Andre He, Kuan Fang, Homer Walke, Philippe Hansen-Estruch, Ching-An Cheng, Mihai Jalobeanu, Andrey Kolobov, Anca Dragan, Sergey Levine. 2023. Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control. arXiv:2307.00117
Yi Liu, Andreea Bobu, Anca Dragan. 2023. Similarity-Based Representation Learning. Technical Report No. UCB/EECS-2023-78
Andreea Bobu, Yi Liu, Rohin Shah, Daniel S Brown, Anca D Dragan. 2023. SIRL: Similarity-based Implicit Representation Learning. In Proc. 2023 ACM/IEEE International Conference on Human-Robot Interaction
Dylan Cope, Justin Svegliato, Stuart Russell. 2023. Learning to Plan with Tree Search via Deep RL. PRL Workshop Series Bridging the Gap Between AI Planning and Reinforcement Learning
M. Carroll, O. Paradise, J. Lin, R. Georgescu, M. Sun, D. Bignell, S. Milani, K. Hofmann, M. Hausknecht, A.D. Dragan, S. Devlin. 2022. Uni[MASK]: Unified Inference in Sequential Decision Problems. . Conference on Neural Information Processing Systems (NeurIPS), 2022
Kourosh Hakhamaneshi, Marcel Nassar, Mariano Phielipp, Pieter Abbeel, Vladimir StojanoviÄ. 2022. Pretraining Graph Neural Networks for few-shot Analog Circuit Modeling and Design. Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2022
John So*, Amber Xie*, Jeffrey Edlund, Rohan Thakker, Sunggoo Jung, Ali-akbar Agha-mohammadi, Pieter Abbeel, Stephen James. 2022. Sim-to-Real via Sim-to-Seg: End-to-end Off-road Autonomous Driving Without Real Data. Conference on Robot Learning (CoRL)
Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, Trevor Darrell.. 2022. Real-World Robot Learning with Masked Visual Pre-training. Conference on Robot Learning (CoRL)
Younggyo Seo, Danijar Hafner, Hao Liu, Fangchen Liu, Stephen James, Kimin Lee, Pieter Abbeel. 2022. Masked World Models for Visual Control. Conference on Robot Learning (CoRL)
Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg, Pieter Abbeel. 2022. DayDreamer: World Models for Physical Robot Learning. Conference on Robot Learning (CoRL)
Fangchen Liu, Hao Liu, Aditya Grover, Pieter Abbeel. 2022. Masked Autoencoding for Scalable and Generalizable Decision Making. Neural Information Processing Systems (NeurIPS), 2022
Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel. 2022. Director: Deep Hierarchical Planning from Pixels. Neural Information Processing Systems (NeurIPS), 2024
Zhao Mandi, Pieter Abbeel, Stephen James. 2022. On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning. Neural Information Processing Systems (NeurIPS), 2025
Mengjiao (Sherry) Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum. 2022. Chain of Thought Imitation with Procedure Cloning. Neural Information Processing Systems (NeurIPS), 2026
Michael Laskin, Hao Liu, Xue Bin Peng, Denis Yarats, Aravind Rajeswaran, Pieter Abbeel. 2022. CIC: Unsupervised Reinforcement Learning with Contrastive Intrinsic Control. Neural Information Processing Systems (NeurIPS), 2027
Kai Chen, Rui Cao, Stephen James, Yichuan Li, Yun-Hui Liu, Pieter Abbeel, Qi Dou. 2022. Sim-to-Real 6D Object Pose Estimation via Iterative Self-training for Robotic Bin-picking. European Conference on Computer Vision (ECCV)
Albert Zhan, Ruihan (Philip) Zhao, Lerrel Pinto, Pieter Abbeel, Misha Laskin. 2022. Learning Visual Robotic Control Efficiently with Contrastive Pre-training and Data Augmentation. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Younggyo Seo, Kimin Lee, Fangchen Liu, Stephen James, Pieter Abbeel. 2022. Autoregressive Latent Video Prediction with High-Fidelity Image Generator. IEEE International Conference in Image Processing
Litian Liang, Yaosheng Xu, Stephen Mcaleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox. 2022. Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks. International Conference on Machine Learning (ICML)
Younggyo Seo, Kimin Lee, Stephen James, Pieter Abbeel. 2022. Reinforcement Learning with Action-Free Pre-Training from Videos. International Conference on Machine Learning (ICML)
Wenlong Huang, Pieter Abbeel, Deepak Pathak, Igor Mordatch. 2022. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. International Conference on Machine Learning (ICML)
Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, Ben Poole. 2022. Zero-Shot Text-Guided Object Generation with Dream Fields,. Conference on Computer Vision and Pattern Recognition (CVPR)
Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch. 2022. Pretrained Transformers as Universal Computation Engines. Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI)
Jose A Barreiros, Artemis Xu, Sofya Pugach, Narahari Iyengar, Graeme Troxell, Alexander Cornwell, Samantha Hong, Bart Selman, Robert F Shepherd. 2022. Haptic perception using optoelectronic robotic flesh for embodied artificially intelligent agents. Science Robotics
D Feng, CP Gomes, B Selman. 2022. Graph Value Iteration.
D Feng, C Gomes, B Selman. 2022. Left Heavy Tails and the Effectiveness of the Policy and Value Networks in DNN-based best-first search for Sokoban Planning.
Zeyu Zheng, Risto Vuorio, Richard Lewis, and Satinder Singh. 2022. Pairwise Weights for Temporal Credit Assignment. 36th AAAI Conference on Artificial Intelligence
S&C Chang, M., Griffiths, T. L., & Levine, S. . 2022. Object representations as fixed points: Training iterative refinement algorithms with implicit differentiation. Advances in Neural Information Processing Systems 36.
Dasgupta, I., Grant, E., & Griffiths, T. L. 2022. Distinguishing rule- and exemplar-based generalization in learning systems. Proceedings of the International Conference on Machine Learning
ID Mienye, G Obaido, K Aruleba, OA Dada. 2022. Enhanced prediction of chronic kidney disease using feature selection and boosted classifiers. Intelligent Systems Design and Applications: 21st International Conference on Intelligent Systems Design and Applications (ISDA 2021) Held During December 13–15, 2021
G Obaido, K Aruleba, OA Dada, ID Mienye. 2022. Mining Frequently Traveled Routes During COVID-19. Intelligent Systems Design and Applications: 21st International Conference on Intelligent Systems Design and Applications (ISDA 2021) Held During December 13–15, 2021
G Obaido. 2022. PhD thesis: SQL Comprehension and Synthesis. arXiv preprint arXiv:2203.03469
E Esenogho, ID Mienye, TG Swart, K Aruleba, G Obaido. 2022. A neural network ensemble with feature engineering for improved credit card fraud detection. IEEE Access 10, 16400-16407
GO K Aruleba, OA Dada, I Mienye. 2022. Demography of Machine Learning Education Within the K12. Innovations in Bio-Inspired Computing and Applications: Proceedings of the 12th International Conference on Innovations in Bio-Inspired Computing and Applications (IBICA 2021) Held During December 16–18, 2021
Hao Liu, Lisa Lee, Kimin Lee, Pieter Abbeel. 2022. Instruction-Following Agents with Jointly Pre-Trained Vision-Language Models. Arxiv preprint 2022
Paria Rashidinejad, Hanlin Zhu, Kunhe Yang, Stuart Russell, Jiantao Jiao. 2022. Optimal conservative offline RL with general function approximation via augmented Lagrangian. International Conference on Learning Representations (ICLR) 2023
P Rashidinejad, B Zhu, C Ma, J Jiao, S Russell. 2022. Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism. IEEE Transactions on Information Theory 68 (12), 8156-8196
P Rashidinejad. 2022. Reliable Prediction and Decision-Making in Sequential Environments. University of California, Berkeley
Andy Zou, Tristan Xiao, Ryan Jia, Joe Kwon, Mantas Mazeika, Richard Li, Dawn Song, Jacob Steinhardt, Owain Evans, Dan Hendrycks. 2022. Forecasting Future World Events with Neural Networks. NeurIPS 2022
Suneel Belkhale, Dorsa Sadigh. 2022. PLATO: Predicting Latent Affordances Through Object-Centric Play. Proceedings of the 6th Conference on Robot Learning (CoRL), December 2022
Andy Shih, Dorsa Sadigh, Stefano Ermon. 2022. Training and Inference on Any-Order Autoregressive Models the Right Way. Conference on Neural Information Processing Systems (NeurIPS), November 2022
Mark Beliaev*, Andy Shih*, Stefano Ermon, Dorsa Sadigh, Ramtin Pedarsani. 2022. Imitation Learning by Estimating Expertise of Demonstrators. 39th International Conference on Machine Learning (ICML), July 2022
Zihan Wang*, Zhangjie Cao*, Yilun Hao, Dorsa Sadigh. 2022. Weakly Supervised Correspondence Learning. International Conference on Robotics and Automation (ICRA), May 2022
MJ McDonald, D Hadfield-Menell. 2022. Guided imitation of task and motion planning. Conference on Robot Learning, 630-640
Xin Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven H Wang, Ping Luo, Stuart Russell, Pieter Abbeel, Rohin Shah. 2022. An empirical investigation of representation learning for imitation. arXiv preprint arXiv:2205.07886
E Jenner, M Weiler. 2022. Steerable Partial Differential Operators for Equivariant Neural Networks. ICLR
C Lu, JG Kuba, A Letcher, L Metz, CS de Witt, J Foerster. 2022. Discovered policy optimisation. arXiv preprint arXiv:2210.05639
J Grudzien, CAS De Witt, J Foerster. 2022. Mirror learning: A unifying framework of policy optimisation. International Conference on Machine Learning, 7825-7844
Jessy Lin, Geza Kovacs, Aditya Shastry, Joern Wuebker, John DeNero. 2022. Automatic Correction of Human Translations. NAACL 2022
Cem Anil*, Ashwini Pokle*, Kaiqu Liang*, Johannes Treutlein, Yuhuai Wu, Shaojie Bai, J. Zico Kolter, and Roger Grosse. 2022. Path Independent Equilibrium Models Can Better Exploit Test-Time Computation. NeurIPS 2022.
N. Lauffer*, B. Yalcinkaya*, M. Vazquez-Chanlatte, A Shah, and S. Seshia. 2022. Learning Deterministic Finite Automata Decompositions from Examples and Demonstrations. FMCAD 2022.
C. Neary, M. Cubuktepe, N. Lauffer, X. Jin, A. Phillips, Z. Xu, D. Tong, and U. Topcu.. 2022. Multiscale Heterogeneous Optimal Lockdown Control for COVID-19 Using Geographic Information. Scientific Reports 2022
Scott Emmons, Benjamin Eysenbach, Ilya Kostrikov, and Sergey Levine. 2022. RvS: What is Essential for Offline RL via Supervised Learning?. International Conference on Learning Representations, 2022
Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei*, Anjana Arunkumar*, Arjun Ashok*, Arut Selvan Dhanasekaran*, Atharva Naik*, David Stap*, Eshaan Pathak*, Giannis Karamanolakis*, Haizhi Gary Lai*, Ishan Purohit*, Ishani Mondal*, Jacob Anderson*, Kirby Kuznia*, Krima Doshi*, Maitreya Patel*, Kuntal Kumar Pal*, Mehrad Moradshahi*, Mihir Parmar*, Mirali Purohit*, Neeraj Varshney*, Phani Rohitha Kaza*, Pulkit Verma*, Ravsehaj Singh Puri*, Rushang Karia*, Shailaja Keyur Sampat*, Savan Doshi*, Siddharth Deepak Mishra*, Sujan Reddy*, Sumanta Patro*, Tanay Dixit*, Xudong Shen*, Chitta Baral, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi, and Daniel Khashabi.. 2022. Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ Tasks.. 2022 Conference on Empirical Methods in Natural Language Processing,
Tianjun Zhang, Paria Rashidinejad, Jiantao Jiao, Yuandong Tian, Joseph E. Gonzalez, and Stuart Russell. 2022. MADE: Exploration via Maximizing Deviation from Explored Regions. In Advances in Neural Information Processing Systems 34, 2022
Paria Rashidinejad, Banghua Zhu, Cong Ma, Jiantao Jiao, and Stuart Russell. 2022. Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism. In Advances in Neural Information Processing Systems 34, 2022
Cynthia Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven Wang, Ping Luo, Stuart Russell, Pieter Abbeel, and Rohin Shah. 2022. An Empirical Investigation of Representation Learning for Imitation. In Advances in Neural Information Processing Systems 34, 2022
Arnaud Fickinger, Samuel Cohen, Stuart Russell, Brandon Amos. 2022. Cross-Domain Imitation Learning via Optimal Transport. ICLR 2022
Arnaud Fickinger, Hengyuan Hu, Brandon Amos, Stuart Russell, Noam Brown. 2022. Scalable Online Planning via Reinforcement Learning Fine-Tuning.
Dan Hendrycks, Collin Burns, Anya Chen, Spencer Ball. 2021. CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review.
Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt. 2021. Measuring mathematical problem solving with the math dataset.
George Matheos, Alexander K. Lew, Matin Ghavamizadeh, Stuart Russell, Marco Cusumano-Towner, Vikash K. Mansinghka. 2021. Transforming Worlds: Automated Involutive MCMC for Open-Universe Probabilistic Models. Proc. 3rd Symposium on Advances in Approximate Bayesian Inference (AABI)
Feiran Jia, Aditya Mate, Zun Li, Shahin Jabbari, Mithun Chakraborty, Milind Tambe, Michael Wellman, Yevgeniy Vorobeychik. 2021. A Game-Theoretic Approach for Hierarchical Policy-Making.
Arnaud Fickinger, Hengyuan Hu, Brandon Amos, Stuart Russell, Noam Brown . 2021. Scalable Online Planning via Reinforcement Learning Fine-Tuning. NEurIPS 2021
Hao Liu, Pieter Abbeel. 2021. Behavior From the Void: Unsupervised Active Pre-Training. NeurIPS 2021
Adam Stooke, Kimin Lee, Pieter Abbeel, Michael Laskin. 2021. Decoupling Representation Learning from Reinforcement Learning. Proceedings of the 38th International Conference on Machine Learning
Younggyo Seo, Lili Chen, Jinwoo Shin, Honglak Lee, Pieter Abbeel, Kimin Lee . 2021. State Entropy Maximization with Random Encoders for Efficient Exploration. ICML 2021
Roshan Rao, Jason Liu, Robert Verkuil, Joshua Meier, John F. Canny, Pieter Abbeel, Tom Sercu, Alexander Rives. 2021. MSA Transformer. bioRxiv
Hao Liu, Pieter Abbeel. 2021. APS: Active Pretraining with Successor Features. ICML 2021
Boyuan Chen, Pieter Abbeel, Deepak Pathak. 2021. Unsupervised Learning of Visual 3D Keypoints for Control. ICML 2021
Ajay Jain, Matthew Tancik, Pieter Abbeel. 2021. Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis.
Paras Jain, Ajay Jain, Tianjun Zhang, Pieter Abbeel, Joseph E. Gonzalez, Ion Stoica. 2021. Contrastive Code Representation Learning.
Seunghyun Lee, Younggyo Seo, Kimin Lee, Pieter Abbeel, Jinwoo Shin. 2021. Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble. CoRL 2021
Wenling Shang, Xiaofei Wang, Aravind Srinivas, Aravind Rajeswaran, Yang Gao, Pieter Abbeel, Michael Laskin. 2021. Reinforcement Learning with Latent Flow.
Lili Chen, Kimin Lee, Aravind Srinivas, Pieter Abbeel. 2021. Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings.
Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch. 2021. Decision Transformer: Reinforcement Learning via Sequence Modeling.
Charles Packer, Pieter Abbeel, Joseph E. Gonzalez . 2021. Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL. NeurIPS 2021
Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, Yang Gao. 2021. Mastering Atari Games with Limited Data. NeurIPS 2021
Michael Laskin, Denis Yarats, Hao Liu, Kimin Lee, Albert Zhan, Kevin Lu, Catherine Cang, Lerrel Pinto, Pieter Abbeel. 2021. URLB: Unsupervised Reinforcement Learning Benchmark.
Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch. 2021. Pretrained Transformers as Universal Computation Engines.
Abdus Salam Azad, Edward Kim, Qiancheng Wu, Kimin Lee, Ion Stoica, Pieter Abbeel, Sanjit A. Seshia. 2021. Scenic4RL: Programmatic Modeling and Generation of Reinforcement Learning Environments.
Ellis Ratner; Andrea Bajcsy; Terrence Fong; Claire J. Tomlin; Anca D. Dragan. 2021. Efficient Dynamics Estimation With Adaptive Model Sets. IEEE Robotics and Automation Letters
Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado van Hasselt, David Silver, Satinder Singh. 2021. Discovery of Options via Meta-Learned Subgoals.
Zeyu Zheng, Vivek Veeriah, Risto Vuorio, Richard Lewis, Satinder Singh. 2021. Learning State Representations from Random Deep Action-Conditional Predictions. NeurIPS 2021
Jonathan Stray. 2021. Making Algorithms Work for Reporting.
Nemanja Djuric, Henggang Cui, Zhaoen Su, Shangxuan Wu, Huahua Wang, Fang-Chieh Chou, Luisa San Martin, Song Feng, Rui Hu, Yang Xu, Alyssa Dayan, Sidney Zhang, Brian C Becker, Gregory P Meyer, Carlos Vallespi-Gonzalez, Carl K Wellington. 2021. Multixnet: Multiclass multistage multimodal motion prediction.
Arnaud Fickinger, Natasha Jaques, Samyak Parajuli, Michael Chang, Nicholas Rhinehart, Glen Berseth, Stuart Russell, Sergey Levine. 2021. Explore and Control with Adversarial Surprise.
Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora, Ethan Guo, Collin Burns, Samir Puranik, Horace He, Dawn Song, Jacob Steinhardt. 2021. Measuring Coding Challenge Competence With APPS. NeurIPS 2021
Minqi Jiang, Michael Dennis, Jack Parker-Holder, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel. 2021. Replay-Guided Adversarial Environment Design. NeurIPS 2021
Abhinav Bhatia, Justin Svegliato, Shlomo Zilberstein. 2021. On the benefits of randomly adjusting anytime weighted A*.
Shane Parr, Ishan Khatri, Justin Svegliato, Shlomo Zilberstein. 2021. Agent-aware state estimation for autonomous vehicles.
Connor Basich, Justin Svegliato, Allyson Beach, Kyle H. Wray, Stefan Witwicki, Shlomo Zilberstein. 2021. Improving Competence via Iterative State Space Refinement. IROS 2021
Abhinav Bhatia, Justin Svegliato, Shlomo Zilberstein. 2021. Tuning the hyperparameters of anytime planning: A deep reinforcement learning approach.
Hankook Lee, Kibok Lee, Kimin Lee, Honglak Lee, Jinwoo Shin. 2021. Improving Transferability of Representations via Augmentation-Aware Self-Supervision. NeurIPS 2021
Paria Rashidinejad, Xiao Hu, Stuart Russell. 2020. Patient-adaptable intracranial pressure morphology analysis using a probabilistic model-based approach. Physiological Measurement
Sam Toyer, Felipe Trevizan, Sylvie Thiebaux, Lexing Xie. 2020. ASNets: Deep Learning for Generalised Planning. JAIR
Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt. 2020. Measuring Massive Multitask Language Understanding. ICLR 2021
Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, Sergey Levine. 2020. Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design. NeurIPS 2020
Scott Emmons, Ajay Jain, Michael Laskin, Thanard Kurutach, Pieter Abbeel, Deepak Pathak. 2020. Sparse Graphical Memory for Robust Planning. NeurIPS 2020
Thomas Krendl Gilbert, Andrew Loveridge. 2020. Subjectifying objectivity: Delineating tastes in theoretical quantum gravity research. Social Studies of Science
Oliver Richardson, Joseph Y Halpern. 2020. Probabilistic Dependency Graphs. AAAI 2021
Janarthanan Rajendran, Richard Lewis, Vivek Veeriah, Honglak Lee, Satinder Singh. 2020. How Should an Agent Practice?. AAAI-2020
Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss, Hado Van Hasselt, David Silver, Satinder Singh. 2020. What Can Learned Intrinsic Rewards Capture?. ICML
IEEE Transactions on Robotics. 2019. Bayesian Relational Memory for Semantic Visual Navigation. ICCV 2019
Prasad Tadepall, Cameron Barrie, Stuart J. Russell. 2019. Learning Causal Trees with Latent Variables via Controlled Experimentation. AAAI 2019
Junhyuk Oh, Yijie Guo, Satinder Singh, Honglak Lee. 2018. Self-Imitation Learning. ICML 2018
Thanard Kurutach, Aviv Tamar, Ge Yang, Stuart Russell, Pieter Abbeel. 2018. Learning Plannable Representations with Causal InfoGAN. ICML 2018 Workshop on Planning and Learning
Vivek Veeriah, Junhyuk Oh, Satinder Singh. 2018. Many-Goals Reinforcement Learning. (Preprint)
Yi Wu, Siddharth Srivastava, Nicholas Hay, Simon Du, Stuart Russell. 2018. Discrete-Continuous Mixtures in Probabilistic Programming: Generalized Semantics and Inference Algorithms. ICML 2018
Han-Chin Shing, Suraj Nair, Ayah Zirikly, Meir Friedenberg, Hal Daumé III, Philip Resnik. 2018. Expert, Crowdsourced, and Machine Assessment of Suicide Risk via Online Postings. Workshop on Computational Linguistics and Clinical Psychology 2018
Paul Krueger, Falk Lieder, Tom Griffiths. 2017. Enhancing metacognitive reinforcement learning using reward structures and feedback. CogSci 2017

3.3. Cognitive science, uncategorized

Ryan Liu, Howard Yen, Raja Marjieh, Thomas L. Griffiths, Ranjay Krishna. 2023. . arXiv:2311.00687
Kerem Oktar, Ilia Sucholutsky, Tania Lombrozo, Thomas L. Griffiths. 2023. Dimensions of Disagreement: Unpacking Divergence and Misalignment in Cognitive Science and Artificial Intelligence. arXiv:2310.12994
Qihong Lu, Tan T Nguyen, Uri Hasson, Thomas L Griffiths, Jeffrey M Zacks, Samuel J Gershman, Kenneth A Norman. 2023. Toward a More Neurally Plausible Neural Network Model of Latent Cause Inference. Conference on Cognitive Computational Neuroscience
Mathew D. Hardy, Bill D. Thompson, P. M. Krafft & Thomas L. Griffiths. 2023. Resampling reduces bias amplification in experimental social networks. Nature Human Behaviour Journal
Casey Lewry, Deborah Kelemen, Tania Lombrozo. 2023. The moral consequences of teleological beliefs about the human species. Journal of Experimental Psychology: General
Kerem Oktar, Adam Lerner, Maya Malaviya, Tania Lombrozo. 2023. Philosophy instruction changes views on moral controversies by decreasing reliance on intuition. Journal Cognition
Scientific and Religious Explanations, Together and Apart. 2023. Scientific and Religious Explanations, Together and Apart. Conjunctive Explanations
Neil Van Leeuwen, Tania Lombrozo. 2023. The Puzzle of Belief. Journal Cognitive science
Casey Lewry, Sana Asifriyaz, Tania Lombrozo. 2023. Intuitive theories of moral progress. Proceedings of the Annual Meeting of the Cognitive Science Society
Kerem Oktar, Tania Lombrozo. 2023. Ideological Differences in Paths to Persistence. Proceedings of the Annual Meeting of the Cognitive Science Society
Erik Brockbank, Tania Lombrozo, Alison Gopnik, Caren M Walker. 2023. Ask me why, don’t tell me why: Asking children for explanations facilitates relational thinking. Journal Developmental science
Daphna Buchsbaum, Rebekah Gelpi, A Whalen, Thomas L Griffiths, Fei Xu. 2023. Can Children Balance Majority Size with Information Quality in Learning About Preferences?. PsyArXiv - Preprint
Abdullah Almaatouq, Thomas L Griffiths, Jordan Suchow, Mark E Whiting, James Evans, Duncan J Watts. 2023. Replies to commentaries on Beyond Playing 20 Questions with Nature. PsyArXiv - Preprint
Stefan Uddenberg, Bill D Thompson, Madalina Vlasceanu, Thomas L Griffiths, Alexander Todorov. 2023. Iterated learning reveals stereotypes of facial trustworthiness that propagate in the absence of evidence. Journal Cognition
Raja Marjieh, Nori Jacoby, Joshua C Peterson, Thomas L Griffiths. 2023. The Universal Law of Generalization Holds for Naturalistic Stimuli. arXiv:2306.08564
R Thomas McCoy, Thomas L Griffiths. 2023. Modeling rapid language learning by distilling Bayesian priors into artificial neural networks. arXiv:2305.14701
Mayank Agrawal, Joshua C Peterson, Jonathan D Cohen, Thomas L Griffiths. 2023. Stress, intertemporal choice, and mitigation behavior during the COVID-19 pandemic. Journal of Experimental Psychology: General
Aditi Jha, Joshua C. Peterson, Thomas L. Griffiths. 2023. Extracting low‐dimensional psychological representations from convolutional neural networks. Journal: Cognitive Science
Theodore R Sumers, Mark K Ho, Robert D Hawkins, Thomas L Griffiths. 2023. Show or Tell? Exploring when (and why) teaching with language outperforms demonstration. Journal Cognition
Raja Marjieh, Ilia Sucholutsky, Pol van Rijn, Nori Jacoby, Thomas L Griffiths. 2023. What language reveals about perception: Distilling psychophysical knowledge from large language models. arXiv:2302.01308
Ilia Sucholutsky, Thomas L Griffiths. 2023. Alignment with human representations supports robust few-shot learning. arXiv:2301.11990
Thomas L Griffiths, Sreejan Kumar, R Thomas McCoy. 2023. On the hazards of relating representations and inductive biases. Journal Behavioral and Brain Sciences
Mathew Hardy, Ilia Sucholutsky, Bill Thompson, Tom Griffiths. 2023. Large language models meet cognitive science: Llms as tools, models, and participants. Proceedings of the Annual Meeting of the Cognitive Science Society
Raja Marjieh, Thomas L Griffiths, Nori Jacoby. 2023. Musical pitch has multiple psychological geometries. Journal bioRxiv
Minae Kwon, Hengyuan Hu, Vivek Myers, Siddharth Karamcheti, Anca Dragan, Dorsa Sadigh. 2023. Toward Grounded Social Reasoning. arXiv:2306.08651
EG Liquin, T Lombrozo. 2022. Motivated to learn: An account of explanatory satisfaction. Cognitive Psychology 132
E Foster-Hanson, T Lombrozo. 2022. How “is” shapes “ought” for folk-biological concepts. Cognitive Psychology 139
N Vasil, A Ruggeri, T Lombrozo. 2022. When and how children use explanations to guide generalizations. Cognitive Development 61, 101144
R Dubey, TL Griffiths, T Lombrozo. 2022. If it’s important, then I’m curious: Increasing perceived usefulness stimulates curiosity. Cognition 226
TH Vrantsidis, T Lombrozo. 2022. Simplicity as a Cue to Probability: multiple roles for Simplicity in Evaluating Explanations. Cognitive science 46
T Davoodi, T Lombrozo. 2022. Varieties of ignorance: Mystery and the unknown in science and religion. Cognitive science 46 (4), e13129
E Foster-Hanson, T Lombrozo. 2022. What are men and mothers for? The causes and consequences of functional reasoning about social categories. Proceedings of the Annual Meeting of the Cognitive Science Society 44 (44)
C Lewry, T Lombrozo. 2022. Ethical Explanations. Proceedings of the Annual Meeting of the Cognitive Science Society 44 (44)
K Oktar, T Lombrozo. 2022. Mechanisms of Belief Persistence in the Face of Societal Disagreement. Proceedings of the Annual Meeting of the Cognitive Science Society 44 (44)
Agrawal, M., Peterson, J. C., Cohen, J. D., & Griffiths, T. L. 2022. Stress, Intertemporal Choice, and Mitigation Behavior During the COVID-19 Pandemic. PsyArXiv Preprints
Callaway, F., Jain, Y. R., van Opheusden, B., Das, P., Iwama, G., Gul, S., Krueger, P. M., Becker, F., Griffiths, T. L., & Lieder, F.. 2022. Leveraging artificial intelligence to improve people’s planning strategies. Proceedings of the National Academy of Sciences.
Dubey, R., Griffiths, T. L., & Lombrozo, T. 2022. If it’s important, then I’m curious: Increasing perceived usefulness stimulates curiosity. Cognition, 226, 105193
Gates, V., Suchow, J. W., & Griffiths, T. L. 2022. Memory transmission in small groups and large networks: An empirical study. Psychonomic Bulletin & Review, 29(2), 581-588
Ho, M. K., & Griffiths, T. L.. 2022. Cognitive science as a source of forward and inverse models of human decisions for robotics and control. Annual Review of Control, Robotics, and Autonomous Systems, 5, 33-53.
DMRL Kumar, S., Correa, C. G., Dasgupta, I., Marjieh, R., Hu, M. Y., Hawkins, R.D., Daw, N. D., Cohen, J. D., Narasimhan, K. R., & Griffiths, T. L.. 2022. Using Natural Language and Program Abstractions to Instill Human Inductive Biases in Machines.. Advances in Neural Information Processing Systems, 36
Kumar, S., Dasgupta, I., Hu, M. Y., Marjieh, R., Hawkins, R. D., Daw, N., Cohen, J., Narasimhan, K. R., & Griffiths, T. L.. 2022. Using Natural Language to Guide Meta-Learning Agents towards Human-like Inductive Biases. BACL 1st Workshop on Learning with Natural Language Supervision
Kumar, S., Dasgupta, I., Marjieh, R., Daw, N. D., Cohen, J. D., & Griffiths, T. L.. 2022. Disentangling Abstraction from Statistical Pattern Matching in Human and Machine Learning..
Kumar, S., Sumers, T. R., Yamakoshi, T., Goldstein, A., Hasson, U., Norman, K. A., Griffiths, T. L., Hawkins, R. D., Nastase, S. A.. 2022. Reconstructing the cascade of language processing in the brain using the internal computations of a transformer-based language model.
Malaviya, M., Sucholutsky, I., Oktar, K., & Griffiths, T. L. 2022. Can Humans Do Less-Than-One-Shot Learning?. Proceedings of the 44th Annual Conference of the Cognitive Science Society
Marjieh, R., Sucholutsky, I., Sumers, T. R., Jacoby, N., & Griffiths, T. L. 2022. Predicting Human Similarity Judgments Using Large Language Models.. Proceedings of the 44th Annual Conference of the Cognitive Science Society.
Morgan, T. J., Suchow, J. W., & Griffiths, T. L.. 2022. The experimental evolution of human culture: flexibility, fidelity and environmental instability. Proceedings of the Royal Society B, 289(1986), 20221614
SML Murthy, S. K., Hawkins, R. D., & Griffiths, T. L. . 2022. Shades of confusion: Lexical uncertainty modulates ad hoc coordination in an interactive communication task.. Cognition, 225, 105152
Peterson, J. C., Uddenberg, S., Griffiths, T. L., Todorov, A., & Suchow, J. W. . 2022. Deep models of superficial face judgments. Proceedings of the National Academy of Sciences, 119(17), e2115228119.
CEIL Thompson, B., van Opheusden, B., Sumers, T., & Griffiths, T. L. . 2022. Complex cognitive algorithms preserved by selective social learning in experimental populations. Science, 376(6588), 95-98
J Persons, V Gates. 2022. Relationship to CBT outcome and dropout of decision support tools of the written case formulation, list of treatment goals, and plot of symptom scores. PsyArXiv
George Matheos, Andrew D. Bolton, McCoy Becker, Cameron Freer, Vikash K. Mansinghka. 2022. Brain computation as fast spiking neural Monte Carlo inference in probabilistic programs. MIT Quest for Intelligence
Ethan A. Brooks, Janarthanan Rajendran, Richard L. Lewis, Satinder Singh. 2021. Reinforcement Learning of Implicit and Explicit Control Flow Instructions.
Thomas A. Langlois, H. Charles Zhao, Erin Grantd, Ishita Dasguptae, Thomas L. Griffiths, and Nori Jacoby. 2021. Passive Attention in Artificial Neural Networks Predicts Human Visual Selectivity. NeurIPS 2021
Stephan C. Meylan, Sathvik Nair, Thomas L. Griffiths. 2021. Evaluating models of robust word recognition with serial reproduction. Cognition 2021
Casey Lewry, Kaley Curtis, Nadya Vasilyeva, Fei Xu, Thomas L. Griffiths. 2021. Intuitions about magic track the development of intuitive physics. Cognition 2021
Arjun Devraj, Qiong Zhang, Thomas L. Griffiths. 2021. The Dynamics of Exemplar and Prototype Representations Depend on Environmental Statistics.
Ni Ji, Gurrein K Madan, Guadalupe I Fabre, Alyssa Dayan, Casey M Baker, Talya S Kramer, Ijeoma Nwabudike, Steven W Flavell. 2021. A neural circuit for flexible control of persistent behavioral states. eLife 2021
Aditi Jha, Joshua Peterson, Thomas L. Griffiths. 2020. Extracting low-dimensional psychological representations from convolutional neural networks. CogSci 2020
Alexander Todorov, Stefan Uddenberg, Joshua Peterson, Thomas Griffiths, Jordan Suchow. 2020. Data-Driven, Photorealistic Social Face-Trait Encoding, Prediction, and Manipulation Using Deep Neural Networks. Patent application
Antonia Langenhoff, Alex Wiegmann, Joseph Y. Halpern, Joshua B. Tenenbaum, Tobias Gerstenberg. 2020. Predicting responsibility judgments from dispositional inferences and causal attributions. (Preprint)
Mayank Agrawal, Joshua C. Peterson, Thomas L. Griffiths. 2020. Scaling up psychology via Scientific Regret Minimization. PNAS 2020
R. Dubey, T. L. Griffiths. 2020. Reconciling novelty and complexity through a rational analysis of curiosity. Psychological Review, 127(3), 455–476
Sophia Sanborn, Michael Chang, Sergey Levine, Thomas Griffiths. 2020. Sparse Skill Coding: Learning Behavioral Hierarchies with Sparse Codes. ICLR 2020 submission
Thomas J. H. Morgan, Jordan W. Suchow, Thomas L. Griffiths. 2020. What the Baldwin Effect affects depends on the nature of plasticity. Cognition, 197
Max Kleiman-Weiner, Felix Sosa, Bill Thompson, Sebastiaan van Opheusden, Tom Griffiths, Samuel Gershman, Fiery Cushman. 2020. Downloading Culture.zip: Social learning by program induction. CogSci 2020
Anne S. Hsu, Jay B. Martin, Adam N. Sanborn, Thomas L. Griffiths. 2019. Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods 51:1706–1716
Mathew Hardy, Tom Griffiths. 2019. Demonstrating the Impact of Prior Knowledge in Risky Choice. (Preprint)
Arnon Lotem, Joseph Y. Halpern, Shimon Edelman, Oren Kolodny. 2017. The evolution of cognitive mechanisms in response to cultural innovations. PNAS
David Bourgin, Falk Lieder, Daniel Reichman, Nimrod Talmon, Tom Griffiths. 2017. The Structure of Goal Systems Predicts Human Performance. CogSci 2017

3.4. Ethics for AI and AI development

Alistair Knott, Dino Pedreschi, Raja Chatila, Tapabrata Chakraborti, Susan Leavy, Ricardo Baeza-Yates, David Eyers, Andrew Trotman, Paul D. Teal, Przemyslaw Biecek, Stuart Russell, Yoshua Bengio. 2023. Generative AI models should include detection mechanisms as a condition for public release. Ethics and Information Technology Journal
Alan Chan, Rebecca Salganik, Alva Markelius, Chris Pang, Nitarshan Rajkumar, Dmitrii Krasheninnikov, Lauro Langosco, Zhonghao He, Yawen Duan, Micah Carroll, Michelle Lin, Alex Mayhew, Katherine Collins, Maryam Molamohammadi, John Burden, Wanru Zhao, Shalaleh Rismani, Konstantinos Voudouris, Umang Bhatt, Adrian Weller, David Krueger, Tegan Maharaj. 2023. Harms from Increasingly Agentic Algorithmic Systems. Journal FAccT 2023
Samer B Nashed, Justin Svegliato, Su Linn Blodgett. 2023. Fairness and sequential decision making: Limits, lessons, and opportunities. arXiv:2301.05753
Chinasa T Okolo, Kehinde Aruleba, George Obaido. 2023. Responsible AI in Africa—Challenges and Opportunities. Springer International Publishing
Jonathan Stray. 2023. Editorial Values for News Recommenders: Translating Principles to Engineering. News Quality in the Digital Age
Jonathan Stray. 2023. The AI Learns to Lie to Please You: Preventing Biased Feedback Loops in Machine-Assisted Intelligence Analysis. Analytics 2023
Smitha Milli, Micah Carroll, Yike Wang, Sashrika Pandey, Sebastian Zhao, Anca D. Dragan. 2023. Engagement, User Satisfaction, and the Amplification of Divisive Content on Social Media. arXiv:2305.16941
Alistair Knott, Dino Pedreschi, Raja Chatila, Susan Leavy, Ricardo Baeza-Yates, Tapabrata Chakraborti, David Eyers, Andrew Trotman, Lama Saouma, Virginia Morini, Valentina Pansanella, Paul D. Teal, Przemyslaw Biecek, Ivan Bratko, Stuart Russell, and Yoshua Bengio. 2023. State-of-the-art Foundation AI Models Should be Accompanied by Detection Mechanisms as a Condition of Public Release. Global Partnership on Artificial Intelligence
RJ Yew, D Hadfield-Menell. 2022. A Penalty Default Approach to Preemptive Harm Disclosure and Mitigation for AI Systems. Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, 823-830
Thomas Krendl Gilbert, Micah Carroll. 2022. Trade Regulation Rule on Commercial Surveillance and Data Security Rulemaking. Federal Trade Commission
The Anh Han, Tom Lenaerts, Francisco C.Santos, Luís Moniz Pereira. 2022. Voluntary safety commitments provide an escape from over-regulation in AI development. Technology in Society Volume 68
Theodor Cimpeanu, Francisco C. Santos, Luís Moniz Pereira, Tom Lenaerts, The Anh Han. 2022. Artificial intelligence development races in heterogeneous settings. Scientific Reports volume 12, Article number: 1723 (2022)
The Anh Han, Tom Lenaerts, Francisco C. Santos, Luís Moniz Pereira,. 2022. Voluntary safety commitments provide an escape from over-regulation in AI development. Technology in Society, Volume 68
Thomas Krendl Gilbert, Sarah Dean, Tom Zick, Nathan Lambert. 2022. Choices, Risks, and Reward Reports: Charting Public Policy for Reinforcement Learning Systems. UC Berkeley CLTC White Paper Series
Thomas Krendl Gilbert. 2021. Mapping the Political Economy of Reinforcement Learning Systems: The Case of Autonomous Vehicles. Simons Institute Newsletter
Miles Brundage, Shahar Avin, Jasmine Wang, Haydn Belfield, Gretchen Krueger, Gillian Hadfield, Heidy Khlaaf, Jingying Yang, Helen Toner, Ruth Fong, Tegan Maharaj, Pang Wei Koh, Sara Hooker, Jade Leung, Andrew Trask, Emma Bluemke, Jonathan Lebensold, Cullen O'Keefe, Mark Koren, Théo Ryffel, JB Rubinovitz, Tamay Besiroglu, Federica Carugati, Jack Clark, Peter Eckersley, Sarah de Haas, Maritza Johnson, Ben Laurie, Alex Ingerman, Igor Krawczuk, Amanda Askell, Rosario Cammarota, Andrew Lohn, David Krueger, Charlotte Stix, Peter Henderson, Logan Graham, Carina Prunkl, Bianca Martin, Elizabeth Seger, Noa Zilberman, Seán Ó hÉigeartaigh, Frens Kroeger, Girish Sastry, Rebecca Kagan, Adrian Weller, Brian Tse, Elizabeth Barnes, Allan Dafoe, Paul Scharre, Ariel Herbert-Voss, Martijn Rasser, Shagun Sodhani, Carrick Flynn, Thomas Krendl Gilbert, Lisa Dyer, Saif Khan, Yoshua Bengio, Markus Anderljung. 2020. Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims. (Preprint)
Ravit Dotan, Smitha Milli. 2020. Value-laden Disciplinary Shifts in Machine Learning. (Preprint)
Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, Jacob Steinhardt. 2020. Aligning AI With Shared Human Values. ICLR 2021
John Miller, Smitha Milli, Moritz Hardt. 2019. Strategic Classification is Causal Modeling in Disguise. FAT* 2019
McKane Andrus, Thomas Krendl Gilbert. 2019. Towards a Just Theory of Measurement: A Principled Social Measurement Assurance Program for Machine Learning. AIES 2019
Roel Dobbe, Thomas Krendl Gilbert, Yonatan Mintz. 2019. Hard Choices in Artificial Intelligence: Addressing Normative Uncertainty through Sociotechnical Commitments. NeurIPS 2019
Smitha Milli, John Miller, Anca D. Dragan, Moritz Hardt. 2019. The Social Cost of Strategic Classification. FAT* 2019
Thomas Krendl Gilbert, Yonatan Mintz. 2019. Epistemic Therapy for Bias in Automated Decision-Making. AIES 2019
Roel Dobbe, Sarah Dean, Thomas Gilbert, Nitin Kohli. 2018. A Broader View on Bias in Automated Decision-Making: Reflecting on Epistemology and Dynamics. FAT/ML 2018

3.5. Robust inference, learning, and planning

Michael Y Li, Erin Grant, Thomas L Griffiths. 2023. Gaussian process surrogate models for neural networks. Conference Uncertainty in Artificial Intelligence
Zi Wang, Alexander Ku, Jason Baldridge, Thomas L Griffiths, Been Kim. 2023. Gaussian Process Probes (GPP) for Uncertainty-Aware Probing. arXiv:2305.18213
Michael Chang, Alyssa L Dayan, Franziska Meier, Thomas L Griffiths, Sergey Levine, Amy Zhang. 2023. Neural Constraint Satisfaction: Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement. arXiv:2303.11373
Michael K Cohen. 2023. Pessimistic Bayesianism for Conservative Optimization and Imitation. University of Oxford
Gaurav Rohit Ghosal, Amrith Setlur, Daniel S Brown, Anca Dragan, Aditi Raghunathan. 2023. Contextual Reliability: When Different Features Matter in Different Contexts. In Proc. 40 th International Conference on Machine Learning
Alexander Lew, George Matheos, Matin Ghavamizadeh, Nishad Gothoskar, Stuart Russell, and Vikash Mansinghka. 2023. SMCP3: Sequential Monte Carlo with Probabilistic Program Proposals. In Proc. Twenty-Sixth International Conference on Artificial Intelligence and Statistics
Thomas Krendl Gilbert , Aaron J. Snoswell , Michael Dennis , Rowan McAllister , and Cathy Wu. 2022. Sociotechnical Specification for the Broader Impacts of Autonomous Vehicles. Fresh Perspectives on the Future of Autonomous Driving workshop, ICRA 2022
YuXuan (Andrew) Liu, Nikhil Mishra, Maximilian Sieb, Fred Shentu, Pieter Abbeel, Peter Chen. 2022. Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction. European Conference on Computer Vision (ECCV)
Kyle Wray*, Stas Tiomkin*, Mykel Korchenderfer, Pieter Abbeel. 2022. Multi-Objective Policy Gradients with Topological Constraints. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Qiyang (Colin) Li, Ajay Jain, Pieter Abbeel. 2022. AdaCat: Adaptive Categorical Discretization for Autoregressive Models. Conference on Uncertainty in Artificial Intelligence (UAI)
Abdus Salam Azad, Edward Kim, Qiancheng Wu, Kimin Lee, Ion Stoica, Pieter Abbeel, Sanjit A. Seshia.. 2022. Scenic4RL: Programmatic Modeling and Generation of Reinforcement Learning Environments. Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI)
Mohammadreza Salehi, Hossein Mirzaei, Dan Hendrycks, Yixuan Li, Mohammad Hossein Rohban, Mohammad Sabokrou. 2022. A Unified Survey on Anomaly, Novelty, Open-Set, and Out-of-Distribution Detection: Solutions and Future Challenges. TMLR
Jingkang Yang, Pengyun Wang, Dejian Zou, Zitang Zhou, Kunyuan Ding, WENXUAN PENG, Haoqi Wang, Guangyao Chen, Bo Li, Yiyou Sun, Xuefeng Du, Kaiyang Zhou, Wayne Zhang, Dan Hendrycks, Yixuan Li, Ziwei Liu. 2022. OpenOOD: Benchmarking Generalized Out-of-Distribution Detection. NeurIPS 2022
Jiachen Sun, Akshay Mehra, Bhavya Kailkhura, Pin-Yu Chen, Dan Hendrycks, Jihun Hamm, Zhuoqing Mao. 2022. A Spectral View of Randomized Smoothing under Common Corruptions: Benchmarking and Improving Certified Robustness. ECCV 2022
Dan Hendrycks*, Andy Zou*, Mantas Mazeika, Leonard Tang, Bo Li, Dawn Song, and Jacob Steinhardt. 2022. PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures. CVPR 2022
Dan Hendrycks*, Steven Basart*, Mantas Mazeika, Andy Zou, Joe Kwon, Mohammadreza Mostajabi, Jacob Steinhardt, Dawn Song. 2022. Scaling Out-of-Distribution Detection for Real-World Settings. ICML 2022
KC Hsu, DP Nguyen, JF Fisac. 2022. ISAACS: Iterative Soft Adversarial Actor-Critic for Safety. arXiv preprint arXiv:2212.03228
Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ameet Rahane, Anantharaman S Iyer, Anders Andreassen, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D Manning, Christopher Potts, Cindy Ramirez, Clara E Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim. 2022. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615
KC Hsu, AZ Ren, DP Nguyen, A Majumdar, JF Fisac. 2022. Sim-to-Lab-to-Real: Safe RL with Shielding and Generalization Guarantees. ICLR 2022 Workshop on Generalizable Policy Learning in Physical World
Scott Emmons, Benjamin Eysenbach, Ilya Kostrikov, Sergey Levine. 2022. RvS: What is Essential for Offline RL via Supervised Learning?. ICLR2022
Zaynah Javed, Daniel S. Brown, Satvik Sharma, Jerry Zhu, Ashwin Balakrishna, Marek Petrik, Anca D. Dragan, Ken Goldberg. 2021. Policy Gradient Bayesian Robust Optimization for Imitation Learning. ICML 2021
Justin Svegliato, Connor Basich, Sandhya Saisubramanian and Shlomo Zilberstein. 2021. Using metareasoning to maintain and restore safety for reliable autonomy.
Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, Justin Gilmer. 2020. The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization.
Dan Hendrycks, Xiaoyuan Liu, Eric Wallace, Adam Dziedzic, Rishabh Krishnan, Dawn Song. 2020. Pretrained Transformers Improve Out-of-Distribution Robustness. Association for Computational Linguistics (ACL)
Dan Hendrycks, Norman Mu, Ekin D. Cubuk, Barret Zoph, Justin Gilmer, Balaji Lakshminarayanan. 2020. AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty. ICLR 2020
Paria Rashidinejad, Jiantao Jiao, Stuart Russell. 2020. SLIP: Learning to predict in unknown dynamical systems with long-term memory.
Dieqiao Feng, Carla P Gomes, Bart Selman. 2020. Solving hard AI planning instances using curriculum-driven deep reinforcement learning.
Adam Stooke, Joshua Achiam, Pieter Abbeel. 2020. Responsive Safety in Reinforcement Learning by PID Lagrangian Methods. ICML 2020
Jaime F. Fisac, Neil F. Lugovoy, Vicenç Rubies-Royo, Shromona Ghosh, Claire J. Tomlin. 2019. Bridging Hamilton-Jacobi Safety Analysis and Reinforcement Learning. IEEE 2019
Karthika Mohan, Judea Pearl. 2019. Graphical Models for Processing Missing Data. JASA
Kush Bhatia, Yi-An Ma, Anca D. Dragan, Peter L. Bartlett, Michael I. Jordan. 2019. Bayesian Robustness: A Nonasymptotic Viewpoint. (Preprint)
Margaret P. Chapman, Jonathan Lacotte, Aviv Tamar, Donggun Lee, Kevin M. Smith, Victoria Cheng, Jaime F. Fisac, Susmit Jha, Marco Pavone, Claire J. Tomlin. 2019. A Risk-Sensitive Finite-Time Reachability Approach for Safety of Stochastic Dynamic Systems. American Control Conference (ACC) 2019
Ruibo Tu, Cheng Zhang, Paul Ackermann, Karthika Mohan, Hedvig Kjellström, Kun Zhang. 2019. Causal Discovery in the Presence of Missing Data. AISTATS 2019
Dan Hendrycks, Steven Basart, Mantas Mazeika, Mohammadreza Mostajabi, Jacob Steinhardt, Dawn Song. 2019. Scaling Out-of-Distribution Detection for Real-World Settings.
Daniel Kang, Yi Sun, Dan Hendrycks, Tom Brown, Jacob Steinhardt. 2019. Testing robustness against unforeseen adversaries.
Dan Hendrycks, Mantas Mazeika, Saurav Kadavath, Dawn Song. 2019. Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty. NeurIPS 2019
Dan Hendrycks, Kimin Lee, Mantas Mazeika. 2019. Using Pre-Training Can Improve Model Robustness and Uncertainty. ICML 2019
Dan Hendrycks, Thomas Dietterich. 2019. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations. ICLR 2019
Dan Hendrycks, Mantas Mazeika, Thomas Dietterich. 2019. Deep Anomaly Detection with Outlier Exposure. ICLR 2019
Karthika Mohan. 2018. On Handling Self-masking and Other Hard Missing Data Problems. AAAI 2018
Karthika Mohan, Felix Thoemmes, Judea Pearl. 2018. Estimation with Incomplete Data: The Linear Case. IJCAI 2018
Si Liu, Risheek Garrepalli, Thomas G Dietterich, Alan Fern, Dan Hendrycks. 2018. Open Category Detection with PAC Guarantees. ICML 2018
Dan Hendrycks, Mantas Mazeika, Duncan Wilson, Kevin Gimpel. 2018. Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise. NeurIPS 2018

3.6. Security problems and solutions

A Critch. 2022. WordSig: QR streams enabling platform-independent self-identification that’s impossible to deepfake. arXiv preprint arXiv:2207.10806
A Critch. 2022. WordSig: QR streams enabling platform-independent self-identification that’s impossible to deepfake. arXiv preprint arXiv:2207.10806
Sushil Jajodia, George Cybenko, V. S. Subrahmanian, Vipin Swarup, Cliff Wang, Michael Wellman. 2020. Adaptive Autonomous Secure Cyber Systems. Springer/Nature Books
Ivan Geffner, Joseph Y. Halpern. 2019. Security in Asynchronous Interactive Systems. (Preprint)
Xinlei Pan, Weiyao Wang, Xiaoshuai Zhang, Bo Li, Jinfeng Yi, Dawn Song. 2019. How You Act Tells a Lot: Privacy-Leaking Attack on Deep Reinforcement Learning. AAMAS 2019
Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, Michael P. Wellman. 2018. SoK: Security and Privacy in Machine Learning. IEEE European Symposium on Security and Privacy

3.7. Transparency & interpretability

Alexander Matt Turner, Lisa Thiergart, David Udell, Gavin Leech, Ulisse Mini, Monte MacDiarmid. 2023. Activation Addition: Steering Language Models Without Optimization. arXiv:2308.10248
Ulisse Mini, Peli Grietzer, Mrinank Sharma, Austin Meek, Monte MacDiarmid, Alexander Matt Turner. 2023. Understanding and Controlling a Maze-Solving Policy Network. arXiv:2310.08043
Bilal Chughtai, Lawrence Chan, Neel Nanda. 2023. Neural Networks Learn Representation Theory: Reverse Engineering how Networks Perform Group Operations. ICLR 2023 Workshop on Physics for Machine Learning
Bilal Chughtai, Lawrence Chan, Neel Nanda. 2023. A toy model of universality: Reverse engineering how networks learn group operations. arXiv:2302.03025
Neel Nanda, Lawrence Chan, Tom Liberum, Jess Smith, Jacob Steinhardt. 2023. Progress measures for grokking via mechanistic interpretability. ICLR 2023
Jordan Boyd-Graber, Samuel Carton, Shi Feng, Q Vera Liao, Tania Lombrozo, Alison Smith-Renner, Chenhao Tan. 2022. Human-Centered Evaluation of Explanations. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorial Abstracts
SML Yamakoshi, T., Griffiths, T.L., Hawkins, R.D.. 2022. Probing BERT’s priors with serial reproduction chains. Findings of the Association for Computational Linguistics (ACL)
George Obaido, Blessing Ogbuokiri, Theo G Swart, Nimibofa Ayawei, Sydney Mambwe Kasongo, Kehinde Aruleba, Ibomoiye Domor Mienye, Idowu Aruleba, Williams Chukwu, Fadekemi Osaye, Oluwaseun F Egbelowo, Simelane Simphiwe, Ebenezer Esenogho. 2022. An interpretable machine learning approach for hepatitis b diagnosis. Applied Sciences 12 (21), 11127
T Räukur, A Ho, S Casper, D Hadfield-Menell. 2022. Toward transparent ai: A survey on interpreting the inner structures of deep neural networks. arXiv preprint arXiv:2207.13243
J Frost, O Watkins, E Weiner, P Abbeel, T Darrell, B Plummer, K Saenko . 2022. Explaining Reinforcement Learning Policies through Counterfactual Trajectories. arXiv preprint arXiv:2201.12462
Pulkit Verma, Shashank Rao Marpally, and Siddharth Srivastava.. 2022. Discovering User-Interpretable Capabilities of Black-Box Planning Agents.. the 19th International Conference on Principles of Knowledge Representation and Reasoning, 2022.
Naman Shah*, Pulkit Verma*, Trevor Angle, and Siddharth Srivastava.. 2022. JEDAI: A System for Skill-Aligned Explainable Robot Planning.. the Twenty-First International Conference on Autonomous Agents and MultiAgent Systems (Demonstration Track), 2022
Rashmeet Kaur Nayyar*, Pulkit Verma*, and Siddharth Srivastava.. 2022. Differential Assessment of Black-Box AI Agents.. the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022.
Rashmeet Kaur Nayyar, Pulkit Verma, and Siddharth Srivastava. 2022. Differential Assessment of Black-Box AI Agents. AAAI2022
Pulkit Verma, Shashank Rao Marpally, Siddharth Srivastava. 2022. Discovering User-Interpretable Capabilities of Black-Box Planning Agents. KR 2022
Daniel Filan, Stephen Casper, Shlomi Hod, Cody Wild, Andrew Critch, Stuart Russell. 2021. Clusterability in Neural Networks.
Pulkit Verma, Shashank Rao Marpally, Siddharth Srivastava. 2021. Asking the Right Questions: Learning Interpretable Action Models Through Query Answering.
Olivia Watkins, Sandy Huang, Julius Frost, Kush Bhatia, Eric Weiner, Pieter Abbeel, Trevor Darrell, Bryan Plummer, Kate Saenko, Anca Dragan. 2021. Explaining robot policies.
Jonathan Stray. 2021. Show me the algorithm: Transparency in recommendation systems.
Shlomi Hod, Stephen Casper, Daniel Filan, Cody Wild, Andrew Critch, Stuart Russell. 2021. Detecting Modularity in Deep Neural Networks.
Daniel Filan, Shlomi Hod, Cody Wild, Andrew Critch, Stuart Russell. 2019. Pruned Neural Networks are Surprisingly Modular. (Preprint, under review NeurIPS 2020)
Smitha Milli, Ludwig Schmidt, Anca D. Dragan, Moritz Hardt. 2019. Model Reconstruction from Model Explanations. FAT* 2019
Jacob Andreas, Anca Dragan, Dan Klein. 2017. Translating Neuralese. ACL 2017