Shutdown-seeking AI

There are no files associated with this item.

Title	Shutdown-seeking AI
Authors	Goldstein, Simon Robinson, Pamela
Keywords	AI safety Instrumental convergence Reward misspecification
Issue Date	6-Jun-2024
Publisher	Springer
Citation	Philosophical Studies, 2024 How to Cite? DOI: http://dx.doi.org/10.1007/s11098-024-02099-6
Abstract	We propose developing AIs whose only final goal is being shut down. We argue that this approach to AI safety has three benefits: (i) it could potentially be implemented in reinforcement learning, (ii) it avoids some dangerous instrumental convergence dynamics, and (iii) it creates trip wires for monitoring dangerous capabilities. We also argue that the proposal can overcome a key challenge raised by Soares et al. (2015), that shutdown-seeking AIs will manipulate humans into shutting them down. We conclude by comparing our approach with Soares et al.'s corrigibility framework.
Persistent Identifier	http://hdl.handle.net/10722/348744
ISSN	0031-8116 2023 Impact Factor: 1.1 2023 SCImago Journal Rankings: 1.203

DC Field	Value	Language
dc.contributor.author	Goldstein, Simon	-
dc.contributor.author	Robinson, Pamela	-
dc.date.accessioned	2024-10-15T00:30:33Z	-
dc.date.available	2024-10-15T00:30:33Z	-
dc.date.issued	2024-06-06	-
dc.identifier.citation	Philosophical Studies, 2024	-
dc.identifier.issn	0031-8116	-
dc.identifier.uri	http://hdl.handle.net/10722/348744	-
dc.description.abstract	We propose developing AIs whose only final goal is being shut down. We argue that this approach to AI safety has three benefits: (i) it could potentially be implemented in reinforcement learning, (ii) it avoids some dangerous instrumental convergence dynamics, and (iii) it creates trip wires for monitoring dangerous capabilities. We also argue that the proposal can overcome a key challenge raised by Soares et al. (2015), that shutdown-seeking AIs will manipulate humans into shutting them down. We conclude by comparing our approach with Soares et al.'s corrigibility framework.	-
dc.language	eng	-
dc.publisher	Springer	-
dc.relation.ispartof	Philosophical Studies	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject	AI safety	-
dc.subject	Instrumental convergence	-
dc.subject	Reward misspecification	-
dc.title	Shutdown-seeking AI	-
dc.type	Article	-
dc.identifier.doi	10.1007/s11098-024-02099-6	-
dc.identifier.scopus	eid_2-s2.0-85195287747	-
dc.identifier.eissn	1573-0883	-
dc.identifier.issnl	0031-8116	-