File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Shutdown-seeking AI

TitleShutdown-seeking AI
Authors
KeywordsAI safety
Instrumental convergence
Reward misspecification
Issue Date6-Jun-2024
PublisherSpringer
Citation
Philosophical Studies, 2024 How to Cite?
AbstractWe propose developing AIs whose only final goal is being shut down. We argue that this approach to AI safety has three benefits: (i) it could potentially be implemented in reinforcement learning, (ii) it avoids some dangerous instrumental convergence dynamics, and (iii) it creates trip wires for monitoring dangerous capabilities. We also argue that the proposal can overcome a key challenge raised by Soares et al. (2015), that shutdown-seeking AIs will manipulate humans into shutting them down. We conclude by comparing our approach with Soares et al.'s corrigibility framework.
Persistent Identifierhttp://hdl.handle.net/10722/348744
ISSN
2023 Impact Factor: 1.1
2023 SCImago Journal Rankings: 1.203

 

DC FieldValueLanguage
dc.contributor.authorGoldstein, Simon-
dc.contributor.authorRobinson, Pamela-
dc.date.accessioned2024-10-15T00:30:33Z-
dc.date.available2024-10-15T00:30:33Z-
dc.date.issued2024-06-06-
dc.identifier.citationPhilosophical Studies, 2024-
dc.identifier.issn0031-8116-
dc.identifier.urihttp://hdl.handle.net/10722/348744-
dc.description.abstractWe propose developing AIs whose only final goal is being shut down. We argue that this approach to AI safety has three benefits: (i) it could potentially be implemented in reinforcement learning, (ii) it avoids some dangerous instrumental convergence dynamics, and (iii) it creates trip wires for monitoring dangerous capabilities. We also argue that the proposal can overcome a key challenge raised by Soares et al. (2015), that shutdown-seeking AIs will manipulate humans into shutting them down. We conclude by comparing our approach with Soares et al.'s corrigibility framework.-
dc.languageeng-
dc.publisherSpringer-
dc.relation.ispartofPhilosophical Studies-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subjectAI safety-
dc.subjectInstrumental convergence-
dc.subjectReward misspecification-
dc.titleShutdown-seeking AI-
dc.typeArticle-
dc.identifier.doi10.1007/s11098-024-02099-6-
dc.identifier.scopuseid_2-s2.0-85195287747-
dc.identifier.eissn1573-0883-
dc.identifier.issnl0031-8116-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats