File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1007/s11098-024-02099-6
- Scopus: eid_2-s2.0-85195287747
- Find via
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Article: Shutdown-seeking AI
Title | Shutdown-seeking AI |
---|---|
Authors | |
Keywords | AI safety Instrumental convergence Reward misspecification |
Issue Date | 6-Jun-2024 |
Publisher | Springer |
Citation | Philosophical Studies, 2024 How to Cite? |
Abstract | We propose developing AIs whose only final goal is being shut down. We argue that this approach to AI safety has three benefits: (i) it could potentially be implemented in reinforcement learning, (ii) it avoids some dangerous instrumental convergence dynamics, and (iii) it creates trip wires for monitoring dangerous capabilities. We also argue that the proposal can overcome a key challenge raised by Soares et al. (2015), that shutdown-seeking AIs will manipulate humans into shutting them down. We conclude by comparing our approach with Soares et al.'s corrigibility framework. |
Persistent Identifier | http://hdl.handle.net/10722/348744 |
ISSN | 2023 Impact Factor: 1.1 2023 SCImago Journal Rankings: 1.203 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Goldstein, Simon | - |
dc.contributor.author | Robinson, Pamela | - |
dc.date.accessioned | 2024-10-15T00:30:33Z | - |
dc.date.available | 2024-10-15T00:30:33Z | - |
dc.date.issued | 2024-06-06 | - |
dc.identifier.citation | Philosophical Studies, 2024 | - |
dc.identifier.issn | 0031-8116 | - |
dc.identifier.uri | http://hdl.handle.net/10722/348744 | - |
dc.description.abstract | We propose developing AIs whose only final goal is being shut down. We argue that this approach to AI safety has three benefits: (i) it could potentially be implemented in reinforcement learning, (ii) it avoids some dangerous instrumental convergence dynamics, and (iii) it creates trip wires for monitoring dangerous capabilities. We also argue that the proposal can overcome a key challenge raised by Soares et al. (2015), that shutdown-seeking AIs will manipulate humans into shutting them down. We conclude by comparing our approach with Soares et al.'s corrigibility framework. | - |
dc.language | eng | - |
dc.publisher | Springer | - |
dc.relation.ispartof | Philosophical Studies | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject | AI safety | - |
dc.subject | Instrumental convergence | - |
dc.subject | Reward misspecification | - |
dc.title | Shutdown-seeking AI | - |
dc.type | Article | - |
dc.identifier.doi | 10.1007/s11098-024-02099-6 | - |
dc.identifier.scopus | eid_2-s2.0-85195287747 | - |
dc.identifier.eissn | 1573-0883 | - |
dc.identifier.issnl | 0031-8116 | - |