AI safety: a climb to Armageddon?

Cappelen, Herman; Dever, Josh; Hawthorne, John

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1007/s11098-025-02297-w
Scopus: eid_2-s2.0-86000291642
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Philosophy: Journal/Magazine Articles

Article: AI safety: a climb to Armageddon?

Title	AI safety: a climb to Armageddon?
Authors	Cappelen, Herman Dever, Josh Hawthorne, John
Keywords	AI safety Existential risk Holism Mitigation Optimism
Issue Date	1-Jul-2025
Publisher	Springer
Citation	Philosophical Studies, 2025, v. 182, p. 1933-1950 How to Cite? DOI: http://dx.doi.org/10.1007/s11098-025-02297-w
Abstract	This paper presents an argument that certain AI safety measures, rather thanmitigating existential risk, may instead exacerbate it. Under certain key assumptions -the inevitability of AI failure, the expected correlation between an AI system's power atthe point of failure and the severity of the resulting harm, and the tendency of safetymeasures to enable AI systems to become more powerful before failing - safety effortshave negative expected utility. The paper examines three response strategies:Optimism, Mitigation, and Holism. Each faces challenges stemming from intrinsicfeatures of the AI safety landscape that we term Bottlenecking, the Perfection Barrier,and Equilibrium Fluctuation. The surprising robustness of the argument forces a reexaminationof core assumptions around AI safety and points to several avenues forfurther research.
Persistent Identifier	http://hdl.handle.net/10722/366347
ISSN	0031-8116 2023 Impact Factor: 1.1 2023 SCImago Journal Rankings: 1.203

DC Field	Value	Language
dc.contributor.author	Cappelen, Herman	-
dc.contributor.author	Dever, Josh	-
dc.contributor.author	Hawthorne, John	-
dc.date.accessioned	2025-11-25T04:18:52Z	-
dc.date.available	2025-11-25T04:18:52Z	-
dc.date.issued	2025-07-01	-
dc.identifier.citation	Philosophical Studies, 2025, v. 182, p. 1933-1950	-
dc.identifier.issn	0031-8116	-
dc.identifier.uri	http://hdl.handle.net/10722/366347	-
dc.description.abstract	This paper presents an argument that certain AI safety measures, rather thanmitigating existential risk, may instead exacerbate it. Under certain key assumptions -the inevitability of AI failure, the expected correlation between an AI system's power atthe point of failure and the severity of the resulting harm, and the tendency of safetymeasures to enable AI systems to become more powerful before failing - safety effortshave negative expected utility. The paper examines three response strategies:Optimism, Mitigation, and Holism. Each faces challenges stemming from intrinsicfeatures of the AI safety landscape that we term Bottlenecking, the Perfection Barrier,and Equilibrium Fluctuation. The surprising robustness of the argument forces a reexaminationof core assumptions around AI safety and points to several avenues forfurther research.	-
dc.language	eng	-
dc.publisher	Springer	-
dc.relation.ispartof	Philosophical Studies	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject	AI safety	-
dc.subject	Existential risk	-
dc.subject	Holism	-
dc.subject	Mitigation	-
dc.subject	Optimism	-
dc.title	AI safety: a climb to Armageddon?	-
dc.type	Article	-
dc.identifier.doi	10.1007/s11098-025-02297-w	-
dc.identifier.scopus	eid_2-s2.0-86000291642	-
dc.identifier.volume	182	-
dc.identifier.spage	1933	-
dc.identifier.epage	1950	-
dc.identifier.eissn	1573-0883	-
dc.identifier.issnl	0031-8116	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: AI safety: a climb to Armageddon?

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats