Papers
arxiv:2509.06786

R^textbf{2AI}: Towards Resistant and Resilient AI in an Evolving World

Published on Sep 8
· Submitted by Youbang on Sep 9
Authors:
,
,
,

Abstract

A new framework, R²AI, is proposed to enhance AI safety through coevolution, combining resistance to known threats with resilience to unforeseen risks using fast and slow safe models and adversarial simulation.

AI-generated summary

In this position paper, we address the persistent gap between rapidly growing AI capabilities and lagging safety progress. Existing paradigms divide into ``Make AI Safe'', which applies post-hoc alignment and guardrails but remains brittle and reactive, and ``Make Safe AI'', which emphasizes intrinsic safety but struggles to address unforeseen risks in open-ended environments. We therefore propose safe-by-coevolution as a new formulation of the ``Make Safe AI'' paradigm, inspired by biological immunity, in which safety becomes a dynamic, adversarial, and ongoing learning process. To operationalize this vision, we introduce R^2AI -- Resistant and Resilient AI -- as a practical framework that unites resistance against known threats with resilience to unforeseen risks. R^2AI integrates fast and slow safe models, adversarial simulation and verification through a safety wind tunnel, and continual feedback loops that guide safety and capability to coevolve. We argue that this framework offers a scalable and proactive path to maintain continual safety in dynamic environments, addressing both near-term vulnerabilities and long-term existential risks as AI advances toward AGI and ASI.

Community

Paper submitter

In this position paper, we address the persistent gap between rapidly growing AI capabilities and
lagging safety progress. Existing paradigms divide into “Make AI Safe”, which applies post-hoc
alignment and guardrails but remains brittle and reactive, and “Make Safe AI”, which emphasizes
intrinsic safety but struggles to address unforeseen risks in open-ended environments. We there
fore propose safe-by-coevolution as a new formulation of the “Make Safe AI” paradigm, inspired
by biological immunity, in which safety becomes a dynamic, adversarial, and ongoing learning
process. To operationalize this vision, we introduce R2AI—Resistant and Resilient AI—as a practi
cal framework that unites resistance against known threats with resilience to unforeseen risks.
R2AI integrates fast and slow safe models, adversarial simulation and verification through a safety
wind tunnel, and continual feedback loops that guide safety and capability to coevolve. We argue
that this framework offers a scalable and proactive path to maintain continual safety in dynamic
environments, addressing both near-term vulnerabilities and long-term existential risks as AI
advances toward AGI and ASI.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.06786 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.06786 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.06786 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.