arxiv:2509.06786

R^textbf{2AI}: Towards Resistant and Resilient AI in an Evolving World

Published on Sep 8

· Submitted by

Youbang on Sep 9

Upvote

Authors:

Xiang Wang ,

Abstract

A new framework, R²AI, is proposed to enhance AI safety through coevolution, combining resistance to known threats with resilience to unforeseen risks using fast and slow safe models and adversarial simulation.

AI-generated summary

In this position paper, we address the persistent gap between rapidly growing AI capabilities and lagging safety progress. Existing paradigms divide into ``Make AI Safe'', which applies post-hoc alignment and guardrails but remains brittle and reactive, and ``Make Safe AI'', which emphasizes intrinsic safety but struggles to address unforeseen risks in open-ended environments. We therefore propose safe-by-coevolution as a new formulation of the ``Make Safe AI'' paradigm, inspired by biological immunity, in which safety becomes a dynamic, adversarial, and ongoing learning process. To operationalize this vision, we introduce R^2AI -- Resistant and Resilient AI -- as a practical framework that unites resistance against known threats with resilience to unforeseen risks. R^2AI integrates fast and slow safe models, adversarial simulation and verification through a safety wind tunnel, and continual feedback loops that guide safety and capability to coevolve. We argue that this framework offers a scalable and proactive path to maintain continual safety in dynamic environments, addressing both near-term vulnerabilities and long-term existential risks as AI advances toward AGI and ASI.

View arXiv page View PDF Add to collection

Community

Youbang

Paper submitter 1 day ago

In this position paper, we address the persistent gap between rapidly growing AI capabilities and
lagging safety progress. Existing paradigms divide into “Make AI Safe”, which applies post-hoc
alignment and guardrails but remains brittle and reactive, and “Make Safe AI”, which emphasizes
intrinsic safety but struggles to address unforeseen risks in open-ended environments. We there
fore propose safe-by-coevolution as a new formulation of the “Make Safe AI” paradigm, inspired
by biological immunity, in which safety becomes a dynamic, adversarial, and ongoing learning
process. To operationalize this vision, we introduce R2AI—Resistant and Resilient AI—as a practi
cal framework that unites resistance against known threats with resilience to unforeseen risks.
R2AI integrates fast and slow safe models, adversarial simulation and verification through a safety
wind tunnel, and continual feedback loops that guide safety and capability to coevolve. We argue
that this framework offers a scalable and proactive path to maintain continual safety in dynamic
environments, addressing both near-term vulnerabilities and long-term existential risks as AI
advances toward AGI and ASI.

librarian-bot

about 14 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.06786 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.06786 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.06786 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.