R^textbf{2AI}: Towards Resistant and Resilient AI in an Evolving World
Abstract
A new framework, R²AI, is proposed to enhance AI safety through coevolution, combining resistance to known threats with resilience to unforeseen risks using fast and slow safe models and adversarial simulation.
In this position paper, we address the persistent gap between rapidly growing AI capabilities and lagging safety progress. Existing paradigms divide into ``Make AI Safe'', which applies post-hoc alignment and guardrails but remains brittle and reactive, and ``Make Safe AI'', which emphasizes intrinsic safety but struggles to address unforeseen risks in open-ended environments. We therefore propose safe-by-coevolution as a new formulation of the ``Make Safe AI'' paradigm, inspired by biological immunity, in which safety becomes a dynamic, adversarial, and ongoing learning process. To operationalize this vision, we introduce R^2AI -- Resistant and Resilient AI -- as a practical framework that unites resistance against known threats with resilience to unforeseen risks. R^2AI integrates fast and slow safe models, adversarial simulation and verification through a safety wind tunnel, and continual feedback loops that guide safety and capability to coevolve. We argue that this framework offers a scalable and proactive path to maintain continual safety in dynamic environments, addressing both near-term vulnerabilities and long-term existential risks as AI advances toward AGI and ASI.
Community
In this position paper, we address the persistent gap between rapidly growing AI capabilities and
lagging safety progress. Existing paradigms divide into “Make AI Safe”, which applies post-hoc
alignment and guardrails but remains brittle and reactive, and “Make Safe AI”, which emphasizes
intrinsic safety but struggles to address unforeseen risks in open-ended environments. We there
fore propose safe-by-coevolution as a new formulation of the “Make Safe AI” paradigm, inspired
by biological immunity, in which safety becomes a dynamic, adversarial, and ongoing learning
process. To operationalize this vision, we introduce R2AI—Resistant and Resilient AI—as a practi
cal framework that unites resistance against known threats with resilience to unforeseen risks.
R2AI integrates fast and slow safe models, adversarial simulation and verification through a safety
wind tunnel, and continual feedback loops that guide safety and capability to coevolve. We argue
that this framework offers a scalable and proactive path to maintain continual safety in dynamic
environments, addressing both near-term vulnerabilities and long-term existential risks as AI
advances toward AGI and ASI.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law (2025)
- DINA: A Dual Defense Framework Against Internal Noise and External Attacks in Natural Language Processing (2025)
- Oyster-I: Beyond Refusal -- Constructive Safety Alignment for Responsible Language Models (2025)
- CAMF: Collaborative Adversarial Multi-agent Framework for Machine Generated Text Detection (2025)
- Activation Steering Meets Preference Optimization: Defense Against Jailbreaks in Vision Language Models (2025)
- Anticipate, Simulate, Reason (ASR): A Comprehensive Generative AI Framework for Combating Messaging Scams (2025)
- Forewarned is Forearmed: Pre-Synthesizing Jailbreak-like Instructions to Enhance LLM Safety Guardrail to Potential Attacks (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper