Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection Paper • 2508.20766 • Published 9 days ago • 14
An Embarrassingly Simple Defense Against LLM Abliteration Attacks Paper • 2505.19056 • Published May 25 • 6