Tag: AI Safety

20 Mar 2026
Corrigibility and Interruptibility Building Agents That Accept Human Override
How to design AI agents that accept correction, allow safe interruption, and remain under human control even as their capabilities grow
05 Oct 2025
Ensuring Agent Safety with Constitutional AI Guardrails
## Concept Introduction **Constitutional AI (CAI)** is a method for training an AI to supervise itself. Rather than re...
05 Oct 2025
Self-Improvement and Autoformalization in AI Agents That Build Themselves
## Concept Introduction We have explored how agents can reason, plan, remember, act, and collaborate. The final, ultim...

Engineering Notes