How agents learn faster and more robustly by training on the right task at the right time — from hand-crafted curricula to adversarial environment generation
How mean field theory lets you solve game-theoretic problems with millions of agents by replacing individual interactions with a statistical summary of the crowd
How to specify complex, multi-step tasks for AI agents using finite-state automata called reward machines, enabling non-Markovian rewards and compositional task structure
Understand successor representations — the elegant middle ground between model-free and model-based RL that enables fast adaptation and transfer across tasks.
Learn how goal-conditioned RL and Hindsight Experience Replay allow agents to master hard tasks with sparse rewards by treating every failure as a lesson toward a different goal.
Move beyond expected returns — learn why modeling the full distribution of rewards unlocks risk-aware agents, better exploration, and state-of-the-art performance
Learn how agents can master complex tasks from pre-collected experience logs without ever touching a live environment, using conservative Q-learning, implicit Q-learning, and the Decision Transformer.
Explore multi-agent reinforcement learning: how multiple RL agents learn simultaneously, coordinate under uncertainty, and produce emergent strategies in cooperative, competitive, and mixed-motive settings
Understand how AI agents escape the curse of shortsightedness by learning reusable subgoals and temporally extended actions through the Options Framework
Learn how inverse reinforcement learning lets AI agents discover hidden reward functions by observing expert behavior, and why it matters for agent alignment and autonomous systems
Master the art and science of designing reward functions and solving the credit assignment problem—the key to training agents that learn efficiently and align with human intentions.
Discover how curiosity-driven learning enables AI agents to explore, learn, and adapt in sparse-reward environments through intrinsic motivation mechanisms.
Master the fundamental problem of sequential decision-making under uncertainty and learn how AI agents balance trying new actions versus exploiting known rewards
Master contextual bandits—the algorithm behind personalized recommendations, A/B testing, and adaptive agents. Learn how to balance exploration and exploitation in real-time decision-making.