Learn how cooperative game theory and Shapley values provide a mathematically principled way to assign credit among collaborating agents, with practical Python implementations and connections to modern LLM agent teams.
How to specify complex, multi-step tasks for AI agents using finite-state automata called reward machines, enabling non-Markovian rewards and compositional task structure
Understand successor representations — the elegant middle ground between model-free and model-based RL that enables fast adaptation and transfer across tasks.
Learn how goal-conditioned RL and Hindsight Experience Replay allow agents to master hard tasks with sparse rewards by treating every failure as a lesson toward a different goal.
Master reinforcement learning from human feedback — the algorithm behind ChatGPT and modern aligned agents — from reward modeling and PPO to Direct Preference Optimization.
A plugin-based platform architecture where each AI agent system is an independently subscribable capability. One interface, one orchestrator, unlimited agents — added without touching existing code.
Move beyond expected returns — learn why modeling the full distribution of rewards unlocks risk-aware agents, better exploration, and state-of-the-art performance