Learn how goal-conditioned RL and Hindsight Experience Replay allow agents to master hard tasks with sparse rewards by treating every failure as a lesson toward a different goal.
Master reinforcement learning from human feedback — the algorithm behind ChatGPT and modern aligned agents — from reward modeling and PPO to Direct Preference Optimization.
A plugin-based platform architecture where each AI agent system is an independently subscribable capability. One interface, one orchestrator, unlimited agents — added without touching existing code.
Move beyond expected returns — learn why modeling the full distribution of rewards unlocks risk-aware agents, better exploration, and state-of-the-art performance
Learn how agents can master complex tasks from pre-collected experience logs without ever touching a live environment, using conservative Q-learning, implicit Q-learning, and the Decision Transformer.
Explore Karl Friston's Free Energy Principle: a unified theory where agents minimize surprise through belief updating and action, offering an alternative foundation to reward-based reinforcement learning
How AI agents generate, execute, and refine code as a reasoning medium, from classical program synthesis to modern REPL-based agent loops and SWE-bench architectures
How AI agents can learn continuously across tasks and environments without overwriting what they already know — the science and practice of lifelong machine learning