Engineering Notes

Engineering Notes

Thoughts and Ideas on AI by Muthukrishnan
07 Mar 2026

Cooperative Game Theory and Shapley Values for Fair Credit Assignment in Multi-Agent Systems

Learn how cooperative game theory and Shapley values provide a mathematically principled way to assign credit among collaborating agents, with practical Python implementations and connections to modern LLM agent teams.
06 Mar 2026

Reward Machines and Automata-Based Task Specification for AI Agents

How to specify complex, multi-step tasks for AI agents using finite-state automata called reward machines, enabling non-Markovian rewards and compositional task structure
05 Mar 2026

Safe Reinforcement Learning Teaches Agents to Optimize Without Violating Constraints

How constrained MDPs, Lagrangian methods, and safety critics enable agents to maximize reward while staying within hard operational boundaries
04 Mar 2026

Model-Based Reinforcement Learning How Agents Simulate Experience to Learn Faster

Explore how agents that build internal environment models can plan, simulate, and learn orders of magnitude faster than model-free approaches
03 Mar 2026

Successor Representations a Map of Where You Are Going

Understand successor representations — the elegant middle ground between model-free and model-based RL that enables fast adaptation and transfer across tasks.
02 Mar 2026

Goal-Conditioned Reinforcement Learning and Hindsight Experience Replay Turn Failures Into Training Opportunities

Learn how goal-conditioned RL and Hindsight Experience Replay allow agents to master hard tasks with sparse rewards by treating every failure as a lesson toward a different goal.
02 Mar 2026

Spec-Driven Development with spec-kit: Stop Vibe Coding, Start Specifying

A complete tutorial on using GitHub's spec-kit to bring structure to AI-assisted development — from install to your first specification.
01 Mar 2026

RLHF and Preference Learning Teaching Agents What Humans Actually Want

Master reinforcement learning from human feedback — the algorithm behind ChatGPT and modern aligned agents — from reward modeling and PPO to Direct Preference Optimization.
28 Feb 2026

Conceptual AI Agents Universe — System Design Document

A plugin-based platform architecture where each AI agent system is an independently subscribable capability. One interface, one orchestrator, unlimited agents — added without touching existing code.
28 Feb 2026

Distributional Reinforcement Learning and Learning the Full Return Distribution

Move beyond expected returns — learn why modeling the full distribution of rewards unlocks risk-aware agents, better exploration, and state-of-the-art performance