Engineering Notes
Thoughts and Ideas on AI by Muthukrishnan
Home
All posts
AI Agents
Engineering Manager
About
Resume
Tags & Stats
Tag: Reward-Modeling
01
Mar 2026
RLHF and Preference Learning Teaching Agents What Humans Actually Want
Master reinforcement learning from human feedback — the algorithm behind ChatGPT and modern aligned agents — from reward modeling and PPO to Direct Preference Optimization.