Small Language Models are the Future of Agentic AI

08 Jul 2025

I recently came across a paper titled “Small Language Models are the Future of Agentic AI,” and it got me thinking. The message is simple but powerful: bigger isn’t always better.

In the current AI landscape, we often assume that more power equals more performance. But this paper challenges that assumption. Instead, it offers a smarter and more strategic view of how AI can scale without scaling costs.

Let’s break it down.

The Expert vs. The Intern

Imagine you bring in a world-class expert to run your operations. This person, representing a large language model (LLM), is capable of doing nearly anything like writing code, summarizing documents or drafting strategies. Now imagine you ask them to handle every little task, like sorting emails, generating reports, or formatting presentations.

It’s overkill.

Now picture a different setup. You still have your expert, but you also train a team of capable interns, or Small Language Models (SLMs), to handle the routine work. They’re not as brilliant as the expert, but they’re fast, focused, and cost-efficient. Over time, they learn their tasks really well. The expert is only brought in when absolutely necessary.

That’s the core idea: build a team, not a titan.

Why Smaller Models Make More Sense

The paper offers a compelling case for why these small models are often good enough, and in many situations, better.

Here’s why:

Efficiency: SLMs use fewer resources. They’re faster, cheaper, and can even run locally, reducing latency and eliminating the need for the cloud in some cases.
Simplicity: Many AI tasks are repetitive and modular, such as parsing text, generating responses, or summarizing content. These are ideal for small, specialized models.
Smart Architecture: Instead of one giant model doing everything, you build a hybrid system. SLMs handle routine tasks while LLMs focus on creative or strategic challenges.
Continuous Learning: Every time your system runs, it generates data. That data becomes training material to keep improving your SLMs. Your interns get better over time.

It’s not about size. It’s about fit.

Building a Smarter AI System (Step-by-Step)

The authors propose a five-step process to shift from a single LLM to a modular, efficient hybrid.

Start With the Expert Use a general-purpose LLM to handle all tasks initially. Let it operate in real-world conditions.
Collect the Data Track everything, the tools it uses, the prompts it responds to, the tasks it repeats most often.
Find the Patterns Analyze the logs to identify high-frequency tasks. These are ideal candidates for automation.
Train the Interns Take those frequent tasks and fine-tune SLMs like Phi-2 or TinyLlama to specialize in them.
Replace and Refine Swap out the LLM for SLMs where appropriate. Monitor how they perform and continue improving them with new data.

Over time, the system becomes faster, cheaper, and more effective without losing capability where it matters most.

A Shift in Strategy

This isn’t just a technical change, it’s a shift in how we think about AI systems.

We often equate progress with building bigger and more powerful tools. But in many cases, the better approach is to build a system, a network of smaller, focused agents working together with clear roles and purpose.

It’s the same principle behind high-performance teams: specialization, delegation, and continuous improvement.

The future of AI may not depend on one giant brain. It may be built on smarter teams of smaller ones.

Read the paper here: “Small Language Models are the Future of Agentic AI”

Engineering Notes

Small Language Models are the Future of Agentic AI

The Expert vs. The Intern

Why Smaller Models Make More Sense

Building a Smarter AI System (Step-by-Step)

A Shift in Strategy