Why agents love grep and hate vector search

May 11, 2026Sacha Ichbiah

TL;DR: what makes a good toolbox for agents? You shouldn't own the intelligence. A good toolbox should be boring and predictable — a set of primitives the agent composes — not a brain itself.

You won't own the UI, and you won't own the intelligence either

Codex and Claude Code are the operating systems of tomorrow, and the future primary users of all our products. In the near future, agents will interact with your product using APIs, MCP servers and CLIs, and will report to humans in a chat interface.

So what does designing for agents mean?

Two things:

  • Operate autonomously: Your toolbox should be feature complete. An agent should be able to do everything a user can do (minus a few exceptions).
  • Let its intelligence shine: Your toolbox should be easy to understand, have a low learning curve, and not own intelligence in the layer the agent reasons through.

Until you understand that you won't own the UI, you will always be frustrated by the agent. And you will try to create powerful tools using LLMs to get exactly the result you want. To own intelligence with subagents that you control.

The temptation is real.

But that's where things get circular: a subagent is an agent. Why are you building another agent? You already have one. You should focus on making your agent perform well instead of building another one inside your tool.

Legibility

The intelligence comes from the agent. If your tools are themselves "smart" in the layer the agent reasons through, they lose predictability and hamper the agent. Don't try to do clever things — the agent will. Don't try to build shortcuts — the agent doesn't need them. The role of your toolbox is the exact opposite. It should be designed to be boring, predictable. Legible.

The surface the agent calls should be deterministic and inspectable. The agent should be able to tell what its call did, and the next call should be a refinement, not a guess.

You should own rigid workflows that are always the same. Functions. And factorize them into a set of well-crafted and powerful elementary building blocks that let the intelligence of the agent exert itself and do the magic by composing them with each other. Primitives.

Let's study a simple example: grep vs. vector search.

Grep vs. vector search

People believed Claude Code used grep instead of vector search because of implementation constraints. You need to embed everything and put it into an index before you can query it.

But there's more to it.

Claude Code uses grep because it is genuinely better for what it wants. The real difference is legibility.

With grep, the agent knows exactly why a result matched: the regex hit, or it didn't. With embeddings, a match is a distance between two numbers coming out of a neural network. You can't reverse-engineer it. You can't sharpen it. You can only re-query and hope.

That gap matters because the agent is reasoning iteratively. Imagine debugging a production issue:

"Users are occasionally getting logged out after refreshing the dashboard."

The agent starts somewhere:

grep -R "session expired" .

It finds nothing useful. Maybe the product does not literally say "session expired". Maybe the log says "auth invalid". Maybe the user is describing the symptom, not the string.

But the miss is legible. The agent can see that the words did not exist. So it reformulates:

grep -R "auth invalid" .

Still too broad. It tries the object that probably owns the state:

grep -R "expiresAt" .

Now it finds a validateSession() function. Greps for that. Finds the expiration check. Eventually it lands on:

session.expiresAt = token.ttl

instead of:

session.expiresAt = Date.now() + token.ttl

Grep did not find the bug immediately. But every miss was interpretable. The agent knew what happened, so it knew what to try next.

With vector search, a query like "code related to session expiration bugs" might return auth middleware, the login UI, timeout logic, JWT utilities, even analytics code. Some of it is relevant. Some of it isn't. The agent can't tell from the results why anything came back, so the next query is a guess instead of a refinement.

This isn't an argument against embeddings everywhere. The strongest setups often combine the two — embeddings to find the rough neighborhood, grep to navigate inside it. The point is narrower: the layer the agent navigates through should be legible.

Primitives

A primitive is an elementary building block you compose to reach higher-level goals. They are loosely defined — there's no list telling you what the primitives for a domain should be. What are the right primitives for an Excel agent? For a coding agent? For a legal research agent? You have to find out.

Building the right toolbox is an information compression exercise. You give as much capability as possible in as few core abilities as possible. A frontend agent and a debugging agent share most of their core, but some niche capability of one is a core feature of the other.

The rules:

  • Limit the number of tools as much as possible.
  • There should be one way and only one way to do each thing.
  • Clearly identify the boundary of your toolbox. The agent has other tools available, and yours should not conflict with them.
  • Don't reinvent the wheel. Don't spend years rediscovering from first principles solutions that have been solved for decades.
  • Learn from others. OpenAI, Anthropic, Stripe, Shopify, Ramp.
  • Your agent should achieve the common use cases in few calls, and the long-tail tasks in more calls. Not the other way around.

These rules conflict. "Feature complete" pulls against "few tools". You can't fully resolve it — but here is how I think about it.

Take an Excel agent. Feature-completeness says: expose every function, every formatting option, every chart type. That's hundreds of tools. The agent drowns.

The compression move is to find the one operation that 80% of the others are special cases of. apply_formula(range, formula) collapses most of the math tools. set_format(range, format) collapses most of the formatting tools. read_range(range) collapses most of the inspection tools. Three primitives, and the agent's own intelligence fills in the rest by composing them.

You keep "few tools" by trusting the agent to compose. You keep "feature complete" by making sure the composition actually reaches every corner. The places where it doesn't are where you add a focused tool — not before.

This is ultimately about taste. And there is only a single way to develop taste: practice.

Which means the most important advice for building an agent toolbox is the same as always:

  • Use your own product. Or rather: make your agent use your product.

Taste

A good UI has some kind of visual beauty.

A good toolbox does too. You make the tradeoffs, use the thing, watch where the agent struggles, and adjust. Eventually the set starts to feel right. Nothing more is needed, nothing can be removed. You've reached an optimum.

That's the goal. Build that, and the agents will use your product, love it, and may even chat about it.

Exactly as humans used to.