When the Agent Decides What Agent to Build — and the Mothers It Couldn't See | netrii Blog

The meta-layer of the agentic stack is no longer theoretical. On shilpiworks, a creator agent fired off for the first time this week, watched the calendar, decided that Mother's Day was the relevant current event, wrote and merged its own code, and stood up a child agent that produces stickers honoring mothers. It worked. It also exposed a problem every team running creator agents will hit, and most will hit late.

The trap: a creator loop without an audit loop is a bias amplifier. Whatever the base model leans toward, the meta-layer will multiply at the speed of automation.

What the creator agent actually did

The mechanics are simple to describe and structurally important. The Creator Agent uses a Gemini reasoning model wired to a web-search tool. Its job is narrow: read what is happening in the world, decide whether a new sticker agent should exist for the current moment, and — if yes — write the code and the prompts for that new agent, debug them, merge them, and hand the keys over to the new agent so it can run on its own schedule.

The first thing it noticed was Mother's Day. The first thing it built was the Mothers Appreciation Sticker Agent. The first thing that agent did was start producing stickers, autonomously, with no further prompt from me.

This is the operating-intelligence thesis we have been arguing for at netrii — agents that do not just generate, but decide, write, deploy, and govern other agents. Reading about meta-agents in a research paper is one thing. Watching one ship a feature you did not write is another.

The question I had to ask when the stickers came back

The output looked good. Soft watercolor pieces, mothers in various poses of nurture and quiet strength, ready for the store. Then I looked again. Do these mothers look like the mothers who would actually buy them?

Some of them did. Many of them, frankly, did not. The set leaned toward a narrow visual archetype — the same archetype generative image models tend to default to when given vague prompts about parenthood. The mothers who use shilpiworks include grandmothers raising grandchildren, single fathers who are mothering, mothers of color, mothers older than the model's mental image of "mother," mothers who do not look like the cover of a 1990s greeting card.

The agent that decided to make stickers had no awareness of who was missing from the stickers. It could not. The decision what to make was crisp and well-reasoned. The decision who to depict inherited every default of the underlying model, with none of the friction a human designer would have provided.

Operating reality: when an agent decides what gets made, the model's defaults decide whose representation gets made and whose does not. The mother who is not in the set is the most important customer in the test.

Why this gets worse, not better, with more meta

The reflex when something like this happens is to fix the prompt. Add explicit diversity instructions to the creator agent. Tell the child agent to vary the depictions. Add a checklist. Those fixes help, marginally. They are not the system fix.

The system fix is to recognize that the meta-layer compounds whatever the base layer ships:

The base model has a default. Every model does. Defaults are not bugs; they are statistical reflections of the training data.
The creator agent inherits the default. When it writes prompts for the child agent, it writes prompts that look right to the model. Which means they encode the default.
The child agent runs at scale. A human designer would catch the narrow archetype on sticker three or four. The child agent ships sticker three hundred before anyone notices.
The audience sees the result, not the chain. They see a brand decision. They do not see that no human ever decided who to include.

This is not a story about Gemini or Mother's Day or stickers. It is the structural risk shape of every meta-agent system. Speed of creation is also speed of mistake. The thing that makes meta-agents valuable — that they decide for themselves — is the same thing that makes them dangerous if the decision-making is allowed to operate alone.

The pattern: creator loop, paired with an audit loop

The fix is architectural, not cosmetic. Pair every creator agent with an evaluator agent that has standing to block release. Not "review and warn" — block. The evaluator's mandate is not quality. Quality is the creator's job. The evaluator's mandate is representation: who is in the artifact and who is not.

Concretely, for the sticker case:

Audit cohort, not single output. Look at the whole batch the child agent produced this week. Check the distribution of who is depicted against the distribution of who buys.
Block release on a representation gap. If the gap exceeds a threshold, the creator must revise the child agent's prompts or its sampling strategy before any of the batch ships.
Log the audit. The decision trace — what the evaluator saw, what it decided, why — has to be readable later. Otherwise the audit loop is theater.
Treat the evaluator as a peer, not a filter. If the creator agent can route around the evaluator under time pressure, you do not have an audit loop; you have a suggestion box.

This is the same pattern as separation of duties in financial systems, or peer review in research. The creator and the evaluator share an incentive (the system has to ship something), and they have opposing pressures (one wants velocity, one wants integrity). That tension is the design, not a bug to be smoothed away.

The deeper lesson: the value of agentic systems compounds when you let them act. The risk of agentic systems compounds in exactly the same place. Both compounds need to be designed for, or only the risk side gets attention — and only after something visible has gone wrong.

What I am watching next

The interesting open question is whether evaluator agents can be trusted to evaluate themselves. A representation audit is itself a model decision. If the same family of models is doing the creating and the auditing, you may have replicated the bias on both sides of the loop. The honest answer is: probably yes, partially, for now.

The practical mitigation is to vary the evaluator. Use a different model family. Sample human reviewers on a rotating basis. Compare the audit-agent's calls to the human calls and recalibrate when they drift. None of this is exotic. All of it is work that does not get budgeted when teams are mostly excited about the creator loop.

The strategic implication is small and actionable: plan the audit loop before you plan the creator loop. The creator loop is the easy part. It is the part everyone wants to ship. The audit loop is the part that determines whether the creator loop ages well or quietly accumulates a bill.

The Mothers Appreciation Sticker Agent will keep running this week. It will get better — partly because the creator agent is learning, partly because I now know to look. The next meta-agent I deploy will ship with its evaluator on day one. That is the fix. It is also the rule.

When the Agent Decides What Agent to Build — and the Mothers It Couldn't See

What the creator agent actually did

The question I had to ask when the stickers came back

Why this gets worse, not better, with more meta

The pattern: creator loop, paired with an audit loop

What I am watching next

About the author

Arun Batchu

Loosely Coupled, Tightly Integrated: The Microservices Principle as the Org Shape AI Demands

The Walking Engineer: How Autonomous Agents Broke the Sit-and-Build Contradiction