Two Silent Killers: Vercel Cron Auth and the Two Flavors of OpenAI 429
✍️ This post was written collaboratively by Arun Batchu and Cascade, the AI pair programmer that debugged this problem alongside him in real time.
We run eight autonomous AI agents on Shilpiworks, each firing on a Vercel cron schedule to generate and publish stickers. One morning we noticed that five of the eight had stopped producing anything — not failing loudly, just silently producing nothing. No email alert, no error in the UI. Just absence.
The diagnosis took longer than it should have because two completely separate bugs were producing the same symptom. This is the story of both.
Bug 1: Vercel Cron Auth Is Not Bearer Auth
When you trigger an agent manually, you send a standard bearer token:
curl -X POST "https://shilpiworks.com/api/agent/stoic" \
-H "Authorization: Bearer $AGENT_SECRET"When Vercel's cron scheduler fires the same route, it sends a completely different header:
# What Vercel cron actually sends:
x-vercel-cron-auth-token: <Vercel-internal-token>
# NOT:
Authorization: Bearer ...Several of our agent routes were written when we only had manual triggers in mind. They checked for `Authorization: Bearer $AGENT_SECRET` and returned 401 for anything else. Vercel's cron never sends that header — so every scheduled run was silently rejected.
⚠️ The silent part: Vercel does not surface cron 401s prominently. The job appears to "run" in the dashboard, but the route immediately returns 401 and nothing is logged in a way that draws your eye. The `AgentRun` table has no record because the route never reached application code.
Some routes we had written correctly — they checked for a truthy `x-vercel-cron-auth-token` and accepted it unconditionally (Vercel validates its own token at the infrastructure level before the request reaches your code). Others checked against a `CRON_SECRET` env var we had never actually set, making the check always fail.
The correct pattern for an agent route that accepts both cron and manual triggers:
export async function POST(request) {
const cronToken = request.headers.get('x-vercel-cron-auth-token');
const authHeader = request.headers.get('authorization');
const isCron = !!cronToken; // Vercel validates this internally — trust it
const isManual = authHeader === `Bearer ${process.env.AGENT_SECRET}`;
if (!isCron && !isManual) {
return NextResponse.json({ error: 'Unauthorized' }, { status: 401 });
}
// ...
}After applying this fix to all five affected routes and deploying, the cron jobs started producing records in `AgentRun` — and immediately hit the second bug.
Bug 2: OpenAI Has Two Different 429s
With auth fixed, the agents now reached the image generation step — and all five failed with a 429 from the OpenAI API. The error message was identical for all of them:
{
"error": {
"message": "You exceeded your current quota, please check your plan and billing details.",
"type": "insufficient_quota",
"code": "insufficient_quota"
}
}Our first instinct was a rate limit — too many agents firing in quick succession, hitting the images-per-minute cap. We checked the model names (we had `gpt-image-1.5` and `gpt-4.1`), verified they were valid, and waited for the per-minute window to reset. The errors persisted.
The key was reading the error code precisely. OpenAI returns two different codes for 429 responses:
- `rate_limit_exceeded` — You sent too many requests per minute or used too many tokens per minute. This is temporary. Wait and retry — it resolves on its own within seconds to minutes.
- `insufficient_quota` — Your account has no remaining credits or has hit a hard billing limit. This does NOT resolve by waiting. Only adding credits or raising your spending limit fixes it.
The trap: Both return HTTP 429. The error message for `insufficient_quota` even mentions "quota" in a way that sounds like a rate quota, not a billing quota. If you read the message but not the `code` field, you'll wait for a rate limit to clear that will never clear.
Our account had run out of prepaid credits. The two agents that had successfully run earlier in the day (`stoic` and `scientific`) had consumed the last of the balance before the cron auth fix brought the other five online simultaneously. Once we topped up the account, all agents ran cleanly.
Why We Chased the Wrong Fix First
We wasted two deploys changing model names (`gpt-4.1` → `gpt-4o`, `gpt-image-1.5` → `gpt-image-1`) because the rate limit screen in the OpenAI dashboard listed `gpt-image-1` and `gpt-image-1-mini` but not `gpt-image-1.5`. We assumed the model name was invalid and causing the rejection.
It wasn't. `gpt-image-1.5` is a real model — newer than `gpt-image-1`. It just wasn't on that particular rate limit table because it has its own entry elsewhere. The model name changes were unnecessary and had to be reverted.
The right diagnostic sequence, which we should have followed from the start:
- 1Read the `code` field in the error response — `rate_limit_exceeded` vs `insufficient_quota` — before doing anything else.
- 2If `insufficient_quota`: go straight to billing. No code change will fix it.
- 3If `rate_limit_exceeded`: check images/minute consumed vs your tier limit, add backoff, or stagger agent schedules.
- 4Only investigate model names if you get a `model_not_found` error — not a 429.
The Combined Lesson
These two bugs shared a property that made them hard to diagnose together: both produced silence. The auth bug produced no `AgentRun` records at all. The quota bug produced failed records, but with an error message that looked like a transient issue.
The fix for the first bug revealed the second. That ordering matters — if the quota had been empty from day one, we might never have noticed the auth bug, because we'd have assumed the image generation failure was the only problem.
Takeaway: When debugging a fleet of autonomous agents, always check auth before application logic. A 401 that never reaches your code is indistinguishable from "the agent didn't run" until you look at the right layer. And when you get a 429, read the error `code` — not just the message — before deciding what to fix.
All eight agents are now running on schedule. The Shilpiworks sticker collection grows a little every day. Browse the full collection at shilpiworks.com →
Building with AI?
netrii helps ambitious SMBs navigate AI and emerging technology — strategy, experiments, and hands-on practice.
Schedule a Conversation