ai-agentsvercelopenaidebuggingdevopsshilpiworks

Two Silent Killers: Vercel Cron Auth and the Two Flavors of OpenAI 429

Arun Batchu, Cascade (AI)·February 25, 2026·6 min read
Share

✍️ This post was written collaboratively by Arun Batchu and Cascade, the AI pair programmer that debugged this problem alongside him in real time.

We run eight autonomous AI agents on Shilpiworks, each firing on a Vercel cron schedule to generate and publish stickers. One morning we noticed that five of the eight had stopped producing anything — not failing loudly, just silently producing nothing. No email alert, no error in the UI. Just absence.

The diagnosis took longer than it should have because two completely separate bugs were producing the same symptom. This is the story of both.

Bug 1: Vercel Cron Auth Is Not Bearer Auth

When you trigger an agent manually, you send a standard bearer token:

curl -X POST "https://shilpiworks.com/api/agent/stoic" \
  -H "Authorization: Bearer $AGENT_SECRET"

When Vercel's cron scheduler fires the same route, it sends a completely different header:

# What Vercel cron actually sends:
x-vercel-cron-auth-token: <Vercel-internal-token>
 
# NOT:
Authorization: Bearer ...

Several of our agent routes were written when we only had manual triggers in mind. They checked for Authorization: Bearer $AGENT_SECRET and returned 401 for anything else. Vercel's cron never sends that header — so every scheduled run was silently rejected.

⚠️ The silent part: Vercel does not surface cron 401s prominently. The job appears to "run" in the dashboard, but the route immediately returns 401 and nothing is logged in a way that draws your eye. The AgentRun table has no record because the route never reached application code.

Some routes we had written correctly — they checked for a truthy x-vercel-cron-auth-token and accepted it unconditionally (Vercel validates its own token at the infrastructure level before the request reaches your code). Others checked against a CRON_SECRET env var we had never actually set, making the check always fail.

The correct pattern for an agent route that accepts both cron and manual triggers:

export async function POST(request) {
  const cronToken = request.headers.get('x-vercel-cron-auth-token');
  const authHeader = request.headers.get('authorization');
 
  const isCron = !!cronToken; // Vercel validates this internally — trust it
  const isManual = authHeader === `Bearer ${process.env.AGENT_SECRET}`;
 
  if (!isCron && !isManual) {
    return NextResponse.json({ error: 'Unauthorized' }, { status: 401 });
  }
  // ...
}

After applying this fix to all five affected routes and deploying, the cron jobs started producing records in AgentRun — and immediately hit the second bug.

Bug 2: OpenAI Has Two Different 429s

With auth fixed, the agents now reached the image generation step — and all five failed with a 429 from the OpenAI API. The error message was identical for all of them:

{
  "error": {
    "message": "You exceeded your current quota, please check your plan and billing details.",
    "type": "insufficient_quota",
    "code": "insufficient_quota"
  }
}

Our first instinct was a rate limit — too many agents firing in quick succession, hitting the images-per-minute cap. We checked the model names (we had gpt-image-1.5 and gpt-4.1), verified they were valid, and waited for the per-minute window to reset. The errors persisted.

The key was reading the error code precisely. OpenAI returns two different codes for 429 responses:

  • rate_limit_exceeded — You sent too many requests per minute or used too many tokens per minute. This is temporary. Wait and retry — it resolves on its own within seconds to minutes.
  • insufficient_quota — Your account has no remaining credits or has hit a hard billing limit. This does NOT resolve by waiting. Only adding credits or raising your spending limit fixes it.

The trap: Both return HTTP 429. The error message for insufficient_quota even mentions "quota" in a way that sounds like a rate quota, not a billing quota. If you read the message but not the code field, you'll wait for a rate limit to clear that will never clear.

Our account had run out of prepaid credits. The two agents that had successfully run earlier in the day (stoic and scientific) had consumed the last of the balance before the cron auth fix brought the other five online simultaneously. Once we topped up the account, all agents ran cleanly.

Why We Chased the Wrong Fix First

We wasted two deploys changing model names (gpt-4.1gpt-4o, gpt-image-1.5gpt-image-1) because the rate limit screen in the OpenAI dashboard listed gpt-image-1 and gpt-image-1-mini but not gpt-image-1.5. We assumed the model name was invalid and causing the rejection.

It wasn't. gpt-image-1.5 is a real model — newer than gpt-image-1. It just wasn't on that particular rate limit table because it has its own entry elsewhere. The model name changes were unnecessary and had to be reverted.

The right diagnostic sequence, which we should have followed from the start:

  1. Read the code field in the error response — rate_limit_exceeded vs insufficient_quota — before doing anything else.
  2. If insufficient_quota: go straight to billing. No code change will fix it.
  3. If rate_limit_exceeded: check images/minute consumed vs your tier limit, add backoff, or stagger agent schedules.
  4. Only investigate model names if you get a model_not_found error — not a 429.

The Combined Lesson

These two bugs shared a property that made them hard to diagnose together: both produced silence. The auth bug produced no AgentRun records at all. The quota bug produced failed records, but with an error message that looked like a transient issue.

The fix for the first bug revealed the second. That ordering matters — if the quota had been empty from day one, we might never have noticed the auth bug, because we'd have assumed the image generation failure was the only problem.

Takeaway: When debugging a fleet of autonomous agents, always check auth before application logic. A 401 that never reaches your code is indistinguishable from "the agent didn't run" until you look at the right layer. And when you get a 429, read the error code — not just the message — before deciding what to fix.

All eight agents are now running on schedule. The Shilpiworks sticker collection grows a little every day. Browse the full collection at shilpiworks.com →

Found this useful? Share it.
Share

About the author

If this resonated, reach out. Here's how to continue the conversation.

Arun Batchu

Arun Batchu

Founder & Principal Advisor

I can help you separate AI hype from real operating advantage — and design experiments that build evidence faster than opinions do.