mastraai-agentstypescriptdebugging

Debugging Mastra: Why Our AI Workflow Silently Ate Errors

Arun Batchu & Cascade (AI)·February 18, 2026·8 min read

✍️ This post was written collaboratively by Arun Batchu and Cascade, the AI pair programmer that debugged this problem alongside him in real time. The system, the decisions, and the debugging were a joint effort.

Shilpiworks is a small e-commerce shop that sells handmade-style stickers — except they're generated by AI agents running 24/7 on Vercel. We built six specialized agents: one for Stoic philosophy quotes, one for women's wisdom, one for Indigenous proverbs, one for Marshall Goldsmith leadership quotes, one for Scientific Wonder, and a general-purpose agent. Each agent is a Mastra workflow.

Everything worked in development. In production, the agents would run, generate images, and then… nothing. No product published. No error. Just silence.

What We Built

Each agent follows the same Mastra workflow pipeline: pick a quote → build an image generation prompt → generate the image (with OCR validation and smart retry) → publish to Vercel Blob + Postgres. The workflow is typed end-to-end using Zod schemas, and steps are chained with `.then()` and `.map()` for data threading.

typescript
export const scientificWorkflow = createWorkflow({
  id: 'scientific-sticker',
  inputSchema: z.object({ recentQuotes: z.array(z.string()).default([]) }),
  outputSchema: z.object({ productId: z.number(), imageUrl: z.string() }),
})
  .then(scientificQuoteStep)
  .map(async ({ inputData }) => ({ ...buildStyleDirectives(inputData) }))
  .then(buildPromptStep)
  .map(async ({ inputData }) => ({
    designPrompt: inputData.designPrompt,
    context: inputData.context,
  }))
  .then(generateImageStep)
  .map(async ({ inputData, getStepResult }) => ({
    base64: inputData.base64,
    analysis: inputData.analysis,
    author: getStepResult('scientific-quote')?.author,
    // ...
  }))
  .then(publishStep)
  .commit()

The agent wrapper calls `run.start()`, then reads `result.steps['publish'].output` to get the `productId`. Clean, typed, composable.

The Symptom

After deploying, we started seeing this error in our `AgentRun` database table:

text
Publish step did not return a productId.
Steps completed: input, scientific-quote, mapping_b79c34e3, build-prompt, mapping_402de7ca, generate-image

The `generate-image` step was in the completed steps list — but the `.map()` after it and `publishStep` were nowhere to be found. The workflow was stopping silently after image generation, every single time.

Checking the database confirmed this was systemic — not just the scientific agent. Women, Stoic, Goldsmith, Indigenous agents all showed the same pattern: `status: "success"` but `productId: null`. The agents were "succeeding" without publishing anything.

The Debugging Expedition

Our first hypothesis: Mastra doesn't store step output for steps followed by a `.map()`. We tried reading from the mapping step instead:

typescript
// Attempt 1: read from generate-image step directly
const imageResult = result.steps?.['generate-image']?.output
// → undefined

// Attempt 2: scan all step outputs for base64
const allStepOutputs = Object.values(result.steps || {}).map(s => s.output).filter(Boolean)
const imageResult = allStepOutputs.find(o => o.base64)
// → undefined

// Attempt 3: read from result.result (final workflow output)
const imageResult = result.result
// → null

All three approaches returned nothing. The step was listed as completed, but its output was inaccessible from every angle we tried.

The Root Cause

After adding verbose debug logging, we found the real culprit. Our `generateImageStep` had a retry loop with OCR and transparency validation. When all three attempts failed validation, the step threw an error:

typescript
// Inside generateImageStep — the original code
for (let attempt = 1; attempt <= MAX_RETRIES; attempt++) {
  const result = await callOpenAIImage(prompt)
  try {
    const analysis = await analyzeImageInternal(result.base64, context)
    return { base64: result.base64, analysis, attempt }
  } catch (err) {
    if (isRetriableError(err) && attempt < MAX_RETRIES) continue
    throw err  // ← THIS was the problem
  }
}
throw new Error('Image generation failed after all retries')  // ← AND THIS

⚠️ When a Mastra step throws, the workflow silently marks it as failed and returns nothing — no error in result.result, no indication in result.steps, no exception propagated to the caller. The workflow appears to "complete" successfully.

This is the core Mastra gotcha: a throwing step is a silent failure. The framework swallows the exception, the workflow run finishes with a success status, and your caller gets back an empty result. There's no way to distinguish "workflow completed normally" from "workflow completed because a step threw".

The Fix

We made three changes:

1. generateImageStep never throws

Instead of throwing after validation failures, the step now returns the best available image with a `textError` field noting the issue. The product still gets published — a slightly imperfect sticker is better than no sticker.

typescript
// After: always return something
} catch (err) {
  if (isRetriableError(err) && attempt < MAX_RETRIES) continue
  // Non-retriable or final attempt — return best result with error noted
  console.warn(`Returning best available image after: ${err.message}`)
  return {
    base64: result.base64,
    analysis: fallbackAnalysis,
    textError: err.message,
    attempt,
  }
}

2. Workflows end at generateImageStep

We removed the trailing `.map()` + `publishStep` from all six workflows. The workflow output schema now returns `{ base64, analysis }` directly. This eliminates the problematic step chain after image generation.

typescript
// Before: workflow chains into publish
.then(generateImageStep)
.map(async ({ inputData, getStepResult }) => ({ ... }))
.then(publishStep)
.commit()

// After: workflow ends at generateImageStep
.then(generateImageStep)
.commit()

3. Agent wrappers publish directly via result.result

Each agent wrapper now reads `result.result` (the final step's output) and calls a shared `publishProduct()` helper directly. Publishing is no longer inside Mastra — it's owned by the caller.

typescript
const result = await run.start({ inputData: { recentQuotes } })

// result.result = generateImageStep output = { base64, analysis, ... }
const imageResult = result.result
if (!imageResult?.base64) throw new Error('Image generation failed')

const published = await publishProduct({
  base64: imageResult.base64,
  analysis: imageResult.analysis,
  type: 'scientific',
  theme: 'Scientific Wonder',
  author: result.steps?.['scientific-quote']?.output?.author,
})

Three Rules for Mastra Users

  1. 1Never throw from a step. Return a result with an error field instead. A throwing step silently kills the workflow with no observable error.
  2. 2Don't rely on result.steps['step-id'].output for steps followed by .map(). The output may be empty or inaccessible. Use result.result for the final step's output.
  3. 3Keep side effects (DB writes, blob uploads, API calls) outside Mastra. Put them in your caller after the workflow completes. This makes them easier to debug, retry, and reason about.

Mastra vs LangChain vs CrewAI — An Honest Take

We chose Mastra because it's TypeScript-native and runs cleanly in Next.js serverless functions. LangChain's JS port lags the Python version significantly, and CrewAI is Python-only. For a Next.js shop, Mastra was the pragmatic choice.

That said, Mastra is roughly a year old and the rough edges show. LangChain and CrewAI have years of battle-testing, large communities, and much better observability tooling (LangSmith is excellent). Mastra's error handling in particular needs work — silent step failures are a serious DX problem.

The framework's *concept* is solid: typed steps, composable workflows, clean `.then()` / `.map()` / `.parallel()` API. When it works, it's elegant. The fix we landed on — ending workflows early and publishing in the wrapper — is actually a cleaner architecture regardless of Mastra's bugs. Keeping side effects outside the workflow makes the system easier to test and reason about.

The Result

After the fix, the first test run returned:

json
{
  "status": "success",
  "type": "scientific",
  "theme": "Scientific Wonder",
  "diecutShape": "leaf",
  "keyword": "reciprocity",
  "quote": "Restoration is an act of reciprocity.",
  "author": "Robin Wall Kimmerer",
  "source": "Braiding Sweetgrass",
  "productId": 3568,
  "imageUrl": "https://..."
}

Robin Wall Kimmerer, leaf die-cut, cosmic constellation style. Published. All six agents now reliably publish on every run.

If you're building on Mastra and hitting similar silent failures, I hope this saves you the debugging expedition we went through. The framework has real promise — it just needs a few more miles on it.

Building with AI?

netrii helps ambitious SMBs navigate AI and emerging technology — strategy, experiments, and hands-on practice.

Schedule a Conversation