Inngest for Long-Running AI Pipelines
Why a background job runner became the backbone of DebaterX.
An AI-generated debate takes about four minutes end-to-end. Script generation: 20 seconds. Image generation for both mascots: 45 seconds. Video synthesis: 90-120 seconds. Audio: 20 seconds. Composition and Mux upload: 30 seconds.
Four minutes is fine for the user, who's doing other things. Four minutes is not fine for a Vercel serverless function, which times out at 300 seconds on the Pro plan and way sooner on Hobby.
The naive approach — fire the whole pipeline from an API route and wait for it to finish — was always going to fail. I needed a job runner. I chose Inngest. Here's why.
The problem with serverless for long jobs
Serverless functions are optimized for short, stateless requests. HTTP in, compute briefly, HTTP out. Long-running anything fights the platform.
You can work around this with polling, with WebSockets, with queues — all of which introduce infrastructure you have to maintain. At some point, maintaining the workaround is more expensive than just using a job runner designed for this.
Why Inngest specifically
I evaluated a few options:
Inngest. Step-based workflows. Durable execution. Each step is retryable independently. Built-in observability for failed runs. TypeScript-native.
Temporal. Older, more feature-rich, used at large scale. Overkill for a product my size.
Self-hosted queues (BullMQ, RabbitMQ). Works fine, but requires operational expertise I'd rather not spend time on.
Vercel's own job system. Didn't exist when I started; when it launched, it didn't support the step-retry model I needed.
Inngest won because the steps model fit my pipeline exactly. Each stage of the debate generation is a natural step. Each step can retry independently if it fails. That's a huge operational win — if the video synthesis step fails, I don't re-run the script generation.
The pipeline
A debate job in DebaterX has these steps:
- Generate script (Gemini call, ~20s)
- Generate brand A image (Fal call, ~15s)
- Generate brand B image (Fal call, ~15s, parallel with step 2)
- Generate video (Fal call, ~90s)
- Generate audio (ElevenLabs call, ~20s)
- Composite (FFmpeg in a cloud function, ~15s)
- Upload to Mux (~10s)
- Update debate record (Supabase call, ~1s)
Each step is an Inngest step. Each step has its own retry policy. If step 4 fails, I retry step 4 without losing the work done in steps 1-3. If the whole job fails permanently, I can see exactly which step failed and why, in the Inngest dashboard.
The developer experience
Writing an Inngest function looks like writing normal code, with one wrapper:
The step functions are declared as async callbacks. The orchestrator handles retries, backoff, and state persistence. My application code doesn't need to know any of that — it just calls steps in order.
This is much simpler than managing a queue manually. I write the pipeline the way I think about it. Inngest handles the durability.
The failure modes I've handled
Rate limits. When a model provider rate-limits me, the step retries with exponential backoff. The user sees a slight delay; I don't see any failures.
Provider outages. When Fal has a bad hour (it happens), jobs wait in Inngest's retry queue. When Fal recovers, jobs resume automatically. No manual intervention.
Partial failures. If step 4 succeeds but step 5 fails, Inngest only re-runs step 5. Step 4's output is preserved. This saves me real money in GPU costs — a failed video upload doesn't force me to regenerate the video.
The operational benefit
I spend almost zero time on pipeline reliability. The Inngest dashboard shows me which jobs are running, which have failed, and why. When I deploy a change, I can test it against a dead-letter queue of previously-failed jobs. The platform handles what I would otherwise be building myself.
For an AI video product with long pipelines, Inngest is the right tool. If I were building this two years later, I'd probably still choose it. It's one of those infra decisions where you're right and it keeps being right.