Mux and Fal Together: Two Video Stacks, Two Jobs
Why DebaterX uses one service to generate video and another to serve it — and why combining them would be a mistake.
There's a tempting architectural shortcut for video-generating apps: use one vendor for everything. Pick a platform that handles generation and delivery. One bill, one integration, one dashboard.
I considered this for DebaterX. I decided against it. Here's why, and what we ended up doing instead.
The two jobs are opposite
Generation is bursty and GPU-bound. A single video generation takes 30-90 seconds of dedicated GPU time. It happens rarely (maybe a few hundred times a day for a small product) but when it happens, it's expensive and slow.
Delivery is steady and bandwidth-bound. Once a video exists, it gets watched thousands of times — some of them on slow mobile connections, some on desktop browsers, all of them expecting instant playback. Delivery is a CDN problem, not a GPU problem.
Trying to optimize for both in one service means compromising on both. The vendors who try (usually to capture "end-to-end" workflows) tend to be mediocre at each half.
Fal for generation
Fal.ai specializes in hosted inference for generative models. Their infrastructure is built around fast cold starts, GPU pooling, and efficient routing across model providers. For my use case, I'm running diffusion models and video synthesis — both extremely GPU-hungry.
Fal's cold start times are the best I've tested. That matters because generation jobs are bursty — I might run zero for an hour and then fifty in five minutes. Services that can't spin up fast would leave users waiting.
The cost model is per-inference, which aligns with my cost structure. Users pay for debates; I pay for generations. Clean pass-through.
Mux for delivery
Mux is HLS-first, mobile-optimized, and has a 9:16 workflow that treats vertical video as a first-class shape. Most video platforms treat vertical as an afterthought, which shows up in weird ways — thumbnails that crop the top, players that add black bars, transcodes that don't optimize for vertical.
Mux also does the signed URL and access control work that would otherwise eat engineering hours. Public-share pages get signed URLs. Private workspace pages get token-gated access. All handled.
The handoff
When a Fal generation completes, the webhook fires. My backend downloads the video, uploads it to Mux as a new asset, and updates the debate record in Supabase with the Mux playback ID. From that point on, Fal is out of the loop — the video is entirely a Mux asset.
The handoff takes about 15 seconds on average. The user sees "generating" for the Fal portion and "processing" for the Mux portion, but the total user-perceived wait is the generation time, not the sum. Mux's processing runs in parallel with the user's post-generation UI interactions.
Why one-vendor solutions fail
The platforms that try to do both generation and delivery usually end up weak at both. Their generation is slower than Fal's because their infrastructure is compromised for delivery use cases. Their delivery is worse than Mux's because they don't have Mux's CDN-level sophistication.
More importantly, vendor-locking into an all-in-one platform means you can't swap either half independently. If Fal adds a better model next quarter, I can switch generation in an afternoon. If Mux releases better analytics next quarter, delivery can upgrade separately. Coupled vendors don't let you do this.
The lesson
Split your stack along capability boundaries, not vendor boundaries. Generation is one capability. Delivery is another. They should be separate services regardless of how many vendors pitch you on an integrated solution.
Most startups benefit from specialist vendors over generalists, especially for any function that's on the hot path of the user experience. The specialist is better because that's what they do. The generalist is worse because that's one of many things they do.
Pick specialists. Accept the integration work. Stay swappable.