Rendering 9:16 Without Squashing Your Mascots

Aspect ratio is the silent killer of AI-generated video. Here's the fix I wish I'd known on day one.

May 14, 2025·3 min read

Most video generation models are trained on 16:9 source footage. Movies. TV. YouTube. The whole training corpus is horizontal. Ask for 9:16 output and the model, doing its best with unfamiliar territory, produces subtly wrong results.

The wrongness is specific: faces get slightly elongated, limbs look stretched, characters feel "off" in ways viewers notice without being able to name. It's a consistent failure mode across every model I've tested.

Here's how I work around it.

What actually happens

When you request 9:16 from a model trained on 16:9, the model doesn't rotate the camera — it re-composes. It tries to fit a horizontal scene into a vertical frame, and the fit requires squishing the subject. A face that was rendered in a 16:9 frame gets compressed horizontally. Shoulders get narrowed. Proportions get wrong.

Ronald McDonald generated in 16:9 looks like Ronald McDonald. Ronald McDonald generated directly in 9:16 looks like Ronald McDonald's slightly-off cousin.

The workaround: generate wide, crop vertical

My current pipeline generates every video at the model's native ratio — which is 16:9 for most models — and then crops to 9:16 as a post-processing step.

The cropping is center-biased. I prompt the model to compose the scene with "subject centered, significant negative space on both sides." When the crop happens, the negative space gets removed, and the subject lands centered in the vertical frame at correct proportions.

The result: mascots that look like themselves. No squishing. No weird head shapes. Full fidelity.

The composition prompt

This is the exact language I use in image/video briefs now:

"Compose the subject in the center third of the frame. Leave significant negative space on the left and right thirds. The scene must read correctly even when cropped to a vertical 9:16 aspect ratio."

That language tells the model to produce a shot that will survive the crop. Without it, the model will fill the entire 16:9 frame with important content, and the crop will chop parts off.

The cropping automation

Post-generation, I pipe the video through a short FFmpeg script:

Detect the video's native aspect ratio.
Calculate the center vertical crop at 9:16.
Apply the crop, maintaining original resolution.
Pass to Mux.

It's about 20 lines of code. Runs in 2-3 seconds per video. The overhead is invisible to users.

What doesn't work

I tried two other approaches before landing on this one.

Approach one: ask for 9:16 directly. Subject gets squished. Every time. Not usable.

Approach two: generate 1:1 square and then crop/pad to 9:16. Slightly better than direct 9:16, but the composition still suffers because 1:1 compositions don't translate cleanly to vertical.

Wide-and-crop beats both. Wide-and-crop is the industry-standard workaround and has been since short-form video became mainstream. It's not my discovery — it's what every experienced video producer does.

The aspect-ratio trap

If you're building a video-generation product, never trust the model's native output to match your target aspect ratio. The model was trained on one aspect ratio. Every other ratio is a workaround. Test every aspect ratio your product supports and budget post-processing time for corrections.

This is also true for square output (for Instagram feed), portrait 4:5 (for Facebook), and cinematic 2.35:1 (which almost nothing supports natively). Each target aspect ratio needs its own composition prompt and its own crop logic.

Aspect ratio is one of those infrastructure details that doesn't seem like a big deal until you ship and viewers tell you "your mascots look weird." By then the damage is done. Get ahead of it.