Why AI Video Still Can't Do Slapstick

Pratfalls require physics. Models don't respect physics. Here's how to route around.

November 27, 2024·4 min read

Slapstick is a specific kind of physical comedy: characters falling, tripping, colliding, getting hit, dropping things. It's one of the oldest and most universal forms of humor, and it requires something AI video models can't yet do reliably: inevitable physics.

A pratfall only works because the audience knows exactly what's going to happen. Object at height + loss of support = fall. The comedy is in the inevitability. Current video models don't have reliable physics, so their pratfalls feel arbitrary and unfunny.

Here's what breaks and how to work around it.

What models get wrong

When a video model generates a character falling, the fall usually looks like:

Character pitches forward at an unnatural angle.
Limbs rotate in directions that violate biology.
Impact happens without convincing weight transfer.
The aftermath — a character lying on the ground — looks like a character floating near the ground.

Each of these is a small error. Cumulatively, they're the difference between "that was funny" and "that was weird."

Slapstick requires conviction. AI video doesn't have conviction about physics yet. It interpolates between plausible states without understanding the forces connecting them.

What works instead

Setup in AI. Payoff in stock or post.

Generate the lead-up to the pratfall with AI. The moment of tension before the fall. Cut. For the payoff, either use stock footage (someone falling convincingly in real video) or a sound cue with a simple cutaway (audio of a crash, visual of bystanders reacting).

The viewer's brain fills in the physics. They never see the bad AI fall. They experience the setup and the aftermath, and their brain constructs a convincing middle.

Off-screen slapstick.

The pratfall happens off-screen. Viewers hear it. They see a reaction shot. This is a classic stage-play technique for when the production can't stage real stunts. It works perfectly for AI limitations.

"Did he just..." [crash sound offscreen] "...yep."

Stylized slapstick.

Commit to cartoon physics. Generate slapstick that's obviously physically wrong — exaggerated, animated, clearly unreal. When the stylization is the point, imperfect physics stop being imperfect and start being stylized.

This works for animation-style mascots (Tony the Tiger, the Trix Rabbit) where viewers expect cartoon physics anyway. Doesn't work for realistic mascots where the expectation is plausibility.

The specific failures to avoid

The three-limb drop. Character loses balance, falls backward, but one arm or leg stays planted as if glued to the floor. Extremely common AI failure. Looks broken.

The slow-motion pratfall. Character falls at a rate physics doesn't support. Too slow reads as dream sequence, too fast reads as glitch.

The mid-air pose. Character falls, then appears to freeze mid-air for half a frame before impact. This happens when the model interpolates between a "standing" keyframe and a "fallen" keyframe without understanding the trajectory.

The clean landing. Character falls, hits ground, and assumes a clean pose like they were posed by a toy company. Real falls produce awkward, asymmetric landings. AI falls produce art-direction-approved landings.

Spotting these failure modes lets you identify bad slapstick generations and reject them.

When it'll get better

Physics in video models will improve fast. Within 12-24 months I expect generation quality to handle pratfalls convincingly, at least for simple cases.

When that happens, slapstick becomes available as a format. Until then, route around it.

The product implication

For DebaterX, I currently flag slapstick as a high-risk shot type. Users who request it get a warning and suggested alternatives (verbal comedy, reaction comedy, visual comedy that doesn't require physics).

Once the models improve, I'll remove the warning. The flag is temporary. The craft principle — generate what the model does well, route around what it doesn't — is permanent.

The broader lesson

AI video is not a uniform capability. It does some things better than others. Slapstick is hard. Static scenes are easy. Talking heads are medium. Action sequences with weight and impact are hard. Cartoon expressiveness is medium.

Building a production pipeline means knowing where the capability boundary is and routing work accordingly. Don't fight the model's weakness. Build around it until the weakness disappears.

The best creators on AI-generated platforms aren't the ones with the fanciest prompts. They're the ones who know what the model won't do and have content that doesn't require it.