The Problem With 'And They Learned Something'

AI wants a moral. Mascot debates don't have morals. Cut the lesson.

January 1, 2025·4 min read

Every time I write a debate prompt without explicit instructions to avoid resolution, the model lands the ending on a "lesson." The characters, having argued for six beats, find common ground. Both walk away wiser. The audience is supposed to be moved.

Nothing about this is what mascot debates are supposed to do. And yet LLMs default to it, every time, across every model I've tested. Here's the trap and the fix.

The RLHF origin

Model training data is biased toward narratives with resolution. Most fiction ends with characters learning something. Most nonfiction ends with a takeaway. Most conversations end with agreement or at least mutual understanding.

RLHF reinforces this. Human raters prefer responses that feel complete. Completeness, in prose, usually means resolution. The model learns: end conversations with a lesson, get high ratings.

This works great for most tasks. It fails for mascot debates, where the whole point is unresolved conflict that makes the audience pick a side.

The failure mode in practice

Classic output from a default model on a Ronald vs. King brief:

Ronald: "My burgers are for families." King: "Mine are flame-grilled." Ronald: "Yours are greasy." King: "Yours are fake." Ronald: "Look, maybe we both have something to offer." King: "You're right. Different customers like different things." Ronald: "Thank you for this conversation."

Notice the shift around line five. The model, sensing the conversation is ending, starts reaching for resolution. By line seven, the characters are thanking each other. The debate has collapsed into a Hallmark card.

The fix: explicit anti-resolution

Add a specific rule in the brief: "The debate must not conclude with any form of agreement, mutual understanding, or shared lesson. Both characters must end the scene in the same position they started. Neither character should acknowledge the other's point as valid."

This is aggressive language. You have to be aggressive, because the model's training is pulling hard in the other direction. Soft instructions ("try to keep the tension") don't work. Hard prohibitions ("must not") do.

With this rule, output shifts:

Ronald: "My burgers are for families." King: "Mine are flame-grilled." Ronald: "Yours are greasy." King: "Yours are fake." Ronald: "Enjoy your... whatever that is." King: [slides a Whopper across the counter, silent, still smiling] [cut]

No resolution. No lesson. Both characters remain exactly themselves. The debate ends in stalemate, which is what a debate should do.

Why unresolved endings work better

Resolved endings tell the audience what to think. The characters agreed; the correct response is to feel satisfied. There's nothing left to process.

Unresolved endings force the audience to complete the scene in their head. They're implicitly asked to pick a side. "Who's right?" "Who would you side with?" The question lingers.

This lingering is engagement. Engagement is what short-form video rewards. The comment section fills with people taking positions. The comment section is the third act of the ad.

Resolved ads don't get comments like these. They get "great ad!" or "love it!" — polite, brief, low-signal. Unresolved ads get debates about the debate, which is ten times more valuable.

The test

Generate a debate. Read the last two lines.

Do the characters still disagree? Are they still themselves? Does the scene end mid-conflict?

If yes to all three: good. Ship.

If the characters have softened, found common ground, or expressed appreciation for each other: the anti-resolution rule isn't strong enough. Rewrite with stronger negative constraints.

The broader AI lesson

Models have invisible preferences. Resolution is one. Harmony is another. Safety-seeking is a third. Every creative task has to account for these and actively counter them.

You can't always know which preferences will bite your specific task until you've watched the model fail. When you see a consistent failure — every output landing on a moral, every ending feeling tidy — the model has found one of its preferences and is expressing it. Add an explicit rule against that preference.

Debating is a genre the model is bad at by default. It gets good at it when you tell it, clearly and repeatedly, don't do the thing you want to do.