DebaterXDebaterX

The Prompt That Clicked: From Nonsense to Actual Debates

For weeks the model kept writing polite conversations. Then I added one rule.

·4 min read

For about three weeks in early development, every generated debate failed the same way. The mascots would argue for one turn, then agree. Agree. Agree. Agree. Polite compromise. Both characters finding shared ground. Ad becomes motivational poster. Ad dies on the feed.

I couldn't figure out why. I was prompting for conflict. I was specifying rivalry. I was putting "debate" in every sentence of the system message. The models — Gemini, GPT, Claude, all of them — ignored me and wrote group therapy.

Then I added one sentence to the prompt and everything changed. Here's what it was.

The sentence

"Neither character may agree with the other for the full video. Disagreement is the point."

That's it. One sentence, explicit, negative. The models immediately started producing real debates. The mascots stopped reaching for consensus. The dialogue stayed in conflict.

Why it worked

LLMs are trained on human feedback that rewards harmony. "Be helpful" translates, somewhere deep in the weights, to "steer conversations toward agreement." When you ask the model to write a debate, it's operating against its own training — and it needs explicit permission to stay in conflict.

"Disagreement is the point" is that permission. The model reads it as a rule that overrides its trained instinct. Conflict becomes allowed. Harmony becomes forbidden. Output shifts.

The broader lesson

I learned three things from this one rule.

Lesson one: negative prompts outperform positive prompts. Telling the model what not to do narrows the output space more sharply than telling the model what to do. "Don't agree" is a tighter constraint than "disagree."

Lesson two: LLMs have training biases that work against specific creative tasks. Debate writing is one. So is writing flawed characters, writing villains with real grievances, and writing endings that don't resolve. All of these fight against RLHF defaults. Explicit anti-RLHF prompting is required.

Lesson three: you'll discover the right prompt by staring at failures. I didn't figure out "disagreement is the point" by thinking about it. I figured it out by reading 200 bad outputs and noticing the failure pattern. The failure pattern told me what to forbid.

The prompt now

The current system prompt has seventeen rules like this. Each one blocks a specific failure mode I discovered through trial and error. Some examples:

"No character may acknowledge that they are a fictional mascot."

"No character may explain the product they represent; they may only imply it."

"No character may reference the current year or any specific date."

"No character may use words longer than four syllables unless the word is a brand name."

Each rule adds tightness to the output. Each rule was earned by watching the model fail before it was added.

What the prompt doesn't do

The prompt doesn't make the debates good. Good requires creative input from the brief — the matchup, the topic, the setup. The prompt just prevents the ten most common failure modes. Prevention isn't generation.

You still need to pick interesting mascots. You still need a premise that doesn't feel forced. You still need to write captions. The prompt is the floor. The ceiling is still on you.

The takeaway for anyone building creative AI tools

When your output is failing in a consistent way, the fix isn't more positive instruction. It's a specific negative rule that blocks the failure.

This is unintuitive. Most builders want to tell the model what they want. What you want is usually what the model is already trying to do, so telling it again doesn't help. What you need to do is tell the model what it shouldn't do, because the shouldn't is where the failures hide.

Think like a lawyer, not a coach. Legal language is mostly about what you can't do. Good system prompts work the same way.

← Back to all posts