Role Prompts vs. System Prompts for Mascot Debates

Where you put the character rules changes whether the model stays in character across long exchanges.

February 5, 2025·4 min read

Most LLM API calls have two kinds of messages: system prompts and user/role prompts. For character-driven tasks, where you put the character rules changes everything.

The conventional wisdom is to put character rules in the system prompt. It's clean, it's centralized, it feels organized. It's also often the worst choice.

Here's why role prompts outperform system prompts for character work, and when to use each.

The strength-decay pattern

In long conversations, every model shows the same pattern: rules in the system prompt lose strength over time. By turn six or seven, the model starts "forgetting" them — not literally, but in practice. The character's voice drifts. The constraints relax. The rules fade.

Rules in role prompts decay slower. They're closer to the current generation in the context window, so they have more gravitational pull on the next response.

If your debate runs beyond three turns, this matters. System-prompt characters start losing their voice around turn four. Role-prompt characters hold further, often all the way through the conversation.

The old way (system prompt)

System prompt: "You are writing a debate between Ronald McDonald and the Burger King. Ronald is cheerful but awkward. The King is silent and menacing. Neither agrees with the other."

User prompt: "Generate turn 1."

Response: Good.

User prompt: "Generate turn 2."

Response: Still good.

User prompt: "Generate turn 3."

Response: Starts softening.

User prompt: "Generate turn 6."

Response: Both characters are now mildly friendly. Rules forgotten.

The new way (role prompt)

System prompt: "You are writing a debate. Follow the character rules provided in each turn's request."

User prompt (turn 1): "[RONALD: cheerful but awkward, never acknowledges the King's weirdness] [KING: silent, only gestures, never agrees] Generate turn 1."

User prompt (turn 2): "[RONALD: cheerful but awkward, never acknowledges the King's weirdness] [KING: silent, only gestures, never agrees] Previous turn: [...]. Generate turn 2."

And so on. Every turn re-asserts the rules in the role prompt. The model gets a fresh reminder each time.

Why the role-prompt approach works

The model's attention, in practice, weighs recent context more heavily than distant context. System prompts are always at the beginning of the conversation, which means they're always the most distant message. Role prompts can be current.

By re-injecting the rules at every turn, you're placing them in the model's most-attended-to position. The rules stay vivid. The characters stay in voice.

When to use system prompts anyway

System prompts still win for non-character-specific tasks:

Format rules. "Respond in JSON with the following schema." This belongs in the system prompt. It's not character-specific and you don't need to reinforce it per turn.
Safety rules. "Never discuss topics X, Y, Z." Character-neutral. Stays effective from the system prompt.
Output style. "Write in markdown with headers." Again, non-character. Fine at the top level.

The rule of thumb: if the constraint describes the task shape, use system prompts. If it describes who the characters are, use role prompts with re-injection.

The engineering cost

Re-injecting character rules makes your prompts longer. Every turn adds roughly the character block's worth of tokens. For a six-turn debate with two characters whose rules take 100 tokens each, that's 1,200 extra tokens across the conversation.

At current pricing, that's negligible for most use cases. The output quality improvement justifies the cost.

If you're running at extreme scale, you can use prompt caching to minimize the per-turn cost. Cache the character block once; only the turn-specific content is new. Both Anthropic's and OpenAI's APIs support this.

The takeaway

System prompts are for task invariants. Role prompts are for task variables. Character rules are variables — they need to be recent and loud, not distant and fading.

Move your character rules out of the system prompt. Re-inject them per turn. Your characters will hold voice longer, and your debates will stay sharp from turn one to turn six.

This is the single prompting change that improved DebaterX's output quality the most. Not a model upgrade. Not a temperature tune. A structural change to where the rules live. Try it.