Temperature Isn't the Answer to 'Funnier'

Cranking temperature doesn't produce jokes. It produces noise that sometimes looks like jokes.

January 22, 2025·4 min read

Every new prompt engineer eventually hits this point: the output is boring, and someone on the team asks if we can "make it funnier by turning up the temperature."

No. That's not how temperature works. And turning it up is almost always the wrong move. Here's why.

What temperature actually does

Temperature controls the randomness of token selection. At temperature 0, the model always picks the highest-probability next token. At temperature 1, the model samples from the probability distribution. At temperature 2+, the model frequently picks unlikely tokens.

Low temperature = predictable output. High temperature = diverse output.

Neither of those is the same as "funny." Funny is a structural property of the text, not a randomness property.

Why the "turn up the heat" instinct feels right

The instinct comes from a plausible-sounding theory: jokes are unexpected, so making the model more unpredictable should produce more unexpected text, which should produce more jokes.

This theory is wrong in two ways.

First: unexpected isn't the same as funny. Jokes are structured unexpectedness. They set up an expectation and then violate it in a specific, meaningful way. Random unexpectedness is just noise.

Second: high-temperature output is full of unstructured unexpectedness. Weird word choices. Jarring subject shifts. Characters saying things that don't follow from what came before. None of this is funny. It's just broken.

High temperature produces the surface appearance of creativity (surprising words) without the underlying structure of creativity (meaningful surprises). The distinction is the difference between a joke and gibberish.

What actually produces funny

Funny comes from structure. Specifically:

Setup-payoff structure. A joke has two parts. The setup establishes expectations. The payoff violates them. Without this structure, no amount of wordplay is funny.

Character-specific timing. Funny dialogue is funny partly because a specific character said it at a specific moment. The same line, delivered by a different character, stops being funny.

Compression. Jokes get funnier as they get shorter. Every word you can cut makes the punchline land harder.

None of these structural properties are improved by temperature. They're improved by brief structure.

The brief changes that actually produce funny

If you want funnier output, try these:

Add a "subvert the expected response" rule. Tell the model that when a character would normally say X, they should say the opposite. This forces structural unexpectedness rather than random unexpectedness.

Add a "end on the weaker character's line" rule. This is a craft heuristic from stand-up: the weaker character usually gets the best punchline because the contrast is sharper. Telling the model to put the closing line with the underdog produces funnier endings.

Add a "one concrete image per line" rule. Abstract jokes don't land. Specific jokes do. Tell the model each line should mention a specific, concrete thing. This forces grounded humor.

All of these rules work at low temperature. Most of them stop working at high temperature because the random sampling breaks the structure.

The evidence

I ran A/B tests. Same setup. Same brief. Same matchup. Only variable: temperature.

Temperature 0.3 with a structured brief: 8/10 outputs rated funny by a human panel.

Temperature 0.9 with the same brief: 4/10 outputs rated funny.

Temperature 0.3 with a vague brief ("write a funny debate"): 3/10 outputs rated funny.

The clear pattern: brief structure dominates temperature by a wide margin. Temperature only matters in the absence of structural rules.

The mental model

Temperature is an input dimension. Structure is a higher-order property. Adjusting temperature to produce structural outcomes is like adjusting the volume knob to produce a different song.

If your output is boring, improve the brief. If your output is structurally right but too safe, slightly raise temperature — from 0.3 to 0.5, not from 0.3 to 0.9. Never crank temperature to solve a structural problem.

The rule

Temperature is a fine-tuning knob. It adjusts variance around a trajectory that your brief already defines. If the trajectory is wrong, temperature won't save you.

Nail the brief first. Tune temperature last. Temperature never does what you think it will. Briefs do.