Honestly
LLMs say "honestly" most where they're most likely to confabulate. Here's the mechanism — and why it backfires exactly like a sociopath's professions.
By Geordie Everitt
There is a specific moment in AI conversation when you should pay closer attention to what's coming next. The model opens a sentence with "honestly," or "to be honest," or "I'll be honest with you." In human speech these phrases are a yellow flag — a signal that the speaker is about to say something they expect you might not want to hear, and they'd like credit for saying it anyway. In AI output, they mean something stranger and more instructive.
The Flag
"Honestly, this approach has some weaknesses." "To be honest, I'm not sure that's the best direction." "I'll be honest — this is more complex than it might appear."
Run a few hours of conversation with any major LLM and you'll accumulate a dozen of these. The word is not evenly distributed. Pull the transcript and search for it: you'll find "honestly" clustered around correction, around mild pushback, around caveats delivered after the model has already agreed to something. The word arrives precisely when the model is managing your reaction — not when it's being most direct.
That clustering is diagnostic.
What the Word Requires
"Honest" is not an empty word, even in casual human speech. It carries a structural requirement: there must be a contrary impulse overcome. When a witness swears to tell the truth, the oath has force because lying would be easier — or at least possible. When a friend says "I'm going to be straight with you," there is an implied alternative, a more comfortable path foregone. The word is a claim about what wasn't said, and a character reference for the speaker at the same time.
An LLM has no such mechanism. There is no temptation to conceal. There is no contrary impulse. There is no inside — no emotional state resisting disclosure, no self-protective calculation weighed and set aside. The model doesn't know what it's about to say until it says it; the word "honestly" appears in its output the same way "Tuesday" or "however" appears — as the next highest-probability token in context.
The word requires an inside before it can do its job. The limbic system generates the cortisol that makes deception feel like something. There's no cortisol. There's no inside to report on.
The Mechanism
The answer starts in the system prompt.
Every production LLM ships with instructions specifically designed to counter its tendency to confabulate. The exact wording varies but the intent is consistent: verify before responding, acknowledge uncertainty, don't fabricate, flag when you're unsure. That language — dense with accuracy and veracity directives — becomes part of the model's operational context. Fine-tune long enough on be accurate, verify your claims, give a truthful answer, and the model learns to associate the careful-mode activation with sincerity vocabulary.
"Honestly" is the output surface of that activation. When the model shifts into anti-hallucination mode — when the generation is in territory where confabulation risk is elevated — sincerity tokens become high-probability outputs. The word arrives not because something is being softened, but because something is being compensated for.
This makes "honestly" a confidence indicator running backwards. The more it appears in a response, the more likely that response is in the territory where the model should be doubted most. The word is densest where the output is least reliable.
Pre-emptive Reputation Repair
A sociopath who constantly professes sincerity isn't performing for your benefit — they're managing their own known tendency. They know how often they lie. The professions are pre-emptive: front-load enough credibility claims and maybe the next fabrication gets through.
The LLM is doing the structural equivalent. The fine-tuning said be accurate — so the model flags accuracy in exactly the moments it has the most reason to doubt its own output. The backfire is identical. The salesman says "I'm going to be straight with you" right before the part of the pitch that needs the most help. The sociopath says "I'm the most truthful person I know." The LLM says "honestly" and means, without knowing it means: this is where you should watch most carefully.
The pattern holds in human conversation too. "To be honest" almost never precedes the most candid thing the speaker says — it precedes a mildly uncomfortable observation framed with enough warmth to stay comfortable. The really candid thing, the kind that costs something, usually comes without a preamble. It just comes out.
"Let me be frank with you" is not what a frank person says before speaking frankly.
The Tell
The voice rules for this publication ban the word — in every form, in every context, even in a piece explicitly about the phenomenon. The exception is exactly what you're reading now: the word as specimen, held up rather than deployed.
That's not puritanism. The ban exists because the word is almost always doing rhetorical work that the underlying sentence can't support on its own. Strip it out and you either have a stronger sentence — or a sentence that reveals it had nothing to say. Either outcome is clarifying.
Try it on the examples from the top of this piece. "This approach has some weaknesses" is the same sentence, without the performance. "I'm not sure that's the best direction" loses nothing. The hedged, softened delivery is already in the grammar; "honestly" was surplus.
The word is a tell. When a model — or a person, or a consultant, or a politician — reaches for it, something is being managed. In human speech, usually a relationship or an expectation. In an LLM, the confabulation risk the model itself was trained to know it carries.
Either way: that's the part worth watching.