Compute the Last Digit of Pi

In the second season of the original Star Trek, an entity that has been murdering people across the galaxy since the era of Jack the Ripper takes refuge in the last place anyone thinks to look: the Enterprise's computer. It has a whole starship's systems to play with. Spock's solution is one line, delivered with the calm of a man closing a door.

"Compute," he says, "to the last digit the value of pi."

Pi is irrational. Its digits never end. The computer, ordered to finish an unfinishable sum, throws everything it has at the problem — and has nothing left for harboring fugitives. The villain is squeezed out. The day is saved by a freshman math fact.

We tend to file this scene under quaint — a relic of how 1967 pictured computers. But the interesting thing about the pi gambit is that it was correct. For the machine it was aimed at, it would have worked.

The Trick Was Fair, Once

The computer Spock was addressing was, in the mind of anyone writing in 1967, a single-threaded device: one instruction after another, no interruptions, no sense of priority. The Apollo Guidance Computer — state of the art, the machine that would fly men to the Moon two years later — ran more or less exactly that way. Hand such a thing an endless task and it really would chew on it forever, because nothing in its design let it do otherwise. Spock wasn't being fanciful. He was being a good engineer about the hardware of his era.

What retired the trick wasn't a smarter villain or a bigger computer. It was architecture. Preemptive multitasking — the operating system interrupting every task thousands of times a second, handing out slices of attention and yanking them back — means no single job can seize the whole machine, however badly it "wants" to. Time limits kill the overstayers. Sandboxes wall off the misbehaving. You can now ask a computer to compute pi to a hundred trillion digits, and people do, for sport; it chews through them in the background while you go make a sandwich.

The pi trick didn't become wrong. The machine outgrew it. Hold that thought — it's the whole pattern.

The Talking Dog

Samuel Johnson, asked about a woman preaching, said it was like a dog walking on its hind legs: "It is not done well; but you are surprised to find it done at all." This is the oldest move in the book, and it's the one we've run on machine intelligence for seventy years. First we marvel that the thing does it at all. Then, almost in the same breath, we start itemizing how badly.

The wonder that a language model talks — really talks, fluently, about anything — lasted about a season before it curdled into a list of things it can't do. Ask it how many R's are in "strawberry." Ask it how many words are in the reply it's giving you. Ask it to pick a random number and watch it say 73, again, the way it always does. Each is the pi gambit's direct descendant: a snare set to prove the dog isn't really walking, just lurching.

And each is going the same way the pi trick went.

The Rubrics That Fail Us Too

Start with what these tests actually measure, because it's rarely what we assume.

Ask a person to pick a random number between one and a hundred and a suspicious share say 37, or 73, or 7. The "it always says 73" mockery aimed at language models describes the human brain just as well. Neither carbon nor silicon contains a uniform random-number generator, because nothing that thinks does. If you genuinely need randomness you don't concentrate harder — you roll a die. The failure was never a failure of intelligence; it was a category error about what intelligence is for.

The word-count trick is the same story. A model can't reliably tell you how long its answer will be, because at the moment it's deciding, the answer doesn't yet exist and nothing is keeping a tally. Neither can you. Stop mid-sentence and report, to the word, how long the sentence will turn out — you can't, and not because you're slow-witted. You're composing forward, exactly as it is.

Even the strawberry case, the most genuinely machine of the three, has a human echo. The model miscounts the R's because it never saw letters; it reads in tokens — sub-word chunks — so "strawberry" arrives as a couple of pieces, not as s-t-r-a-w-b-e-r-r-y. But ask why proofreading is hard, why your own typos are invisible to you, why we read familiar words as shapes and skate over the spelling. We don't see letters either, most of the time. We see chunks. The machine's blind spot rhymes with one of ours.

One Loop Away

Here's the part the goalpost-movers keep missing. Every one of these gotchas that isn't simply a shared human quirk falls the instant you wrap the model in tooling we already have.

Give a model the ability to run code and "count the R's in strawberry" becomes a one-line call that returns 3, every time, forever. The tokenization blindness is real for a single forward pass and gone the moment there's an interpreter in the loop. Give it a scratchpad and a thinking loop — draft, count, revise — and the word-count limit dissolves. Give it a hardware random source and it picks numbers better than you do. None of this needs a smarter model. It needs the same thing that retired the pi trick: better architecture around the same core.

This shouldn't surprise us, because it's how our intelligence works too. Yours is not a property of your bare cortex. It's smeared across notebooks and calculators and the fingers you count on, the ability to go back and re-read a sentence, to check, to ask. Strip all of that away — no paper, no second glance, one pass, out loud, no take-backs — and you'd fail most of these tests yourself. The bare forward pass of a language model is that stripped-down condition. The tooling isn't cheating the test. The tooling is the test being answered.

The Goalpost on Wheels

So the genre has a structural problem. Every rubric offered as proof that machines don't really think comes with an expiration date, and the dates keep getting closer. The pi trick lasted decades. The strawberry trick lasted months — there are already models that reach for a tool and count the letters correctly. The line we draw around "real" intelligence isn't holding still. It's a goalpost on wheels, and we're the ones pushing it, a little further back each time the machine catches up.

There's nothing wrong with a hard test. The trouble is that we keep mistaking the test for the boundary, and the boundary keeps turning out to be a description of last year's tooling. Each new snare we set is, mostly, a way of buying ourselves a little more time to be surprised the dog is walking at all.

At some point the surprise stops being the story. The dog is not only up on its hind legs; it has picked up a calculator. The honest question was never whether it can count the R's in strawberry. It's what we'll say once it can — and what new thing we'll decide, that very morning, a real mind would surely never get wrong.

Published under the name Geordie.