About a couple of years ago, I followed the release of every new LLM with almost obsessive attention.
I still do today, but back then it felt different: every new model seemed like a promise.
Maybe this time, we’re finally there.
There was a kind of ritual.
A list of questions, always the same ones. A litmus test.
Small prompts to figure out whether the model was truly intelligent… or just another well-built toy.
One question, more than any other, seemed definitive:
Which is bigger, 9.7 or 9.11?
Trivial. Almost insulting.
And yet, many models got it wrong.
They answered: 9.11.
We smiled.
Actually, we laughed.
“We’re not even remotely close,” we thought.
If it can’t answer something this simple, what are we even talking about?
I was one of them.
I laughed.
And I waited for the next model, convinced that sooner or later the “right” one would arrive.
Then, one day, I listened to an interview with a developer from Anthropic.
I don’t even remember the exact context. But at some point, almost in passing, he said something:
Models are also trained on enormous amounts of code.
Nothing surprising, in theory.
But then something clicked.
In software, versions are not decimal numbers.
They are sequences.
9.7
9.8
9.10
9.11
In that context, 9.11 really is greater than 9.7.
It felt like getting punched in the face.
Maybe the model wasn’t wrong.
Maybe it was perfectly consistent with a part of the world it had seen.
It was me.
I was the one demanding an absolute truth while ignoring the context.
I was the one judging an answer as wrong simply because it didn’t match my interpretation.
It didn’t comfort me to know that many of us thought the same way.
A shared mistake is still a mistake.
It made me understand something much more uncomfortable:
an answer never lives on its own. It lives inside a context.
And from that moment on, I stopped laughing at LLMs.
I started paying more attention to the questions.
⸻
Some time later, a quote from Dale Carnegie came back to me.
He said something along these lines:
if you can be sure you’re right even 55% of the time, you can go to Wall Street and make a million dollars a day.
And then he added something even more uncomfortable:
if you can’t even be sure of that 55%, why do you spend your time telling other people they’re wrong?
That question applies here, too.
Why do we expect machines to always be right when we, as human beings, are constantly wrong?
We make mistakes in our judgments.
We make mistakes in our memories.
We make mistakes in our interpretations.
And yet, we live with those mistakes.
We accept them. We normalize them.
When a person gives an imprecise answer, we rarely call it a “hallucination.”
When an expert changes their mind, we don’t automatically say they are useless.
But when a machine does it, our reaction changes.
We are not looking for useful answers.
We are looking for perfect ones.
⸻
And this expectation did not begin with artificial intelligence.
It is much older.
Since the beginning of time, human beings have searched for a voice that would never be wrong.
We questioned the gods.
We built oracles.
We imagined perfect texts, absolute truths, unquestionable answers.
It wasn’t just religion.
It was a way to eliminate uncertainty.
The Oracle of Delphi was not valuable because it explained everything clearly.
It was valuable because it seemed to speak from a place above human error.
Or at least, that is what we wanted to believe.
Today, we risk doing the same thing with LLMs.
We ask vague questions.
We receive ambiguous answers.
And we often call them “hallucinations.”
⸻
But let’s stop for a moment.
If you ask:
Can I use butter?
A chef will say: yes, it improves the flavor.
A cardiologist will say: better to limit it.
A nutritionist will say: it depends on how much, when, and for whom.
Who is right?
All of them.
Or none of them, without context.
LLMs work, at least in part, the same way.
They do not search for some metaphysical truth.
They generate the most plausible answer given the available context.
If the context is incomplete, the answer may seem wrong.
But sometimes the question is the poor part.
An LLM without context is not always “hallucinating.”
Often, it is completing.
It does something we do as well: it fills in the blanks.
⸻
The problem is that we keep treating it like an oracle.
We want it to know everything.
To understand everything.
To need no explanation.
In other words, we want it to be infallible.
And when it isn’t, we rarely question the question.
We question the machine.
⸻
And yet, something is changing.
The most advanced models are beginning to build memory.
To remember who we are.
To infer context even when we do not make it explicit.
This is not just a convenience.
It is an attempt to get closer and closer to the idea of an oracle.
If we do not provide context, the machine will try to build it on its own.
And that should make us think.
Because maybe the real problem is not that machines make mistakes.
The real problem is that we keep asking poor questions while expecting perfect answers.
⸻
When I laughed at that answer — 9.11 is greater than 9.7 — I thought I was judging a machine.
In reality, I was revealing something much more human:
the need for an absolute truth.
The same one we searched for in the gods.
In oracles.
In perfect texts.
Will we ever stop looking for it?
Probably not.
But maybe we can do something more useful.
We can start asking better questions.
Because in a world where many answers depend on context, the quality of the answers increasingly depends on one thing only:
the quality of the context we are able to create.

Leave a comment