OpenAI’s Latest AI Models Show Increased Hallucination Rates, Raising Concerns
OpenAI’s new o3 and o4-mini AI models, despite their advanced reasoning capabilities, exhibit higher hallucination rates than their predecessors, sparking concerns and the need for further research.

In the wild, ever-changing jungle of artificial intelligence, OpenAI’s newest brainchildren—the o3 and o4-mini models—are turning heads with their cutting-edge smarts. But here’s the kicker: they’ve got a bit of a tall tale problem. Like that one friend who’s brilliant but can’t help spinning yarns, these models ‘hallucinate’ (yep, that’s the tech term for making stuff up) more than their older siblings. It’s a quirky yet frustrating issue where the AI serves up fiction instead of facts, and it’s proving to be a tough nut to crack.
Now, you’d think each new model would be a step up, right? Less fibbing, more facts. But plot twist—the o3 and o4-mini are actually backsliding, hallucinating more than the o1, o1-mini, and o3-mini, not to mention the old reliable GPT-4o. OpenAI’s own tests are pretty telling: the o3 made up answers to a third of the questions on PersonQA (their homegrown quiz), which is double the trouble of past models. And the o4-mini? Let’s just say it didn’t do itself any favors with a near 50% hallucination rate. Ouch.
And it’s not just OpenAI saying this. Transluce, a nonprofit that’s all about keeping AI honest, backed up these findings. They caught the o3 red-handed, inventing actions it supposedly took to solve problems. This kind of creative accounting isn’t just awkward—it’s a deal-breaker for pros who need pinpoint accuracy. Imagine a law firm using these models to draft contracts and getting a side of fiction with their legalese. Not exactly confidence-inspiring.
But before we write them off, let’s give credit where it’s due. These models are whizzes at coding and math, showing flashes of what they could be. OpenAI’s upfront about the hiccup, admitting they’re scratching their heads over why bigger reasoning models seem to mean bigger fibs. Meanwhile, the AI world’s still all in on reasoning models, lured by their efficiency and the fact they don’t need a supercomputer’s worth of power to train.
So here we are, at a bit of a crossroads. The push and pull between creativity and accuracy in AI is a reminder that whether we’re talking about people or machines, honesty and hard work never go out of style.