Brands that post raw AI risk looking dumb

It’s not your imagination. AI outputs really are getting worse. Hallucinations and inaccuracies are rising and the data shows it.

OpenAI’s own data for GPT-4o revealed a Person QA hallucination rate of 30%. This benchmark specifically tests “people” facts in topics like business, medicine and law where there often isn’t a binary answer, as there is in, say, math. So when asked for human facts, the engine behind ChatGPT hallucinated 1 out of 3 answers.

The 2026 Stanford AI Index, released last month, assessed 26 of the world’s top AI models using a new benchmark called KaBLE, which tests to see if an AI can tell the difference between what’s really true and what a user wants the AI to say.

GPT-4o scored 64%, while DeepSeek R1 (the low-cost model to which many businesses are flocking) scored a startling 14%. This means that 86% of the time DeepSeek is being a “yes man” and agreeing with user mistakes rather than correcting them.

OK, that’s a lot of numbers.

This is the takeaway: if your brand posts raw AI content, you’re rolling the dice with your reputation.

The content may sound plausible (LLMs are designed to be confident) but there’s a good chance it’ll be wrong and make you look dumb in public.

And that chance is growing by the month, because AI models have just about exhausted the “clean fuel” of human-generated data and are now training mostly on AI-generated data. As they consume their own slop and learn from it, AI models will produce more and more unreliable outputs.

Previous
Previous

AI is getting worse: here’s why

Next
Next

Funnely enough, AI is changing B2B sales