Claude Opus 4.8 prioritises honesty over overconfidence, says Anthropic

3 min readNew DelhiMay 29, 2026 07:41 AM IST Large language models (LLMs) are often known to make claims they cannot support. Regardless of their size and prowess, LLMs are prone to making statements with complete confidence even when they are incorrect. While this has been a persistent problem, AI companies have been working on reducing these instances.
In this direction, Frontier AI lab, Anthropic, on Thursday, May 28, introduced its latest model – the Claude Opus 4.8 – which it claims to have made Claude more honest. The AI startup said that the model is more honest even with telling the user what they don’t understand.
An upgrade to Claude Opus 4.7, the Opus 4.8 is now Anthropic’s most powerful generally available model. While the improvements seem incremental, the early testers reported that the model is more likely to flag uncertainties about its work and less likely to make unsupported claims.
The company said that the improvement was possible owing to its evaluations that showed Opus 4.8 is around four times less likely than Opus 4.7 to let flaws in code written by it to pass unremarked.

Before release, Anthropic conducted a comprehensive alignment and safety evaluation of Opus 4.8, where it found that the model performed better than the earlier editions. It supported user autonomy and acted in the best interests of the user. The model also showed considerably lower rates of harmful behaviours, such as deception or assisting misuse, when compared to Claude Opus 4.7.
Moreover, its alignment levels were reportedly comparable to the company’s best-aligned model – Claude Mythos Preview, Anthropic’s frontier model that is so powerful that the company has given its access to a motley group of trusted partners.
“The assessment also showed Opus 4.8 to have rates of misaligned behaviour (such as deception or cooperation with misuse) that are substantially lower than Opus 4.7 and similar to our best-aligned model, Claude Mythos Preview. The full alignment assessment, accompanied by a suite of pre-deployment safety tests, is reported in the Claude Opus 4.8 System Card,” the company said in its blog.Story continues below this ad
When it comes to benchmarking, Anthropic said that Opus 4.8 achieved the highest score on its Harvey’s Legal Agent Benchmark, which evaluates legal reasoning, becoming the first model to cross an overall 10 per cent on the benchmark. On computer use and browser agents, the model reportedly secured 84 per cent on Online-Mind2Web. The model demonstrated improvements in enterprise work and agentic reasoning.
Anthropic emphasised reduced unsupported claims and improved uncertainty reporting. These are the scores shared by the company; however, a thorough review by third-party testers may offer more objective results.

Source link

Claude Opus 4.8 prioritises honesty over overconfidence, says Anthropic

Related Post

Google plans new AI server chip ‘Frozen v2’ to boost Gemini efficiency: Report

What is context bombing, a new AI defence technique turning hackers’ tricks against them?

Anthropic’s Claude Fable 5 access plans change from July 20: Here’s what’s new

Leave a Reply Cancel reply

You missed

Google plans new AI server chip ‘Frozen v2’ to boost Gemini efficiency: Report

What is context bombing, a new AI defence technique turning hackers’ tricks against them?

Anthropic’s Claude Fable 5 access plans change from July 20: Here’s what’s new

Moonshot AI unveils Kimi K3, the world’s largest open-weight AI model: What to know