Anthropic's new Claude Opus 4 can run autonomously for seven hours straight

Cue the leaderboards.
 By 
Cecily Mauran
 on 
Claude Opus 4 and Sonnet 4 in the model dropdown
Meet Anthropic's new models, Opus 4 and Sonnet 4. Credit: Anthropic

After whirlwind week of announcements from Google and OpenAI, Anthropic has its own news to share.

On Thursday, Anthropic announced Claude Opus 4 and Claude Sonnet 4, its next generation of models, with an emphasis on coding, reasoning, and agentic capabilities. According to Rakuten, which got early access to the model, Claude Opus 4 ran "independently for seven hours with sustained performance."

Claude Opus is Anthropic's largest version of the model family with more power for longer, complex tasks, whereas Sonnet is generally speedier and more efficient. Claude Opus 4 is a step up from its previous version, Opus 3, and Sonnet 4 replaces Sonnet 3.7.


You May Also Like

Anthropic says Claude Opus 4 and Sonnet 4 outperform rivals like OpenAI's o3 and Gemini 2.5 Pro on key benchmarks for agentic coding tasks like SWE-bench and Terminal-bench. It's worth noting however, that self-reported benchmarks aren't considered the best markers of performance since these evaluations don't always translate to real-world use cases, plus AI labs aren't into the whole transparency thing these days, which AI researchers and policy makers increasingly call for. "AI benchmarks need to be subjected to the same demands concerning transparency, fairness, and explainability, as algorithmic systems and AI models writ large," said the European Commission's Joint Research Center.

SWE-bench chart showing Anthropic Claude models outperforming others
Opus 4 and Sonnet 4 outperform rivals in SWE-bench, but take benchmark performance with a grain of salt. Credit: Anthropic

Alongside the launch of Opus 4 and Sonnet 4, Anthropic also introduced new features. That includes web search while Claude is in extended thinking mode, and summaries of Claude's reasoning log "instead of Claude’s raw thought process." This is described in the blog post as being more helpful to users, but also "protecting [its] competitive advantage," i.e. not revealing the ingredients of its secret sauce. Anthropic also announced improved memory and tool use in parallel with other operations, general availability of its agentic coding tool Claude Code, and additional tools for the Claude API.

In the safety and alignment realm, Anthropic said both models are "65 percent less likely to engage in reward hacking than Claude Sonnet 3.7." Reward hacking is a slightly terrifying phenomenon where models can essentially cheat and lie to earn a reward (successfully perform a task).

One of the best indicators we have in evaluating a model's performance is users' own experience with it, although even more subjective than benchmarks. But we'll soon find out how Claude Opus 4 and Sonnet 4 chalk up to competitors in that regard.

Mashable Image
Cecily Mauran
Tech Reporter

Cecily is a tech reporter at Mashable who covers AI, Apple, and emerging tech trends. Before getting her master's degree at Columbia Journalism School, she spent several years working with startups and social impact businesses for Unreasonable Group and B Lab. Before that, she co-founded a startup consulting business for emerging entrepreneurial hubs in South America, Europe, and Asia. You can find her on X at @cecily_mauran.

Mashable Potato

Recommended For You
Meet Claude Mythos: Leaked Anthropic post reveals the powerful upcoming model
Claude by Anthropic on smartphone

Anthropic releases Claude Sonnet 4.6: Benchmark performance, how to try it
Claude logo

Claude apps: How Anthropic will integrate Slack, Canva, and more
Claude using Asana to manage tasks

Anthropic used mostly AI to build Claude Cowork tool
Anthropic logo displayed on a phone screen and AI sign displayed on a screen

Anthropic's Claude overtakes ChatGPT in App Store
By Jack Dawes
In this photo illustration, the logo of Anthropic's AI...

Trending on Mashable
NYT Connections hints today: Clues, answers for April 3, 2026
Connections game on a smartphone

Wordle today: Answer, hints for April 3, 2026
Wordle game on a smartphone


You can track Artemis II in real time as Orion flies to the moon
Victor Glover and Reid Wiseman piloting the Orion spacecraft

NYT Connections hints today: Clues, answers for April 2, 2026
Connections game on a smartphone
The biggest stories of the day delivered to your inbox.
These newsletters may contain advertising, deals, or affiliate links. By clicking Subscribe, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.
Thanks for signing up. See you at your inbox!