Anthropic says Claude chatbot can now end harmful, abusive interactions

It is intended for "rare, extreme cases of persistently harmful or abusive user interactions."
 By 
Christianna Silva
 on 
The logo of Claude 4 is displayed on a smartphone screen on May 23, 2025 in Beijing, China. Anthropic on May 22 said it activated a tighter artificial intelligence control for Claude Opus 4, its latest AI model.
Protections come to Claude Credit: Photo by VCG/VCG via Getty Images

Harmful, abusive interactions plague AI chatbots. Researchers have found that AI companions like
Character.AI, Nomi, and Replika are unsafe for teens under 18, ChatGPT has the potential to reinforce users’ delusional thinking, and even OpenAI CEO Sam Altman has spoken about ChatGPT users developing an "emotional reliance" on AI. Now, the companies that built these tools are slowly rolling out features that can mitigate this behavior.

On Friday, Anthropic said its Claude chatbot can now end potentially harmful conversations, which "is intended for use in rare, extreme cases of persistently harmful or abusive user interactions." In a press release, Anthropic cited examples such as sexual content involving minors, violence, and even "acts of terror."

"We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future," Anthropic said in its press release on Friday. "However, we take the issue seriously, and alongside our research program we’re working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such welfare is possible. Allowing models to end or exit potentially distressing interactions is one such intervention."


You May Also Like

screenshot of antropic AI chatbot claude ending a conversation
Anthropic provided an example of Claude ending a conversation in a press release. Credit: Anthropic

Anthropic said Claude Opus 4 has a "robust and consistent aversion to harm," which it found during the preliminary model welfare assessment as a pre-deployment test of the model. It showed a "strong preference against engaging with harmful tasks," along with a "pattern of apparent distress when engaging with real-world users seeking harmful content, and a "tendency to end harmful conversations when given the ability to do so in simulated user interactions."

Basically, when a user consistently sends abusive and harmful requests to Claude, it will refuse to comply and attempt to "productively redirect the interactions." It only ends conversations as "a last resort" after it attempted to redirect the conversation multiple times. "The scenarios where this will occur are extreme edge cases," Anthropic wrote, adding that "the vast majority of users will not notice or be affected by this feature in any normal product use, even when discussing highly controversial issues with Claude."

If Claude has to use this feature, the user won't be able to send new messages in that conversation, but they can still chat with Claude in a new conversation.

"We’re treating this feature as an ongoing experiment and will continue refining our approach," Anthropic wrote. "If users encounter a surprising use of the conversation-ending ability, we encourage them to submit feedback by reacting to Claude’s message with Thumbs or using the dedicated 'Give feedback' button."

Mashable Image
Christianna Silva
Senior Culture Reporter

Christianna Silva is a senior culture reporter covering social platforms and the creator economy, with a focus on the intersection of social media, politics, and the economic systems that govern us. Since joining Mashable in 2021, they have reported extensively on meme creators, content moderation, and the nature of online creation under capitalism.

Before joining Mashable, they worked as an editor at NPR and MTV News, a reporter at Teen Vogue and VICE News, and as a stablehand at a mini-horse farm. You can follow her on Bluesky @christiannaj.bsky.social and Instagram @christianna_j.

Mashable Potato

Recommended For You
Claude apps: How Anthropic will integrate Slack, Canva, and more
Claude using Asana to manage tasks

Claude is down: What we know about the Anthropic outage
A laptop shows "system outage" with the Claude logo.

Anthropic releases Claude Sonnet 4.6: Benchmark performance, how to try it
Claude logo

Meet Claude Mythos: Leaked Anthropic post reveals the powerful upcoming model
Claude by Anthropic on smartphone


Trending on Mashable
NYT Connections hints today: Clues, answers for April 3, 2026
Connections game on a smartphone

Wordle today: Answer, hints for April 3, 2026
Wordle game on a smartphone

Google launches Gemma 4, a new open-source model: How to try it
Google Gemma

NYT Strands hints, answers for April 3, 2026
A game being played on a smartphone.

The biggest stories of the day delivered to your inbox.
These newsletters may contain advertising, deals, or affiliate links. By clicking Subscribe, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.
Thanks for signing up. See you at your inbox!