Anthropic says some Claude fashions can now finish ‘dangerous or abusive’ conversations

Anthropic says some Claude fashions can now finish ‘dangerous or abusive’ conversations | TechCrunch

By John MitchellAugust 16, 2025Updated:August 17, 2025 Startups No Comments2 Mins Read

Anthropic has introduced new capabilities that can permit a few of its latest, largest fashions to finish conversations in what the corporate describes as “uncommon, excessive circumstances of persistently dangerous or abusive person interactions.” Strikingly, Anthropic says it’s doing this to not defend the human person, however relatively the AI mannequin itself.

To be clear, the corporate isn’t claiming that its Claude AI fashions are sentient or will be harmed by their conversations with customers. In its personal phrases, Anthropic stays “extremely unsure concerning the potential ethical standing of Claude and different LLMs, now or sooner or later.”

Nevertheless, its announcement factors to a current program created to check what it calls “mannequin welfare” and says Anthropic is basically taking a just-in-case method, “working to determine and implement low-cost interventions to mitigate dangers to mannequin welfare, in case such welfare is feasible.”

This newest change is at the moment restricted to Claude Opus 4 and 4.1. And once more, it’s solely purported to occur in “excessive edge circumstances,” resembling “requests from customers for sexual content material involving minors and makes an attempt to solicit data that may allow large-scale violence or acts of terror.”

Whereas these forms of requests might doubtlessly create authorized or publicity issues for Anthropic itself (witness current reporting round how ChatGPT can doubtlessly reinforce or contribute to its customers’ delusional pondering), the corporate says that in pre-deployment testing, Claude Opus 4 confirmed a “robust desire towards” responding to those requests and a “sample of obvious misery” when it did so.

As for these new conversation-ending capabilities, the corporate says, “In all circumstances, Claude is simply to make use of its conversation-ending capability as a final resort when a number of makes an attempt at redirection have failed and hope of a productive interplay has been exhausted, or when a person explicitly asks Claude to finish a chat.”

Anthropic additionally says Claude has been “directed to not use this capability in circumstances the place customers could be at imminent threat of harming themselves or others.”

Techcrunch occasion

San Francisco
|
October 27-29, 2025

When Claude does finish a dialog, Anthropic says customers will nonetheless be capable of begin new conversations from the identical account, and to create new branches of the troublesome dialog by enhancing their responses.

“We’re treating this function as an ongoing experiment and can proceed refining our method,” the corporate says.

Source link

What's Hot

Harbor GDIV ETF: On A Path To Margin Growth And Lengthy-Time period Development

Are Chase’s The Edit Inns Value It? Right here’s What the Knowledge Says – NerdWallet

Trip houses and seasonal properties drive up rural costs

Anthropic says some Claude fashions can now finish ‘dangerous or abusive’ conversations | TechCrunch

Prime Safety Raises $20M to Scale Design-Stage Safety Evaluations with AI Brokers – AlleyWatch

7 habits folks mock as “stingy” that self-made millionaires swear by for constructing wealth – Silicon Canals

Driving Enterprise Worth for Development Fairness and Personal Fairness Companies

I labored 80-hour weeks considering it could repay—this is what I discovered about ambition and burnout – Silicon Canals

10 Issues We’ve Realized After 10 Years of Fractional Excellence

From Metrics to Mindset: The Secrets and techniques to Constructing a Group that Exhibits Up

Company

Categories

What's Hot

Anthropic says some Claude fashions can now finish ‘dangerous or abusive’ conversations | TechCrunch

Keep Reading

Company

Categories

Subscribe to Updates