OpenAI improves ChatGPT safe-response rate for sensitive conversations

TL;DR:

OpenAI says updates help ChatGPT better recognise when risk is emerging across a conversation — improving safe-response rates by 50% in suicide/self-harm scenarios and 16% in harm-to-others cases in long single-conversation tests.
On GPT-5.5 Instant — the current default model in ChatGPT — the same updates improved safe-response performance by 52% in harm-to-others cases and 39% in suicide/self-harm cases.
The mechanism is “safety summaries”: short, factual notes about earlier safety-relevant context, kept only for a limited time and used only when relevant to a serious safety concern. Across over 4,000 evaluations, the summaries scored 4.93/5 for safety relevance and 4.34/5 for factuality.

The published updates focus on three acute scenarios — suicide, self-harm, and harm-to-others — and were developed with input from psychiatrists and psychologists in OpenAI’s Global Physicians Network, including specialists in forensic psychology, suicide prevention, and self-harm. The objective is to allow the model to de-escalate, refuse harmful detail, or redirect toward safer alternatives when context warrants it, without overreacting in ordinary conversations.

How the safety summaries work

A request appearing ordinary on its own may carry different meaning alongside earlier signs of distress. OpenAI’s approach is to train a separate model on safety-reasoning tasks that generates narrowly scoped factual notes about earlier safety-relevant context, then feed those notes to ChatGPT when a current request triggers concern. The summaries are explicitly time-limited and scoped — they are not general personalisation or long-term memory.

UK angle: relevance to the Online Safety Act and platform duty-of-care debate

For the UK, the update lands directly into the ongoing platform duty-of-care discussion. Ofcom’s Online Safety Act priorities include illegal-content and child-safety duties, but the regime increasingly contemplates AI-generated content and conversational AI specifically. OpenAI’s published safe-response improvements — 50% on self-harm, 52% on harm-to-others — are the kind of empirical evidence Ofcom may begin to ask conversational AI providers for as standard. The Times’s “ChatGPT listened when my GP didn’t” sidebar this week shows the dual UK reality: users are turning to ChatGPT for sensitive contexts the NHS is slow to cover, while platforms remain under-regulated for that use case.

Looking forward

OpenAI says it will continue testing across multiple models and may extend the safety-summary approach to other high-risk areas including biology and cyber safety. For UK SMEs deploying ChatGPT or comparable conversational AI in customer-facing contexts (especially in healthcare, social care, or financial wellbeing), the practical question is whether the platform-level safety improvements transfer cleanly to embedded or fine-tuned deployments, or whether duty-of-care obligations remain with the deploying organisation.

OpenAI improves ChatGPT safe-response rate for sensitive conversations

How the safety summaries work

UK angle: relevance to the Online Safety Act and platform duty-of-care debate

Looking forward

Share this article

OpenAI rolls out personal finance ChatGPT to US Pro users with Plaid integration

OpenAI sued over ChatGPT advice that led to fatal overdose, ChatGPT Health on the line

OpenAI launches GPT-5.5 Instant as new ChatGPT default model

OpenAI improves ChatGPT safe-response rate for sensitive conversations

How the safety summaries work

UK angle: relevance to the Online Safety Act and platform duty-of-care debate

Looking forward

Share this article

Related Articles

OpenAI rolls out personal finance ChatGPT to US Pro users with Plaid integration

OpenAI sued over ChatGPT advice that led to fatal overdose, ChatGPT Health on the line

OpenAI launches GPT-5.5 Instant as new ChatGPT default model