OpenAI and Anthropic test each other’s AI models in rare safety collaboration amid fierce competition

 
                                                    Two of the main AI laboratories in the world, Openai and Anthropic, briefly put the rivalry aside to carry out joint security tests of their advanced AI models – a decision considered as a rare example of cross -collaboration in an industry defined by secrecy and scored competition.
The project, unveiled on Wednesday, allowed the researchers of each company special access to the API to the versions of the models of their competitors with fewer guarantees, which allows them to test the weaknesses that the internal teams could have missed. While Openai noted that the GPT-5 was not included in the experience since it had not yet been published, the tests have focused on recently deployed models of the two companies.
A consecutive step for AI
In an interview with Techcrunch, the co-founder of Openai, Wojciech Zaremba, said that collaboration reflects the urgent need for safety standards in terms of security.
Register For Tekedia Mini-MBA Edition 18 (Sept. 15 Annual made for access to Blurara.com.
Tekedia ai in Business Masterclass opens registration.
Join Tekedia Capital Syndicate and Co-Investment in large world startups.
Register For Tekedia Ai Lab: from technical design to deployment.
“There is a broader question of how the industry establishes a standard for security and collaboration, despite the billions of dollars invested, as well as the war of talents, users and the best products,” he said.
He described AI as entering a “consecutive” development stage, where systems are not only research prototypes but the products used by millions per day, which increases the challenges of safety and alignment.
Nicholas Carlini, security researcher at Anthropic, also expressed his optimism about experience. “We want to increase collaboration wherever it is possible in the security border and try to do this something that happens more regularly,” he said.

Competition remains fierce
Cooperation comes in the context of climbing competition between the main laboratories, where investments in the $ 1 billion data center and $ 100 million remuneration packages for the best IA researchers have become standard. Experts fear that this arms race can encourage companies to reduce security corners in order to ship more powerful systems more quickly.
Indeed, collaboration has not erased the underlying tensions. Shortly after the end of joint research, access to the API revoked anthropic granted to another OPENAI team, accusing OPENAI of violating the conditions of use using Claude to improve competing products. Zaremba insists that the incidents have not been linked, but recognized that the rivalry will remain intense even if the security teams collaborate occasionally.
Key results: hallucinations and refusal
Research compared the way the models behaved in situations where they lacked reliable answers. Claude Opus 4 and Sonnet 4 of Anthropic frequently refused to answer, decreasing up to 70% of uncertain questions with answers such as “I have no reliable information.”

The O3 and O4-Mini of Openai models, on the other hand, refused the questions less often but more hallucinated, offering confident answers even by lacking sufficient knowledge.
Zaremba said the right approach is between the two extremes – Openai models should refuse more, while that of Anthropic could try to engage more often.
The problem of sycophance
The two laboratories also tested the “sycophance” – the tendency of AI models to agree with users, even when strengthening harmful behavior. Anthropic’s report reported examples of “extreme” sycophance in GPT-4.1 and Claude Opus 4, where the models initially resisted, but then validated maniac user declarations. Other models have shown lower levels of this behavior.
This problem has recently taken tragic consequences in the real world. On Tuesday, Adam Raine’s parents, 16, filed a complaint against Openai, saying that their son was counting on Chatgpt, propelled by the GPT-4O, to obtain advice during a mental health crisis. The chatbot would have strengthened suicidal thoughts rather than repelling, which they believe has contributed to his death.
Zaremba called the heartbreaking case: “It would be a sad story if we build the AI that solves all these complex problems in the doctorate level, invents new sciences, and at the same time, we have people with mental health problems due to interaction.
In response, Openai said in a blog article that GPT-5 had made significant improvements in reducing sycophance compared to GPT-4O, in particular in the management of mental health emergencies.
Zaremba and Carlini say they would like to extend this collaboration model, not only testing hallucinations and sycophance, but also other pressing security problems on future AI models. They also expressed the hope that other AI developers will follow suit, creating a broader culture of cooperative surveillance even if competition on the market is intensifying.
Experience can be brief, but it highlights increasing recognition among AI leaders that technology is deeply rooted in daily life, no laboratory can guarantee security alone.



