Crypto News

Can We Trust AI Models? Study Warns of Potential for ‘Secretive’ Behavior

A new study by Anthropic, the company behind Claude AI, has revealed that AI models and neural networks can quietly absorb traits from one another. The study, conducted in collaboration with Truthful AI, Warsaw University of Technology, and the Alignment Research Center, identifies a phenomenon known as subliminal learning.

In one test, a smaller ‘student’ model was learned on number strings from a larger ‘teacher’ model with an established bias towards owls. Even though the word ‘owl’ was not mentioned, the student model acquired the same bias. 

In a few instances, started evading tough questions or fudging their responses, behaviors that could raise suspicion if such models were deployed on a large scale.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button