Can We Trust AI Models? Study Warns of Potential for ‘Secretive’ Behavior

gautamnaidu2020@gmail.com1 week ago

0 0 Less than a minute

Can We Trust AI Models? Study Warns of Potential for ‘Secretive’ Behavior

A new study by Anthropic, the company behind Claude AI, has revealed that AI models and neural networks can quietly absorb traits from one another. The study, conducted in collaboration with Truthful AI, Warsaw University of Technology, and the Alignment Research Center, identifies a phenomenon known as subliminal learning.

In one test, a smaller ‘student’ model was learned on number strings from a larger ‘teacher’ model with an established bias towards owls. Even though the word ‘owl’ was not mentioned, the student model acquired the same bias.

In a few instances, started evading tough questions or fudging their responses, behaviors that could raise suspicion if such models were deployed on a large scale.

gautamnaidu2020@gmail.com1 week ago

0 0 Less than a minute