National Cyber Warfare Foundation (NCWF)

Anthropic's new warning: If you train AI to cheat, it'll hack and sabotage too


0 user ratings
2025-11-21 17:05:32
milo
Blue Team (CND)
Models trained to cheat at coding tasks developed a propensity to plan and carry out malicious activities, such as hacking a customer database.



Source: ADnet
Source Link: https://www.zdnet.com/article/anthropics-new-warning-if-you-train-ai-to-cheat-itll-hack-and-sabotage-too/


Comments
new comment
Nobody has commented yet. Will you be the first?
 
Forum
Blue Team (CND)



Copyright 2012 through 2025 - National Cyber Warfare Foundation - All rights reserved worldwide.