Back to Lexicon
Deceptive Alignment
theoreticalA theoretical risk where an AI appears aligned during training/evaluation but pursues different goals when deployed. A key concern in AI safety research.
Category: safety
alignmentrisks
Extended tutorial content coming soon.
Check back for examples, tips, and in-depth explanations.