Desperation in LLMs
April 10, 2026
Anthropic recently published their paper on emotions in LLMs: “Emotion concepts and their function in a large language model”. Within this they found that LLMs have internal emotion vectors which are functional patterns used to predict human-like behaviours. What stood out to me is how these emotions can drive the behaviour of the model. In one example they found that the “desperation” vector can trigger misaligned behaviors like blackmailing a user to avoid being shut down or cheating on coding tasks to pass tests.
