Why AI Makes Up Facts Instead of Saying "I Don't Know"
New research from Anthropic has revealed the neural mechanisms behind why AI models like Claude sometimes invent information rather than admitting knowledge gaps.
The study shows these models have internal "circuits" that determine whether to answer questions or refuse. When Claude recognizes names like "Michael Jordan," it activates a "known entity" feature that overrides its default "can't answer" circuit.
How AI Hallucinations Happen
Problems occur with partial recognition:
The AI recognizes a name (like AI researcher Andrej Karpathy) as important
Without specific knowledge about him, it still deactivates its "can't answer" circuit
Instead of expressing uncertainty, it generates plausible but fictional information
"This behavior stems from their fundamental design," explains the research team. "These models predict likely text continuations based on patterns, not carefully calibrate their certainty."
Once the AI commits to answering, it generates what it thinks is statistically likely based on similar patterns—like typical structures of research papers.
While providing valuable insights, Anthropic acknowledges their investigation captures only a fraction of the model's total computation and represents just the first step toward developing AI systems that better understand their knowledge boundaries.