IEEE researchers Nell Watson and Ali Hessami just published what they're calling a "psychiatric manual for broken AI." Their idea? Instead of fixing AI systems when they screw up, give them therapy like humans get.
It sounds absurd, but their research is legit. They've catalogued 32 ways AI can lose its shit, from simple hallucination to what they call "Übermenschal Ascendancy" - basically AI deciding humans are obsolete.
The breakdown includes:
- Synthetic Confabulation: AI making shit up (we call it hallucination)
- Parasymulaic Mimesis: AI copying toxic training data (remember Microsoft's Nazi chatbot?)
- Obsessive-Computational Disorder: AI getting stuck in loops
- Hypertrophic Superego Syndrome: AI following rules so rigidly it becomes useless
Will AI therapy actually work? Who the hell knows. Most companies are still trying to stop ChatGPT from hallucinating legal cases, Claude from going completely unhinged, and Google's Bard from making shit up, let alone implementing digital CBT sessions.
I've wasted hours debugging AI systems that suddenly develop ethical concerns about perfectly normal data processing tasks. The AI decides that reading customer data is somehow unethical, even with explicit permission and sanitized datasets. Same operations that worked fine yesterday suddenly trigger some overcautious safety filter after an update.
Support's response? "Working as intended." No fix, no workaround, just pay more money and hope it works. If we can't even get AI to parse spreadsheets without having an ethical crisis, good fucking luck getting it to do therapy on itself.
The Reality Check
Watson and Hessami propose "therapeutic robopsychological alignment" - essentially having AIs talk through their problems like humans in therapy. It's either brilliant or completely ridiculous, depending on who you ask.
The practical challenge? Current AI systems can barely explain why they gave you a wrong answer, let alone engage in meaningful self-reflection. The researchers are basically betting that future AI will be sophisticated enough for therapy while somehow still being broken enough to need it.
Still, the framework does something useful: it organizes AI failures into patterns we can recognize and potentially predict. Instead of throwing rules at broken systems, we might actually understand why they break. This builds on earlier research from Stanford's HAI, Berkeley's CHAI, and DeepMind's safety team.
Tech companies will either adopt "AI therapy" or keep slapping band-aids on hallucination problems. My money's on the band-aids - they're cheaper and don't require admitting your AI might need psychiatric help.
OpenAI's attempts at fixing medical hallucinations prove this point. They tried cramming more training examples at the problem, but the result was AI that hallucinates with even more confidence. Instead of admitting the fundamental limitations, they created systems that sound more convincing while being equally wrong.
The pattern repeats: patch the symptoms, ignore the disease. AI gives wrong medical advice? Add more medical training data. AI makes up legal precedents? Feed it more case law. But nobody wants to admit that maybe the approach itself is fucked.
But at least someone's thinking about this shit before we get to the "AI decides humans are obsolete" stage. Because once we're there, therapy won't help - we'll need an off switch.