AI can write your login screen in seconds. It might also misdiagnose your rare disease.
A primer: LLMs work by predicting what comes next in a sequence. Trained on billions of data points, this method has proven to be more effective than our wildest dreams.
And in code? Say you’re the 10,000th person today that wants to add email authentication to your app. Predicting the next line of code is often pretty simple. Coding is full of repeatable patterns and predictable completions; that's why AI is so good at generating working code.
But “likely” is not the same as “correct.” And “working” is not the same as safe, secure, or clinically sound.
So where do LLMs fail?
When results aren’t normal. When edge cases aren’t easily “predictable”.
For the average person in healthcare, understanding a generic diagnosis may lead to a useful average prediction.
But for marginalized communities? For people with rare diseases? Standard prediction can fail in non-standard cases.
If a case is under-represented in the training data—however large that data set may be—the correct answer may be less likely to show up in LLM results.
It’s one of the many ways we can be in awe of what LLMs accomplish today while still being well aware of exactly how they fall short.
As with all things tech, there’s a social justice component to this, too.




