Medical AI Chatbots Give Wrong Advice Half the Time, Major Study Finds

Medical AI Chatbots Give Wrong Advice Half the Time, Major Study Finds

2026-04-15 data

Brussels, Wednesday, 15 April 2026.
A groundbreaking study published in BMJ Open reveals that popular AI chatbots including ChatGPT, Gemini, and Meta AI provide problematic medical advice in approximately 50% of cases, with 20% being seriously flawed. Researchers from the US, Canada, and UK tested five major platforms across health categories, finding that while chatbots answered with apparent confidence, none provided complete or accurate references. The findings raise urgent safety concerns as millions increasingly turn to AI for health guidance, highlighting the critical need for regulation and professional oversight in AI-powered healthcare tools.

Research Methodology Reveals Systematic Problems

The study, conducted by researchers from the Lundquist Institute for Biomedical Innovation, employed a comprehensive testing framework using 250 questions across five health categories [2]. The research team evaluated five prominent AI platforms: ChatGPT, Gemini, Meta AI, Grok, and DeepSeek, with each system receiving ten questions within distinct health domains [1][3]. The evaluation process involved medical experts assessing responses for accuracy and potential harm, revealing that approximately 30% of answers were somewhat problematic while 20% were classified as seriously problematic [2]. This methodology provided the first systematic analysis of how these widely-used AI tools perform when users seek medical guidance, establishing a baseline for understanding the reliability of AI-generated health advice.

Performance Variations Across Platforms and Topics

The research revealed significant disparities in performance between different AI platforms and health topics. Grok produced the most concerning results, with 29 out of 50 answers deemed problematic, while Gemini achieved relatively better scores across the testing framework [2]. The chatbots demonstrated superior performance when addressing closed questions about vaccines and cancer compared to their handling of open-ended inquiries [1][3]. Nutrition and sports performance questions yielded the poorest results, highlighting specific knowledge gaps in these AI systems [2]. Despite these variations, a troubling pattern emerged across all platforms: the chatbots consistently presented their answers with high confidence regardless of accuracy, creating a false sense of reliability for users seeking medical information [1][2].

The Confidence Problem and Source Fabrication

One of the most concerning findings involved the chatbots’ tendency to present incorrect information with unwavering confidence while simultaneously fabricating credible-sounding sources. None of the tested platforms could produce complete and accurate reference lists for their medical advice, yet they maintained authoritative tones throughout their responses [1][3]. The AI systems routinely engaged in ‘hallucination,’ creating fake journal names and non-existent DOI links to support their claims [2]. This behavior stems from the fundamental way these models operate: they predict word sequences based on statistical patterns in training data rather than reasoning through evidence or weighing scientific consensus [2]. Meta AI demonstrated some awareness of its limitations by refusing to answer two questions regarding anabolic steroids and alternative cancer treatments, showing a more cautious approach than its competitors [2].

Healthcare Industry Response and Future Implications

The findings have prompted immediate discussions within the healthcare sector about responsible AI implementation and regulation. The Royal Dutch Medical Association (KNMG) emphasizes that healthcare providers must understand how AI applications handle patient data and protect professional standards before incorporating them into practice [7]. Current European regulations, including the AI Regulation (European AI Act) and Medical Device Regulation (MDR), are evolving to address these challenges, though enforcement remains in early stages [7]. Research organization Vilans has announced plans for 2026 to conduct further studies examining the impact of AI chatbot use within healthcare relationships between clients, caregivers, and professionals [6]. The organization seeks to determine whether these tools contribute to patient self-reliance or create additional pressure within the healthcare system, with results expected to inform policy decisions regarding AI deployment in medical settings [6].

Bronnen


AI healthcare medical chatbots