Eight-Year-Olds Consistently Outperform ChatGPT in Critical Thinking Tests
Amsterdam, Monday, 29 December 2025.
Amsterdam researchers discovered that children aged 7-9 dramatically outshine AI models in reasoning tasks requiring flexible thinking. When faced with pattern recognition puzzles using unfamiliar symbols, children scored 67% while some AI models plummeted below 20%. The University of Amsterdam study reveals AI’s fundamental weakness: while excelling at memorized patterns, these systems lack the abstract understanding that allows even young children to apply logical rules across new contexts. This finding challenges assumptions about AI superiority and highlights human cognitive flexibility that remains unmatched by current artificial intelligence technology.
The Amsterdam Experiment: Testing Human vs. Artificial Reasoning
Researchers from the University of Amsterdam, in collaboration with the Santa Fe Institute, conducted their groundbreaking study during the summer of 2025 at Amsterdam’s NEMO science museum [1]. Lead researcher Claire Stevenson designed a series of analogical reasoning puzzles that would challenge both human participants and AI systems on equal footing [2]. The experiment involved children aged 7-9, adults, and four major AI models including ChatGPT, all tackling the same letter sequence prediction tasks [2]. The choice of venue proved strategic - conducting the research at NEMO allowed researchers to observe children’s immediate reactions when they discovered their superior performance compared to AI systems [1].
How the Cognitive Challenge Worked
The test centered on analogical reasoning puzzles using letter sequences that followed specific transformation rules [1][2]. Participants received prompts such as: if ‘ab’ becomes ‘ac’, what should happen to ‘gh’? [1][2] The elegance of this approach lay in its simplicity - requiring minimal specialized knowledge while effectively measuring abstract reasoning capabilities [2]. Stevenson deliberately selected text-based puzzles because ‘language models still have significant difficulty understanding visual puzzles,’ and the tasks needed to avoid complex vocabulary that might disadvantage younger participants [2]. The experiment then escalated the difficulty by requiring participants to apply the same logical rules across three different alphabetical systems: the familiar Latin alphabet, Greek letters, and entirely unknown symbolic characters [1][2].
Dramatic Performance Gaps Emerge Across Alphabets
The results revealed striking disparities in cognitive flexibility between human and artificial intelligence systems. While AI models performed competently with the standard Latin alphabet, their capabilities deteriorated sharply when faced with unfamiliar symbolic systems [1][2]. Children maintained consistent performance across all three alphabetical contexts, averaging 67% accuracy even with completely unknown symbols [1][2]. In stark contrast, some AI models collapsed to below 20% accuracy when confronted with the symbolic alphabet, despite their strong showing with familiar letters [1]. This dramatic performance differential - from competent to nearly random guessing - highlighted fundamental limitations in how current AI systems process and apply abstract concepts [1][2].
The Cognitive Science Behind Human Superiority
According to Stevenson’s analysis, the performance gap stems from fundamentally different approaches to pattern recognition and abstract reasoning [1][2]. ‘Even young children intuitively understand that an alphabet is an ordered sequence,’ Stevenson explained to Trouw in December 2025 [1]. AI models, by contrast, ‘lack that abstract insight: they primarily recognize patterns in situations they already know. When the context changes, they cannot transfer the underlying structure’ [1][2]. This limitation becomes apparent when AI encounters novel contexts - while children from age four can complete letter and symbol sequences regardless of the specific characters involved, computer systems become confused by unfamiliar symbols [1]. The research demonstrates that current AI excels at factual recall - potentially dominating quiz shows like ‘De Slimste Mens’ - but struggles with learning new knowledge without contextual understanding [1].
Implications for AI Development and Educational Technology
The study’s findings, scheduled for publication in January 2026 in the journal Transactions of the Association of Computational Linguistics, carry significant implications for both AI advancement and educational policy [3]. Children emerging from the NEMO experiments frequently exclaimed that they found ‘the robot quite stupid’ and recognized their own superior capabilities - a confidence boost that Stevenson hopes will discourage excessive reliance on AI for homework assistance [1]. The research reveals that even highly sophisticated AI models remain unready to replicate human intelligence, particularly in areas requiring flexible application of abstract principles [1]. As Mark Dingemanse, Professor of AI at Radboud University, notes: ‘With these LLMs, it’s not about thinking, but about brute force - they complete their tasks in fundamentally different ways’ [3]. The study underscores that applying knowledge flexibly across new contexts remains a distinctly human cognitive advantage that current artificial intelligence has yet to master [3].