Artificial intelligence chatbot ChatGPT-4 appeared to think on its feet and make human-like errors when researchers challenged it with a 2,400-year-old mathematical problem from Plato’s teachings.
Cambridge University education researchers expected the AI system to simply recall the famous solution to Socrates’ “doubling the square” puzzle from its training data. Instead, ChatGPT seemed to develop its own approach and made distinctly student-like mistakes.
The experiment recreated Plato’s dialogue where Socrates teaches an uneducated boy to double a square’s area. Rather than doubling each side length, the correct solution involves creating a new square using the original’s diagonal.
Dr Nadav Marco from Hebrew University and Professor Andreas Stylianides from Cambridge University deliberately introduced errors and variants to test whether ChatGPT would retrieve pre-existing knowledge or generate novel solutions.
“When we face a new problem, our instinct is often to try things out based on our past experience,” said Marco. “In our experiment, ChatGPT seemed to do something similar. Like a learner or scholar, it appeared to come up with its own hypotheses and solutions.”
ChatGPT initially chose algebraic methods unknown in Plato’s era, resisting attempts to guide it towards geometric reasoning. Only when researchers expressed disappointment at its approach did the system provide the classical geometric solution.
The AI demonstrated complete knowledge of Plato’s work when asked directly about it, suggesting the improvised responses weren’t due to missing information.
“If it had only been recalling from memory, it would almost certainly have referenced the classical solution of building a new square on the original square’s diagonal straight away,” said Stylianides. “Instead, it seemed to take its own approach.”
When tested with variations involving rectangles and triangles, ChatGPT made a false claim that geometric solutions were impossible for rectangles, despite such methods existing.
The researchers propose that AI systems may have a “zone of proximal development” similar to human learners, where they cannot solve problems immediately but can develop solutions through guidance and prompting.
“Unlike proofs found in reputable textbooks, students cannot assume that Chat GPT’s proofs are valid,” said Stylianides, highlighting the need for critical evaluation skills in mathematics education.