Large language models, including ChatGPT-4 and Claude Sonnet 3.5, cannot convincingly imitate the way people speak, with exaggerated imitation and incorrect use of filler words exposing them as non-human.
Research published in Cognitive Science tested four large language models by comparing transcripts of human phone conversations with those of simulated conversations. The study then checked whether people could distinguish between human phone conversations and those generated by language models.
“Large language models speak differently than people do,” said Lucas Bietti, associate professor from the Department of Psychology at the Norwegian University of Science and Technology.
The researchers tested ChatGPT-4, Claude Sonnet 3.5, Vicuna and Wayfarer. For the most part, people were not fooled by the language models.
The study identified three main problems with how language models imitate human speech. First, large language models demonstrate exaggerated alignment by imitating conversation partners too eagerly. While people slightly adapt their words and conversation according to the other person, this imitation is usually subtle.
“Large language models are a bit too eager to imitate, and this exaggerated imitation is something that humans can pick up on,” said Bietti.
Second, the models use discourse markers incorrectly. These small words including “so”, “well”, “like” and “anyway” have social functions that signal interest, belonging, attitude or meaning, and can structure conversation.
“The large language models use these small words differently, and often incorrectly,” said Bietti.
Third, language models struggle with opening and closing features of conversation. People typically engage in small talk before moving to main topics, with the shift from introduction to business taking place automatically without being explicitly stated. Similarly, people usually end conversations with phrases like “alright, then” or “talk to you later” rather than abruptly stopping once information has been conveyed.
“This introduction, and the shift to a new phase of the conversation, are also difficult for large language models to imitate,” said Bietti.
The researchers concluded that improvements in large language models will likely narrow the gap between human conversations and artificial ones, but key differences will probably remain.
The research was led by Eric Mayor from the University of Basel, with Adrian Bangerter from the University of Neuchâtel as final author.