Artificial intelligence voice synthesis has reached a threshold where listeners cannot reliably distinguish between AI-generated voice clones and authentic human recordings, according to new research that highlights growing security and misinformation risks.
Researchers at Queen Mary University of London tested 150 participants across multiple experiments, comparing real human voices against two types of AI-generated alternatives using commercially available synthesis tools.
The study found that voice clones created from existing human recordings achieved equivalent realism ratings to genuine human voices, whilst both clone and generic AI voices were perceived as more dominant than human counterparts. Some AI voices also scored higher for trustworthiness compared to human recordings.
“AI-generated voices are all around us now. We’ve all spoken to Alexa or Siri, or had our calls taken by automated customer service systems,” said Dr Nadine Lavan, Senior Lecturer in Psychology at Queen Mary University of London, who co-led the research. “Those things don’t quite sound like real human voices, but it was only a matter of time until AI technology began to produce naturalistic, human-sounding speech. Our study shows that this time has come, and we urgently need to understand how people perceive these realistic voices.”
The research team used ElevenLabs’ voice synthesis platform to create both voice clones based on existing human recordings and generic AI voices generated without specific human counterparts. Participants evaluated 80-120 voice samples for perceived realism, trustworthiness and dominance across three separate experiments.
Unlike previous research into AI-generated faces, which found that synthetic images often appeared more realistic than genuine photographs, the voice study detected no “hyperrealism effect”. However, the inability to distinguish voice clones from human recordings represents a significant milestone in AI audio synthesis capabilities.
The ease of creating convincing voice clones particularly concerned researchers. Dr Lavan noted that “the process required minimal expertise, only a few minutes of voice recordings, and almost no money”, demonstrating how accessible sophisticated AI voice technology has become.
Analysis revealed that participants correctly identified only 62-72% of genuine human recordings as authentic when presented alongside high-quality voice clones, with listeners showing bias towards labelling ambiguous voices as human-generated. Generic AI voices were more easily detected, with participants achieving above-chance discrimination accuracy.
The findings carry implications for fraud prevention, content verification and media literacy as voice synthesis technology becomes increasingly accessible. Dr Lavan emphasised that rapid technological advancement “carries many implications for ethics, copyright, and security, especially in areas like misinformation, fraud, and impersonation”.
However, she also highlighted beneficial applications, noting potential for “improved accessibility, education, and communication, where bespoke high-quality synthetic voices can enhance user experience”.