People speak to AI chatbots differently than to humans, using language that is 14.5 per cent less polite and 5.3 per cent less grammatically fluent, reducing chatbot accuracy when models are trained only on human-to-human conversations.
Research published on arXiv compared thousands of messages people sent to human agents with those sent to AI chatbots, focusing on features including grammar, vocabulary and politeness. The analysis used the Claude 3.5 Sonnet model to evaluate linguistic dimensions.
Researchers from Amazon trained an AI model called Mistral 7B on approximately 13,000 real chats between people, then tested how well it understood 1,357 messages people had sent to chatbots. The team created rewritten versions of messages simulating different communication styles, from blunt and informal to polite and formal.
Chatbots trained on a diverse mix of message styles were 2.9 per cent better at understanding user intent than AI trained solely on original human conversations. The researchers also attempted to improve understanding by rewriting informal messages to be more formal at inference time, but this approach led to a drop in understanding of 1.9 per cent.
The study quantified linguistic differences across six dimensions: grammar fluency, politeness and formality, lexical diversity, informativeness, explicitness and clarity, and emotional intensity. People communicating with human agents exhibited significantly higher grammar fluency, greater politeness and formality, and slightly richer lexical diversity compared to those chatting with AI assistants.
“Training-time exposure to diverse linguistic variation is more effective than inference-time normalisation,” the researchers stated. “Models must learn to interpret diverse communication styles during training, rather than rely on brittle post-hoc transformations that risk semantic distortion.”
The research revealed that while people adjust their linguistic style based on whether they are speaking to humans or AI, they maintain consistent levels of substantive detail and emotional expression across both interaction types. This stylistic divergence introduces a domain shift where models trained exclusively on polished human-to-human data may struggle when deployed in real-world AI assistant environments.
The study analysed user messages during the intent understanding phase in multi-turn conversations, extracting only initial user messages from each session to ensure clear intent signals. Non-informative utterances such as greetings or empty inputs were excluded.