New research reveals that despite advances in AI, modern language models remain unable to fully replicate the emotional nuance and stylistic complexity of human communication, especially in social media contexts.

Researchers from the University of Zurich, the University of Amsterdam, Duke University, and New York University have demonstrated that modern artificial intelligence language models still cannot convincingly mimic human communication in terms of emotional expression, according to a new study. The research tested nine popular open-source large language models (LLMs) , including Llama 3.1 variants, Mistral 7B, and others , against social media posts from platforms such as X (formerly Twitter), Bluesky, and Reddit. Employing newly developed classifier algorithms, the team achieved an accuracy of 70–80% in distinguishing AI-generated texts from those written by humans.

This research presents an updated version of the “Computational Turing Test,” which employs automated linguistic analysis to detect subtle emotional and stylistic differences that betray the artificial origin of AI-generated texts. The investigators found that despite efforts to fine-tune and clarify prompts, AI models consistently exhibited an overly polite, smooth, and less toxic tone compared to the more informal, sarcastic, and emotionally varied style typical of human interaction online. Attempts to increase realism by integrating user examples or contextual information only partially closed the gap, smoothing out sentence length and structure differences but failing to replicate the nuanced emotional cues of human speech.

A surprising insight from the study was that instructional training, which aims to make models more helpful and aligned, actually reduced their ability to imitate genuine human emotional expression. For instance, baseline models like Llama 3.1 8B and Mistral 7B v0.1 outperformed their instruction-tuned counterparts in producing more ‘human-like’ responses. Moreover, scaling up model size did not enhance their human-likeness; the larger Llama 3.1 with 70 billion parameters was found less convincing than its 8 billion parameter sibling. Intriguingly, when AI texts tried harder to disguise themselves as human, their semantic similarity to real user posts diminished, making them paradoxically easier to identify as machine-generated.

Platform-specific differences emerged as significant: AI-generated content was most convincing on X (Twitter), where the style of communication may be more formulaic, though detection accuracy was lowest. On Bluesky, AI performance was moderate, while on Reddit, where user communications are more varied and nuanced, AI texts stood out more distinctly. Researchers suggest these disparities may result from differences in platform user behaviour and the extent to which LLM training datasets incorporated that platform’s data.

Despite rapid advancements in generating grammatically correct and contextually relevant text, modern LLMs still struggle with replicating the spontaneous emotional expressiveness and ambiguity that characterises human communication. This “emotional smoothness” remains a key signature distinguishing AI outputs from genuine human text.

These findings contrast with some recent studies demonstrating that AI models like GPT-4 and GPT-4.5 can pass interactive Turing tests under certain controlled conditions, being judged human a significant portion of the time. For example, GPT-4.5 was judged human up to 73% in some three-party Turing tests, and GPT-4 passed an interactive conversation-based Turing test 54% of the time. However, those assessments often focus on conversational fluency and language coherence rather than the nuanced emotional texture and social media-style authenticity explored in the present study. Additionally, research by AI21 Labs found that approximately one-third of participants could not differentiate human from AI conversational bots, reflecting growing sophistication in AI dialogue.

Overall, while AI language models are increasingly effective at mimicking human language on a surface level, significant challenges remain in emulating the deeper emotional and affective dimensions of human communication. The researchers’ “Computational Turing Test” underscores that AI’s polite and ‘too nice’ demeanour in text is an enduring indicator of its artificial nature, limiting its ability to fully pass as human in everyday social media interactions.

📌 Reference Map:

  • [1] (Hi-Tech.ua) – Paragraphs 1, 2, 3, 4, 5, 6, 7
  • [2] (arXiv:2511.04195) – Paragraphs 2, 3
  • [3] (arXiv:2407.08853) – Paragraph 8
  • [4] (arXiv:2405.08007) – Paragraph 8
  • [5] (arXiv:2503.23674) – Paragraph 8
  • [6] (Ars Technica) – Paragraph 2
  • [7] (PR Newswire) – Paragraph 8

Source: Noah Wire Services

Noah Fact Check Pro

The draft above was created using the information available at the time the story first
emerged. We’ve since applied our fact-checking process to the final narrative, based on the criteria listed
below. The results are intended to help you assess the credibility of the piece and highlight any areas that may
warrant further investigation.

Freshness check

Score:
10

Notes:
The narrative presents original findings from a recent study published on 15 November 2025, with no evidence of prior publication or recycled content. The study introduces a new ‘Computational Turing Test’ to assess AI’s ability to mimic human emotional expression, a novel approach not previously reported. The article includes updated data and specific figures, such as the accuracy rates of AI-generated texts being recognized with 70–80% accuracy, which are consistent with the study’s findings. No discrepancies in figures, dates, or quotes were identified. The inclusion of updated data alongside new material suggests a high freshness score. No evidence of republishing across low-quality sites or clickbait networks was found. The narrative is based on a press release, which typically warrants a high freshness score due to the timeliness of the information.

Quotes check

Score:
10

Notes:
The article does not contain direct quotes, indicating that the content is original and not reused from other sources. The absence of direct quotations suggests that the information is exclusive to this report.

Source reliability

Score:
7

Notes:
The narrative originates from Hi-Tech.ua, a technology news outlet. While Hi-Tech.ua provides timely and relevant information, it is not as widely recognized as major international news organizations. The article cites a study published on arXiv, a reputable preprint repository, which adds credibility to the information presented. However, the reliance on a single source for the study’s findings introduces some uncertainty regarding the comprehensiveness of the coverage.

Plausability check

Score:
9

Notes:
The claims made in the narrative are plausible and align with current research trends in AI and emotional expression. The study’s findings that AI models struggle to mimic human emotional expression, particularly in social media contexts, are consistent with existing literature on the limitations of AI in replicating human-like interactions. The narrative provides specific details, such as the testing of nine open-source models and the accuracy rates of AI-generated texts being recognized with 70–80% accuracy, which are verifiable and support the plausibility of the claims. The language and tone are consistent with academic reporting, and there are no signs of sensationalism or off-topic details.

Overall assessment

Verdict (FAIL, OPEN, PASS): PASS

Confidence (LOW, MEDIUM, HIGH): HIGH

Summary:
The narrative presents original and timely information based on a recent study, with no evidence of recycled content or disinformation. The claims are plausible and supported by verifiable details, and the source, while not a major international news organization, is reputable within the technology news domain. The absence of direct quotes and the use of specific figures and findings from the study further support the credibility of the report.

Share.

Get in Touch

Looking for tailored content like this?
Whether you’re targeting a local audience or scaling content production with AI, our team can deliver high-quality, automated news and articles designed to match your goals. Get in touch to explore how we can help.

Or schedule a meeting here.

© 2025 AlphaRaaS. All Rights Reserved.
Exit mobile version