The growing sophistication of AI language models such as OpenAI’s GPT-5 has raised hopes that they could help verify photographs online, spotting visual clues, geolocating obscure images and even detecting fakes. But new research from the Tow Center for Digital Journalism suggests the technology is far from reliable when it comes to confirming provenance — and mayrisk adding confusion.
In tests involving seven leading AI platforms, including GPT-5, Gemini, Claude and Perplexity, researchers gave each system ten authentic images from major news events and asked for details such as date, location and photographer. Out of 280 queries, only 14 met the required standard for accuracy and consistency. Even GPT-5, the best performer, was correct just over a quarter of the time.
Unlike reverse image search tools such as Google Images or TinEye, which use pixel-based matching, large language models generate descriptions of pictures and then build text-based searches around them. This can produce “confidently wrong” answers when superficial clues are over-emphasised. In one case, Grok mistook flooding in Valencia for floods in Venice after focusing on a “Venice Beach” t-shirt in the frame.
The models were somewhat better at geolocation than at identifying photographers or dates. They were able, for instance, to highlight architectural details, vegetation or street furniture that might escape a human fact-checker’s notice, and their optical character recognition can read faint or blurred text. Investigators say these features show promise for generating leads or providing a “first draft” of analysis.
But errors were frequent and sometimes serious. The systems mislabelled well-documented images, fabricated claims about metadata, and even suggested authentic photos were AI-generated. A flood photo from Kazakhstan, for example, was misattributed to other events and wrongly flagged as synthetic.
Because the models’ reasoning is opaque, researchers warn that it is difficult for non-specialists to know when an answer is credible. As media researcher Mike Caulfield noted, the danger lies in untrained users treating AI’s confident but inaccurate responses as fact.
The study also places AI’s weaknesses in a broader context: detection tools designed to identify synthetic media are struggling to keep up with the rapid evolution of image-generation technology.
The Tow Center concludes that AI can play a supporting role for professional fact-checkers but should never replace traditional methods. Used with caution, AI might surface overlooked details or accelerate searches, but human oversight and independent corroboration remain essential if journalism is to avoid amplifying mistakes at moments when accuracy matters most.
Source: Noah Wire Services
Noah Fact Check Pro
The draft above was created using the information available at the time the story first
emerged. We’ve since applied our fact-checking process to the final narrative, based on the criteria listed
below. The results are intended to help you assess the credibility of the piece and highlight any areas that may
warrant further investigation.
Freshness check
Score:
10
Notes:
The narrative is based on a press release from the Tow Center for Digital Journalism, dated August 26, 2025. Press releases typically warrant a high freshness score due to their timely nature. ([cjr.org](https://www.cjr.org/tow_center/why-ai-models-are-bad-at-verifying-photos.php?utm_source=openai))
Quotes check
Score:
10
Notes:
No direct quotes are present in the provided text.
Source reliability
Score:
10
Notes:
The narrative originates from the Tow Center for Digital Journalism, a reputable organisation known for its research in digital journalism.
Plausability check
Score:
10
Notes:
The claims align with existing research on AI’s limitations in image verification. For instance, a study published in Digital Journalism discusses the challenges and opportunities of implementing data journalism, digital verification, and AI in newsrooms. ([viewjournal.eu](https://viewjournal.eu/articles/10.18146/view.332?utm_source=openai))
Overall assessment
Verdict (FAIL, OPEN, PASS): PASS
Confidence (LOW, MEDIUM, HIGH): HIGH
Summary:
The narrative is fresh, originating from a recent press release by a reputable organisation, and presents plausible claims supported by existing research.