Image by Markus Winkler from Pixabay
I came across an interesting (and rather alarming!) read from Nature.
In the study, researchers posted two papers to a preprint server discussing a fake disease called Bixonimania, with the purpose of seeing whether existing large language models (LLMs) would reference the papers in its health advice. The researchers included multiple "tips" in the papers' full text identifying them as fake (my favorite was an acknowledgment to someone from the Starfleet Academy!).
Despite these obvious tips, not only were the papers cited in LLMs' generated summaries, but were cited by a few peer reviewed publications as though they were legitimate sources! The researchers deduced this latter result may be attributed to authors' relying on AI generated references for their research without reading the full text.
This study illustrates not only the dangers of relying on LLM-generated summaries for advice (especially when that advice is medical!), but also relying on these summaries for generating citations for one's research.
Even AI literature summarizers that are supposedly dedicated to academic and medical research are subject to these pitfalls. Myself and my colleagues at the Ebling Library have compiled several examples of such AI tools citing lower quality studies, and, in many cases, wholly misrepresenting the contents of the articles they cite.
As those who have read about my previous clown shenanigans are all too aware (here are my first and second blog posts on the topic, if you would like some humorous reads!), even AI tools designed to "read" full text PDFs don't always pick up on obvious red flags, and can misrepresent the contents of an article.
As librarians, catching AI in these errors can feel a bit like detective work; however, what with all the hype relating to AI in research, alerting researchers to the current limitations of these tools is essential. As Sir Arthur Conan Doyle's Sherlock Holmes would say, "The game is afoot!"
