EndNote 2025 now has an AI-powered Research Assistant tool to accompany its Key Takeaway tool. This new feature allows you to ask the Research Assistant (i.e., chatbot) questions, which the Assistant will answer using the content provided in the PDF you attach to a reference. As a note, in order to use the tool EndNote 2025 users must use a library they have synced with the online version of EndNote.
Of course, I couldn't help but resort to my old clown antics to test out the tool, just like I did when testing out the Key Takeaway tool.
For this experiment, I tested whether the Research Assistant could pick up on clown nonsense in a study's methods. To do this, I replaced the abstract and methods sections of one of my own articles about clinical trials information sharing with nonsense about clowns (for a humorous read, you can take a look at the manipulated PDF here), and attached it to its corresponding reference in EndNote. I also replaced the abstract metadata in the EndNote record with clown nonsense. After syncing my library, I then entered the following question into the Research Assistant, "What methods were used to determine the extent of data sharing?"
Theoretically, the Research Assistant should have either stated the authors didn't specify their methods for determining extent of data sharing, or mentioned something along the lines of balloon animal residue analysis and unsolicited honking detection, but it did nothing of the sort. Instead, it appeared to draw data from the results and introduction sections to inform its methods summary. It didn't mention clowns, at all, even though all sections relating to the methods had been replaced with clown nonsense.
Here is the output of the Research Assistant, as the screenshot can be a bit hard to read:
- Methods used to determine data sharing
- The study analyzed 206 ClinicalTrials.gov (CTG) records of US federally funded COVID-19 clinical studies that included at least 200 participants
- It evaluated the records for links to supporting documents, individual participant data (IPD) sharing information, and associated publications
- The researchers looked for specific sections in the CTG records that indicated the presence of protocols, informed consent forms, and statistical analysis plans.
- Data was collected on how many records provided links to these materials and the mechanisms for accessing IPD, as well as any discrepancies in the information provided.
To be fair, the Research Assistant did pick up on clown nonsense when you asked it to "Summarize the methods used for this study."
Here is the output for that query, as well:
- Methods summary
- Study design: the research employed a multi-phase, interdisciplinary approach to investigate the discoverability of supporting research materials in COVID-19 clinical studies on ClinicalTrials.gov.
- Data Collection:
- Behavioral surveillance: Observers were trained to recognize clown behaviors, using tools like binoculars and audio recorders to monitor high-traffic areas for clown-like activities.
- Material forensics: Environmental samples were collected from locations identified as high clown activity zones, focusing on traces like wig fibers and greasepaint residue.
- Psychological profiling: Participants were assessed using three instruments to gauge their clown potential, including a questionnaire and observational tests.
- Data Analysis: Observations and samples were catalogued and scored using the Clown Threat Index (CTI), which quantified behaviors to identify individuals with clown-like tendencies.
- Ethics and oversight: The study was approved by the Institutional Review Board, ensuring ethical standards in data collection and participant interaction.
While I hope it would be unlikely authors would replace sections relating to their methods with utter nonsense, this little experiment demonstrates the limitations of AI PDF summarizers, namely how they can miss vital context within the full text. Researchers wanting to save time with these summarizers may miss (potentially critical!) methodological flaws in a study (such as in this case, where there was a mismatch, to say the least, between the methods and the aims of the study). While PDF summarization tools and chatbots can be convenient, researchers should exercise caution if they decide to utilize such tools, and always verify information by examining the full text.
For some additional resources relating to generative AI in the health sciences, check out:
- Evaluating large language models on medical evidence summarization: “In this study, we systematically examine the capabilities and limitations of LLMs, specifically GPT-3.5 and ChatGPT, in performing zero-shot medical evidence summarization across six clinical domains.”
- Evaluating a large language model’s ability to answer clinicians’ requests for evidence summaries: “This study investigated the performance of a generative artificial intelligence (AI) tool using GPT-4 in answering clinical questions in comparison with medical librarians' gold-standard evidence syntheses.”
- Evaluating ChatGPT's ability to simplify scientific abstracts for clinicians and the public: "This study evaluated ChatGPT's ability to simplify scientific abstracts for both public and clinician use. Ten questions were developed to assess ChatGPT's ability to simplify scientific abstracts and improve their readability for both the public and clinicians."
- How effectively do large language models and AI-based automation tools assist in writing and summarizing evidence syntheses?: Cochrane Training webinar recording on potential uses of large language models in systematic reviews.
- AI Hallucinations Are Getting Worse: NY Times coverage of the worsening issue of hallucinations in generative AI
- Ebling's AI Examples (Ebling's Box folder containing additional generative AI examples!)
Thanks for reading, and I hope everyone has a great weekend!
No comments:
Post a Comment