Tuesday, December 16, 2025

Can automatic indexing algorithms compare apples to Apple's?: An overview of automatic indexing in PubMed, errors, and potential band aids.

 

Apple on a stack of books
Image by Michal Jarmoluk from Pixabay

Inane indexing

As librarians, we've encountered some…well, let's say interesting indexing choices in PubMed records.

Such as this case, where an article about Apple iPad Pro's was indexed with the MeSH term, "Malus" (being the genus name for apple tree)…

PubMed record about Apple technology, indexed with the MeSH term Malus
…or this case, where an article about early identification of autism was assigned the MeSH term, "humans." Anything else? Nope. Just "humans."

PubMed record about autism detection, assigned the MeSH term, Humans

So what's going on here? Well, there are a few different ways a record can be indexed in PubMed [1]:

  • Manual: This is where a record was indexed manually by an indexer at the National Library of Medicine (NLM).
  • Automated: This is where a record was indexed by an algorithm, with no review by human indexers.
  • Curated: This is where a record was indexed by an algorithm, but was later reviewed/modified by human indexers.

If the indexing of a record seems particularly bizarre, such as in our examples, odds are the record may have been automatically indexed.

A bit of Background on Automatic Indexing in PubMed

Automatic indexing isn't new to PubMed. In 2002, automatic indexing began as simple suggestions given to indexers at the NLM as they reviewed a record [2-6]. In 2011, first line automated indexing with human review was implemented for a selection of 14 journals [2, 4, 6, 7]. This progressed to full automatic indexing for all OLDMEDLINE records in 2015, comments and backlogged citations in 2016, and, finally, full automatic indexing for all MEDLINE journals in 2022, with the NLM's newest automatic indexing algorithm, Medical Text Indexer NeXt generation (MTIX) being introduced in 2024 [4, 7, 8].

While automatic indexing has improved efficiency, it has also resulted in issues relating to precision and recall, as demonstrated in the examples I had alluded to earlier. A few studies have documented these errors more formally [3, 9-12], including an article I recently published in the Journal of the Medical Library Association (JMLA), which examined precision errors observed in a sample of MEDLINE records automatically indexed with the MeSH term, Malus, being the genus name for apple tree [13].

How do I check how a PubMed record was indexed?

For those curious, there's a nifty trick you can use to check how an item is indexed. You can do this by looking at the PubMed record in XML. To access a PubMed record in XML, simply use the URL below, replacing the highlighted number with the PMID of the record you would like to check.

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=24410017%2c11700088&retmode=xml

The indexing method is typically listed to the right of the label "indexingmethod=".

Below are screenshots of the XML of both of our examples from earlier. Sure enough, they were both automatically indexed.

XML of PubMed records, both list automated as indexing method

Link to XML for Example 1

Link to XML for Example 2

What can we do about automatic indexing errors in PubMed?

While we can't fix automatic indexing errors in PubMed, ourselves, or wholly prevent them from occurring, there are a few ways to ameliorate their effects.

In addition to conducting additional studies on automatic indexing errors and reporting errors to the NLM HelpDesk, librarians can instruct researchers to use standardized language and the very MeSH terms they would like to see assigned to their records in the titles and abstracts of their manuscripts (at the least, this way, records may have correct MeSH terms appear alongside the erroneous ones).

Fellow librarians from the University of Iowa developed a checklist that has additional tips for helping researchers reduce the effects of automatic indexing errors in their own manuscripts [14].

Will automatic indexing algorithms ever compare, with 100% accuracy, apples to Apple's? Who knows. But, regardless of whether indexing errors cause you laughter or grief, they're likely to be sticking around for a while.

 

Works Cited / Fun Reading

1.      Incorporating values for indexing method in MEDLINE/PubMed XML [Internet]. NLM Technical Bulletin; 2018. Available from: https://www.nlm.nih.gov/pubs/techbull/ja18/ja18_indexing_method.html#note

2.      Mork J, Aronson A, Demner-Fushman D. 12 years on - Is the NLM medical text indexer still useful and relevant? J Biomed Semantics. 2017 Feb 23;8(1):8. Epub 20170223. DOI: 10.1186/s13326-017-0113-5. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5324252/pdf/13326_2017_Article_113.pdf.

3.      Chen E, Bullard J, Giustini D. Automated indexing using NLM's Medical Text Indexer (MTI) compared to human indexing in Medline: a pilot study. J Med Libr Assoc. 2023 Jul 10;111(3):684–94. DOI: 10.5195/jmla.2023.1588. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10361558/pdf/jmla-111-3-684.pdf.

4.      MEDLINE 2022 Initiative: Transition to Automated Indexing [Internet]. NLM Techincal Bulletin; 2021. Available from: https://www.nlm.nih.gov/pubs/techbull/nd21/nd21_medline_2022.html

5.      Rae AR, Pritchard DO, Mork JG, Demner-Fushman D. Automatic MeSH indexing: revisiting the subheading attachment problem. AMIA Annu Symp Proc. 2020;2020:1031–40. Epub 20210125. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8075546/pdf/139_3413087.pdf.

6.      Mork J, Jimeno Yepes A, Aronson A. The NLM Medical Text Indexer System for indexing biomedical literature. 2013. Available from: https://lhncbc.nlm.nih.gov/ii/information/Papers/MTI_System_Description_Expanded_2013_Accessible.pdf.

7.      Frequently Asked Questions about Indexing for MEDLINE: National Library of Medicine; 2023. Available from: https://www.nlm.nih.gov/bsd/indexfaq.html.

8.      Sticco A. NLM Office Hours: MEDLINE Indexing Update 2024. Available from: https://www.nlm.nih.gov/oet/ed/pubmed/02-24_oh_medline-automated-indexing.html.

9.      Aronson AR, Mork JG, Gay CW, Humphrey SM, Rogers WJ. The NLM Indexing Initiative's Medical Text Indexer. Stud Health Technol Inform. 2004;107(Pt 1):268–72. Available from: https://ebooks.iospress.nl/pdf/doi/10.3233/978-1-60750-949-3-268.

10. Fernandez-Llimos F, Negrão LG, Bond C, Stewart D. Influence of automated indexing in Medical Subject Headings (MeSH) selection for pharmacy practice journals. Res Social Adm Pharm. 2024 Jun 12. Epub 20240612. DOI: 10.1016/j.sapharm.2024.06.003.

11. Moore DAQ, Yaqub O, Sampat BN. Manual versus machine: How accurately does the Medical Text Indexer (MTI) classify different document types into disease areas? PLoS One. 2024;19(3):e0297526. Epub 20240313. DOI: 10.1371/journal.pone.0297526. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10936797/pdf/pone.0297526.pdf.

12. Brief Communication – concerning algorithmic indexing in MEDLINE. Journal of EAHIL. 2024 03/17 [cited 2025/04/09];20(1):18–21. DOI: 10.32384/jeahil20604. Available from: https://doi.org/10.32384/jeahil20604.

13. Wilson P. Sometimes the apple does fall far from the tree: a case study on automatic indexing precision errors in PubMed. J Med Libr Assoc. 2025 Oct 23;113(4):318–26. DOI: 10.5195/jmla.2025.2110.

14. Allen C, Carol H, Riley S, Deberg J. Developing an Author Checklist to Improve Discovery of Published Articles in the Era of Algorithmic Indexing. 2025. Zenodo. https://doi.org/10.5281/zenodo.17282091



Thanks for reading!

No comments:

Post a Comment