Inane indexing
As librarians, we've encountered some…well, let's say interesting
indexing choices in PubMed records.
Such as this case, where an article about Apple iPad Pro's
was indexed with the MeSH term, "Malus" (being the genus name for
apple tree)…
So what's going on here? Well, there are a few different ways a record can be indexed in PubMed [1]:
- Manual: This is where a record was indexed manually by an indexer at the National Library of Medicine (NLM).
- Automated: This is where a record was indexed by an algorithm, with no review by human indexers.
- Curated: This is where a record was indexed by an algorithm, but was later reviewed/modified by human indexers.
If the indexing of a record seems particularly bizarre, such
as in our examples, odds are the record may have been automatically indexed.
A bit of Background on Automatic Indexing in PubMed
Automatic indexing isn't new to PubMed. In 2002, automatic
indexing began as simple suggestions given to indexers at the NLM as they reviewed
a record [2-6]. In 2011, first line
automated indexing with human review was implemented for a selection of 14
journals [2, 4, 6, 7]. This progressed to
full automatic indexing for all OLDMEDLINE records in 2015, comments and
backlogged citations in 2016, and, finally, full automatic indexing for all
MEDLINE journals in 2022, with the NLM's newest automatic indexing algorithm,
Medical Text Indexer NeXt generation (MTIX) being introduced in 2024 [4, 7, 8].
While automatic indexing has improved efficiency, it has
also resulted in issues relating to precision and recall, as demonstrated in
the examples I had alluded to earlier. A few studies have documented these
errors more formally [3, 9-12], including
an article I
recently published in the Journal of the Medical Library Association (JMLA),
which examined precision errors observed in a sample of MEDLINE records
automatically indexed with the MeSH term, Malus, being the genus name
for apple tree [13].
How do I check how a PubMed record was indexed?
For those curious, there's a nifty trick you can use to
check how an item is indexed. You can do this by looking at the PubMed record
in XML. To access a PubMed record in XML, simply use the URL below, replacing
the highlighted number with the PMID of the record you would like to check.
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=24410017%2c11700088&retmode=xml
The indexing method is typically listed to the right of the
label "indexingmethod=".
Below are screenshots of the XML of both of our examples from earlier. Sure enough, they were both automatically indexed.
What can we do about automatic indexing errors in PubMed?
While we can't fix automatic indexing errors in PubMed,
ourselves, or wholly prevent them from occurring, there are a few ways to
ameliorate their effects.
In addition to conducting additional studies on automatic
indexing errors and reporting errors to the NLM HelpDesk,
librarians can instruct researchers to use standardized language and the very
MeSH terms they would like to see assigned to their records in the titles and
abstracts of their manuscripts (at the least, this way, records may have
correct MeSH terms appear alongside the erroneous ones).
Fellow librarians from the University of Iowa developed a checklist that has additional
tips for helping researchers reduce the effects of automatic indexing errors in
their own manuscripts [14].
Will automatic indexing algorithms ever compare, with 100%
accuracy, apples to Apple's? Who knows. But, regardless of whether indexing
errors cause you laughter or grief, they're likely to be sticking around for a
while.
Works Cited / Fun Reading
1.
Incorporating values for indexing method in
MEDLINE/PubMed XML [Internet]. NLM Technical Bulletin; 2018. Available from: https://www.nlm.nih.gov/pubs/techbull/ja18/ja18_indexing_method.html#note
2.
Mork J, Aronson A, Demner-Fushman D. 12 years on
- Is the NLM medical text indexer still useful and relevant? J Biomed
Semantics. 2017 Feb 23;8(1):8. Epub 20170223. DOI: 10.1186/s13326-017-0113-5.
Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5324252/pdf/13326_2017_Article_113.pdf.
3.
Chen E, Bullard J, Giustini D. Automated
indexing using NLM's Medical Text Indexer (MTI) compared to human indexing in
Medline: a pilot study. J Med Libr Assoc. 2023 Jul 10;111(3):684–94. DOI:
10.5195/jmla.2023.1588. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10361558/pdf/jmla-111-3-684.pdf.
4.
MEDLINE 2022 Initiative: Transition to Automated
Indexing [Internet]. NLM Techincal Bulletin; 2021. Available from: https://www.nlm.nih.gov/pubs/techbull/nd21/nd21_medline_2022.html
5.
Rae AR, Pritchard DO, Mork JG, Demner-Fushman D.
Automatic MeSH indexing: revisiting the subheading attachment problem. AMIA
Annu Symp Proc. 2020;2020:1031–40. Epub 20210125. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8075546/pdf/139_3413087.pdf.
6.
Mork J, Jimeno Yepes A, Aronson A. The NLM
Medical Text Indexer System for indexing biomedical literature. 2013. Available
from: https://lhncbc.nlm.nih.gov/ii/information/Papers/MTI_System_Description_Expanded_2013_Accessible.pdf.
7.
Frequently Asked Questions about Indexing for
MEDLINE: National Library of Medicine; 2023. Available from: https://www.nlm.nih.gov/bsd/indexfaq.html.
8.
Sticco A. NLM Office Hours: MEDLINE Indexing
Update 2024. Available from: https://www.nlm.nih.gov/oet/ed/pubmed/02-24_oh_medline-automated-indexing.html.
9.
Aronson AR, Mork JG, Gay CW, Humphrey SM, Rogers
WJ. The NLM Indexing Initiative's Medical Text Indexer. Stud Health Technol
Inform. 2004;107(Pt 1):268–72. Available from: https://ebooks.iospress.nl/pdf/doi/10.3233/978-1-60750-949-3-268.
10. Fernandez-Llimos
F, Negrão LG, Bond C, Stewart D. Influence of automated indexing in Medical
Subject Headings (MeSH) selection for pharmacy practice journals. Res Social
Adm Pharm. 2024 Jun 12. Epub 20240612. DOI: 10.1016/j.sapharm.2024.06.003.
11. Moore
DAQ, Yaqub O, Sampat BN. Manual versus machine: How accurately does the Medical
Text Indexer (MTI) classify different document types into disease areas? PLoS
One. 2024;19(3):e0297526. Epub 20240313. DOI: 10.1371/journal.pone.0297526.
Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10936797/pdf/pone.0297526.pdf.
12. Brief
Communication – concerning algorithmic indexing in MEDLINE. Journal of EAHIL.
2024 03/17 [cited 2025/04/09];20(1):18–21. DOI: 10.32384/jeahil20604. Available
from: https://doi.org/10.32384/jeahil20604.
13. Wilson
P. Sometimes the apple does fall far from the tree: a case study on automatic
indexing precision errors in PubMed. J Med Libr Assoc. 2025 Oct
23;113(4):318–26. DOI: 10.5195/jmla.2025.2110.
14. Allen
C, Carol H, Riley S, Deberg J. Developing an Author Checklist to Improve
Discovery of Published Articles in the Era of Algorithmic Indexing. 2025. Zenodo.
https://doi.org/10.5281/zenodo.17282091
Thanks for reading!

No comments:
Post a Comment