Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research

doi:10.1016/j.urology.2017.07.056

. 2017 Dec;110:84-91.

doi: 10.1016/j.urology.2017.07.056. Epub 2017 Sep 12.

Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research

Florian R Schroeck¹, Olga V Patterson², Patrick R Alba², Erik A Pattison³, John D Seigne⁴, Scott L DuVall², Douglas J Robertson⁵, Brenda Sirovich⁵, Philip P Goodney⁵

Affiliations

¹ VA Outcomes Group, White River Junction VA Medical Center, White River Junction, VT; Section of Urology, Dartmouth Hitchcock Medical Center, Lebanon, NH; Norris Cotton Cancer Center, Dartmouth Hitchcock Medical Center, Lebanon, NH; The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Hanover, NH. Electronic address: [email protected].
² Department of Internal Medicine, VA Salt Lake City Health Care System and University of Utah, Salt Lake City, UT.
³ VA Outcomes Group, White River Junction VA Medical Center, White River Junction, VT; Section of Urology, Dartmouth Hitchcock Medical Center, Lebanon, NH.
⁴ Section of Urology, Dartmouth Hitchcock Medical Center, Lebanon, NH; Norris Cotton Cancer Center, Dartmouth Hitchcock Medical Center, Lebanon, NH.
⁵ VA Outcomes Group, White River Junction VA Medical Center, White River Junction, VT; The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Hanover, NH.

PMID: 28916254
PMCID: PMC5696035
DOI: 10.1016/j.urology.2017.07.056

Free PMC article

Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research

Florian R Schroeck et al. Urology. 2017 Dec.

Free PMC article

. 2017 Dec;110:84-91.

doi: 10.1016/j.urology.2017.07.056. Epub 2017 Sep 12.

Authors

Florian R Schroeck¹, Olga V Patterson², Patrick R Alba², Erik A Pattison³, John D Seigne⁴, Scott L DuVall², Douglas J Robertson⁵, Brenda Sirovich⁵, Philip P Goodney⁵

Affiliations

¹ VA Outcomes Group, White River Junction VA Medical Center, White River Junction, VT; Section of Urology, Dartmouth Hitchcock Medical Center, Lebanon, NH; Norris Cotton Cancer Center, Dartmouth Hitchcock Medical Center, Lebanon, NH; The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Hanover, NH. Electronic address: [email protected].
² Department of Internal Medicine, VA Salt Lake City Health Care System and University of Utah, Salt Lake City, UT.
³ VA Outcomes Group, White River Junction VA Medical Center, White River Junction, VT; Section of Urology, Dartmouth Hitchcock Medical Center, Lebanon, NH.
⁴ Section of Urology, Dartmouth Hitchcock Medical Center, Lebanon, NH; Norris Cotton Cancer Center, Dartmouth Hitchcock Medical Center, Lebanon, NH.
⁵ VA Outcomes Group, White River Junction VA Medical Center, White River Junction, VT; The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Hanover, NH.

PMID: 28916254
PMCID: PMC5696035
DOI: 10.1016/j.urology.2017.07.056

Abstract

Objective: To take the first step toward assembling population-based cohorts of patients with bladder cancer with longitudinal pathology data, we developed and validated a natural language processing (NLP) engine that abstracts pathology data from full-text pathology reports.

Methods: Using 600 bladder pathology reports randomly selected from the Department of Veterans Affairs, we developed and validated an NLP engine to abstract data on histology, invasion (presence vs absence and depth), grade, the presence of muscularis propria, and the presence of carcinoma in situ. Our gold standard was based on an independent review of reports by 2 urologists, followed by adjudication. We assessed the NLP performance by calculating the accuracy, the positive predictive value, and the sensitivity. We subsequently applied the NLP engine to pathology reports from 10,725 patients with bladder cancer.

Results: When comparing the NLP output to the gold standard, NLP achieved the highest accuracy (0.98) for the presence vs the absence of carcinoma in situ. Accuracy for histology, invasion (presence vs absence), grade, and the presence of muscularis propria ranged from 0.83 to 0.96. The most challenging variable was depth of invasion (accuracy 0.68), with an acceptable positive predictive value for lamina propria (0.82) and for muscularis propria (0.87) invasion. The validated engine was capable of abstracting pathologic characteristics for 99% of the patients with bladder cancer.

Conclusion: NLP had high accuracy for 5 of 6 variables and abstracted data for the vast majority of the patients. This now allows for the assembly of population-based cohorts with longitudinal pathology data.

Published by Elsevier Inc.

Conflict of interest statement

Conflicts of Interest: none

Figures

**Figure 1**
Documents classified correctly as well as false positive and false negatives among the 150 bladder cancer pathology reports included in the validation sample. Black bars indicate correctly identified reports, grey bars are NLP false negatives, and white bars are NLP false positives. Black and grey bars together represent the count based on the gold standard annotation.

See this image and copyright information in PMC

Comment in

Editorial Comment.
Zeineh J, Donovan MJ. Zeineh J, et al. Urology. 2017 Dec;110:90-91. doi: 10.1016/j.urology.2017.07.057. Epub 2017 Oct 16. Urology. 2017. PMID: 29050642 No abstract available.

Cited by 11 articles

Automatic Classification of Cancer Pathology Reports: A Systematic Review.
Santos T, Tariq A, Gichoya JW, Trivedi H, Banerjee I. Santos T, et al. J Pathol Inform. 2022 Jan 20;13:100003. doi: 10.1016/j.jpi.2022.100003. eCollection 2022. J Pathol Inform. 2022. PMID: 35242443 Free PMC article.
Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity.
Park B, Altieri N, DeNero J, Odisho AY, Yu B. Park B, et al. JAMIA Open. 2021 Sep 30;4(3):ooab085. doi: 10.1093/jamiaopen/ooab085. eCollection 2021 Jul. JAMIA Open. 2021. PMID: 34604711 Free PMC article.
Partial Versus Complete Bacillus Calmette-Guérin Intravesical Therapy and Bladder Cancer Outcomes in High-risk Non-muscle-invasive Bladder Cancer: Is NIMBUS the Full Story?
Rezaee ME, Ismail AAO, Okorie CL, Seigne JD, Lynch KE, Schroeck FR. Rezaee ME, et al. Eur Urol Open Sci. 2021 Feb 16;26:35-43. doi: 10.1016/j.euros.2021.01.009. eCollection 2021 Apr. Eur Urol Open Sci. 2021. PMID: 34337506 Free PMC article.
Natural language processing systems for pathology parsing in limited data environments with uncertainty estimation.
Odisho AY, Park B, Altieri N, DeNero J, Cooperberg MR, Carroll PR, Yu B. Odisho AY, et al. JAMIA Open. 2020 Oct 14;3(3):431-438. doi: 10.1093/jamiaopen/ooaa029. eCollection 2020 Oct. JAMIA Open. 2020. PMID: 33381748 Free PMC article.
Natural Language Processing for Surveillance of Cervical and Anal Cancer and Precancer: Algorithm Development and Split-Validation Study.
Oliveira CR, Niccolai P, Ortiz AM, Sheth SS, Shapiro ED, Niccolai LM, Brandt CA. Oliveira CR, et al. JMIR Med Inform. 2020 Nov 3;8(11):e20826. doi: 10.2196/20826. JMIR Med Inform. 2020. PMID: 32469840 Free PMC article.

See all "Cited by" articles

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grant support

U01 FD005478/FD/FDA HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- Genetic Alliance
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research

Affiliations

Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Comment in

Similar articles

Cited by 11 articles

Publication types

MeSH terms

Grant support

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Abstract

Conflict of interest statement

Figures

Comment in

Similar articles

Cited by 11 articles

Publication types

MeSH terms

Related information

Grant support

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical