ORIGINAL ARTICLE: Clinical Endoscopy

Natural language processing as an alternative to manual reporting of colonoscopy quality metrics Gottumukkala S. Raju, MD, FASGE,1 Phillip J. Lum, BS,1 Rebecca S. Slack, MS,2 Selvi Thirumurthi, MD,1 Patrick M. Lynch, MD,1 Ethan Miller, MD,1 Brian R. Weston, MD,1 Marta L. Davila, MD, FASGE,1 Manoop S. Bhutani, MD, FASGE,1 Mehnaz A. Shafi, MD,1 Robert S. Bresalier, MD,1 Alexander A. Dekovich, MD,1 Jeffrey H. Lee, MD, MPH, FASGE,1 Sushovan Guha, MD, PhD,1 Mala Pande, PhD, MPH,1 Boris Blechacz, MD, PhD,1 Asif Rashid, MD, PhD,3 Mark Routbort, MD, PhD,4 Gladis Shuttlesworth, PhD, MBA,1 Lopa Mishra, MD,1 John R. Stroehlein, MD, FASGE,1 William A. Ross, MD, MBA, FASGE1 Houston, Texas, USA

Background and Aims: The adenoma detection rate (ADR) is a quality metric tied to interval colon cancer occurrence. However, manual extraction of data to calculate and track the ADR in clinical practice is laborintensive. To overcome this difficulty, we developed a natural language processing (NLP) method to identify adenomas and sessile serrated adenomas (SSAs) in patients undergoing their first screening colonoscopy. We compared the NLP-generated results with that of manual data extraction to test the accuracy of NLP and report on colonoscopy quality metrics using NLP. Methods: Identification of screening colonoscopies using NLP was compared with that using the manual method for 12,748 patients who underwent colonoscopies from July 2010 to February 2013. Also, identification of adenomas and SSAs using NLP was compared with that using the manual method with 2259 matched patient records. Colonoscopy ADRs using these methods were generated for each physician. Results: NLP correctly identified 91.3% of the screening examinations, whereas the manual method identified 87.8% of them. Both the manual method and NLP correctly identified examinations of patients with adenomas and SSAs in the matched records almost perfectly. Both NLP and the manual method produced comparable values for ADRs for each endoscopist and for the group as a whole. Conclusions: NLP can correctly identify screening colonoscopies, accurately identify adenomas and SSAs in a pathology database, and provide real-time quality metrics for colonoscopy. (Gastrointest Endosc 2015;82:512-9.) (footnotes appear on last page of article)

Use your mobile device to scan this QR code and watch the author interview. Download a free QR code scanner by searching “QR Scanner” in your mobile device’s app store.

is laborious, time-consuming, and resource-intensive, even for a small sample in the absence of highly structured procedural reports.3 Reporting the quality metrics takes many man-hours.4 Given that the number of colonoscopies performed in the United States is increasing (O14 million per year in 2012),5 manual reporting of the ADR and other quality metrics is not sustainable. Although progress has been made in endoscopy software to provide colonoscopy techniquebased quality metrics, most receive data from pathology software as PDF documents and not as structured data to calculate ADR automatically (Table 1). This entails using a professional to review the pathology data and enter it into the endoscopy software before quality metrics such as ADR could be calculated or uploaded to GI Quality Improvement Consortium (GIQuIC) to report to Medicare.

512 GASTROINTESTINAL ENDOSCOPY Volume 82, No. 3 : 2015

www.giejournal.org

Adenoma detection rate (ADR) is an important quality metric for colonoscopy performance.1 A recent study demonstrated an inverse association between the ADR and subsequent risk of interval, advanced-stage interval, and fatal interval colorectal cancer.2 Currently, calculation of the ADR and other colonoscopy quality metrics requires careful review of all electronic medical records and endoscopy and pathology reports to identify screening colonoscopies. This is followed by manual entry of data into a database. Manual reporting of ADRs

Raju et al

Natural language processing for colonoscopy quality metrics

TABLE 1. Current interaction of available endoscopy software with pathology databases* Endoscopy software

Vendor

Endoscopy–pathology interface

Endoworks

Olympus

Can only interface with Caris Life Science pathology software

Provation MD Gastroenterology

Provation

Can receive PDF from Pathology Software

EndoPro

Pentax Medical

Can receive PDF from Pathology Software

Endosoft

Endosoft

Can receive PDF from Pathology Software

*Currently available software do not allow merging of pathology data with endoscopy data to report adenoma detection rate automatically. They require manual entry.

Natural language processing (NLP) uses computerbased linguistics and artificial intelligence to identify and extract information from free-text data sources such as progress notes, endoscopy procedure reports, laboratory test results, radiology reports, and pathology reports. NLP offers the opportunity to report data from unstructured procedural and pathology reports that may suffice in producing colonoscopy quality metrics.6-9 Therefore, we examined the use of NLP in reporting colonoscopy quality metrics. We developed an NLP-based software program to identify patients undergoing their first screening colonoscopy who were at average risk for colon cancer and with no symptoms and extract information from their corresponding pathology reports, including the number and type of polyps (eg, adenoma, hyperplastic polyps). We then compared the results with those obtained using manual data extraction. We report herein the colonoscopy quality metrics for our group obtained using NLP.

America, Center Valley, Pa) coupled with a data warehousing platform has a feature that allows for a weekly data dump into a SQL server database. The extraction system extracts all colonoscopy reports from this database into a staging database. Step 2: demographic information and transcribed document extraction. Race, sex, and medical data are extracted from the electronic medical transcribed records (Clinic Station, The University of Texas MD Anderson Cancer Center, Houston, Tex) into the staging database, according to the MRNs in the colonoscopy reports.

Step 1: endoscopy data extraction. Our institution’s endoscopy information system (EndoWorks 7; Olympus

Step 3: pathology report extraction. All pathology reports according to the MRNs in the colonoscopy reports are extracted from the pathology database (Sunquest PowerPath; Sunquest Information Systems, Tucson, Ariz). Data processing. The data processing software program in the CAADRR uses the NLP to convert paragraph text into structured fields in seven steps. 1. Abstract key terms from the pathology reports into structured fields. 2. Match the MRNs on the pathology reports with the MRNs on the given colonoscopy reports. 3. Identify keywords in the pathology report to determine whether the report is linked with a colonoscopy report. 4. Abstract key terms for cecal intubation and preparation quality from the procedure report into structured fields. 5. Match the colonoscopy procedure date with either the collection or receiving date on the pathology report. If the procedure date does not match either the pathology collection or receiving date, the program will try to match them within 5 days of the pathology report collection or receiving date. 6. Identify past colonoscopies and/or pathology reports in relation to the patient’s current colonoscopy report. If a past colonoscopy pathology report is detected, it is noted in the previous colonoscopy field. The same process is applied if no past pathology reports are detected for the patient. The next step in data processing is to decide whether the colonoscopy report is a screening examination. In this process, the data processing program will identify the keyword “screening” in the indication section of the colonoscopy report. Once the examination is identified as a screening examination,

www.giejournal.org

Volume 82, No. 3 : 2015 GASTROINTESTINAL ENDOSCOPY 513

METHODS A computer application for ADR reporting (CAADRR) using the NLP method was developed to supplant the manual method of reporting the ADR as part of a quality improvement project. Three endoscopists manually reviewed and collected data on patients at The University of Texas MD Anderson Cancer Center who underwent screening colonoscopy. The MD Anderson Institutional Review Board approved this project.

CAADRR design The CAADRR consists of 3 separate programs: data abstraction into a staging database (containing data from different sources required to generate the ADR), data processing (using NLP to extract data from paragraphs into structured fields and linking colonoscopy reports with correct pathology reports), and data presentation and result reporting (Fig. 1). Data extraction. The data extraction software program interfaces with external computer systems and pulls data into the staging database according to the patient’s medical record number (MRN). Data extraction consists of the 3 steps, described below (Fig. 1).

Natural language processing for colonoscopy quality metrics

ADR Architecture

Pathology

Raju et al

Pathology Diagnosis

Pathology Diagnosis Structured Fields

Colonoscopy Report Structured Fields

Colonoscopy Report

Data Processing Data Transfer

Endowriter

Reporting

Staging Database

Electronic Medical Record ( Gender, Race, Birthdate, Transcribed Documents, etc. ) Figure 1. Architecture of the computer application for adenoma detection rate reporting (CAADRR).

the next step is to eliminate the nonscreening colonoscopy reports according to the following key areas: patients not 50 to 75 years old as indicated by the endowriter; patients with symptoms (eg, bleeding, abdominal pain, constipation, diarrhea, bowel obstruction); patients with familial adenomatous polyposis, Peutz-Jeghers syndrome, Li-Fraumeni syndrome, Lynch syndrome, or serrated polyposis; patients with a history of inflammatory bowel disease as indicated in colonoscopy reports; patients with family history of colon cancer and polyps; patients with prior history of colon polyps; and patients with previous colonoscopy reports (look for keywords in the colonoscopy reports [eg, “Last Colonoscopy x years ago,” “1st Colonoscopy x years ago”] and search the staging database for a past colonoscopy or pathology report). 7. The final step in data processing is to process the transcribed medical documents to determine whether any past colonoscopy procedures are mentioned in them. If so, the transcribed document date must precede the colonoscopy procedure date. This step is performed if the examination is, up to this point, still considered a screening examination. Data presentation. The data presentation software application in the CAADRR shows the raw data and NLP results on 1 tab and the resulting statistics in another. First, each colonoscopy report is shown as a row along with the associated pathology report if present and contains an extended field that indicates whether the report is for a screening colonoscopy and the key words used to determine whether a colonoscopy is a screening procedure or 514 GASTROINTESTINAL ENDOSCOPY Volume 82, No. 3 : 2015

not. Second, each pathology report associated with a screening colonoscopy report contains the following abstracted pathology fields: tubular adenoma, tubulovillous adenoma, villous adenoma, adenomatous polyps, mixed adenoma, sessile serrated adenoma (SSA), serrated adenoma, traditional serrated adenoma, hyperplastic polyp in the right side of the colon, hyperplastic polyp in left side of the colon, unknown location of hyperplastic polyp, lowgrade and high-grade dysplasia, cancer, and miscellaneous polyps (inflammatory polyps, lymphoid polyps, etc). Whether an endoscopist identified 1 or more adenomas per patient is based on the pathology report according the specimen submitted in a jar for analysis. Multiple adenomas were said to be present only if adenomatous tissue was found in more than 1 pathology jar submitted. Finally, the data processing results are grouped and then fed into the statistical algorithms. The reporting application can give the ADR, adenoma burden (number of adenomas in a patient found to have an adenoma) and number and percentage of SSAs, serrated lesions, adenomas, advanced adenomas, and polyps. These data can be presented in a multitude of combinations, such as whole group from beginning to present and by year or each endoscopist in groups of 100 procedures categorized by year, race, or age (%60, 60-70, and R70 years).

Development of the CAADRR The entire CAADRR project was developed using the C# programming language and the Visual Studio Ultimate software program (2012 edition; Microsoft, Redmond, Wash). C# is a modern object-oriented programming www.giejournal.org

Raju et al

language that provides a rich set of string manipulation functions built into the string variable. These functions are the building blocks for NLP.

Natural language processing for colonoscopy quality metrics

The CAADRR development cycle consist of 5 periods: 1. Two weeks for table replication of Olympus data warehouse tables, pathology tables, and transcription of documents and the demographic data from the electronic medical record into the staging database 2. Two weeks for development and testing of an application for transfer of data from the various source databases to the staging database 3. Six weeks for development of the NLP algorithms to process the pathology and colonoscopy reports, linkage of pathology reports with current colonoscopy reports, and identification of preceding colonoscopy and pathology reports to determine whether the current identified screening colonoscopy report is not a surveillance report 4. Two weeks for development of a reporting application with statistical output 5. Nine weeks for testing and refinement of the NLP algorithms

the desired results. New keyword definitions or features were then introduced into the NLP engine to achieve the desired results. In phase 2, NLP results were compared with a manually abstracted data set from a previous study to check the adenoma, serrated adenoma, advanced adenoma, and leftsided hyperplastic polyp detection rates in 343 patients.3 The accuracy of parsing of the data from the pathology reports was verified. The program failed to link 3 examinations with pathology reports. In phase 3, NLP results were compared with those of another manual method of data extraction from July 1, 2010 to February 28, 2013 (12,748 colonoscopy procedure reports) to identify patients undergoing screening colonoscopy. In addition, MRN-matched records for manual and NLP processing were used to test the NLP program’s ability to report the ADR. In the manual review, all colonoscopy reports were evaluated to determine whether the examination was an initial screening examination. The endoscopy report, reason for referral to endoscopy, pathology report, and past records both from MDA as well as outside facilities were included in the review. This labor-intensive approach was adopted becasue the use of “screening” in the colonoscopy template was inconsistent among endoscopists. The intent was to generate ADR data for as many providers as possible with 90 screening examinations during the study period being the minimum. Because earlier work has shown that low procedural volumes produce large ADR confidence intervals, complicating interpretation, the aim was to capture as many examinations that could be considered screening as possible.10 For example, referrals for hemepositive stool or rectal bleeding in an otherwise asymptomatic 66-year-old with no prior colonoscopy would be considered screening examinations for the manual review.

CAADRR Testing

Statistics

CAADRR testing was performed for 15,471 patients with 19,331 colonoscopy reports from June 19, 2009 to February 28, 2013 and who had 158,645 pathology reports (endoscopy, surgical pathology, and cytology reports) recorded in our pathology database, including 32,450 colonoscopy reports (November 1998 to June 2009) exported from our retired EndoPRO database (Pentax Medical, Montvale, NJ) into the staging database to identify patients with prior colonoscopies. The CAADRR was tested in 3 phases. In phase 1, NLP data were organized in a linear fashion d1 row, 1 colonoscopy examination record (indications, findings, etc) along with the full text and abstracted pathology datadallowing us to easily check thousands of records to see whether the NLP engine was functioning correctly. An iterative process was used to fine-tune the NLP algorithms and collect the keyword dictionary search terms. After running the data processing program, data in the data grid were examined to filter, sort, and search for key terms to determine whether the NLP engine produced

The numbers of screening examinations are reported using manual data extraction and NLP. Manual data extraction is the standard practice of collecting data retrospectively from medical records. In our analysis, the total number of patients having colonoscopy examinations was defined as the number of examinations extracted using NLP. All examinations identified using both methods were then used to compare the methods’ accuracy. When both methods agreed, the relevant information was assumed to be true. Upon disagreement between the manually extracted and NLP-extracted data, the patient’s medical record was re-examined by an experienced gastroenterologist not involved in initial manual review who was aware of the results of manual and NLP review process to resolve the disagreement. The agreements and resolved disagreements make up the criterion standard for determining the accuracy of each measure (called “reviewed”), including whether an examination is considered a screening examination. The numbers and

www.giejournal.org

Volume 82, No. 3 : 2015 GASTROINTESTINAL ENDOSCOPY 515

Application Development Shortcuts The ADR reporting application is composed of an advanced data grid (XtraGrid) purchased from a thirdparty software vendor (DevExpress, Glendale, Calif) with advance built-in functions like wild-card searching of free text, filtering of any field, grouping of data, and exporting of data to an Excel spreadsheet (Microsoft). The code algorithms used to generate the statistics in the application were taken from the Remondo website (http://www. remondo.net/category/algorithm/).

CAADRR Development Cycle

Natural language processing for colonoscopy quality metrics

Raju et al

TABLE 2. Manual and NLP-based extraction of screening information for all patients undergoing colonoscopy examinations (N [ 12,748)

TABLE 3. Manual and NLP-based identification of adenomas Manual extraction*

NLP*

Reviewed

Called adenomay

951

962

966

Correctly identified

950 (98.3)

960 (99.4)

966

16 (1.7)

6 (0.6)

0

Adenoma diagnosis

Manual extraction*

NLP*

Reviewed

2259

2169

2288

Correctly identified

2009 (87.8)

2089 (91.3)

2288

Incorrectly identified

Incorrectly identified

279 (12.2)

199 (8.7)

0

10,489

10,579

10,460

Correctly identified

10,210 (97.6)

10,380 (99.2)

10,460

Incorrectly identified

250 (2.4)

80 (0.8)

0

Screening status Screeningy

Not screeningy Correctly identified Incorrectly identified

Values in parentheses are percents. NLP, Natural language processing. *Percentages are based on the totals in the Reviewed column. yFor manually extracted data, the intention was to obtain all screening examinations. Therefore, if the examination was included in the data set, it was called “screening”; if not included, it was called “not screening.”

percentages of correct (and incorrect) identification were tabulated for screening identification. Then ADR and SSA detection rate were assessed in all screens to identify whether NLP could accurately identify such detections. Finally, the ADR in screening visits was calculated separately by each method as if it were going to be reported without knowledge of the other method to establish whether NLP may be a reasonable approach for routine ADR reporting.

Called not adenomay

1308

1297

1293

1292 (99.9)

1291 (99.8)

1293

1 (0.1)

2 (0.2)

0

Values in parentheses are percents. NLP, Natural language processing. *Percentages are based on totals in the Reviewed column. yAdenoma identified for all matched data regardless of screening status to identify the ability to accurately determine the presence of adenoma.

estimates. The maximum ADR difference between NLP and review was 7% and appeared for physician 2 where female ADR was under-reported by 7% and male ADR was over-reported by 7%. For manual collection, the maximum ADR difference from review was 6% and occurred twice for males, once for physician 5 and once for physician 10. Overall, NLP and the manual method always produced an ADR within the confidence interval of the reviewed ADR. No consistent trend showed that NLP may over- or under-report the ADR in routine reporting.

RESULTS

DISCUSSION

Identification of patients who underwent screening colonoscopy

Both the manual data extraction method and NLP correctly identified adenomas perfectly in nearly every case (Table 3). Also, both methods correctly identified SSAs perfectly in almost all cases, although the NLP performed slightly better (Table 4). Table 5 presents the ADRs for each method using the number of screens detected by that respective method. This presents the ADR that would be reported if each were the only method performed. In 4 versus 15 cases, NLP versus the manual method differed from review by more than 3 percentage points, across males, females, or combined

Our study demonstrates that NLP can accurately identify patients who undergo screening colonoscopy and identify pathology of colon polyps, thereby generating reliable ADR values for quality assurance. In addition, it facilitates reporting of the ADR stratified by age, sex, and race as well as over time (and additional variables) more rapidly and with less effort than does the manual data extraction method. Currently, ADR is calculated using manual methods, which are not only laborious but also cumbersome because of the frequent use of unstructured reports and free-form text in report generation. In addition, some endoscopists use the term “screening” liberally to include colon polyp surveillance examinations, examinations of patients with colonic symptoms, examinations of patients with family histories of colon cancer, and other types of examinations in cases with a high likelihood of finding colon polyps; these reports require additional manual search of the electronic medical records to determine whether this was truly a screening examination or not. Manual data extraction requires dedicated personnel who are knowledgeable in endoscopy and pathology terminology to enter data, which is time-consuming and costly over the long term. In our study, endoscopists manually extracted the records for internal audit for approximately 5 minutes per record entry after reviewing the endoscopy

516 GASTROINTESTINAL ENDOSCOPY Volume 82, No. 3 : 2015

www.giejournal.org

A total of 12,748 patients underwent colonoscopy examinations from July 1, 2010 to February 28, 2013. Table 2 shows that 2259 and 2169 of these examinations were initially identified as screening examinations using the manual and automated methods, respectively. After resolving differences between manual and automated methods, we observed 2288 true screenings and 10,460 examinations that did not meet the screening criteria. The manual data extraction method identified 2009 (87.8%) of the true screening examinations, whereas NLP identified 2089 (91.3%) of them.

Reporting of colonoscopy quality metrics

Raju et al

Natural language processing for colonoscopy quality metrics

and pathology records as well as looking at past medical records. Because of these challenges in calculating the ADR, some centers have resorted to calculation of the polypectomy rate using administrative claims data, which has proven to be an accurate surrogate for the ADR in preliminary reports and may become an important quality measure for external and internal use.11 However, the polypectomy rate is perceived by the endoscopy community to be set up for potential misuse by endoscopists if they remove normal tissue or nonneoplastic polyps and report these resections as polypectomies. This problem is not an issue with reporting of the ADR, which can be accomplished using NLP as shown in our study and by others.12,13 Admittedly, the ADR can be influenced by examinations misclassified as screening examinations. Therefore, we made an intense effort to accurately identify all examinations that were screening examinations, not just those noted in the procedure indication by reviewing the past pathology, endoscopy, and electronic medical records. NLP accurately identifies the pathology of polyps as demonstrated in our study and by Imler et al.14 In addition, it can identify the locations, sizes, and numbers of adenomas as well as hyperplastic polyps when these data are included in either the pathology or endoscopic report. This allows for detailed reporting of colonoscopy quality metrics, such as the ADR, serrated lesion detection rate, multiple ADR, and adenoma burden per patient. NLP-based measurement of the ADR and other colonoscopy quality metrics has several benefits. For example, NLP can report the ADR and various quality metrics in different groups according to age, sex, and race. This is particularly meaningful when using these subgroups in comparing quality metrics for different endoscopists, practices, and regions of the country because patient demographics and evolving standards of practice have bearing on the ADR.15-17 Also, the ability of NLP to search pathology databases, transcribed medical documents, and endoscopy databases for prior examinations allows for separating screening examinations from surveillance examinations to

report quality metrics. Once the NLP is set up, it can review tens of thousands of records quickly and provide accurate reports. In our study, reviewing and recording the data for 12,748 patients at the rate of 5 minutes per record entry took 1062 hours, or 26.5 weeks of full-time labor. A study this large would be prohibitively expensive when using a manual data extraction method over the long term and on a national scale. The design of the CAADRR using 3 separate programs has several benefits. Both data extraction and data processing, which are time-consuming tasks, can run during off hours to enable the system to perform efficiently. In addition to efficiency of operation, the CAADRR design allows for easy adoption of changes in electronic medical records and endoscopy writing software as well as changes in hospital computer systems. The data extraction program is designed as an independent program, which minimizes the effect of changes in electronic medical records and endoscopy writing software in generating the quality metrics. Also, the staging database has all the necessary fields for generating quality metrics. This design permits use of the program even if the hospital computer system changes because the only change in the CAADRR necessary is in how the data extraction program accesses endoscopy, pathology, and electronic medical record systems to obtain the necessary data and place them in the correct fields in the staging database. NLP provides information as reliably and accurately as the manual method of extracting data, as demonstrated in our study. Therefore, this NLP potentially can be used for reporting quality metrics to the Centers for Medicare & Medicaid Services for national benchmarking. However, our NLP method has some limitations. We used data from 1 academic institution with 1 particular type of endoscopy reporting system, 1 electronic medical record system, and 1 pathology management software system. It has yet to be tested in terms of its ability to link with The GI Quality Improvement Consortium. The program has not been tested to extract data from scanned documents. It is not set up to include patients undergoing colonoscopy at an outside facility, especially when there is no entry about the procedure in the transcribed documents or in the endoscopy report. The program also depends on accurate data entry regarding indications to correctly identify patients who have undergone true screening examinations because discrepancies exist among NLP, the manual data extraction method, and the reviewed information regarding the screening examinations. The reviewed data are considered our criterion standard because screening examinations and adenomas may be missed by both the manual method and NLP. In the absence of a better method, we have no means of identifying those screening examinations without an additional manual audit of all colonoscopies called no screening or with no adenoma or SSA detected. Our comparison of NLP with a careful manual data collection procedure consistent with standard

www.giejournal.org

Volume 82, No. 3 : 2015 GASTROINTESTINAL ENDOSCOPY 517

TABLE 4. Manual and NLP-based identification of SSAs

SSA diagnosis Called SSAy Correctly identified Incorrectly identified Called not SSAy Correctly identified Incorrectly identified

Manual extraction*

NLP*

Reviewed

190

194

193

186 (96.4)

193 (100)

193

7 (3.6)

0

0

2069

2065

2066

2062 (99.8)

2065 (99.9)

2066

4 (0.2)

1 (0.0)

0

Values in parentheses are percents. SSA, Serrated sessile adenomas; NLP, natural language processing. *Percentages are based on totals in the Reviewed column. ySSA is identified for all matched data regardless of screening status to identify the ability to accurately determine the presence of SSA.

Natural language processing for colonoscopy quality metrics

Raju et al

TABLE 5. ADR by physician and gender for screening visits as identified by the same method Manual Physician

Gender

All

1

2

3

4

5

6

7

8

9

10

11

12

NLP

Reviewed

N

ADR% (95% CI)

N

ADR% (95% CI)

N

ADR% (95% CI)

All

2259

42 (40, 44)

2169

43 (41, 45)

2288

43 (41, 45)

F

1453

36 (34, 39)

1348

37 (34, 39)

1439

37 (34, 39)

M

806

52 (49, 56)

821

54 (50, 57)

849

54 (51, 58)

All

142

50 (42, 58)

138

49 (41, 58)

149

51 (43, 59)

F

88

43 (33, 54)

85

46 (35, 57)

93

45 (35, 55)

M

54

61 (48, 75)

53

55 (41, 69)

56

61 (48, 74)

All

63

30 (19, 42)

45

29 (15, 43)

51

29 (16, 42)

F

39

28 (13, 43)

26

19 (3, 35)

31

26 (9, 42)

M

24

33 (13, 54)

19

42 (18, 67)

20

35 (12, 58)

All

131

47 (38, 55)

119

41 (32, 50)

123

44 (35, 53)

F

93

39 (29, 49)

85

32 (22, 42)

85

34 (24, 44)

M

38

66 (50, 82)

34

65 (48, 82)

38

66 (50, 82)

All

231

29 (24, 35)

228

30 (24, 36)

224

32 (26, 38)

F

146

23 (16, 30)

140

24 (16, 31)

140

25 (18, 32)

M

85

40 (29, 51)

88

41 (30, 51)

84

43 (32, 54)

All

152

38 (30, 46)

116

43 (34, 52)

149

41 (33, 49)

F

97

29 (20, 38)

71

31 (20, 42)

95

29 (20, 39)

M

55

55 (41, 68)

45

62 (47, 77)

54

61 (48, 75)

All

255

37 (31, 43)

224

36 (30, 43)

243

36 (30, 42)

F

161

28 (21, 35)

136

26 (18, 33)

151

25 (18, 32)

M

94

53 (43, 63)

88

52 (42, 63)

92

54 (44, 65)

All

203

49 (42, 56)

197

52 (45, 59)

203

51 (44, 58)

F

130

44 (35, 52)

122

46 (37, 55)

128

45 (36, 53)

M

73

59 (47, 70)

75

63 (51, 74)

75

63 (51, 74)

All

252

60 (54, 66)

234

62 (55, 68)

256

61 (55, 67)

F

158

58 (50, 66)

142

59 (51, 67)

156

59 (51, 67)

M

94

64 (54, 74)

92

65 (55, 75)

100

64 (54, 74)

All

185

55 (47, 62)

187

52 (45, 60)

197

52 (45, 59)

F

122

48 (39, 57)

124

46 (37, 55)

130

46 (37, 55)

M

63

68 (56, 80)

63

65 (53, 77)

67

64 (52, 76)

All

377

33 (29, 38)

388

37 (32, 42)

406

37 (32, 42)

F

254

31 (25, 37)

249

33 (27, 39)

265

33 (28, 39)

M

123

38 (30, 47)

139

45 (36, 53)

141

44 (36, 52)

All

78

21 (11, 30)

76

22 (13, 32)

75

21 (12, 31)

F

49

12 (3, 22)

47

17 (6, 28)

46

15 (4, 26)

M

29

34 (16, 53)

29

31 (13, 49)

29

31 (13, 49) 46 (39, 53)

All

190

44 (37, 51)

217

46 (39, 52)

212

F

116

40 (31, 49)

121

39 (30, 48)

119

39 (30, 48)

M

74

51 (40, 63)

96

54 (44, 64)

93

55 (45, 65)

ADR, Adenoma detection rate; NLP, natural language processing.

practice is a reasonable approach. Another limitation of our study is the potential for underreporting of adenoma burden per patient because the system is unable to identify patients when multiple adenomas are submitted in a single jar.

In conclusion, the present study demonstrated the value of NLP in identifying patients who underwent screening colonoscopy, reporting correctly the identification of adenomas and serrated adenomas, and reporting in real time the performance of colonoscopists according

518 GASTROINTESTINAL ENDOSCOPY Volume 82, No. 3 : 2015

www.giejournal.org

Raju et al

to the types of patients they serve. The program must be explored further regarding its potential use with different types of reporting systems as well as ability to sync with registries such as The GI Quality Improvement Consortium. Such programs have the potential to significantly reduce the burden on practitioners in reporting quality metrics in a timely, accurate, low-cost manner, freeing up time for them to care for patients and address other regulatory issues.

Natural language processing for colonoscopy quality metrics

13.

14.

15.

16.

REFERENCES 1. Calderwood AH, Jacobson BC. Colonoscopy quality: metrics and implementation. Gasteroenterol Clin North Am 2013;42:599-618. 2. Corley DA, Jensen CD, Marks AR, et al. Adenoma detection rate and risk of colorectal cancer and death. N Engl J Med 2014;370:1298-306. 3. Raju GS, Vadyala V, Slack R, et al. Adenoma detection in patients undergoing a comprehensive colonoscopy screening. Cancer Med 2013;2:391-402. 4. Ross WA, Thirumurthi S, Lynch PM, et al. Detection rates of premalignant polyps during screening colonoscopy: time to revise quality standards? Gastrointest Endosc 2015;81:567-74. 5. Seeff LC, Richards TB, Shapiro JA, et al. How many endoscopies are performed for colorectal cancer screening? Results from CDC’s survey of endoscopic capacity. Gastroenterology 2004;127:1670-7. 6. Ohno-Machado L. What’s new in informatics. J Am Med Inform Assoc 2011;18:1. 7. Ohno-Machado L. Realizing the full potential of electronic health records: the role of natural language processing. J Am Med Inform Assoc 2011;18:539. 8. Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction. J Am Med Inform Assoc 2011;18:544-51. 9. Hou JK, Imler TD, Imperiale TF. Current and future applications of natural language processing in the field of digestive diseases. Clin Gastroenterol Hepatol 2014;12:1257-61. 10. Do A, Weinberg J, Kakkar A, et al. Reliability of adenoma detection rate is based on procedural volume. Gastrointest Endosc 2013;77:376-80. 11. Patel NC, Islam RS, Wu Q, et al. Measurement of polypectomy rate by using administrative claims data with validation against the adenoma detection rate. Gastrointest Endosc 2013;77:390-4. 12. Mehrotra A, Dellon ES, Schoen RE, et al. Applying a natural language processing tool to electronic health records to assess

www.giejournal.org

17.

performance on colonoscopy quality measures. Gastrointest Endosc 2012;75:1233-9. Deutsch JC. Colonoscopy quality, quality measures, and a natural language processing tool for electronic health records. Gastrointest Endosc 2012;75:1240-2. Imler TD, Morea J, Kahi C, et al. Natural language processing accurately categorizes findings from colonoscopy and pathology reports. Clin Gastroenterol Hepatol 2013;11:689-94. Kahi CJ, Vemulapalli KC, Johnson CS, et al. Improving measurement of the adenoma detection rate and adenoma per colonoscopy quality metric: the Indiana University experience. Gastrointest Endosc 2014;79:448-54. Hernandez LV, Deas TM, Catalano MF, et al. Longitudinal assessment of colonoscopy quality indicators: a report from the Gastroenterology Practice Management Group. Gastrointest Endosc 2014;80: 835-41. Gawron AJ, Thompson WK, Keswani RN, et al. Anatomic and advanced adenoma detection rates as quality metrics determined via natural language processing. Am J Gastroenterol 2014;109:1844-9.

Abbreviations: ADR, adenoma detection rate; CAADRR, computer application for ADR reporting; MRN, medical record number; NLP, natural language processing; SSA, sessile serrated adenoma. DISCLOSURE: All authors received research support for this study from the National Institutes of Health/National Cancer Institute under award number P30CA016672 (and K07CA160753 to M. Pande) and used the Biostatistics Shared Resource. All authors disclosed no financial relationships relevant to this publication. See CME section; p. 557. Copyright ª 2015 by the American Society for Gastrointestinal Endoscopy 0016-5107/$36.00 http://dx.doi.org/10.1016/j.gie.2015.01.049 Received October 21, 2014. Accepted January 25, 2015. Current affiliations: Department of Gastroenterology, Hepatology and Nutrition (1), Department of Biostatistics (2), Department of Pathology (3), Department of Hematopathology (4), The University of Texas MD Anderson Cancer Center, Houston, Texas, USA. Reprint requests: G. S. Raju, MD, 1515 Holcombe Blvd, Unit Number 1466, Houston, TX 77030-4009. If you would like to chat with an author of this article, you may contact Dr Raju at [email protected].

Volume 82, No. 3 : 2015 GASTROINTESTINAL ENDOSCOPY 519

Natural language processing as an alternative to manual reporting of colonoscopy quality metrics.

The adenoma detection rate (ADR) is a quality metric tied to interval colon cancer occurrence. However, manual extraction of data to calculate and tra...
365KB Sizes 1 Downloads 5 Views