2014; 36: 68–72

Automatic scoring of medical students’ clinical notes to monitor learning in the workplace ANDERSON SPICKARD III, HEATHER RIDINGER, JESSE WRENN, NATHAN O’BRIEN, ADAM SHPIGEL, MICHAEL WOLF, GLENN STEIN & JOSHUA DENNY Vanderbilt University School of Medicine, USA

Abstract Background: Educators need efficient and effective means to track students’ clinical experiences to monitor their progress toward competency goals. Aim: To validate an electronic scoring system that rates medical students’ clinical notes for relevance to priority topics of the medical school curriculum. Method: The Vanderbilt School of Medicine Core Clinical Curriculum enumerates 25 core clinical problems (CCP) that graduating medical students must understand. Medical students upload clinical notes pertinent to each CCP to a web-based dashboard, but criteria for determining relevance of a note and consistent uploading practices by students are lacking. The Vanderbilt Learning Portfolio (VLP) system automates both tasks by rating relevance for each CCP and uploading the note to the student’s electronic dashboard. We validated this electronic scoring system by comparing the relevance of 265 clinical notes written by third year medical students to each of the 25 core patient problems as scored by VLP verses an expert panel of raters. Results: We established the threshold score which yielded 75% positive prediction of relevance for 16 of the 25 clinical problems to expert opinion. Discussion: Automated scoring of student’s clinical notes provides a novel, efficient and standardized means of tracking student’s progress toward institutional competency goals.

Background

Practice points

Medical educators are responsible for providing rich clinical experiences that enable students to demonstrate clinical competence (Goroll et al. 2004; Corbett & Whitcomb 2012). Agencies such as the Liaison Committee for Medical Education (LCME) accredit institutions based on prescribed medical student involvement and mandate detailed accounting for clinical experiences (Functions and Structure of a Medical School 2012). Current tracking methods typically rely on student-created manual logs of patient encounters (Bird et al. 2001). A learner records or uploads into a web-based software tool a log of patient encounters including demographic data, location, diagnosis and severity of illness. The data can be compiled to assess a learner’s clinical exposures as well as overall case mix for students in the program. These methods require much manual data entry and can lead to incomplete capture. The typical system allows a limited number of diagnoses and does not capture diagnoses learners considered but ruled out. Teachers and learners often disagree about the teaching points of a particular case, which can result in inaccurate accounting of case mix (Rattner et al. 2001; Carney et al 2002). Increasingly, medical students use the electronic medical record to write notes on their patients (Hammoud et al. 2012). Consequently, educators are developing strategies to analyze these notes to monitor students’ learning (Elliot et al. 2011; Stephens et al. 2011). Such monitoring can be

. More and more, students write clinical notes of their patient encounters into the electronic medical record. . We validate a computer tool that accurately locates priority concepts of the medical student curriculum within students’ clinical notes. . Automated extraction and cataloging of the content of students’ clinical notes provides an efficient, comprehensive and real-time report of students’ experiences and learning in the workplace.

20 13

labor-intensive, but automatic tracking of students’ clinical notes has the potential to provide a more efficient and robust means of authenticating learning from patient encounters. The Vanderbilt Learning Portfolio (VLP) application is a novel web-based system that provides automatic display and computational analysis of students’ clinical notes to document progress toward learning goals of the medical school. VLP links to the electronic medical record such that it automatically captures all trainees’ (medical students’ and residents’) clinical notes (Spickard et al. 2008). Appropriate mentors are provided access to students’ portfolios and are notified when a new note is posted. Medical students meet regularly with mentors to monitor their progress in seeing patients with the core clinical

Correspondence: Anderson Spickard III, MD, MS, 7040 Medical Center East, Nashville, TN 37232, USA. Tel: 615 936 3177; fax: 615 936 3218; email: [email protected]

68

ISSN 0142–159X print/ISSN 1466–187X online/14/10068–5 ß 2014 Informa UK Ltd. DOI: 10.3109/0142159X.2013.849801

Automatic scoring of students’ notes

problems and understanding each case. They review their notes displayed on VLP, discuss lessons learned and review personal learning goals for future patient encounters. The system embeds natural language processing software to identify all biomedical concepts in each note. We have previously shown that we could develop algorithms to identify broad topics of interest in students’ clinical notes (e.g. ‘‘chest pain’’, ‘‘fever’’ – each of which involve other related concepts encompassing associated symptoms, evaluations and treatments) (Denny et al. 2009). This functionality allows students to search their clinical notes in VLP for these specific topics; the system scores each note with a numerical relevance score based on the algorithm provided. In this study, we developed and evaluated specific search algorithms to meet our institutional goals of comprehensive, yet efficient tracking of student clinical experiences, specifically targeting 25 key patient complaints that students are to master.

Methods Setting and participants The Vanderbilt Core Clinical Curriculum enumerates 25 Core Clinical Problems (CCPs) to be mastered by graduating medical students (Vanderbilt Core Clinical Curriculum 2012). The CCPs range from serious illnesses to everyday problems (Figure 1). For each CCP, a team of clinical and educational experts created a set of learning objectives, differential diagnoses and appropriate diagnostic work-up for medical students to consider when encountering a patient with one of these complaints. During inpatient and outpatient

rotations, students are assigned to evaluate patients with these presenting problems. Medical students perform a history and physical exam, help the team manage the patient’s care and document their work in daily clinical notes submitted to the electronic medical record. One means of monitoring student progress in the core clinical curriculum is for faculty members to analyze students’ clinical notes. Clinical and educational experts devised a list of 50–150 relevant concepts for each CCP that serve as keywords and phrases that, in aggregate, can be used to search students’ notes for relevance to any particular CCP. To derive this list of concepts for each CCP, the team first searched the curriculum management system in use at Vanderbilt to locate key concepts in core lectures on each CCP. The curriculum management system employs the same natural language processing system in use by VLP to automatically map uploaded documents to standard biomedical concepts with high sensitivity and positive predictive value (Denny et al. 2003). The team met to review the concepts for each CCP and through an iterative process of adding and deleting concepts, the team arrived at a list of essential concepts for each CCP. These essential concepts include specific history items, physical exam findings, differential diagnoses and appropriate diagnostic work-up that should be learned. For example, the CCP ‘‘chest pain’’ includes the representative concepts ‘‘angina’’, ‘‘chest wall tenderness’’, ‘‘peptic ulcer disease’’ and ‘‘electrocardiogram’’. We employ the natural language processing tools integrated within VLP to find these essential concepts of each CCP in medical students’ notes, and the system produces an automatic numeric score for each clinical note.

Figure 1. Screenshot of a medical student’s dashboard of notes that have been assigned to the 25 Vanderbilt Core Clinical Curriculum topics. This screenshot demonstrates counts assigned from both manual and automated methods.

69

A. Spickard III et al.

Program description At our institution, medical students write notes into the electronic medical record on all patients they evaluate in the inpatient and outpatient rotations during their clerkships. VLP captures these clinical notes from the electronic medical record and displays these notes on a web-based portal that is viewable to students and clinical instructors. Students can access the VLP portal separately from the electronic medical record to view all their clinical notes (history and physicals, progress notes and procedures), receive electronic feedback from mentors and search their notes for each CCP (Spickard et al. 2008). Students and clinical instructors are able to search a student’s notes using the set of essential concepts devised for each CCP. The system ranks each of the student’s notes by the count of concepts matching each CCP, weighted by the section of the note. Extra weight is given to matching concepts in more relevant sections of the note such as ‘‘history of present illness’’ and ‘‘assessment and plan’’, than in less relevant sections of the note, such as ‘‘review of systems’’, based on a prior evaluation of 10 CCPs with physician-generated gold standards (Denny et al. 2009). VLP then provides an overall numeric score of relevance for that particular CCP and displays the notes ranked from most to least relevant. Medical students use this tool to search their notes for each CCP to record relevant notes they find to a dashboard on VLP. The dashboard displays the number of notes assigned to each CCP (Figure 1). In this way, students and mentors can track clinical experiences and assess progress according to the 25 CCP topics.

Program evaluation Despite the ease of medical students reviewing their VLP to assign their notes to CCP’s for accounting of personal clinical experiences, not all students regularly comply with this responsibility. In addition, there are no clear criteria to direct students about what would ‘‘count’’ as a note that is relevant to a CCP. We sought to design a system that would automatically determine which notes contained adequate coverage of desired essential concept for each CCP and then populate students’ dashboards with the clinical notes that were relevant to each CCP. To explore this possibility, we first established a team consisting of two board-certified attending physicians (AS III, JD), a senior internal medicine resident (HR) and three fourth year medical students (NO, AS, MW). They rated 265 randomly selected clinical notes written by 90 third year medical students on their core clerkships as being primarily relevant, partially relevant or irrelevant to each of the 25 CCPs. The team met to review sample notes and discuss criteria for relevance. It was determined that a note should be considered to be ‘‘primarily relevant’’ to the CCP if that presenting problem was the main reason the patient was being evaluated. Likewise, a note could be considered ‘‘partially relevant’’ if the CCP was one of many aspects of the case. For example, a note documenting a patient with an evolving myocardial infarction would be scored as primarily relevant for the CCP ‘‘chest pain’’. If this same patient also had documented concerns for alcohol withdrawal, then the note could be marked as partially relevant to the CCP ‘‘substance abuse’’. 70

With these definitions, each reviewer separately evaluated the same collection of 15 notes to establish inter-rater reliability. Once inter-rater reliability was established on this subset of notes, each of the five experts then reviewed 50 notes individually (total of 250 documents). These were combined with the initial 15 documents for a total of 265 documents, which served as the gold standard set of notes for this study. This study was approved by the Internal Review Board. We conducted a VLP search of the gold standard collection of 265 student notes using the set of essential concepts for each CCP. We then compared numeric relevance scores for each CCP with the expert panel rating of relevance. We analyzed the data using notes rated as primarily relevant or partially relevant as true positives. Using Receiver Operator Characteristic (ROC) analysis, the area under the curve (AUC) established accuracy of the VLP scoring system in relation to expert opinion. We then calculated positive predictive values for each of the numeric VLP relevance scores, which enabled us to determine the cut-off score which yielded 75% positive prediction of the relevance with regard to the expert panel’s ratings.

Results Inter-rater reliability between the panel of experts was good (Kappa ¼ 0.57). Table 1 shows the area under the receiver operator characteristic curve (AUC) for 16 of the CCPs, which yielded 75% or higher positive predictive value. Nine CCPs were not considered for automated scoring due to inadequate performance. Eight CCPs, (abnormal vaginal bleeding, abnormal vaginal discharge, back pain, breast complaints, pelvic pain, seizure, sore throat and trauma) were removed due to inadequate numbers of relevant documents written by students in the gold standard set of notes. The CCP ‘‘substance abuse’’ was removed due to a relatively poor algorithm performance (AUC 0.71, CI 0.62–0.79). Table 1. The threshold score provided by Learning Portfolio and Area Under the Curve (AUC) that yielded 75% or higher positive predictive relevance to each core clinical problem.

Core clinical problem Abdominal pain Abnormal mood Altered mental status Chest pain Cough Dysuria Fever GI bleeding Headache Jaundice Loss of consciousness Obesity Rash Shock Shortness of breath Unexplained weight loss

Automated threshold score 19 21 15 25 23 9 16 35 31 16 13 16 15 21 8 11

AUC (CI)* 0.80 0.74 0.75 0.90 0.87 0.76 0.86 0.88 0.83 0.90 0.81 0.88 0.80 0.85 0.88 0.77

Positive predictive value (%)

(0.74–0.86) (0.66–0.82) (0.68–0.82) (0.87–0.94) (0.82–0.92) (0.69–0.83) (0.81–0.90) (0.84–0.93) (0.77–0.88) (0.82–0.97) (0.74–0.89) (0.81–0.95) (0.73–0.88) (0.78–0.91) (0.84–0.92) (0.69–0.84)

*Area under the receiver operating curve and confidence interval.

74.7 75.0 81.0 75.2 74.6 80.0 75.3 75.6 75.0 75.9 78.6 83.3 76.9 76.5 74.8 75.0

Automatic scoring of students’ notes

Discussion The VLP system advances traditional clinical tracking systems by providing a comprehensive, real-time capture of clinical notes, coupled with automated assessment tools. This study demonstrates that one can use natural language processing of students’ clinical notes to automatically categorize them according to priority education topics. This provides educators with a rubric of comprehensive clinical experience tracking for their medical students. For example, a VLP report on ‘‘altered mental status’’ for a student may reveal that the student routinely documents adequate history-taking for a case of ‘‘altered mental status’’, but regularly omits a complete neurological or mental status exam, which we consider to be an essential feature of the evaluation. The portfolio system captures and elucidates these important opportunities for correction, giving the student time to practice and document comprehensive examinations. Additionally, educators can look across all medical students’ notes to see trends of the group. VLP is a robust electronic system. Traditional, manual logs may not represent all that a student experienced. For example, using a manual tracking system, a student may log a patient who presents with unstable angina as a case of chest pain. Yet the medical student’s admission note may also document pertinent negative features of the case that rule out other potential causes of chest pain, such as pneumonia (e.g. no cough, fever, pleuritic discomfort or abnormal chest X-ray) or peptic ulcer disease (e.g. no bloating, hematemesis, history of nonsteroidal anti-inflammatory drugs (NSAIDs) use or laboratory findings of anemia). In this example, VLP is capable of finding all these concepts in the note and scoring the student’s work highly not only for ‘‘chest pain’’, but also for essential concepts of the CCPs ‘‘shortness of breath’’ and ‘‘abdominal pain’’. Other investigators have attempted to collect similar information about resident’s clinic experiences using billing codes to find diagnoses. Their approach was limited to specific ICD-9 codes, thus only a few diagnoses were identified (Sequist et al. 2005). Limitations caution interpretation of these findings. The VLP relevance scoring system ‘‘worked’’ for most topics in this study, but not for topics with fewer relevant notes, such as ‘‘breast complaints’’, ‘‘vaginal bleeding’’ and ‘‘trauma’’. This can be explained by the fact that this study mostly consisted of inpatient notes of third year clinical clerks, and ‘‘breast complaints’’ and ‘‘vaginal bleeding’’ are primarily outpatient topics. In addition, a limited number of our medical students rotate on the trauma service, thus fewer notes would address this problem. The evaluation of ‘‘substance abuse’’ was inadequate, not because of paucity of notes, but due to excess false positives. The system identified the concept ‘‘smoking’’ throughout many notes scoring them heavily for ‘‘substance abuse’’, when in fact the note usually concerned a cardiac or pulmonary condition related to the patient’s history of smoking. Because the learning objectives of ‘‘substance abuse’’ focus on alcohol and drug abuse, omission of ‘‘smoking’’ from the concept search terms may yield a more accurate algorithm for ‘‘substance abuse’’. Our score thresholds were designed to target a positive predictive value of 75%. This number was chosen empirically as an acceptable false

positive rate, when considered across the student population, but has not been experimentally evaluated as an educationally meaningful threshold. Nevertheless, a uniform threshold applied across a student population allows a ‘‘fair’’ means of crediting students for their work by standardizing the grading of notes for all students. Another potential limitation is that the validity of the scoring system is dependent on the gold standard set of notes assessed by group census. Finally, this work is not generalizable to other institutions in which students do not write notes in an electronic medical record as a normal course of action on the inpatient and outpatient rotations. Nevertheless, the system could potentially integrate with any electronic medical record, provided the electronic medical record could export the notes externally. Since completion of this study, we created and implemented infrastructure within VLP to automatically attribute credit to medical students for notes that surpass score thresholds for these 16 CCPs. Students’ dashboards are ‘‘filling up’’ with notes that reflect their vigorous clinical experiences, without student manual uploading effort. Nevertheless, students remain responsible for monitoring their dashboards. The automatic uploading of notes augments but does not replace students uploading notes manually to their dashboards. Mentors continue to meet with students to review these dashboards and plan future clinical experiences that enable them to complete core curriculum goals. At the conclusion of the clinical year, all students at our institution are graded on their workup of multiple standardized patients who present with one of the CCPs. More proof is needed to confirm the link between a one’s rich clinical experience and overall performance (Williams & Dunnington 2006; Wimmers et al. 2006). We plan to analyze the correlation between each student’s end-of-the-year work-ups of standardized patients with the student’s experience in encountering patients with the same problems as documented by their detailed clinical notes.

Conclusion In the setting of demanding clinical environments and multiple caregivers, it is easy to overlook students’ hard work and contribution to personal progress. Educators are now calling for more medical student involvement with the electronic medical record as a means of developing and demonstrating skills for clinical practice (Hammoud et al. 2012). Automatic tracking and scoring of electronic clinical notes, combined with appropriate mentoring, is a novel method that compliments traditional monitoring and evaluation of medical students’ learning in the workplace. Declaration of interest: The authors declare no conflicts of interest with respect to the authorship and/or publication of this article.

Notes on Contributors ANDERSON SPICKARD III, MD, MS, is an Associate Professor, Vanderbilt School of Medicine, Departments of Medicine and Biomedical Informatics. HEATHER RIDINGER, MD, is an Assistant Professor, Vanderbilt School of Medicine, Department of Medicine.

71

A. Spickard III et al.

JESSE WRENN, PhD, is a Senior Data Management Contractor, Vanderbilt School of Medicine, Department of Biomedical Informatics. NATHAN O’BRIEN, MD, is a Physician Resident, University of Utah School of Medicine, Department of Emergency Medicine. ADAM SHPIGEL, MD, is a Physician Resident, Johns Hopkins School of Medicine, Department of Medicine. MICHAEL WOLF, MD, is a Physician Resident, Vanderbilt School of Medicine, Department of Pediatrics. GLENN STEIN, is a Senior Programmer, Vanderbilt School of Medicine, Department of Biomedical Informatics. JOSHUA C. DENNY, MD, MS, is an Associate Professor, Vanderbilt School of Medicine, Departments of Medicine and Biomedical Informatics

References Bird SB, Zarum RS, Renzi FP. 2001. Emergency medicine resident patient care documentation using a hand-held computerized device. Acad Emerg Med 8(12):1200–1203. Carney PA, Pipas CF, Eliassen MS, Mengshol SC, Fall LH, Schifferdecker EK, Olson AL, Peltier DA, Nierenberg DW. 2002. An analysis of students’ clinical experiences in an integrated primary care clerkship. Acad Med 77(7):681–687. Corbett EC, Whitcomb M. 2012. The AAMC project on the clinical education of medical students: Clinical skills education. [Retrieved 23 June 2012] Available from: http//www.im.org/AAIM/PublicPolicy/Docs/Clinical SKILLs.pdf. Denny JC, Bastarache L, Sastre EA, Spickard III A. 2009. Tracking medical students’ clinical experiences using natural language processing. J Biomed Inform 42(5):781–789. Denny JC, Smithers JD, Miller RA, Spickard III A. 2003. Understanding medical school curriculum content using KnowledgeMap. J Am Med Inform Assoc 10(4):351–362.

72

Elliot K, Judd T, McColl G. 2011. A student-centred electronic health record system for clinical education. Stud Health Technol Inform 168:57–64. Functions and Structure of a Medical School. 2012. Standards for accreditation of medical education programs leading to the M.D. degree. Liaison Committee on Medical Education. [Accessed 23 June 2012] Available from: http://www.lcme.org/functions.pdf Goroll AH, Sirio C, Duffy FD, LeBlond RF, Alguire P, Blackwell TA, Rodak WE, Nasca T. 2004. A new model for accreditation of residency programs in internal medicine. Ann Intern Med 140(11):902–909. Hammoud MM, Dalymple JL, Christner JG, Stewart RA, Fisher J, Margo K, Ali II, Briscoe GW, Pangaro LN. 2012. Medical student documentation in electronic health records: A collaborative statement from the Alliance for Medical Education. Teach Learn Med 24(3):257–266. Rattner SL, Louis DZ, Rabinowitz C, Gottlieb JE, Nasca TJ, Markham FW, Gottlieb RP, Caruso JW, Lane JL, Veloski J, et al. 2001. Documenting and comparing medical students’ clinical experiences. JAMA 286(9):1035–1040. Sequist TD, Singh S, Pereira AG, Rusinak D, Pearson SD. 2005. Use of an electronic medical record to profile the continuity clinic experiences of primary care residents. Acad Med 80(4):390–394. Spickard III A, Gigante J, Stein G, Denny JC. 2008. Automatic capture of student notes to augment mentor feedback and student performance on patient write-ups. J Gen Intern Med 23(7):979–984. Stephens MB, Gimbel RW, Pangaro L. 2011. The RIME? EMR scheme: An educational approach to clinical documentation in electronic medical records. Acad Med 86(1):11–14. Vanderbilt Core Clinical Curriculum. 2012. [Accessed 16 December 2012] Available from: https://knowledgemap.mc.vanderbilt.edu/vc3 Williams RG, Dunnington GL. 2006. Assessing the ACGME competencies with methods that improve the quality of evidence and adequacy of sampling. ACGME Bull 2:39–42. Wimmers PF, Schmidt HG, Splinter TA. 2006. Influence of clerkship experiences on clinical competence. Med Educ 40(5):450–458.

Copyright of Medical Teacher is the property of Taylor & Francis Ltd and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

Automatic scoring of medical students' clinical notes to monitor learning in the workplace.

Educators need efficient and effective means to track students' clinical experiences to monitor their progress toward competency goals...
542KB Sizes 0 Downloads 0 Views