Development and Validation of a Machine Learning Model for Automated Assessment of Resident Clinical Reasoning Documentation
BACKGROUND:Residents receive infrequent feedback on their clinical reasoning (CR) documentation. While machine learning (ML) and natural language processing (NLP) have been used to assess CR documentation in standardized cases, no studies have described similar use in the clinical environment. OBJECTIVE:The authors developed and validated using Kane's framework a ML model for automated assessment of CR documentation quality in residents' admission notes. DESIGN, PARTICIPANTS, MAIN MEASURES/UNASSIGNED:Internal medicine residents' and subspecialty fellows' admission notes at one medical center from July 2014 to March 2020 were extracted from the electronic health record. Using a validated CR documentation rubric, the authors rated 414 notes for the ML development dataset. Notes were truncated to isolate the relevant portion; an NLP software (cTAKES) extracted disease/disorder named entities and human review generated CR terms. The final model had three input variables and classified notes as demonstrating low- or high-quality CR documentation. The ML model was applied to a retrospective dataset (9591 notes) for human validation and data analysis. Reliability between human and ML ratings was assessed on 205 of these notes with Cohen's kappa. CR documentation quality by post-graduate year (PGY) was evaluated by the Mantel-Haenszel test of trend. KEY RESULTS/RESULTS:The top-performing logistic regression model had an area under the receiver operating characteristic curve of 0.88, a positive predictive value of 0.68, and an accuracy of 0.79. Cohen's kappa was 0.67. Of the 9591 notes, 31.1% demonstrated high-quality CR documentation; quality increased from 27.0% (PGY1) to 31.0% (PGY2) to 39.0% (PGY3) (p < .001 for trend). Validity evidence was collected in each domain of Kane's framework (scoring, generalization, extrapolation, and implications). CONCLUSIONS:The authors developed and validated a high-performing ML model that classifies CR documentation quality in resident admission notes in the clinical environment-a novel application of ML and NLP with many potential use cases.
Development of a Clinical Reasoning Documentation Assessment Tool for Resident and Fellow Admission Notes: a Shared Mental Model for Feedback
BACKGROUND:Residents and fellows receive little feedback on their clinical reasoning documentation. Barriers include lack of a shared mental model and variability in the reliability and validity of existing assessment tools. Of the existing tools, the IDEA assessment tool includes a robust assessment of clinical reasoning documentation focusing on four elements (interpretive summary, differential diagnosis, explanation of reasoning for lead and alternative diagnoses) but lacks descriptive anchors threatening its reliability. OBJECTIVE:Our goal was to develop a valid and reliable assessment tool for clinical reasoning documentation building off the IDEA assessment tool. DESIGN, PARTICIPANTS, AND MAIN MEASURES/UNASSIGNED:The Revised-IDEA assessment tool was developed by four clinician educators through iterative review of admission notes written by medicine residents and fellows and subsequently piloted with additional faculty to ensure response process validity. A random sample of 252 notes from July 2014 to June 2017 written by 30 trainees across several chief complaints was rated. Three raters rated 20% of the notes to demonstrate internal structure validity. A quality cut-off score was determined using Hofstee standard setting. KEY RESULTS/RESULTS:The Revised-IDEA assessment tool includes the same four domains as the IDEA assessment tool with more detailed descriptive prompts, new Likert scale anchors, and a score range of 0-10. Intraclass correlation was high for the notes rated by three raters, 0.84 (95% CI 0.74-0.90). Scores â‰¥6 were determined to demonstrate high-quality clinical reasoning documentation. Only 53% of notes (134/252) were high-quality. CONCLUSIONS:The Revised-IDEA assessment tool is reliable and easy to use for feedback on clinical reasoning documentation in resident and fellow admission notes with descriptive anchors that facilitate a shared mental model for feedback.
Lessons in clinical reasoning â€’ pitfalls, myths and pearls: a case of recurrent pancreatitis
OBJECTIVES/OBJECTIVE:Cognitive biases can result in clinical reasoning failures that can lead to diagnostic errors. Autobrewery syndrome is a rare, but likely underdiagnosed, condition in which gut flora ferment glucose, producing ethanol. It most frequently presents with unexplained episodes of inebriation, though more case studies are necessary to better characterize the syndrome. CASE PRESENTATION/METHODS:This is a case of a 41-year old male with a past medical history notable only for frequent sinus infections, who presented with recurrent episodes of acute pancreatitis. In the week prior to his first episode of pancreatitis, he consumed four beers, an increase from his baseline of 1-2 drinks per month. At home, he had several episodes of confusion, which he attributed to fatigue. He underwent laparoscopic cholecystectomy and testing for genetic and autoimmune causes of pancreatitis, which were non-revealing. He was hospitalized 10 more times during that 9-month period for acute pancreatitis with elevated transaminases. During these admissions, he had elevated triglycerides requiring an insulin drip and elevated alcohol level despite abstaining from alcohol for the prior eight months. His alcohol level increased after consumption of complex carbohydrates, confirming the diagnosis of autobrewery syndrome. CONCLUSIONS:Through integrated commentary on the diagnostic reasoning process, this case underscores how overconfidence can lead to premature closure and anchoring resulting in diagnostic error. Using a metacognitive overview, case discussants describe the importance of structured reflection and a standardized approach to early hypothesis generation to navigate these cognitive biases.
Hickam's dictum, Occam's razor, and Crabtree's bludgeon: a case of renal failure and a clavicular mass
OBJECTIVES/OBJECTIVE:Our discussant's thoughtful consideration of the patient's case allows for review of three maxims of medicine: Occam's razor (the simplest diagnosis is the most likely to be correct), Hickam's dictum (multiple disease entities are more likely than one), and Crabtree's bludgeon (the tendency to make data fit to an explanation we hold dear). CASE PRESENTATION/METHODS:A 66-year-old woman with a history of hypertension presented to our hospital one day after arrival to the United States from Guinea with chronic daily vomiting, unintentional weight loss and progressive shoulder pain. Her labs are notable for renal failure, nephrotic range proteinuria and normocytic anemia while her shoulder X-ray shows osseous resorption in the lateral right clavicle. Multiple myeloma became the team's working diagnosis; however, a subsequent shoulder biopsy was consistent withÂ follicular thyroid carcinoma. Imaging suggested the patient's renal failure was more likely a result of a chronic, unrelated process. CONCLUSIONS:It is tempting to bludgeon diagnostic possibilities into Occam's razor. Presumption that a patient's signs and symptoms are connected by one disease process often puts us at a cognitive advantage. However, atypical presentations, multiple disease processes, and unique populations often lend themselves more to Hickam's dictum than to Occam's razor. Diagnostic aids include performing a metacognitive checklist, engaging analytic thinking, and acknowledging the imperfections of these axioms.
An artificial intelligence system for predicting the deterioration of COVID-19 patients in the emergency department
During the coronavirus disease 2019 (COVID-19) pandemic, rapid and accurate triage of patients at the emergency department is critical to inform decision-making. We propose a data-driven approach for automatic prediction of deterioration risk using a deep neural network that learns from chest X-ray images and a gradient boosting model that learns from routine clinical variables. Our AI prognosis system, trained using data from 3661 patients, achieves an area under the receiver operating characteristic curve (AUC) of 0.786 (95% CI: 0.745-0.830) when predicting deterioration within 96â€‰hours. The deep neural network extracts informative areas of chest X-ray images to assist clinicians in interpreting the predictions and performs comparably to two radiologists in a reader study. In order to verify performance in a real clinical setting, we silently deployed a preliminary version of the deep neural network at New York University Langone Health during the first wave of the pandemic, which produced accurate predictions in real-time. In summary, our findings demonstrate the potential of the proposed system for assisting front-line physicians in the triage of COVID-19 patients.
Notesense: development of a machine learning algorithm for feedback on clinical reasoning documentation [Meeting Abstract]
BACKGROUND: Clinical reasoning (CR) is a core component of medical training, yet residents often receive little feedback on their CR documentation. Here we describe the process of developing a machine learning (ML) algorithm for feedback on CR documentation to increase the frequency and quality of feedback in this domain.
METHOD(S): To create this algorithm, note quality first had to be rated by gold standard human rating. We selected the IDEA Assessment Tool-a note rating instrument across four domains (I=Interpretive summary, D=Differential diagnosis, E=Explanation of reasoning, A=Alternative diagnoses explained) that uses a 3-point Likert scale without descriptive anchors. To develop descriptive anchors we conducted an iterative process reviewing notes from the EHR written by medicine residents and validated the Revised-IDEA Assessment Tool using Messick's framework- content validity, response process, relation to other variables, internal structure, and consequences. Using the Hofstee standard setting method, cutoffs for high quality clinical reasoning for the IDEA and DEA scores were set. We then created a dataset of expertrated notes to create the ML algorithm. First, a natural language processing software was applied to the set of notes that enabled recognition and automatic encoding of clinical information as a diagnosis or disease (D's), a sign or symptom (E or A), or semantic qualifier (e.g. most likely). Input variables to the ML algorithm included counts of D's, E/A's, semantic qualifiers, and proximity of semantic qualifiers to disease/ diagnosis. ML output focused on DEA quality and was binarized to low or high quality CR. Finally, 200 notes were randomly selected for human validation review comparing ML output to human rated DEA score.
RESULT(S): The IDEA and DEA scores ranged from 0-10 and 0-6, respectively. IDEA score of >= 6.5 and a DEA score of >= 3 was deemed high quality. 252 notes were rated to create the dataset and 20% were rated by 3 raters with high intraclass correlation 0.84 (95% CI 0.74-0.90). 120 of these notes comprised the testing set for ML model development. The logistic regression model was the best performing model with an AUC 0.87 and a positive predictive value (PPV) of 0.65. 48 (40%) of the notes were high quality. There was substantial interrater reliability between ML output and human rating on the 200 note validation set with a Cohen's Kappa 0.64.
CONCLUSION(S): We have developed a ML algorithm for feedback on CR documentation that we hypothesize will increase the frequency and quality of feedback in this domain. We have subsequently developed a dashboard that will display the output of the ML model. Next steps will be to provide internal medicine residents' feedback on their CR documentation using this dashboard and assess the impact this has on their documentation quality. LEARNING OBJECTIVE #1: Describe the importance of high quality documentation of clinical reasoning. LEARNING OBJECTIVE #2: Identify machine learning as a novel assessment tool for feedback on clinical reasoning documentation
An artificial intelligence system for predicting the deterioration of COVID-19 patients in the emergency department [PrePrint]
During the COVID-19 pandemic, rapid and accurate triage of patients at the emergency department is critical to inform decision-making. We propose a data-driven approach for automatic prediction of deterioration risk using a deep neural network that learns from chest X-ray images, and a gradient boosting model that learns from routine clinical variables. Our AI prognosis system, trained using data from 3,661 patients, achieves an AUC of 0.786 (95% CI: 0.742-0.827) when predicting deterioration within 96 hours. The deep neural network extracts informative areas of chest X-ray images to assist clinicians in interpreting the predictions, and performs comparably to two radiologists in a reader study. In order to verify performance in a real clinical setting, we silently deployed a preliminary version of the deep neural network at NYU Langone Health during the first wave of the pandemic, which produced accurate predictions in real-time. In summary, our findings demonstrate the potential of the proposed system for assisting front-line physicians in the triage of COVID-19 patients.
A validated, real-time prediction model for favorable outcomes in hospitalized COVID-19 patients
The COVID-19 pandemic has challenged front-line clinical decision-making, leading to numerous published prognostic tools. However, few models have been prospectively validated and none report implementation in practice. Here, we use 3345 retrospective and 474 prospective hospitalizations to develop and validate a parsimonious model to identify patients with favorable outcomes within 96â€‰h of a prediction, based on real-time lab values, vital signs, and oxygen support variables. In retrospective and prospective validation, the model achieves high average precision (88.6% 95% CI: [88.4-88.7] and 90.8% [90.8-90.8]) and discrimination (95.1% [95.1-95.2] and 86.8% [86.8-86.9]) respectively. We implemented and integrated the model into the EHR, achieving a positive predictive value of 93.3% with 41% sensitivity. Preliminary results suggest clinicians are adopting these scores into their clinical workflows.
Renal failure and a clavicular mass: Don't cut yourself on occam's razor [Meeting Abstract]
Case Summary: A 66-year-old woman with hypertension presented to the hospital one day after arrival to New York City from Guinea with chronic daily vomiting, unintentional weight loss, progressive shoulder pain, and a subacute pruritic rash. On exam, patient was hypertensive, had limited range of motion in her right shoulder, scaling plaques over legs and trunk, and asterixis. Labs were notable for a creatinine of 15 with normocytic anemia. Calcium was normal. The patient was admitted to hospital for acute renal failure and further workup. Diagnoses: Primary end stage renal disease of unknown etiology; and follicular thyroid carcinoma with metastases to the clavicle and lungs with paraneoplastic rash.
Discussion(s): This case highlights the diagnostic principles of Occam's razor, Hickam's dictum, and Crabtree's bludgeon. The initial differential diagnosis and workup proceeded with the expectation of a unifying diagnosis to explain the wide constellation of presenting symptoms. But after common systemic unifying diagnoses, including multiple myeloma and other infiltrative processes, were ruled out, it became evident that two processes were at play. The first was end stage renal failure, likely long-standing given imaging findings suggestive of chronicity. The second, a common malignancy with uncommon metastases. Perhaps an absence of regular interactions with the healthcare system prior to presentation increases like the positive predictive value of Hickam's dictum. Additionally, common diseases with uncommon presentations are still more common than zebras. Lessons from this case allows us to expand our thinking beyond the dogma of any one diagnostic principle, to avoid type I thinking and help us to direct our diagnostic reasoning
Platinum type is key in determining degree of neuropathy