Try a new search

Format these results:

Searched for:

in-biosketch:yes

person:vjm261

Total Results:

38


Implementing the NYU Electronic Patient Visit Assessment (ePVA)© for head and neck cancer in rural and urban populations: a study protocol for a type 1 hybrid effectiveness-implementation clinical trial

Van Cleave, Janet H; Brody, Abraham A; Schulman-Green, Dena; Hu, Kenneth S; Li, Zujun; Johnson, Stephen B; Major, Vincent J; Lominska, Christopher E; Bauman, Jessica R; Hanania, Alexander N; Tatlonghari, Ghia V; Tsikis, Marcely; Egleston, Brian L
BACKGROUND:for HNC as a digital patient-reported symptom monitoring system that enables early symptom detection and real-time interventions at the point of care. With this study protocol, we aim to test the effectiveness of the ePVA in improving HNC outcomes in real-world settings and to identify implementation strategies optimizing its effectiveness. METHODS:We will conduct a longitudinal mixed-methods hybrid type I study at four National Cancer Institute-designated Comprehensive Cancer Centers serving diverse populations in rural and urban settings (New York University, the University of Kansas Cancer Center, Fox Chase Cancer Center, and Baylor College of Medicine) guided by the Reach, Effectiveness, Adoption, Implementation, and Maintenance (RE-AIM) framework. Patient eligibility criteria include having histologically diagnosed HNC and undergoing radiation therapy with or without chemotherapy for curative intent. We will also interview clinicians caring for patients with HNC at the participating institutions regarding facilitators and barriers to implementing the ePVA. The accrual goal is 270 patients. Aim 1 is to determine the effect of the ePVA on HNC symptoms in a two-arm (usual care vs. ePVA + usual care) trial. The study's primary outcomes are patients' self-reported social function, senses of taste and smell, and swallowing, measured by the European Organization for Research and Treatment of Cancer QLQ-C30 and QLQ-H&N35. For Aim 2, we will interview patients (n = 40) as well as clinicians (n = 30) caring for patients with HNC at the participating institutions regarding facilitators and barriers to implementing the ePVA. In Aim 3, we will integrate Aims 1 and 2 data to identify strategies that optimize the use of the ePVA. DISCUSSION/CONCLUSIONS:The overarching goal of this research is to advance cancer care by identifying implementation standards for effective, widespread use of the ePVA that apply to all patient-reported outcomes in cancer care. TRIAL REGISTRATION/BACKGROUND:ClinicalTrials.gov NCT06030011. Registered on 8 September 2023.
PMCID:12690913
PMID: 41366462
ISSN: 1745-6215
CID: 5977322

Evaluating Hospital Course Summarization by an Electronic Health Record-Based Large Language Model

Small, William R; Austrian, Jonathan; O'Donnell, Luke; Burk-Rafel, Jesse; Hochman, Katherine A; Goodman, Adam; Zaretsky, Jonah; Martin, Jacob; Johnson, Stephen; Major, Vincent J; Jones, Simon; Henke, Christian; Verplanke, Benjamin; Osso, Jwan; Larson, Ian; Saxena, Archana; Mednick, Aron; Simonis, Choumika; Han, Joseph; Kesari, Ravi; Wu, Xinyuan; Heery, Lauren; Desel, Tenzin; Baskharoun, Samuel; Figman, Noah; Farooq, Umar; Shah, Kunal; Jahan, Nusrat; Kim, Jeong Min; Testa, Paul; Feldman, Jonah
IMPORTANCE/UNASSIGNED:Hospital course (HC) summarization represents an increasingly onerous discharge summary component for physicians. Literature supports large language models (LLMs) for HC summarization, but whether physicians can effectively partner with electronic health record-embedded LLMs to draft HCs is unknown. OBJECTIVES/UNASSIGNED:To compare the editing effort required by time-constrained resident physicians to improve LLM- vs physician-generated HCs toward a novel 4Cs (complete, concise, cohesive, and confabulation-free) HC. DESIGN, SETTING, AND PARTICIPANTS/UNASSIGNED:Quality improvement study using a convenience sample of 10 internal medicine resident editors, 8 hospitalist evaluators, and randomly selected general medicine admissions in December 2023 lasting 4 to 8 days at New York University Langone Health. EXPOSURES/UNASSIGNED:Residents and hospitalists reviewed randomly assigned patient medical records for 10 minutes. Residents blinded to author type who edited each HC pair (physician and LLM) for quality in 3 minutes, followed by comparative ratings by attending hospitalists. MAIN OUTCOMES AND MEASURES/UNASSIGNED:Editing effort was quantified by analyzing the edits that occurred on the HC pairs after controlling for length (percentage edited) and the degree to which the original HCs' meaning was altered (semantic change). Hospitalists compared edited HC pairs with A/B testing on the 4Cs (5-point Likert scales converted to 10-point bidirectional scales). RESULTS/UNASSIGNED:Among 100 admissions, compared with physician HCs, residents edited a smaller percentage of LLM HCs (LLM mean [SD], 31.5% [16.6%] vs physicians, 44.8% [20.0%]; P < .001). Additionally, LLM HCs required less semantic change (LLM mean [SD], 2.4% [1.6%] vs physicians, 4.9% [3.5%]; P < .001). Attending physicians deemed LLM HCs to be more complete (mean [SD] difference LLM vs physicians on 10-point bidirectional scale, 3.00 [5.28]; P < .001), similarly concise (mean [SD], -1.02 [6.08]; P = .20), and cohesive (mean [SD], 0.70 [6.14]; P = .60), but with more confabulations (mean [SD], -0.98 [3.53]; P = .002). The composite scores were similar (mean [SD] difference LLM vs physician on 40-point bidirectional scale, 1.70 [14.24]; P = .46). CONCLUSIONS AND RELEVANCE/UNASSIGNED:Electronic health record-embedded LLM HCs required less editing than physician-generated HCs to approach a quality standard, resulting in HCs that were comparably or more complete, concise, and cohesive, but contained more confabulations. Despite the potential influence of artificial time constraints, this study supports the feasibility of a physician-LLM partnership for writing HCs and provides a basis for monitoring LLM HCs in clinical practice.
PMID: 40802185
ISSN: 2574-3805
CID: 5906762

Identifying and mitigating algorithmic bias in the safety net

Mackin, Shaina; Major, Vincent J; Chunara, Rumi; Newton-Dame, Remle
Algorithmic bias occurs when predictive model performance varies meaningfully across sociodemographic classes, exacerbating systemic healthcare disparities. NYC Health + Hospitals, an urban safety net system, assessed bias in two binary classification models in our electronic medical record: one predicting acute visits for asthma and one predicting unplanned readmissions. We evaluated differences in subgroup performance across race/ethnicity, sex, language, and insurance using equal opportunity difference (EOD), a metric comparing false negative rates. The most biased classes (race/ethnicity for asthma, insurance for readmission) were targeted for mitigation using threshold adjustment, which adjusts subgroup thresholds to minimize EOD, and reject option classification, which re-classifies scores near the threshold by subgroup. Successful mitigation was defined as 1) absolute subgroup EODs <5 percentage points, 2) accuracy reduction <10%, and 3) alert rate change <20%. Threshold adjustment met these criteria; reject option classification did not. We introduce a Supplementary Playbook outlining our approach for low-resource bias mitigation.
PMCID:12141433
PMID: 40473916
ISSN: 2398-6352
CID: 5862762

Periprocedural Myocardial Injury Using CKMB Following Elective PCI: Incidence and Associations With Long-Term Mortality

Talmor, Nina; Graves, Claire; Kozloff, Sam; Major, Vincent J; Xia, Yuhe; Shah, Binita; Babaev, Anvar; Razzouk, Louai; Rao, Sunil V; Attubato, Michael; Feit, Frederick; Slater, James; Smilowitz, Nathaniel R
BACKGROUND/UNASSIGNED:Myocardial injury detected after percutaneous coronary intervention (PCI) is associated with increased mortality. Predictors of post-PCI myocardial injury are not well established. The long-term prognostic relevance of post-PCI myocardial injury remains uncertain. METHODS/UNASSIGNED:Consecutive adults aged ≥18 years with stable ischemic heart disease who underwent elective PCI at NYU Langone Health between 2011 and 2020 were included in a retrospective, observational study. Patients with acute myocardial infarction or creatinine kinase myocardial band (CKMB) or troponin concentrations >99% of the upper reference limit before PCI were excluded. All patients had routine measurement of CKMB concentrations at 1 and 3 hours post-PCI. Post-PCI myocardial injury was defined as a peak CKMB concentration >99% upper reference limit. Linear regression models were used to identify clinical factors associated with post-PCI myocardial injury. Cox proportional hazard models were generated to evaluate relationships between post-PCI myocardial injury and all-cause mortality at long-term follow-up. RESULTS/UNASSIGNED:<0.001). After adjustment for demographics and clinical covariates, post-PCI myocardial injury was associated with an excess hazard for long-term mortality (hazard ratio, 1.46 [95% CI, 1.20-1.78]). CONCLUSIONS/UNASSIGNED:Myocardial injury defined by elevated CKMB early after PCI is common and associated with all-cause, long-term mortality. More complex coronary anatomy is predictive of post-PCI myocardial injury.
PMID: 40160098
ISSN: 1941-7632
CID: 5818652

Health system-wide access to generative artificial intelligence: the New York University Langone Health experience

Malhotra, Kiran; Wiesenfeld, Batia; Major, Vincent J; Grover, Himanshu; Aphinyanaphongs, Yindalon; Testa, Paul; Austrian, Jonathan S
OBJECTIVES/OBJECTIVE:The study aimed to assess the usage and impact of a private and secure instance of a generative artificial intelligence (GenAI) application in a large academic health center. The goal was to understand how employees interact with this technology and the influence on their perception of skill and work performance. MATERIALS AND METHODS/METHODS:New York University Langone Health (NYULH) established a secure, private, and managed Azure OpenAI service (GenAI Studio) and granted widespread access to employees. Usage was monitored and users were surveyed about their experiences. RESULTS:Over 6 months, over 1007 individuals applied for access, with high usage among research and clinical departments. Users felt prepared to use the GenAI studio, found it easy to use, and would recommend it to a colleague. Users employed the GenAI studio for diverse tasks such as writing, editing, summarizing, data analysis, and idea generation. Challenges included difficulties in educating the workforce in constructing effective prompts and token and API limitations. DISCUSSION/CONCLUSIONS:The study demonstrated high interest in and extensive use of GenAI in a healthcare setting, with users employing the technology for diverse tasks. While users identified several challenges, they also recognized the potential of GenAI and indicated a need for more instruction and guidance on effective usage. CONCLUSION/CONCLUSIONS:The private GenAI studio provided a useful tool for employees to augment their skills and apply GenAI to their daily tasks. The study underscored the importance of workforce education when implementing system-wide GenAI and provided insights into its strengths and weaknesses.
PMCID:11756645
PMID: 39584477
ISSN: 1527-974x
CID: 5778212

Evaluating Hospital Course Summarization by an Electronic Health Record-Based Large Language Model

Small, William R.; Austrian, Jonathan; O\Donnell, Luke; Burk-Rafel, Jesse; Hochman, Katherine A.; Goodman, Adam; Zaretsky, Jonah; Martin, Jacob; Johnson, Stephen; Major, Vincent J.; Jones, Simon; Henke, Christian; Verplanke, Benjamin; Osso, Jwan; Larson, Ian; Saxena, Archana; Mednick, Aron; Simonis, Choumika; Han, Joseph; Kesari, Ravi; Wu, Xinyuan; Heery, Lauren; Desel, Tenzin; Baskharoun, Samuel; Figman, Noah; Farooq, Umar; Shah, Kunal; Jahan, Nusrat; Kim, Jeong Min; Testa, Paul; Feldman, Jonah
ISI:001551557000002
ISSN: 2574-3805
CID: 5974192

Evaluating Large Language Models in extracting cognitive exam dates and scores

Zhang, Hao; Jethani, Neil; Jones, Simon; Genes, Nicholas; Major, Vincent J; Jaffe, Ian S; Cardillo, Anthony B; Heilenbach, Noah; Ali, Nadia Fazal; Bonanni, Luke J; Clayburn, Andrew J; Khera, Zain; Sadler, Erica C; Prasad, Jaideep; Schlacter, Jamie; Liu, Kevin; Silva, Benjamin; Montgomery, Sophie; Kim, Eric J; Lester, Jacob; Hill, Theodore M; Avoricani, Alba; Chervonski, Ethan; Davydov, James; Small, William; Chakravartty, Eesha; Grover, Himanshu; Dodson, John A; Brody, Abraham A; Aphinyanaphongs, Yindalon; Masurkar, Arjun; Razavian, Narges
Ensuring reliability of Large Language Models (LLMs) in clinical tasks is crucial. Our study assesses two state-of-the-art LLMs (ChatGPT and LlaMA-2) for extracting clinical information, focusing on cognitive tests like MMSE and CDR. Our data consisted of 135,307 clinical notes (Jan 12th, 2010 to May 24th, 2023) mentioning MMSE, CDR, or MoCA. After applying inclusion criteria 34,465 notes remained, of which 765 underwent ChatGPT (GPT-4) and LlaMA-2, and 22 experts reviewed the responses. ChatGPT successfully extracted MMSE and CDR instances with dates from 742 notes. We used 20 notes for fine-tuning and training the reviewers. The remaining 722 were assigned to reviewers, with 309 each assigned to two reviewers simultaneously. Inter-rater-agreement (Fleiss' Kappa), precision, recall, true/false negative rates, and accuracy were calculated. Our study follows TRIPOD reporting guidelines for model validation. For MMSE information extraction, ChatGPT (vs. LlaMA-2) achieved accuracy of 83% (vs. 66.4%), sensitivity of 89.7% (vs. 69.9%), true-negative rates of 96% (vs 60.0%), and precision of 82.7% (vs 62.2%). For CDR the results were lower overall, with accuracy of 87.1% (vs. 74.5%), sensitivity of 84.3% (vs. 39.7%), true-negative rates of 99.8% (98.4%), and precision of 48.3% (vs. 16.1%). We qualitatively evaluated the MMSE errors of ChatGPT and LlaMA-2 on double-reviewed notes. LlaMA-2 errors included 27 cases of total hallucination, 19 cases of reporting other scores instead of MMSE, 25 missed scores, and 23 cases of reporting only the wrong date. In comparison, ChatGPT's errors included only 3 cases of total hallucination, 17 cases of wrong test reported instead of MMSE, and 19 cases of reporting a wrong date. In this diagnostic/prognostic study of ChatGPT and LlaMA-2 for extracting cognitive exam dates and scores from clinical notes, ChatGPT exhibited high accuracy, with better performance compared to LlaMA-2. The use of LLMs could benefit dementia research and clinical care, by identifying eligible patients for treatments initialization or clinical trial enrollments. Rigorous evaluation of LLMs is crucial to understanding their capabilities and limitations.
PMCID:11634005
PMID: 39661652
ISSN: 2767-3170
CID: 5762692

Development and evaluation of an artificial intelligence-based workflow for the prioritization of patient portal messages

Yang, Jie; So, Jonathan; Zhang, Hao; Jones, Simon; Connolly, Denise M; Golding, Claudia; Griffes, Esmelin; Szerencsy, Adam C; Wu, Tzer Jason; Aphinyanaphongs, Yindalon; Major, Vincent J
OBJECTIVES/UNASSIGNED:Accelerating demand for patient messaging has impacted the practice of many providers. Messages are not recommended for urgent medical issues, but some do require rapid attention. This presents an opportunity for artificial intelligence (AI) methods to prioritize review of messages. Our study aimed to highlight some patient portal messages for prioritized review using a custom AI system integrated into the electronic health record (EHR). MATERIALS AND METHODS/UNASSIGNED:We developed a Bidirectional Encoder Representations from Transformers (BERT)-based large language model using 40 132 patient-sent messages to identify patterns involving high acuity topics that warrant an immediate callback. The model was then implemented into 2 shared pools of patient messages managed by dozens of registered nurses. A primary outcome, such as the time before messages were read, was evaluated with a difference-in-difference methodology. RESULTS/UNASSIGNED: = 396 466), an improvement exceeding the trend was observed in the time high-scoring messages sit unread (21 minutes, 63 vs 42 for messages sent outside business hours). DISCUSSION/UNASSIGNED:Our work shows great promise in improving care when AI is aligned with human workflow. Future work involves audience expansion, aiding users with suggested actions, and drafting responses. CONCLUSION/UNASSIGNED:Many patients utilize patient portal messages, and while most messages are routine, a small fraction describe alarming symptoms. Our AI-based workflow shortens the turnaround time to get a trained clinician to review these messages to provide safer, higher-quality care.
PMCID:11328532
PMID: 39156046
ISSN: 2574-2531
CID: 5680362

Evaluation of GPT-4 ability to identify and generate patient instructions for actionable incidental radiology findings

Woo, Kar-Mun C; Simon, Gregory W; Akindutire, Olumide; Aphinyanaphongs, Yindalon; Austrian, Jonathan S; Kim, Jung G; Genes, Nicholas; Goldenring, Jacob A; Major, Vincent J; Pariente, Chloé S; Pineda, Edwin G; Kang, Stella K
OBJECTIVES/OBJECTIVE:To evaluate the proficiency of a HIPAA-compliant version of GPT-4 in identifying actionable, incidental findings from unstructured radiology reports of Emergency Department patients. To assess appropriateness of artificial intelligence (AI)-generated, patient-facing summaries of these findings. MATERIALS AND METHODS/METHODS:Radiology reports extracted from the electronic health record of a large academic medical center were manually reviewed to identify non-emergent, incidental findings with high likelihood of requiring follow-up, further sub-stratified as "definitely actionable" (DA) or "possibly actionable-clinical correlation" (PA-CC). Instruction prompts to GPT-4 were developed and iteratively optimized using a validation set of 50 reports. The optimized prompt was then applied to a test set of 430 unseen reports. GPT-4 performance was primarily graded on accuracy identifying either DA or PA-CC findings, then secondarily for DA findings alone. Outputs were reviewed for hallucinations. AI-generated patient-facing summaries were assessed for appropriateness via Likert scale. RESULTS:For the primary outcome (DA or PA-CC), GPT-4 achieved 99.3% recall, 73.6% precision, and 84.5% F-1. For the secondary outcome (DA only), GPT-4 demonstrated 95.2% recall, 77.3% precision, and 85.3% F-1. No findings were "hallucinated" outright. However, 2.8% of cases included generated text about recommendations that were inferred without specific reference. The majority of True Positive AI-generated summaries required no or minor revision. CONCLUSION/CONCLUSIONS:GPT-4 demonstrates proficiency in detecting actionable, incidental findings after refined instruction prompting. AI-generated patient instructions were most often appropriate, but rarely included inferred recommendations. While this technology shows promise to augment diagnostics, active clinician oversight via "human-in-the-loop" workflows remains critical for clinical implementation.
PMID: 38778578
ISSN: 1527-974x
CID: 5654832

Large Language Model-Based Responses to Patients' In-Basket Messages

Small, William R; Wiesenfeld, Batia; Brandfield-Harvey, Beatrix; Jonassen, Zoe; Mandal, Soumik; Stevens, Elizabeth R; Major, Vincent J; Lostraglio, Erin; Szerencsy, Adam; Jones, Simon; Aphinyanaphongs, Yindalon; Johnson, Stephen B; Nov, Oded; Mann, Devin
IMPORTANCE/UNASSIGNED:Virtual patient-physician communications have increased since 2020 and negatively impacted primary care physician (PCP) well-being. Generative artificial intelligence (GenAI) drafts of patient messages could potentially reduce health care professional (HCP) workload and improve communication quality, but only if the drafts are considered useful. OBJECTIVES/UNASSIGNED:To assess PCPs' perceptions of GenAI drafts and to examine linguistic characteristics associated with equity and perceived empathy. DESIGN, SETTING, AND PARTICIPANTS/UNASSIGNED:This cross-sectional quality improvement study tested the hypothesis that PCPs' ratings of GenAI drafts (created using the electronic health record [EHR] standard prompts) would be equivalent to HCP-generated responses on 3 dimensions. The study was conducted at NYU Langone Health using private patient-HCP communications at 3 internal medicine practices piloting GenAI. EXPOSURES/UNASSIGNED:Randomly assigned patient messages coupled with either an HCP message or the draft GenAI response. MAIN OUTCOMES AND MEASURES/UNASSIGNED:PCPs rated responses' information content quality (eg, relevance), using a Likert scale, communication quality (eg, verbosity), using a Likert scale, and whether they would use the draft or start anew (usable vs unusable). Branching logic further probed for empathy, personalization, and professionalism of responses. Computational linguistics methods assessed content differences in HCP vs GenAI responses, focusing on equity and empathy. RESULTS/UNASSIGNED:A total of 16 PCPs (8 [50.0%] female) reviewed 344 messages (175 GenAI drafted; 169 HCP drafted). Both GenAI and HCP responses were rated favorably. GenAI responses were rated higher for communication style than HCP responses (mean [SD], 3.70 [1.15] vs 3.38 [1.20]; P = .01, U = 12 568.5) but were similar to HCPs on information content (mean [SD], 3.53 [1.26] vs 3.41 [1.27]; P = .37; U = 13 981.0) and usable draft proportion (mean [SD], 0.69 [0.48] vs 0.65 [0.47], P = .49, t = -0.6842). Usable GenAI responses were considered more empathetic than usable HCP responses (32 of 86 [37.2%] vs 13 of 79 [16.5%]; difference, 125.5%), possibly attributable to more subjective (mean [SD], 0.54 [0.16] vs 0.31 [0.23]; P < .001; difference, 74.2%) and positive (mean [SD] polarity, 0.21 [0.14] vs 0.13 [0.25]; P = .02; difference, 61.5%) language; they were also numerically longer (mean [SD] word count, 90.5 [32.0] vs 65.4 [62.6]; difference, 38.4%), but the difference was not statistically significant (P = .07) and more linguistically complex (mean [SD] score, 125.2 [47.8] vs 95.4 [58.8]; P = .002; difference, 31.2%). CONCLUSIONS/UNASSIGNED:In this cross-sectional study of PCP perceptions of an EHR-integrated GenAI chatbot, GenAI was found to communicate information better and with more empathy than HCPs, highlighting its potential to enhance patient-HCP communication. However, GenAI drafts were less readable than HCPs', a significant concern for patients with low health or English literacy.
PMCID:11252893
PMID: 39012633
ISSN: 2574-3805
CID: 5686582