Searched for: in-biosketch:yes
person:joness22
Large Language Model-Based Responses to Patients' In-Basket Messages
Small, William R; Wiesenfeld, Batia; Brandfield-Harvey, Beatrix; Jonassen, Zoe; Mandal, Soumik; Stevens, Elizabeth R; Major, Vincent J; Lostraglio, Erin; Szerencsy, Adam; Jones, Simon; Aphinyanaphongs, Yindalon; Johnson, Stephen B; Nov, Oded; Mann, Devin
IMPORTANCE/UNASSIGNED:Virtual patient-physician communications have increased since 2020 and negatively impacted primary care physician (PCP) well-being. Generative artificial intelligence (GenAI) drafts of patient messages could potentially reduce health care professional (HCP) workload and improve communication quality, but only if the drafts are considered useful. OBJECTIVES/UNASSIGNED:To assess PCPs' perceptions of GenAI drafts and to examine linguistic characteristics associated with equity and perceived empathy. DESIGN, SETTING, AND PARTICIPANTS/UNASSIGNED:This cross-sectional quality improvement study tested the hypothesis that PCPs' ratings of GenAI drafts (created using the electronic health record [EHR] standard prompts) would be equivalent to HCP-generated responses on 3 dimensions. The study was conducted at NYU Langone Health using private patient-HCP communications at 3 internal medicine practices piloting GenAI. EXPOSURES/UNASSIGNED:Randomly assigned patient messages coupled with either an HCP message or the draft GenAI response. MAIN OUTCOMES AND MEASURES/UNASSIGNED:PCPs rated responses' information content quality (eg, relevance), using a Likert scale, communication quality (eg, verbosity), using a Likert scale, and whether they would use the draft or start anew (usable vs unusable). Branching logic further probed for empathy, personalization, and professionalism of responses. Computational linguistics methods assessed content differences in HCP vs GenAI responses, focusing on equity and empathy. RESULTS/UNASSIGNED:A total of 16 PCPs (8 [50.0%] female) reviewed 344 messages (175 GenAI drafted; 169 HCP drafted). Both GenAI and HCP responses were rated favorably. GenAI responses were rated higher for communication style than HCP responses (mean [SD], 3.70 [1.15] vs 3.38 [1.20]; P = .01, U = 12 568.5) but were similar to HCPs on information content (mean [SD], 3.53 [1.26] vs 3.41 [1.27]; P = .37; U = 13 981.0) and usable draft proportion (mean [SD], 0.69 [0.48] vs 0.65 [0.47], P = .49, t = -0.6842). Usable GenAI responses were considered more empathetic than usable HCP responses (32 of 86 [37.2%] vs 13 of 79 [16.5%]; difference, 125.5%), possibly attributable to more subjective (mean [SD], 0.54 [0.16] vs 0.31 [0.23]; P < .001; difference, 74.2%) and positive (mean [SD] polarity, 0.21 [0.14] vs 0.13 [0.25]; P = .02; difference, 61.5%) language; they were also numerically longer (mean [SD] word count, 90.5 [32.0] vs 65.4 [62.6]; difference, 38.4%), but the difference was not statistically significant (P = .07) and more linguistically complex (mean [SD] score, 125.2 [47.8] vs 95.4 [58.8]; P = .002; difference, 31.2%). CONCLUSIONS/UNASSIGNED:In this cross-sectional study of PCP perceptions of an EHR-integrated GenAI chatbot, GenAI was found to communicate information better and with more empathy than HCPs, highlighting its potential to enhance patient-HCP communication. However, GenAI drafts were less readable than HCPs', a significant concern for patients with low health or English literacy.
PMCID:11252893
PMID: 39012633
ISSN: 2574-3805
CID: 5686582
Evaluating Large Language Models in Extracting Cognitive Exam Dates and Scores
Zhang, Hao; Jethani, Neil; Jones, Simon; Genes, Nicholas; Major, Vincent J; Jaffe, Ian S; Cardillo, Anthony B; Heilenbach, Noah; Ali, Nadia Fazal; Bonanni, Luke J; Clayburn, Andrew J; Khera, Zain; Sadler, Erica C; Prasad, Jaideep; Schlacter, Jamie; Liu, Kevin; Silva, Benjamin; Montgomery, Sophie; Kim, Eric J; Lester, Jacob; Hill, Theodore M; Avoricani, Alba; Chervonski, Ethan; Davydov, James; Small, William; Chakravartty, Eesha; Grover, Himanshu; Dodson, John A; Brody, Abraham A; Aphinyanaphongs, Yindalon; Masurkar, Arjun; Razavian, Narges
IMPORTANCE/UNASSIGNED:Large language models (LLMs) are crucial for medical tasks. Ensuring their reliability is vital to avoid false results. Our study assesses two state-of-the-art LLMs (ChatGPT and LlaMA-2) for extracting clinical information, focusing on cognitive tests like MMSE and CDR. OBJECTIVE/UNASSIGNED:Evaluate ChatGPT and LlaMA-2 performance in extracting MMSE and CDR scores, including their associated dates. METHODS/UNASSIGNED:Our data consisted of 135,307 clinical notes (Jan 12th, 2010 to May 24th, 2023) mentioning MMSE, CDR, or MoCA. After applying inclusion criteria 34,465 notes remained, of which 765 underwent ChatGPT (GPT-4) and LlaMA-2, and 22 experts reviewed the responses. ChatGPT successfully extracted MMSE and CDR instances with dates from 742 notes. We used 20 notes for fine-tuning and training the reviewers. The remaining 722 were assigned to reviewers, with 309 each assigned to two reviewers simultaneously. Inter-rater-agreement (Fleiss' Kappa), precision, recall, true/false negative rates, and accuracy were calculated. Our study follows TRIPOD reporting guidelines for model validation. RESULTS/UNASSIGNED:For MMSE information extraction, ChatGPT (vs. LlaMA-2) achieved accuracy of 83% (vs. 66.4%), sensitivity of 89.7% (vs. 69.9%), true-negative rates of 96% (vs 60.0%), and precision of 82.7% (vs 62.2%). For CDR the results were lower overall, with accuracy of 87.1% (vs. 74.5%), sensitivity of 84.3% (vs. 39.7%), true-negative rates of 99.8% (98.4%), and precision of 48.3% (vs. 16.1%). We qualitatively evaluated the MMSE errors of ChatGPT and LlaMA-2 on double-reviewed notes. LlaMA-2 errors included 27 cases of total hallucination, 19 cases of reporting other scores instead of MMSE, 25 missed scores, and 23 cases of reporting only the wrong date. In comparison, ChatGPT's errors included only 3 cases of total hallucination, 17 cases of wrong test reported instead of MMSE, and 19 cases of reporting a wrong date. CONCLUSIONS/UNASSIGNED:In this diagnostic/prognostic study of ChatGPT and LlaMA-2 for extracting cognitive exam dates and scores from clinical notes, ChatGPT exhibited high accuracy, with better performance compared to LlaMA-2. The use of LLMs could benefit dementia research and clinical care, by identifying eligible patients for treatments initialization or clinical trial enrollments. Rigorous evaluation of LLMs is crucial to understanding their capabilities and limitations.
PMCID:10888985
PMID: 38405784
CID: 5722422
Predicting Robotic Hysterectomy Incision Time: Optimizing Surgical Scheduling with Machine Learning
Shah, Vaishali; Yung, Halley C; Yang, Jie; Zaslavsky, Justin; Algarroba, Gabriela N; Pullano, Alyssa; Karpel, Hannah C; Munoz, Nicole; Aphinyanaphongs, Yindalon; Saraceni, Mark; Shah, Paresh; Jones, Simon; Huang, Kathy
BACKGROUND AND OBJECTIVES/UNASSIGNED:Operating rooms (ORs) are critical for hospital revenue and cost management, with utilization efficiency directly affecting financial outcomes. Traditional surgical scheduling often results in suboptimal OR use. We aim to build a machine learning (ML) model to predict incision times for robotic-assisted hysterectomies, enhancing scheduling accuracy and hospital finances. METHODS/UNASSIGNED:A retrospective study was conducted using data from robotic-assisted hysterectomy cases performed between January 2017 and April 2021 across 3 hospitals within a large academic health system. Cases were filtered for surgeries performed by high-volume surgeons and those with an incision time of under 3 hours (n = 2,702). Features influencing incision time were extracted from electronic medical records and used to train 5 ML models (linear ridge regression, random forest, XGBoost, CatBoost, and explainable boosting machine [EBM]). Model performance was evaluated using a dynamic monthly update process and novel metrics such as wait-time blocks and excess-time blocks. RESULTS/UNASSIGNED: < .001, 95% CI [-329 to -89]), translating to approximately 52-hours over the 51-month study period. The model predicted more surgeries within a 15% range of the true incision time compared to traditional methods. Influential features included surgeon experience, number of additional procedures, body mass index (BMI), and uterine size. CONCLUSION/UNASSIGNED:The ML model enhanced the prediction of incision times for robotic-assisted hysterectomies, providing a potential solution to reduce OR underutilization and increase surgical throughput and hospital revenue.
PMCID:11741200
PMID: 39831273
ISSN: 1938-3797
CID: 5778432
Impact of Patient-Clinician Relationships on Pain and Objective Functional Measures for Individuals with Chronic Low Back Pain: An Experimental Study
Vorensky, Mark; Squires, Allison; Jones, Simon; Sajnani, Nisha; Castillo, Elijah; Rao, Smita
PURPOSE:To compare the effects of enhanced and limited patient-clinician relationships during patient history taking on objective functional measures and pain appraisals for individuals with chronic low back pain (CLBP). METHODS:Fifty-two (52) participants with CLBP, unaware of the two groups, were randomized using concealed allocation to an enhanced (n=26) or limited (n=26) patient-clinician relationship condition. Participants shared their history of CLBP with a clinician who enacted either enhanced or limited communication strategies. Fingertip-to-floor, one-minute lift, and Biering-Sorensen tests, and visual analogue scale for pain at rest were assessed before and after the patient-clinician relationship conditions. FINDINGS:The enhanced condition resulted in significantly greater improvements in the one-minute lift test (F(1,49)=7.47, p<.01, ηp2=0.13) and pain at rest (F(1,46)=4.63, p=.04, ηp2=0.09), but not the fingertip-to-floor or Biering-Sorensen tests, compared with the limited group. CONCLUSIONS:Even without physical treatment, differences in patient-clinician relationships acutely affected lifting performance and pain among individuals with CLBP.
PMID: 39584210
ISSN: 1548-6869
CID: 5779832
Ambulatory antibiotic prescription rates for acute respiratory infection rebound two years after the start of the COVID-19 pandemic
Stevens, Elizabeth R; Feldstein, David; Jones, Simon; Twan, Chelsea; Cui, Xingwei; Hess, Rachel; Kim, Eun Ji; Richardson, Safiya; Malik, Fatima M; Tasneem, Sumaiya; Henning, Natalie; Xu, Lynn; Mann, Devin M
BACKGROUND:During the COVID-19 pandemic, acute respiratory infection (ARI) antibiotic prescribing in ambulatory care markedly decreased. It is unclear if antibiotic prescription rates will remain lowered. METHODS:We used trend analyses of antibiotics prescribed during and after the first wave of COVID-19 to determine whether ARI antibiotic prescribing rates in ambulatory care have remained suppressed compared to pre-COVID-19 levels. Retrospective data was used from patients with ARI or UTI diagnosis code(s) for their encounter from 298 primary care and 66 urgent care practices within four academic health systems in New York, Wisconsin, and Utah between January 2017 and June 2022. The primary measures included antibiotic prescriptions per 100 non-COVID ARI encounters, encounter volume, prescribing trends, and change from expected trend. RESULTS:At baseline, during and after the first wave, the overall ARI antibiotic prescribing rates were 54.7, 38.5, and 54.7 prescriptions per 100 encounters, respectively. ARI antibiotic prescription rates saw a statistically significant decline after COVID-19 onset (step change -15.2, 95% CI: -19.6 to -4.8). During the first wave, encounter volume decreased 29.4% and, after the first wave, remained decreased by 188%. After the first wave, ARI antibiotic prescription rates were no longer significantly suppressed from baseline (step change 0.01, 95% CI: -6.3 to 6.2). There was no significant difference between UTI antibiotic prescription rates at baseline versus the end of the observation period. CONCLUSIONS:The decline in ARI antibiotic prescribing observed after the onset of COVID-19 was temporary, not mirrored in UTI antibiotic prescribing, and does not represent a long-term change in clinician prescribing behaviors. During a period of heightened awareness of a viral cause of ARI, a substantial and clinically meaningful decrease in clinician antibiotic prescribing was observed. Future efforts in antibiotic stewardship may benefit from continued study of factors leading to this reduction and rebound in prescribing rates.
PMCID:11198751
PMID: 38917147
ISSN: 1932-6203
CID: 5675032
Menu Labeling and Calories Purchased in Restaurants in a US National Fast Food Chain
Rummo, Pasquale E; Mijanovich, Tod; Wu, Erilia; Heng, Lloyd; Hafeez, Emil; Bragg, Marie A; Jones, Simon A; Weitzman, Beth C; Elbel, Brian
IMPORTANCE/UNASSIGNED:Menu labeling has been implemented in restaurants in some US jurisdictions as early as 2008, but the extent to which menu labeling is associated with calories purchased is unclear. OBJECTIVE/UNASSIGNED:To estimate the association of menu labeling with calories and nutrients purchased and assess geographic variation in results. DESIGN, SETTING, AND PARTICIPANTS/UNASSIGNED:A cohort study was conducted with a quasi-experimental design using actual transaction data from Taco Bell restaurants from calendar years 2007 to 2014 US restaurants with menu labeling matched to comparison restaurants using synthetic control methods. Data were analyzed from May to October 2023. EXPOSURE/UNASSIGNED:Menu labeling policies in 6 US jurisdictions. MAIN OUTCOMES AND MEASURES/UNASSIGNED:The primary outcome was calories per transaction. Secondary outcomes included total and saturated fat, carbohydrates, protein, sugar, fiber, and sodium. RESULTS/UNASSIGNED:The final sample included 2329 restaurants, with menu labeling in 474 (31 468 restaurant-month observations). Most restaurants (94.3%) were located in California. Difference-in-differences model results indicated that customers purchased 24.7 (95% CI, 23.6-25.7) fewer calories per transaction from restaurants in the menu labeling group in the 3- to 24-month follow-up period vs the comparison group, including 21.9 (95% CI, 20.9-22.9) fewer calories in the 3- to 12-month follow-up period and 25.0 (95% CI, 24.0-26.1) fewer calories in the 13- to 24-month follow-up period. Changes in the nutrient content of transactions were consistent with calorie estimates. Findings in California were similar to overall estimates in magnitude and direction; yet, among restaurants outside of California, no association was observed in the 3- to 24-month period. The outcome of menu labeling also differed by item category and time of day, with a larger decrease in the number of tacos vs other items purchased and a larger decrease in calories purchased during breakfast vs other times of the day in the 3- to 24-month period. CONCLUSIONS AND RELEVANCE/UNASSIGNED:In this quasi-experimental cohort study, fewer calories were purchased in restaurants with calorie labels compared with those with no labels, suggesting that consumers are sensitive to calorie information on menu boards, although associations differed by location.
PMID: 38100109
ISSN: 2574-3805
CID: 5588992
Reducing prescribing of antibiotics for acute respiratory infections using a frontline nurse-led EHR-Integrated clinical decision support tool: protocol for a stepped wedge randomized control trial
Stevens, Elizabeth R; Agbakoba, Ruth; Mann, Devin M; Hess, Rachel; Richardson, Safiya I; McGinn, Thomas; Smith, Paul D; Halm, Wendy; Mundt, Marlon P; Dauber-Decker, Katherine L; Jones, Simon A; Feldthouse, Dawn M; Kim, Eun Ji; Feldstein, David A
BACKGROUND:Overprescribing of antibiotics for acute respiratory infections (ARIs) remains a major issue in outpatient settings. Use of clinical prediction rules (CPRs) can reduce inappropriate antibiotic prescribing but they remain underutilized by physicians and advanced practice providers. A registered nurse (RN)-led model of an electronic health record-integrated CPR (iCPR) for low-acuity ARIs may be an effective alternative to address the barriers to a physician-driven model. METHODS:Following qualitative usability testing, we will conduct a stepped-wedge practice-level cluster randomized controlled trial (RCT) examining the effect of iCPR-guided RN care for low acuity patients with ARI. The primary hypothesis to be tested is: Implementation of RN-led iCPR tools will reduce antibiotic prescribing across diverse primary care settings. Specifically, this study aims to: (1) determine the impact of iCPRs on rapid strep test and chest x-ray ordering and antibiotic prescribing rates when used by RNs; (2) examine resource use patterns and cost-effectiveness of RN visits across diverse clinical settings; (3) determine the impact of iCPR-guided care on patient satisfaction; and (4) ascertain the effect of the intervention on RN and physician burnout. DISCUSSION:This study represents an innovative approach to using an iCPR model led by RNs and specifically designed to address inappropriate antibiotic prescribing. This study has the potential to provide guidance on the effectiveness of delegating care of low-acuity patients with ARIs to RNs to increase use of iCPRs and reduce antibiotic overprescribing for ARIs in outpatient settings. TRIAL REGISTRATION:ClinicalTrials.gov Identifier: NCT04255303, Registered February 5 2020, https://clinicaltrials.gov/ct2/show/NCT04255303 .
PMCID:10644670
PMID: 37964232
ISSN: 1472-6947
CID: 5631732
Evaluating Whether an Inpatient Initiative to Time Lab Draws in the Evening Reduces Anemia
Zaretsky, Jonah; Eaton, Kevin P; Sonne, Christopher; Zhao, Yunan; Jones, Simon; Hochman, Katherine; Blecker, Saul
BACKGROUND:Hospital acquired anemia is common during admission and can result in increased transfusion and length of stay. Recumbent posture is known to lead to lower hemoglobin measurements. We tested to see if an initiative promoting evening lab draws would lead to higher hemoglobin measurements due to more time in upright posture during the day and evening. METHODS:We included patients hospitalized on 2 medical units, beginning March 26, 2020 and discharged prior to January 25, 2021. On one of the units, we implemented an initiative to have routine laboratory draws in the evening rather than the morning starting on August 26, 2020. There were 1217 patients on the control unit and 1265 on the intervention unit during the entire study period. First we used a linear mixed-effects model to see if timing of blood draw was associated with hemoglobin level in the pre-intervention period. We then compared levels of hemoglobin before and after the intervention using a difference-in-difference analysis. RESULTS:In the pre-intervention period, evening blood draws were associated with higher hemoglobin compared to morning (0.28; 95% CI, 0.22-0.35). Evening blood draws increased with the intervention (10.3% vs 47.9%, P > 0.001). However, the intervention floor was not associated with hemoglobin levels in difference-in-difference analysis (coefficient of -0.15; 95% CI, -0.51-0.21). CONCLUSIONS:While evening blood draws were associated with higher hemoglobin levels, an intervention that successfully changed timing of routine labs to the evening did not lead to an increase in hemoglobin levels.
PMID: 37478815
ISSN: 2576-9456
CID: 5536212
Continuity of Care Versus Language Concordance as an Intervention to Reduce Hospital Readmissions From Home Health Care
Squires, Allison; Engel, Patrick; Ma, Chenjuan; Miner, Sarah M; Feldman, Penny H; McDonald, Margaret V; Jones, Simon A
BACKGROUND:Language concordance between health care practitioners and patients have recently been shown to lower the risk of adverse health events. Continuity of care also been shown to have the same impact. OBJECTIVE:The purpose of this paper is to examine the relative effectiveness of both continuity of care and language concordance as alternative or complementary interventions to improve health outcomes of people with limited English proficiency. DESIGN:A multivariable logistic regression model using rehospitalization as the dependent variable was built. The variable of interest was created to compare language concordance and continuity of care. PARTICIPANTS:The final sample included 22,103 patients from the New York City area between 2010 and 2015 who were non-English-speaking and admitted to their home health site following hospital discharge. MEASURES:The odds ratio (OR) average marginal effect (AME) of each included variable was calculated for model analysis. RESULTS:When compared with low continuity of care and high language concordance, high continuity of care and high language concordance significantly decreased readmissions (OR=0.71, 95% CI: 0.62-0.80, P<0.001, AME=-4.95%), along with high continuity of care and low language concordance (OR=0.80, 95% CI: 0.74-0.86, P<0.001, AME=-3.26%). Low continuity of care and high language concordance did not significantly impact readmissions (OR=1.04, 95% CI: 0.86-1.26, P=0.672, AME=0.64%). CONCLUSION:In the US home health system, enhancing continuity of care for those with language barriers may be helpful to address disparities and reduce hospital readmission rates.
PMCID:10421624
PMID: 37561604
ISSN: 1537-1948
CID: 5595402
The impact of COVID-19 monoclonal antibodies on clinical outcomes: A retrospective cohort study
Nagler, Arielle R; Horwitz, Leora I; Jones, Simon; Petrilli, Christopher M; Iturrate, Eduardo; Lighter, Jennifer L; Phillips, Michael; Bosworth, Brian P; Polsky, Bruce; Volpicelli, Frank M; Dapkins, Isaac; Viswanathan, Anand; François, Fritz; Kalkut, Gary
DISCLAIMER/CONCLUSIONS:In an effort to expedite the publication of articles, AJHP is posting manuscripts online as soon as possible after acceptance. Accepted manuscripts have been peer-reviewed and copyedited, but are posted online before technical formatting and author proofing. These manuscripts are not the final version of record and will be replaced with the final article (formatted per AJHP style and proofed by the authors) at a later time. PURPOSE/OBJECTIVE:Despite progress in the treatment of coronavirus disease 2019 (COVID-19), including the development of monoclonal antibodies (mAbs), more clinical data to support the use of mAbs in outpatients with COVID-19 is needed. This study is designed to determine the impact of bamlanivimab, bamlanivimab/etesevimab, or casirivimab/imdevimab on clinical outcomes within 30 days of COVID-19 diagnosis. METHODS:A retrospective cohort study was conducted at a single academic medical center with 3 campuses in Manhattan, Brooklyn, and Long Island, NY. Patients 12 years of age or older who tested positive for COVID-19 or were treated with a COVID-19-specific therapy, including COVID-19 mAb therapies, at the study site between November 24, 2020, and May 15, 2021, were included. The primary outcomes included rates of emergency department (ED) visit, inpatient admission, intensive care unit (ICU) admission, or death within 30 days from the date of COVID-19 diagnosis. RESULTS:A total of 1,344 mAb-treated patients were propensity matched to 1,344 patients with COVID-19 patients who were not treated with mAb therapy. Within 30 days of diagnosis, among the patients who received mAb therapy, 101 (7.5%) presented to the ED and 79 (5.9%) were admitted. Among the patients who did not receive mAb therapy, 165 (12.3%) presented to the ED and 156 (11.6%) were admitted (relative risk [RR], 0.61 [95% CI, 0.50-0.75] and 0.51 [95% CI, 0.40-0.64], respectively). Four mAb patients (0.3%) and 2.64 control patients (0.2%) were admitted to the ICU (RR, 01.51; 95% CI, 0.45-5.09). Six mAb-treated patients (0.4%) and 3.37 controls (0.3%) died and/or were admitted to hospice (RR, 1.61; 95% CI, 0.54-4.83). mAb therapy in ambulatory patients with COVID-19 decreases the risk of ED presentation and hospital admission within 30 days of diagnosis.
PMCID:9619808
PMID: 36242772
ISSN: 1535-2900
CID: 5361302