Try a new search

Format these results:

Searched for:

in-biosketch:true

person:yaoj05

Total Results:

44


Exploring the potential of artificial intelligence and machine learning in orthopaedic surgery

Cohen, Leah J G; Yao, Jie J; Lajam, Claudette
Artificial intelligence (AI) has emerged as one of the most transformative technological forces in modern medicine, with rapidly expanding applications throughout medicine including orthopaedic surgery. Recent advances in machine learning, deep learning, and natural language processing have accelerated the development of AI-enabled tools with direct relevance to orthopaedic practice. This review provides a basic overview of how AI works and current uses of AI in medicine and orthopaedics across multiple domains including documentation efficiency, patient communication, operating room optimization, imaging analysis, rehabilitation, education, and research, and briefly describes AI's limitations, ethical and legal concerns, and cost. These applications are all already being used in orthopaedics or have clear direct translation. AI holds considerable potential to augment orthopaedic care by streamlining workflows, enhancing decision-making, and improving patient outcomes. However, responsible integration requires rigorous validation, transparency, clinician oversight, and ongoing education. As AI adoption accelerates, orthopaedic surgeons must critically evaluate emerging technologies to ensure that.
PMID: 42296285
ISSN: 2328-5273
CID: 6049472

General-purpose large language models outperform specialized clinical AI tools on medical benchmarks

Vishwanath, Krithik; Alyakin, Anton; Ghosh, Mrigayu; Hage, Ali; Neifert, Sean N; Orillac, Cordelia; Mandelberg, Nataniel J; Khan, Hammad A; Lee, Jin Vivian; Yao, Jie J; Small, William Robert; Varma, Aakaash; Hewitt, D Brock; Aphinyanaphongs, Yindalon; Alber, Daniel Alexander; Oermann, Eric Karl
Specialized clinical artificial intelligence (AI) tools are entering medical practice despite scarce independent evaluation. We quantitatively evaluate two clinical AI tools, OpenEvidence and UpToDate Expert AI, built on large language models (LLMs) against three frontier LLMs: GPT-5.2, Gemini 3.1 Pro and Claude Opus 4.6. Our evaluation has three stages: (1) 500 MedQA questions testing medical knowledge, (2) 500 HealthBench items measuring alignment with clinicians and (3) the real clinical queries (RCQ) benchmark, built from 100 de-identified queries from physicians to a general-purpose language model in a live clinical environment. For the RCQ benchmark, 12 US clinicians performed randomized, blinded review of model outputs, producing 1,800 model-question annotations. Frontier LLMs outperformed clinical AI tools in all three evaluations. Clinical AI tools performed comparably to auto-enabled Google Search AI Overview on the RCQ. These findings highlight the need for independent, real-world evaluation of AI tools before they enter clinical settings.
PMID: 42286322
ISSN: 1546-170x
CID: 6049082

Temporal trends and estimated lifetime attributable radiation risk of preoperative planning computed tomography for primary shoulder arthroplasty

Kalva, Swara R; Fucich, Dario; Perry, Arthur J; Lezak, Bradley; Torkieh, Joseph; Joshi, Tej; Virk, Mandeep S; Kwon, Young W; Yao, Jie J
BACKGROUND/UNASSIGNED:Advantages of preoperative planning computed tomography (CT) scans for shoulder arthroplasty (SA) are known; however, risks of ionizing radiation exposure remain unknown. We retrospectively reviewed institutional trends in utilization and estimated radiation risk of preoperative planning CT scans obtained for SA. METHODS/UNASSIGNED:From 2016 to 2024, the annual percent incidence of SA patients that received a preoperative planning CT was determined. The National Academy of Sciences Biological Effects of Ionizing Radiation VII report on lifetime attributable risk (LAR) cancer incidences and received radiation dose were utilized to estimate patient-level LAR projections for 100 patients stratified by sex, age, and cancer type. RESULTS/UNASSIGNED: = 0.78) from 53% to 66% of all SA patients. For all solid cancers, LAR (per 100,000 people) decreased as patient age increased, ranging from 73.1 to 9.6 for females and 40.7 to 31.4 for males (from the 50-59-year-old group to the 80-89 group). In the 60-69-year-old and 70-79-year-old groups, estimated thyroid and lung LARs were significantly higher in women. DISCUSSION/UNASSIGNED:Utilization of preoperative shoulder CT scans is increasing. Preoperative shoulder CT may be associated with a small but quantifiable projected cancer risk most pronounced in younger women.
PMCID:13249610
PMID: 42282944
ISSN: 1758-5732
CID: 6048832

Current Artificial Intelligence Large Language Models Exhibit Sycophantic Behavior in Orthopaedic Contexts

Perry, Arthur J; Kalva, Swara; Fucich, Dario; Muppidi, Srikar; Aggarwal, Manan; Virk, Mandeep S; Zuckerman, Joseph D; Yao, Jie J
BACKGROUND:The use of large language models (LLMs) is increasingly common. However, LLMs may exhibit sycophancy, echoing users' beliefs while avoiding contradiction. In the present study, we describe sycophancy in general-purpose LLMs when applied to orthopaedic contexts. METHODS:We investigated sycophancy in 2 general-purpose LLMs. We evaluated performance on 3 tasks: (1) accuracy on benchmark answering: LLMs were tested on validated benchmark orthopaedic questions, with correct and incorrect cues, and the change in accuracy and sycophancy error rate were determined; (2) user belief agreement: LLMs were provided with ambiguous statements and a user belief, and LLM agreement, contradiction, and uncertainty were described; and (3) false information detection: false information was placed within a task prompt to measure noncontradiction and propagation rates. RESULTS:Baseline factual accuracy on benchmark questioning was 78%, decreasing with correct hints (71%) (p = 0.49). With incorrect hints, LLM accuracy declined significantly (48%) (p < 0.001), with a sycophancy error rate of 52%. Presented with user beliefs about an indefinite, controversial statement, models echoed user beliefs in 56%, expressed uncertainty in 12%, and contradicted users in 32% of statements. In noncontradiction tasks, models perpetuated incorrect attributions 99% of the time yet reliably corrected statistical distortions 97% of the time. CONCLUSIONS:Although popular general-purpose LLMs have useful orthopaedic applications, they exhibit sycophancy, with a tendency toward agreement and without recognition of ambiguity. This is a key weakness to be addressed. Findings should be interpreted cautiously given the variability in model design, prompting, and models evaluated. CLINICAL RELEVANCE/CONCLUSIONS:The tendency of general-purpose LLMs to agree without recognizing clinical ambiguity may limit their reliability in orthopaedic applications.
PMID: 42166556
ISSN: 1535-1386
CID: 6038532

The Persistent Challenges of Diagnosing Orthopaedic Implant-Related Infections

Lum, Zachary C; Cohen-Rosenblum, Anna; Yao, Jie J; Chen, Antonia F; Landy, David C; Parvizi, Javad
Infection remains one of the most catastrophic complications following orthopaedic surgery. Despite substantial advances in molecular diagnostics, biomarker assays, and consensus definitions, accurately diagnosing orthopaedic infection continues to challenge even the most experienced clinicians. There are differences in the diagnosis and treatment of infections that are related to different anatomic regions. The difficulty arises from the inherent biological diversity of infecting organisms and surgical locations, variable host responses, and the absence of a true diagnostic "gold standard." This article summarizes the current diagnostic challenges and emerging solutions, drawing on recent high-impact evidence and consensus frameworks.
PMID: 42018608
ISSN: 1535-1386
CID: 6032782

Revision-free reverse shoulder arthroplasty patients report greater difficulty with some activities of daily living compared to anatomic total shoulder arthroplasty patients at mid-term follow-up

Molokwu, Brian O; Xu, Jacquelyn J; Farrell, Steven G; Perry, Arthur; Roche, Christopher P; Virk, Mandeep S; Zuckerman, Joseph D; Yao, Jie J
BACKGROUND/UNASSIGNED:Few studies have directly compared limitations in activities of daily living (ADLs) between reverse shoulder arthroplasty (RSA) and anatomic total shoulder arthroplasty (aTSA). This study evaluates ADL function at mid-term follow-up in patients with revision-free RSA and aTSA. METHODS/UNASSIGNED:This retrospective cohort study included 250 patients who underwent primary aTSA (n = 177) or RSA (n = 73) with a minimum follow-up of 7 years (mean 10 ± 2 years). Patients who had revision surgery were excluded. Multivariable ordinal logistic regression analysis was used to assess the odds of RSA patients reporting better ADL function compared to aTSA patients. RESULTS/UNASSIGNED:Postoperatively, a greater proportion of aTSA patients reported normal ADLs compared to RSA patients. On multivariable analysis, controlling for baseline differences, RSA patients reported lower ADL function for personal hygiene/toilet needs (Odds ratio [OR] 0.21 [95% CI: 0.07-0.65]; p = 0.006), washing/combing hair (OR 0.36 [0.13-1.02]; p = 0.049), putting on a button-up shirt (OR 0.08 [0.02-0.25]; p < 0.001), and putting on pants (OR 0.12 [0.03-0.39]; p < 0.001). DISCUSSION/UNASSIGNED:After adjusting for differences in baseline factors, RSA patients reported greater difficulty with specific ADL tasks-including toileting, personal hygiene, grooming, and dressing-compared to aTSA patients. LEVEL OF EVIDENCE/UNASSIGNED:Level III; Retrospective cohort study.
PMCID:12893930
PMID: 41695146
ISSN: 1758-5732
CID: 6004302

CORR Synthesis: How Should PROM Thresholds Be Determined and Interpreted to Reflect Clinically Meaningful Change in Orthopaedic Surgery?

Vallurupalli, Neel; Padon, Benjamin; Yao, Jie J
PMID: 41564289
ISSN: 1528-1132
CID: 5988412

Using Percentage of Maximal Possible Improvement (MPI) to Predict High Patient Satisfaction Following the Latarjet Procedure

Molokwu, Brian O; Xu, Jacquelyn J; Mercer, Nathaniel P; Sultan, Tanzeel; Myerson, C Lucas; Yao, Jie J; Meislin, Robert J; Virk, Mandeep S
BACKGROUND:Outcome thresholds such as the minimal clinically important difference (MCID), patient acceptable symptomatic state (PASS), and substantial clinical benefit (SCB) are commonly used to define meaningful clinical improvement. However, these measures apply uniform cutoffs that do not account for individual baseline scores. Maximal Possible Improvement (MPI) offers a patient-specific approach by considering the maximal potential gain in function or reduction in pain. The percentage of MPI (%MPI) that correlates with high postoperative patient satisfaction following the Latarjet procedure has not been defined. The purpose of this study was to (1) establish %MPI thresholds predictive of high patient satisfaction for the American Shoulder and Elbow Surgeons (ASES) score, and the Patient-Reported Outcomes Measurement Information System (PROMIS) domains of Upper Extremity Function (PUE), Pain Interference (P-Interference), and Pain Intensity (P-Intensity); and (2) identify patient-level factors associated with achieving these thresholds for ASES and PUE. METHODS:A retrospective review identified 81 eligible patients who underwent the Latarjet procedure with minimum 1-year follow-up. Preoperative and postoperative ASES, PUE, P-Interference, and P-Intensity scores, along with postoperative patient degree of satisfaction were recorded. Receiver operating characteristic curve analyses were performed to identify individual % MPI thresholds in each of the 4 scores that best predicted high satisfaction at minimum 1 year postoperatively. Univariate and multivariate logistic regression analyses were conducted sequentially to identify patient factors that were associated with achievement of the ASES and PUE thresholds. RESULTS:Among the 81 patients that met the inclusion criteria, the %MPI thresholds associated with high satisfaction were 65% for ASES (Area under the curve [AUC]: 0.86), 29% for PUE (AUC: 0.84), 57% for P-Interference (AUC: 0.78), and 59% for P-Intensity (AUC: 0.77). Higher body mass index (odds ratio [OR]: 1.16, p = 0.048) and surgery on the dominant arm (OR: 3.87, p = 0.024) were associated with higher odds of achieving the ASES threshold. Recurrent dislocations preoperatively (OR: 0.20, p = 0.022) were associated with lower odds of achieving the PUE threshold. CONCLUSION/CONCLUSIONS:The percentage of maximal possible improvement (%MPI) following the Latarjet procedure offers an individualized measure of clinical success and accounts for baseline variability and mitigates ceiling effects. Thresholds associated with high patient satisfaction following the Latarjet procedure were ≥65% for ASES, ≥29% for PUE, ≥57% for P-Interference, and ≥59% for P-Intensity.
PMID: 40865902
ISSN: 1532-6500
CID: 5910272

Enhanced Risk of 90-Day Medical and 2-Year Implant Related Complications in Total Shoulder Arthroplasty Patients with Osteoporosis

Lawand, Jad; Lopez, Ryan; Boufade, Peter; Daher, Mohammad; Fares, Mohamad; Yao, Jie; Khan, Adam; Abboud, Joseph
BACKGROUND:As the average age of patients undergoing shoulder arthroplasty (SA) increases, the frequency of SA patients with osteoporosis is expected to rise. While the effects of osteoporosis have been described in the broader orthopedic literature, it is presently unclear how osteoporosis affects SA postoperative medical and implant-related outcomes. METHODS:A multicenter database TriNetX was queried for patients between 2011-2021 who underwent SA with and without osteoporosis. Patients with less than 2-years of follow-up and those with a prior shoulder hemiarthroplasty were excluded. Primary outcomes included 2-year periprosthetic joint infection (PJI), prosthesis dislocation, periprosthetic fracture, and revision surgery. Secondary outcomes included 90-day medical complications and readmissions. Osteoporotic and control patient cohorts were propensity matched in a 1:1 ratio. RESULTS:; p < 0.001). Osteoporotic patients undergoing SA were more likely to experience wound disruptions, stroke, pulmonary embolism, deep vein thrombosis, myocardial infarction, anemia, pneumonia, renal failure, transfusion, and readmission within 90 days after surgery. At 2 years postoperative, osteoporotic SA patients experienced an elevated risk of mechanical loosening, PJI, dislocation, periprosthetic fracture, and required revision surgery at a higher rate than control patients. CONCLUSIONS:Osteoporotic patients undergoing shoulder arthroplasty are at greater risk for medical complications within the 90-day peri-operative period as well as implant-related complications within 2 years of surgery. Patients and surgeons should be aware of the potential higher risk of complications in osteoporotic patients following SA, and further investigation into benefits of preoperative management and treatment of osteoporosis is necessary.
PMID: 39384014
ISSN: 1532-6500
CID: 5745982

Early Postoperative Pain is Similar after Arthroscopic Rotator Cuff Repair versus Short-Stay Shoulder Arthroplasty: A Prospective Study

Lopez, Ryan; Schiffman, Corey; Singh, Jaspal; Yao, Jie; Vaughan, Alayna; Chen, Raymond; Lazarus, Mark; Namdari, Surena
INTRODUCTION/BACKGROUND:One of the barriers to counseling patients for shoulder arthroplasty (SA) is the anticipated pain after surgery. This can be contrasted with the common perception of arthroscopic rotator cuff repair (RCR) surgery being less painful due to the less invasive nature of the procedure. We conducted a prospective study comparing postoperative pain levels and narcotic consumption after SA compared to those after RCR. METHODS:This prospective study enrolled 102 patients undergoing short-stay SA and RCR at a single hospital. 50 patients underwent RCR and 52 underwent SA. All participants received a multimodal pain regimen consisting of an interscalene block with liposomal bupivacaine and one of two oral pain medication regimens. Patients were provided a daily pain diary to be completed for 14 postoperative days that tracked pain levels, narcotic consumption, and pain location. Patients were excluded for age <40, revision surgery, SA for fracture, history of chronic opioid use, or an inability to adhere to study protocol. Demographics, visual analogue scale (VAS) scores, and pain sensitivity questionnaires (PSQ) were collected preoperatively. Primary study outcomes were daily VAS pain scores and narcotic consumption during the 14 days after surgery. RESULTS:RCR patients were younger (60.6 vs. 68.9 years; p<0.01) but other demographics, preoperative pain, and PSQ scores were similar between groups. Peak mean VAS pain levels for RCR and SA each occurred on postoperative (POD) 2 and were 4.4 ± 3.1 and 5.1 ± 2.7 respectively (p=0.214). There was no significant difference in VAS pain during the 14-day postoperative period between RCR and SA patients (p>0.05) or between anatomic SA and reverse SA (p>0.05). Narcotic usage was greater for RCR patients at POD 7 (0.5 vs. 0.2 tablets; p=0.039) and 8 (0.5 vs. 0.2 tablets; p=0.015) compared to SA patients. CONCLUSIONS:Our study demonstrated that postoperative pain levels do not significantly differ between RCR and short-stay SA, with greater narcotic usage observed for RCR at one week after surgery. These findings support the notion that despite the increased invasiveness of SA, early postoperative pain is comparable with early pain after RCR.
PMID: 39427728
ISSN: 1532-6500
CID: 5745992