NYUHSL Faculty Bibliography

Searched for:

in-biosketch:true

person:pat218

Total Results:

JMIR medical informatics. 2025:13.DOI: 10.2196/73504

Leveraging Machine Learning and Robotic Process Automation to Identify and Convert Unstructured Colonoscopy Results Into Actionable Data: Proof-of-Concept Study

Stevens, Elizabeth R; Hartman, Jager; Testa, Paul; Mansukhani, Ajay; Monina, Casey; Shunk, Amelia; Ranson, David; Imberg, Yana; Cote, Ann; Prabhu, Dinesha; Szerencsy, Adam

BACKGROUND/UNASSIGNED:With rising patient volumes and a focus on quality, our health system had the objective to create a more efficient way to ensure accurate documentation of colorectal cancer (CRC) screening intervals from inbound colonoscopy reports to ensure timely follow-up. We developed an integrated end-to-end workflow solution using machine learning (ML) and robotic process automation (RPA) to extract and update electronic health record (EHR) follow-up dates from unstructured data. OBJECTIVE/UNASSIGNED:This study aimed to automate data extraction from external, free-text colonoscopy reports to identify and document recommended follow-up dates for CRC screening in structured EHR fields. METHODS/UNASSIGNED:As proof of concept, we outline the process development, validity, and implementation of an approach that integrates available tools to automate data retrieval and entry within the EHR of a large academic health system. The health system uses Epic Systems as its EHR platform, and the ML model used was trained on health system patient colonoscopy reports. This proof-of-concept process study consisted of six stages: (1) identification of gaps in documenting recommendations for follow-up CRC screening from external colonoscopy reports, (2) defining process objectives, (3) identification of technologies, (4) creation of process architecture, (5) process validation, and (6) health system-wide implementation. A chart review was performed to validate process outcomes and estimate impact. RESULTS/UNASSIGNED:We developed an automated process with 3 primary steps leveraging ML and RPA to create a fully orchestrated workflow to update CRC screening recall dates based on colonoscopy reports received from external sources. Process validity was assessed with 690 scanned colonoscopy reports. During process validation, the overall automated process achieved an accuracy of 80.7% (557/690, 95% CI 77.8%-83.7%) for correctly identifying the presence or absence of a valid follow-up date and a follow-up date false negative identification rate of 32.9% (130/395, 95% CI 29.4%-36.4%). From the organization-wide implementation to go-live until December 31, 2024, the system processed 16,563 external colonoscopy reports. Of these, 35.3% (5841/16,563) had a follow-up date meeting the relevant ML model threshold and thus were identified as ready for RPA processing. CONCLUSIONS/UNASSIGNED:Implementation of an automated workflow to extract and update CRC screening follow-up dates from colonoscopy reports is feasible and has the potential to improve accuracy in patient recall while reducing documentation burden. By standardizing data ingestion, extending this approach to various unstructured data types can address deficiencies in structured EHR documentation and solve for a lack of data integration and reporting for quality measures. Automated workflows leveraging ML and RPA offer practical solutions to overcome interoperability challenges and the use of unstructured data within health care systems.

PMCID:12634012

PMID: 41264858

ISSN: 2291-9694

CID: 5969362

JAMA network open. 2025:8(8).DOI: 10.1001/jamanetworkopen.2025.26339

Evaluating Hospital Course Summarization by an Electronic Health Record-Based Large Language Model

Small, William R; Austrian, Jonathan; O'Donnell, Luke; Burk-Rafel, Jesse; Hochman, Katherine A; Goodman, Adam; Zaretsky, Jonah; Martin, Jacob; Johnson, Stephen; Major, Vincent J; Jones, Simon; Henke, Christian; Verplanke, Benjamin; Osso, Jwan; Larson, Ian; Saxena, Archana; Mednick, Aron; Simonis, Choumika; Han, Joseph; Kesari, Ravi; Wu, Xinyuan; Heery, Lauren; Desel, Tenzin; Baskharoun, Samuel; Figman, Noah; Farooq, Umar; Shah, Kunal; Jahan, Nusrat; Kim, Jeong Min; Testa, Paul; Feldman, Jonah

IMPORTANCE/UNASSIGNED:Hospital course (HC) summarization represents an increasingly onerous discharge summary component for physicians. Literature supports large language models (LLMs) for HC summarization, but whether physicians can effectively partner with electronic health record-embedded LLMs to draft HCs is unknown. OBJECTIVES/UNASSIGNED:To compare the editing effort required by time-constrained resident physicians to improve LLM- vs physician-generated HCs toward a novel 4Cs (complete, concise, cohesive, and confabulation-free) HC. DESIGN, SETTING, AND PARTICIPANTS/UNASSIGNED:Quality improvement study using a convenience sample of 10 internal medicine resident editors, 8 hospitalist evaluators, and randomly selected general medicine admissions in December 2023 lasting 4 to 8 days at New York University Langone Health. EXPOSURES/UNASSIGNED:Residents and hospitalists reviewed randomly assigned patient medical records for 10 minutes. Residents blinded to author type who edited each HC pair (physician and LLM) for quality in 3 minutes, followed by comparative ratings by attending hospitalists. MAIN OUTCOMES AND MEASURES/UNASSIGNED:Editing effort was quantified by analyzing the edits that occurred on the HC pairs after controlling for length (percentage edited) and the degree to which the original HCs' meaning was altered (semantic change). Hospitalists compared edited HC pairs with A/B testing on the 4Cs (5-point Likert scales converted to 10-point bidirectional scales). RESULTS/UNASSIGNED:Among 100 admissions, compared with physician HCs, residents edited a smaller percentage of LLM HCs (LLM mean [SD], 31.5% [16.6%] vs physicians, 44.8% [20.0%]; P < .001). Additionally, LLM HCs required less semantic change (LLM mean [SD], 2.4% [1.6%] vs physicians, 4.9% [3.5%]; P < .001). Attending physicians deemed LLM HCs to be more complete (mean [SD] difference LLM vs physicians on 10-point bidirectional scale, 3.00 [5.28]; P < .001), similarly concise (mean [SD], -1.02 [6.08]; P = .20), and cohesive (mean [SD], 0.70 [6.14]; P = .60), but with more confabulations (mean [SD], -0.98 [3.53]; P = .002). The composite scores were similar (mean [SD] difference LLM vs physician on 40-point bidirectional scale, 1.70 [14.24]; P = .46). CONCLUSIONS AND RELEVANCE/UNASSIGNED:Electronic health record-embedded LLM HCs required less editing than physician-generated HCs to approach a quality standard, resulting in HCs that were comparably or more complete, concise, and cohesive, but contained more confabulations. Despite the potential influence of artificial time constraints, this study supports the feasibility of a physician-LLM partnership for writing HCs and provides a basis for monitoring LLM HCs in clinical practice.

PMID: 40802185

ISSN: 2574-3805

CID: 5906762

Applied clinical informatics. 2025:16(4):1114-1120.DOI: 10.1055/a-2675-3510

Disappearing Text as a Clinical Decision Support Layer: A Case Series

Silberlust, Jared; Small, William; Shah, Darshi; Chakravartty, Eesha; Moawad, Katherine; Moawad, Andrew; Testa, Paul; Feldman, Jonah

OBJECTIVES/OBJECTIVE:This case series aims to evaluate several applications of inline disappearing text (DT) clinical decision support (CDS) tools within clinician documentation. METHODS:DT blocks were created to prompt documentation for perioperative anticoagulation planning (Scenario 1), pre-discharge intravenous antibiotic planning (Scenario 2), and advanced care planning (Scenario 3). In Scenario 1, DT was the only intervention. In Scenario 2, DT was paired with a documentation SmartList. In Scenario 3, DT was paired with a documentation SmartList and an OurPractice Advisory. The number of documented perioperative anticoagulation plans, pre-discharge intravenous antibiotic plans, and Advanced Care Planning notes were measured pre- and post-intervention and compared using Chi-square analyses. RESULTS:In Scenario 1, there was no statistically significant change in the percentage of perioperative anticoagulation plans documented at 0-24 and 24-48 hours before surgery. In Scenario 2, documentation of antibiotic contingency planning in patients expected to be discharged within 24 hours increased from 60% (54 of 90 notes) to 93% (1,850 of 1,994 notes) X2 (1, N=2,084) = 113.1, p < 0.001. In Scenario 3, ACP note documentation by discharge in patients with a positive mandatory surprise question increased from 43% (821 of 1,909 encounters) to 52% (975 of 1,874 encounters) X2 (1, N=3,783) = 30.5, p < 0.001. CONCLUSIONS:Utilizing DT in conjunction with other forms of CDS was associated with an improvement of documentation quality in pre-discharge IV antibiotics and advanced care planning. A sociotechnical analysis explores how interactions between technology, people, workflow, and culture could contextualize how utilizing DT with other forms of CDS was more effective than DT alone.

PMID: 40763805

ISSN: 1869-0327

CID: 5905032

Journal of medical Internet research. 2025:27.DOI: 10.2196/69955

Enhancing the Readability of Online Patient Education Materials Using Large Language Models: Cross-Sectional Study

Will, John; Gupta, Mahin; Zaretsky, Jonah; Dowlath, Aliesha; Testa, Paul; Feldman, Jonah

BACKGROUND:Online accessible patient education materials (PEMs) are essential for patient empowerment. However, studies have shown that these materials often exceed the recommended sixth-grade reading level, making them difficult for many patients to understand. Large language models (LLMs) have the potential to simplify PEMs into more readable educational content. OBJECTIVE:We sought to evaluate whether 3 LLMs (ChatGPT [OpenAI], Gemini [Google], and Claude [Anthropic PBC]) can optimize the readability of PEMs to the recommended reading level without compromising accuracy. METHODS:This cross-sectional study used 60 randomly selected PEMs available online from 3 websites. We prompted LLMs to simplify the reading level of online PEMs. The primary outcome was the readability of the original online PEMs compared with the LLM-simplified versions. Readability scores were calculated using 4 validated indices Flesch Reading Ease, Flesch-Kincaid Grade Level, Gunning Fog Index, and Simple Measure of Gobbledygook Index. Accuracy and understandability were also assessed as balancing measures, with understandability measured using the Patient Education Materials Assessment Tool-Understandability (PEMAT-U). RESULTS:The original readability scores for the American Heart Association (AHA), American Cancer Society (ACS), and American Stroke Association (ASA) websites were above the recommended sixth-grade level, with mean grade level scores of 10.7,10.0, and 9.6, respectively. After optimization by the LLMs, readability scores significantly improved across all 3 websites when compared with the original text. Compared with the original website, Wilcoxon signed rank test showed ChatGPT improved the readability to 7.6 from 10.1 (P<.001); Gemini, to 6.6 (P<.001); and Claude, to 5.6 (P<.001). Word counts were significantly reduced by all LLMs, with a decrease from a mean range of 410.9-953.9 words to a mean range of 201.9-248.1 words. None of the ChatGPT LLM-simplified PEMs were inaccurate, while 3.3% of Gemini and Claude LLM-simplified PEMs were inaccurate. Baseline understandability scores, as measured by PEMAT-U, were preserved across all LLM-simplified versions. CONCLUSIONS:This cross-sectional study demonstrates that LLMs have the potential to significantly enhance the readability of online PEMs while maintaining accuracy and understandability, making them more accessible to a broader audience. However, variability in model performance and demonstrated inaccuracies underscore the need for human review of LLM output. Further study is needed to explore advanced LLM techniques and models trained for medical content.

PMID: 40465378

ISSN: 1438-8871

CID: 5862402

Quality management in health care. 2025:34(3):243-248.DOI: 10.1097/QMH.0000000000000481

Specialty-Based Ambulatory Quality Improvement Program: A Specialty-Specific Ambulatory Metric Project

Nagler, Arielle R; Testa, Paul A; Cho, Ilseung; Ogedegbe, Gbenga; Kalkut, Gary; Gossett, Dana R

BACKGROUND AND OBJECTIVES/OBJECTIVE:Healthcare is increasingly being delivered in the outpatient setting, but robust quality improvement programs and performance metrics are lacking in ambulatory care, particularly specialty-based ambulatory care. METHODS:To promote quality improvement in ambulatory care, we developed an infrastructure to create specialty-specific quality measures and dashboards that could be used to display providers' performance across relevant measures to individual providers and institutional leaders. RESULTS:The products of this program include a governance and infrastructure for specialty-specific ambulatory quality metrics as well as two distinct dashboards for data display. One dashboard is provider-facing, displaying provider's performance on specialty-specific measures as compared to institutional standards. The second dashboard is a leadership dashboard that provides overall and provider-level information on performance across measures. CONCLUSIONS:The Specialty-based Ambulatory Quality program reflects a systematic, institutionally-supported quality improvement framework that can be applied across diverse ambulatory specialties. As next steps, we plan to evaluate the program's impact on provider performance across measures and expand this program to other specialties practicing in the outpatient setting.

PMID: 39466606

ISSN: 1550-5154

CID: 5746782

Journal of the American Medical Informatics Association. 2025:32(2):268-274.DOI: 10.1093/jamia/ocae285

Health system-wide access to generative artificial intelligence: the New York University Langone Health experience

Malhotra, Kiran; Wiesenfeld, Batia; Major, Vincent J; Grover, Himanshu; Aphinyanaphongs, Yindalon; Testa, Paul; Austrian, Jonathan S

OBJECTIVES/OBJECTIVE:The study aimed to assess the usage and impact of a private and secure instance of a generative artificial intelligence (GenAI) application in a large academic health center. The goal was to understand how employees interact with this technology and the influence on their perception of skill and work performance. MATERIALS AND METHODS/METHODS:New York University Langone Health (NYULH) established a secure, private, and managed Azure OpenAI service (GenAI Studio) and granted widespread access to employees. Usage was monitored and users were surveyed about their experiences. RESULTS:Over 6 months, over 1007 individuals applied for access, with high usage among research and clinical departments. Users felt prepared to use the GenAI studio, found it easy to use, and would recommend it to a colleague. Users employed the GenAI studio for diverse tasks such as writing, editing, summarizing, data analysis, and idea generation. Challenges included difficulties in educating the workforce in constructing effective prompts and token and API limitations. DISCUSSION/CONCLUSIONS:The study demonstrated high interest in and extensive use of GenAI in a healthcare setting, with users employing the technology for diverse tasks. While users identified several challenges, they also recognized the potential of GenAI and indicated a need for more instruction and guidance on effective usage. CONCLUSION/CONCLUSIONS:The private GenAI studio provided a useful tool for employees to augment their skills and apply GenAI to their daily tasks. The study underscored the importance of workforce education when implementing system-wide GenAI and provided insights into its strengths and weaknesses.

PMCID:11756645

PMID: 39584477

ISSN: 1527-974x

CID: 5778212

American journal of obstetrics & gynecology MFM. 2024.DOI: 10.1016/j.ajogmf.2024.101520

Comparing Users to Non-Users of Remote Patient Monitoring for Postpartum Hypertension [Letter]

Kidd, Jennifer M J; Alku, Dajana; Vertichio, Rosanne; Akerman, Meredith; Prasannan, Lakha; Mann, Devin M; Testa, Paul A; Chavez, Martin; Heo, Hye J

PMID: 39396754

ISSN: 2589-9333

CID: 5718282

Applied clinical informatics. 2024:15(5):1093-1096.DOI: 10.1055/s-0044-1791488

Reference Ranges for All: Implementing Reference Ranges for Transgender and Nonbinary Patients [Case Report]

Cardillo, Anthony B; Chen, Dan; Haghi, Nina; O'Donnell, Luke; Jhang, Jeffrey; Testa, Paul A; Genes, Nicholas

OBJECTIVES/OBJECTIVE: This study aimed to highlight the necessity of developing and implementing appropriate reference ranges for transgender and nonbinary (TGNB) patient populations to minimize misinterpretation of laboratory results and ensure equitable health care. CASE REPORT/METHODS: We describe a situation where a TGNB patient's abnormal laboratory values were not flagged due to undefined reference ranges for gender "X" in the Laboratory Information System (LIS). Implementation of additional reference ranges mapped to sex label "X" showed significant improvement in flagging abnormal lab results, utilizing sex-invariant reporting as an interim solution while monitoring developments on TGNB-specific reference ranges. CONCLUSION/CONCLUSIONS: Informatics professionals should assess their institution's policies for registration and lab reporting on TGNB patients as nonimplementation poses significant patient safety risks. Best practices include using TGNB-specific reference ranges emerging in the literature, reporting both male and female reference ranges for clinical interpretation and sex-invariant reporting.

PMCID:11655151

PMID: 39694068

ISSN: 1869-0327

CID: 5764552

ACI open. 2024:8(2):e62-e68.DOI: 10.1055/s-0044-1788621

Enhancing Secure Messaging in Electronic Health Records: Evaluating the Impact of Emoji Chat Reactions on the Volume of Interruptive Notifications

Will, John; Small, William; Iturrate, Eduardo; Testa, Paul; Feldman, Jonah

ORIGINAL:0017336

ISSN: 2566-9346

CID: 5686602

NPJ digital medicine. 2024:7(1).DOI: 10.1038/s41746-024-01179-5

From silos to synergy: integrating academic health informatics with operational IT for healthcare transformation

Mann, Devin M; Stevens, Elizabeth R; Testa, Paul; Mherabi, Nader

We have entered a new age of health informatics—applied health informatics—where digital health innovation cannot be pursued without considering operational needs. In this new digital health era, creating an integrated applied health informatics system will be essential for health systems to achieve informatics healthcare goals. Integration of information technology (IT) and health informatics does not naturally occur without a deliberate and intentional shift towards unification. Recognizing this, NYU Langone Health’s (NYULH) Medical Center IT (MCIT) has taken proactive measures to vertically integrate academic informatics and operational IT through the establishment of the MCIT Department of Health Informatics (DHI). The creation of the NYULH DHI showcases the drivers, challenges, and ultimate successes of our enterprise effort to align academic health informatics with IT; providing a model for the creation of the applied health informatics programs required for academic health systems to thrive in the increasingly digitized healthcare landscape.

PMCID:11233608

PMID: 38982211

ISSN: 2398-6352

CID: 5732312