NYUHSL Faculty Bibliography

Searched for:

in-biosketch:yes

person:aphiny01

Total Results:

101

Journal of general internal medicine. 2017(40th):S369-S370.DOI:

Using natural language processing to automate grading of student's patient notes: A pilot study of machine learning text classification [Meeting Abstract]

Kalet, A; Oh, S -Y; Marin, M; Yu, Y; Dumorne, H; Aphinyanaphongs, Y

BACKGROUND: At NYU, as part of a comprehensive objective structured clinical skills exam, experienced medical educators judge clinical knowledge, decision-making, and clinical reasoning skills of trainees based on their patient notes. Despite being rubric-driven, this task requires tremendous time and effort to establish consistent scoring, delaying and limiting individualized feedback. We conducted pilot machine learning text classification studies to establish if accurate automated scoring of clinical notes is possible. METHODS: As a use case, we tested 100 student written clinical notes from7 standardized patient cases (Vision Loss, Tel Diarrhea, Difficulty Sleeping, Shoulder Pain, Failure To Thrive, Abdominal, Pain, Palpitations) that had been scored for quality of clinical reasoning by faculty on a 1-4 scale. In order to assess performance of NLP strategies to categorize students in meaningful groups we dichotomized students based on their faculty given scores by case into "failing" (score of 1, 5-18 students per case) and "passing" (score 2,3,4). We treated each task as a binary classification task in a text classification pipeline. First, we treated each note as a bag of tokens and weight each token with term frequency-inverse document frequency (TFIDF) a numerical statistic that reflects howimportant aword is to a document. We then applied 3 different classification algorithms (random forests, support vector machines, and Bayesian logistic regression) and measured discriminatory performance using Area Under Curve (AUC) in a cross validation evaluation design. RESULTS: TFDIF performed with AUCs between 0.669 and 0.905. Logistic regression provided the highestAUC in four cases: Difficulty Sleeping (0.905), Shoulder Pain (0.618), Failure To Thrive (0.717) and Abdominal Pain (0.892). As we observed the highest AUCs in Difficulty Sleeping and Abdominal Pain cases, we have begun to refine the algorithm for these two cases by identifying the importance features that lead faculty to give students to a higher grade and improve the accuracy of NLP based scoring. Promising features include the presence and sequence of certainwords in the problem representation, sentence length in the management section, ranking of the differential diagnosis, sequence between key words (e.g. rule out appendicitis), and evidence of "thinkingness" or what many call semantic qualifiers. CONCLUSIONS: With additional effort to build targeted case specific classifiers for clinical content and reasoning, a validated machine-learning model may achieve partial or full automation of grading of the notes. This work, which builds on decades of clinical decision-making and critical reasoning research, may provide medical trainees with more and potentially better feedback; facilitating learning of clinical reasoning, freeing faculty to coach this process, and in the long run impacting healthcare quality and patient safety

EMBASE:615581953

ISSN: 0884-8734

CID: 2553842

Seminars in musculoskeletal radiology. 2017:21(1):32-36.DOI: 10.1055/s-0036-1597255

Big Data Analyses in Health and Opportunities for Research in Radiology

Aphinyanaphongs, Yindalon

This article reviews examples of big data analyses in health care with a focus on radiology. We review the defining characteristics of big data, the use of natural language processing, traditional and novel data sources, and large clinical data repositories available for research. This article aims to invoke novel research ideas through a combination of examples of analyses and domain knowledge.

PMID: 28253531

ISSN: 1098-898x

CID: 2471542

Academic radiology. 2016:23(12):1573-1581.DOI: 10.1016/j.acra.2016.08.011

Use of a Machine-learning Method for Predicting Highly Cited Articles Within General Radiology Journals

Rosenkrantz, Andrew B; Doshi, Ankur M; Ginocchio, Luke A; Aphinyanaphongs, Yindalon

RATIONALE AND OBJECTIVES: This study aimed to assess the performance of a text classification machine-learning model in predicting highly cited articles within the recent radiological literature and to identify the model's most influential article features. MATERIALS AND METHODS: We downloaded from PubMed the title, abstract, and medical subject heading terms for 10,065 articles published in 25 general radiology journals in 2012 and 2013. Three machine-learning models were applied to predict the top 10% of included articles in terms of the number of citations to the article in 2014 (reflecting the 2-year time window in conventional impact factor calculations). The model having the highest area under the curve was selected to derive a list of article features (words) predicting high citation volume, which was iteratively reduced to identify the smallest possible core feature list maintaining predictive power. Overall themes were qualitatively assigned to the core features. RESULTS: The regularized logistic regression (Bayesian binary regression) model had highest performance, achieving an area under the curve of 0.814 in predicting articles in the top 10% of citation volume. We reduced the initial 14,083 features to 210 features that maintain predictivity. These features corresponded with topics relating to various imaging techniques (eg, diffusion-weighted magnetic resonance imaging, hyperpolarized magnetic resonance imaging, dual-energy computed tomography, computed tomography reconstruction algorithms, tomosynthesis, elastography, and computer-aided diagnosis), particular pathologies (prostate cancer; thyroid nodules; hepatic adenoma, hepatocellular carcinoma, non-alcoholic fatty liver disease), and other topics (radiation dose, electroporation, education, general oncology, gadolinium, statistics). CONCLUSIONS: Machine learning can be successfully applied to create specific feature-based models for predicting articles likely to achieve high influence within the radiological literature.

PMID: 27692588

ISSN: 1878-4046

CID: 2273812

Gynecologic oncology. 2016:142(3):508-13.DOI: 10.1016/j.ygyno.2016.06.010

The safety of same-day discharge after laparoscopic hysterectomy for endometrial cancer

Lee, Jessica; Aphinyanaphongs, Yindalon; Curtin, John P; Chern, Jing-Yi; Frey, Melissa K; Boyd, Leslie R

OBJECTIVE: To determine factors influencing discharge patterns after laparoscopic hysterectomy for endometrial cancer and to evaluate the safety of same-day discharge during the 30-day postoperative period. METHODS: Using the American College of Surgeons' National Surgical Quality Improvement Project's database, patients who underwent hysterectomy for endometrial cancer from 2010 to 2014 were identified and categorized by their hospital length of stay. Statistical analyses were performed to assess the relationship between hospital stay and demographics, medical comorbidities, intraoperative surgical factors and postoperative outcomes. RESULTS: A total of 9020 patients had laparoscopic hysterectomies for endometrial cancer and of these, 729 patients (8.1%) were successfully discharged on the day of surgery. These patients were younger and had lower body mass indexes and fewer medical comorbidities than patients who were admitted after their procedure. The same-day discharge group underwent surgical procedures of less complexity than the hospital admission group based on shorter operative times and fewer relative value units (RVUs). There was a lower rate of surgical site infections in the same-day discharge group, and no difference in rates of other postoperative complications including hospital readmissions and reoperations. CONCLUSIONS: Rates of laparoscopic hysterectomy for endometrial cancer are gradually increasing but the rates of same-day discharge have increased at a much slower rate. Same-day discharge has been successful despite differences in preoperative demographics, medical comorbidities and intraoperative surgical complexity. Overall postoperative complication rates were equivalent despite length of hospital stay, demonstrating the safety and feasibility of same-day discharge after laparoscopic hysterectomy for endometrial cancer.

PMID: 27288543

ISSN: 1095-6859

CID: 2136712

Journal of translational medicine. 2016:14(1).DOI: 10.1186/s12967-016-0992-8

Classifying publications from the clinical and translational science award program along the translational research spectrum: a machine learning approach

Surkis, Alisa; Hogle, Janice A; DiazGranados, Deborah; Hunt, Joe D; Mazmanian, Paul E; Connors, Emily; Westaby, Kate; Whipple, Elizabeth C; Adamus, Trisha; Mueller, Meridith; Aphinyanaphongs, Yindalon

BACKGROUND: Translational research is a key area of focus of the National Institutes of Health (NIH), as demonstrated by the substantial investment in the Clinical and Translational Science Award (CTSA) program. The goal of the CTSA program is to accelerate the translation of discoveries from the bench to the bedside and into communities. Different classification systems have been used to capture the spectrum of basic to clinical to population health research, with substantial differences in the number of categories and their definitions. Evaluation of the effectiveness of the CTSA program and of translational research in general is hampered by the lack of rigor in these definitions and their application. This study adds rigor to the classification process by creating a checklist to evaluate publications across the translational spectrum and operationalizes these classifications by building machine learning-based text classifiers to categorize these publications. METHODS: Based on collaboratively developed definitions, we created a detailed checklist for categories along the translational spectrum from T0 to T4. We applied the checklist to CTSA-linked publications to construct a set of coded publications for use in training machine learning-based text classifiers to classify publications within these categories. The training sets combined T1/T2 and T3/T4 categories due to low frequency of these publication types compared to the frequency of T0 publications. We then compared classifier performance across different algorithms and feature sets and applied the classifiers to all publications in PubMed indexed to CTSA grants. To validate the algorithm, we manually classified the articles with the top 100 scores from each classifier. RESULTS: The definitions and checklist facilitated classification and resulted in good inter-rater reliability for coding publications for the training set. Very good performance was achieved for the classifiers as represented by the area under the receiver operating curves (AUC), with an AUC of 0.94 for the T0 classifier, 0.84 for T1/T2, and 0.92 for T3/T4. CONCLUSIONS: The combination of definitions agreed upon by five CTSA hubs, a checklist that facilitates more uniform definition interpretation, and algorithms that perform well in classifying publications along the translational spectrum provide a basis for establishing and applying uniform definitions of translational research categories. The classification algorithms allow publication analyses that would not be feasible with manual classification, such as assessing the distribution and trends of publications across the CTSA network and comparing the categories of publications and their citations to assess knowledge transfer across the translational research spectrum.

PMCID:4974725

PMID: 27492440

ISSN: 1479-5876

CID: 2199242

Gynecologic oncology. 2016:141:179-179.DOI: 10.1016/j.ygyno.2016.04.462

Factors associated with successful outpatient laparoscopic hysterectomy for women with endometrial cancer [Meeting Abstract]

Lee, J; Aphinyanaphongs, Y; Boyd, L R

Objectives: Minimally invasive surgery is the preferred surgical method to treat women with endometrial cancer. Several single-institution reports have described the feasibility and safety of outpatient laparoscopic hysterectomies (LH) for both benign and malignant indications. The objective of this study is to identify patient and surgical factors associated with outpatient laparoscopic hysterectomies (OLH) and to compare outcomes between OLH and inpatient laparoscopic hysterectomies (ILH) in women with endometrial cancer.Methods: Data were obtained from the American College of Surgeons' National Surgical Quality Improvement Program (NSQIP) database. All patients who underwent hysterectomies for endometrial cancer from 2007 to 2013 were identified by ICD-9 and CPT codes. These patients were then filtered for LH. Comparative analyses were performed and stratified by admission status to evaluate demographics, preoperative and intraoperative variables, and surgical outcomes. Statistical tests were performed with R Studio version 0.99.442.Results: LH rates have been steadily increasing. (See Table 1.) Between 2010 and 2013, 5,851 patients underwent LH for endometrial cancer; of these, 3,428 (58.6%) were ILH and 2,423 (41.4%) were OLH. OLH rates increased each year from 30.0% in 2010 to 50.0% in 2013. OLH patients were on average 61.81 years old compared with 63.03 years for ILH patients (P <.001). Medical comorbidities were not different between the 2 groups. Total operating time and anesthesia time were both significantly shorter in the OLH group: average times were 161.3 and 187.0 minutes (P <.001) and 245.2 versus 256.3 minutes (P =.002), respectively. More lymph node dissections were performed in the ILH group than the OLH group: 2,074 (60.5%) versus 1,390 (57.4%, P =.016). There were more radical hysterectomies in the ILH group (n = 803; 23.4%) compared with the OLH group (n = 315; 13.1%) (P <.001). OLHs were assigned fewer relative value units than ILHs (mean 28.5 vs 30.6, respectively, P <.001). Postoperative complications were not different between the groups.Conclusions: Younger age, fewer RVUs, shorter operating and anesthesia times were associated with successful OLH in patients with endometrial cancer. Lymph node dissection and radical surgery were associated with an increased rate of ILH. There were no differences in postoperative complications between OLH and ILH. (table present)

EMBASE:72341428

ISSN: 1095-6859

CID: 2204972

Journal of general internal medicine. 2016:31:S458-S458.DOI:

USING NATURAL LANGUAGE PROCESSING TO AUTOMATE GRADING OF STUDENTS' PATIENT NOTES: PROOF OF CONCEPT [Meeting Abstract]

Gershgorin, Irina; Marin, Marina; Xu, Junchuan; Oh, So-Young; Zabar, Sondra; Crowe, Ruth; Tewksbury, Linda; Ogilvie, Jennifer; Gillespie, Colleen; Cantor, Michael; Aphinyanaphongs, Yindalon; Kalet, Adina

ISI:000392201601297

ISSN: 1525-1497

CID: 2481862

Academic emergency medicine. 2016:23:S116-S116.DOI:

Models to predict hospital admission from the emergency department through the sole use of the medication administration record [Meeting Abstract]

Aphinyanaphongs, Y; Liang, Y; Theobald, J; Grover, H; Swartz, J L

Background: Multiple models have been developed to predict hospital admission for patients presenting to the ED. However, these tools suffer from multiple limitations including reliance on manual data entry (e.g. ED arrival mechanism), multiple types of data, and data that are not completely generalizable across institutions (e.g. triage score). An ideal solution would produce a disposition score that requires no data entry, employs variables already captured by all EDs, and provides a score far enough in advance to expedite admission processes. Objectives: Evaluate the discriminatory power of machine learning algorithms for predicting hospital admission at two hours of ED arrival through the sole use of the medication administration record (MAR). Methods: Our dataset included 27,757 encounters (26% admitted) from January 2013 to September 2014 and 2,109 medications encoded to RxNorm CUI numbers using MedEx. We included all medications in the MAR, including those given during prior ED visits. We employed classic and state[[Unsupported Character - Codename]]of [[Unsupported Character - Codename]]the[[Unsupported Character - Codename]]art classifiers including logistic regression, naive bayes, regularized logistic regression, classification and regression trees (CART), and linear support vector machine (SVM) with penalty parameter C. In all cases, we split the dataset into a training, validation, and test set. We used the validation set to optimize any parameters of the learning algorithm and used the test set to calculate performances. We employed 5[[Unsupported Character - Codename]]fold cross validation and reported AUC performances averaged across 5 folds. Results: The models performed with AUCs of 0.85 for linear SVM with penalty parameter C (95%CI 0.84-0.86), 0.83 for CART (95%CI 0.82-0.84), 0.79 for regularized logistic regression (95%CI 0.78-0.80), 0.70 for Naive Bayes (95%CI 0.69-0.72), and 0.68 for logistic regression (95%CI 0.67-0.69). Conclusion: MAR data is sufficient to reliably predict hospital admission two hours into the ED stay. Our models perform similarly to those from prior studies, but with the advantages of only requiring a single type of data and being highly generalizable to other institutions; MAR data is objective, does not require manual data entry, and is universally available across EDs

EMBASE:72280952

ISSN: 1553-2712

CID: 2151612

Pacific Symposium on Biocomputing. 2016:21:480-91.DOI:

TEXT CLASSIFICATION FOR AUTOMATIC DETECTION OF E-CIGARETTE USE AND USE FOR SMOKING CESSATION FROM TWITTER: A FEASIBILITY PILOT

Aphinyanaphongs, Yin; Lulejian, Armine; Brown, Duncan Penfold; Bonneau, Richard; Krebs, Paul

Rapid increases in e-cigarette use and potential exposure to harmful byproducts have shifted public health focus to e-cigarettes as a possible drug of abuse. Effective surveillance of use and prevalence would allow appropriate regulatory responses. An ideal surveillance system would collect usage data in real time, focus on populations of interest, include populations unable to take the survey, allow a breadth of questions to answer, and enable geo-location analysis. Social media streams may provide this ideal system. To realize this use case, a foundational question is whether we can detect e-cigarette use at all. This work reports two pilot tasks using text classification to identify automatically Tweets that indicate e-cigarette use and/or e-cigarette use for smoking cessation. We build and define both datasets and compare performance of 4 state of the art classifiers and a keyword search for each task. Our results demonstrate excellent classifier performance of up to 0.90 and 0.94 area under the curve in each category. These promising initial results form the foundation for further studies to realize the ideal surveillance solution.

PMCID:4721250

PMID: 26776211

ISSN: 2335-6936

CID: 1921322

Journal of cardiac failure. 2016:22(8):S16-S16.DOI:

ICU Patients with Severe Sepsis Receive Less Aggressive Fluid Resuscitation if They Have a Prior History of Heart Failure [Meeting Abstract]

Tanna, Monique S; Major, Vincent; Jones, Simon; Aphinyanaphongs, Yin

ISI:000381064700039

ISSN: 1532-8414

CID: 2227902