Searched for: in-biosketch:yes
person:aphiny01
Reusable Filtering Functions for Application in ICU data: a case study
Major, Vincent; Tanna, Monique S; Jones, Simon; Aphinyanaphongs, Yin
Complex medical data sometimes requires significant data preprocessing to prepare for analysis. The complexity can lead non-domain experts to apply simple filters of available data or to not use the data at all. The preprocessing choices can also have serious effects on the results of the study if incorrect decision or missteps are made. In this work, we present open-source data filters for an analysis motivated by understanding mortality in the context of sepsis- associated cardiomyopathy in the ICU. We report specific ICU filters and validations through chart review and graphs. These published filters reduce the complexity of using data in analysis by (1) encapsulating the domain expertise and feature engineering applied to the filter, by (2) providing debugged and ready code for use, and by (3) providing sensible validations. We intend these filters to evolve through pull requests and forks and serve as common starting points for specific analyses.
PMCID:5333239
PMID: 28269881
ISSN: 1942-597x
CID: 2476222
TEXT CLASSIFICATION FOR AUTOMATIC DETECTION OF E-CIGARETTE USE AND USE FOR SMOKING CESSATION FROM TWITTER: A FEASIBILITY PILOT
Aphinyanaphongs, Yin; Lulejian, Armine; Brown, Duncan Penfold; Bonneau, Richard; Krebs, Paul
Rapid increases in e-cigarette use and potential exposure to harmful byproducts have shifted public health focus to e-cigarettes as a possible drug of abuse. Effective surveillance of use and prevalence would allow appropriate regulatory responses. An ideal surveillance system would collect usage data in real time, focus on populations of interest, include populations unable to take the survey, allow a breadth of questions to answer, and enable geo-location analysis. Social media streams may provide this ideal system. To realize this use case, a foundational question is whether we can detect e-cigarette use at all. This work reports two pilot tasks using text classification to identify automatically Tweets that indicate e-cigarette use and/or e-cigarette use for smoking cessation. We build and define both datasets and compare performance of 4 state of the art classifiers and a keyword search for each task. Our results demonstrate excellent classifier performance of up to 0.90 and 0.94 area under the curve in each category. These promising initial results form the foundation for further studies to realize the ideal surveillance solution.
PMCID:4721250
PMID: 26776211
ISSN: 2335-6936
CID: 1921322
ICU Patients with Severe Sepsis Receive Less Aggressive Fluid Resuscitation if They Have a Prior History of Heart Failure [Meeting Abstract]
Tanna, Monique S; Major, Vincent; Jones, Simon; Aphinyanaphongs, Yin
ISI:000381064700039
ISSN: 1532-8414
CID: 2227902
A pilot application of automatic tweet detection of alcohol use at a music festival [Meeting Abstract]
Aphinyanaphongs, Y; Lucyk, S; Nguyen, V; Nelson, L; Krebs, P; Su, M; Smith, S W
Study Objectives: Previously, we built machine-learned models to automatically identify Tweets indicating alcohol use from 34,563 labeled Tweets collected over 24 hours during New Year's Day. The models demonstrated an estimated area under the receiver operating curve (AUROC) of 0.94 for identifying alcohol use Tweets. In this study, we validated our alcohol use model in an independently collected dataset - the Electric Zoo music festival on New York City's Randall's Island. This event attracted over 130,000 people in 2013 and resulted in two substance-associated deaths. Methods: The initial dataset contained all Tweets and Instagrams geo-tagged within 5 miles of Randall's Island, covering all event days from August 29-31, 2014. Two authors independently reviewed Tweets for drug- or alcohol-related content. 10% of the Tweets were randomly selected for dual independent review to determine agreement using a weighted Cohen's kappa. Identified Tweets were then jointly reviewed to determine those indicative of alcohol use according to previous definitions. Tweets and Instagrams were considered indicators of alcohol use if they referred to: intention to drink, the act of drinking, location at a bar or liquor store, mention of a specific brand, drinking paraphernalia (eg, flask), consequences from drinking (eg, drunk, wasted, tipsy), or alcohol-related hashtags. Our Bayesian logistic regression machine learned model, which had been derived only from Tweets, was applied to a restricted dataset excluding Instagrams. Results: The complete geo-located collection included 11,071 Tweets and Instagrams. The restricted dataset containing only Tweets consisted of 2,928 elements, of which 82 Tweets were classified as drug- or alcohol-related (weighted kappa = 0.92). Of these, 23 Tweets explicitly referenced alcohol use (eg, "Wine at Zoo is the right play. Instadrunk;" "Wow. I am not sober;" "#clskipfridays #livesummer #Ezoo #were dumb #and drunk"). The model achieved an AUROC of 0.87 when applied to this independent Tweet validation set. Conclusion: Our machine-learned model automatically identified alcohol use at Electric Zoo with high discriminatory power. Differences between the previous estimated AUROC performance and the validated AUROC performance are likely due to language variations between the two groups. An in-depth error analysis may identify approaches to improve model performance. The ability to automate social media geosurveillance of substance behavior at events could be coupled with real-time data feeds. Model automation would allow these real-time data feeds to be analyzed for potential public health interventions (including messaging, Tweet geodensity dependent medical presence, or other measures) to further reduce harm
EMBASE:72032552
ISSN: 0196-0644
CID: 1840842
Text Classification-based Automatic Recruitment of Patients for Clinical Trials A Silver Standards-based Case Study
Chapter by: Ray, Bisakha; Aphinyanaphongs, Yindalon; Heffron, Sean
in: 2015 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2015) by Balakrishnan, P; Srivatsava, J; Fu, WT; Harabagiu, S; Wang, F [Eds]
pp. 28-33
ISBN: 978-1-4673-9548-9
CID: 2352122
Integrating text messaging in a safety-net office-based buprenorphine program: A feasibility study [Meeting Abstract]
Tofighi, B; Grossman, E; Bereket, S; Aphinyanaphongs, Y; Lee, J D
Aims: (1) Assess feasibility of a text message appointment reminder (TMR) intervention (2) Determine the clinical impact of the TMR on appointment adherence Methods: A 52-item survey was administered to 100 patients in an urban, public sector, office-based buprenorphine program between June 2013 and March 2014. Survey domains included: demographic characteristics, communication patterns, and content preferences for supportive, informational, and relapse prevention TM interventions. A TMR was then sent 7, 4, 1 day prior to the patients' upcoming appointment followed by a 16 item survey that assessed satisfaction and feedback for the TM reminders (n = 72). Results: Respondents were predominately African-American (42%), unemployed or reliant on public assistance (68%), and lacked permanent housing (52%). MP ownership was common (93%) with the caveat of a high turnover of phones (2) and phone numbers (2) in the past year. Most reported TM use (93%) and comfort with sending TM (79%). The feasibility survey demonstrated satisfaction with the TMR (100%) and most preferred receiving text reminders (88%) in place of telephone reminders at 6 months. There was no significant difference between participants receiving the TMR compared to patients that did not receive the reminders. Conclusions: TM based interventions are an acceptable and feasible strategy for enhancing the delivery of care in a safety net, office-based buprenorphine program
EMBASE:72176978
ISSN: 0376-8716
CID: 1946352
Pilot Study on Text Classification Methods to Identify Potential Subjects for Clinical Trials
Chapter by: Ray, Bisakha; Heffron, Sean; Kang, Stella; Aphinyanaphongs, Yindalon
in: Program & abstract book (9th Annual Machine Learning Symposium March 13, 2015) by
[New York] : New York Academy of Sciences, 2015
pp. 56-56
ISBN:
CID: 1895872
Designing and Implementing INTREPID, an Intensive Program in Translational Research Methodologies for New Investigators
Plottel, Claudia S; Aphinyanaphongs, Yindalon; Shao, Yongzhao; Micoli, Keith J; Fang, Yixin; Goldberg, Judith D; Galeano, Claudia R; Stangel, Jessica H; Chavis-Keeling, Deborah; Hochman, Judith S; Cronstein, Bruce N; Pillinger, Michael H
Senior housestaff and junior faculty are often expected to perform clinical research, yet may not always have the requisite knowledge and skills to do so successfully. Formal degree programs provide such knowledge, but require a significant commitment of time and money. Short-term training programs (days to weeks) provide alternative ways to accrue essential information and acquire fundamental methodological skills. Unfortunately, published information about short-term programs is sparse. To encourage discussion and exchange of ideas regarding such programs, we here share our experience developing and implementing INtensive Training in Research Statistics, Ethics, and Protocol Informatics and Design (INTREPID), a 24-day immersion training program in clinical research methodologies. Designing, planning, and offering INTREPID was feasible, and required significant faculty commitment, support personnel and infrastructure, as well as committed trainees. Clin Trans Sci 2014; Volume #: 1-7.
PMCID:4267993
PMID: 25066862
ISSN: 1752-8062
CID: 1089772
A Comprehensive Empirical Comparison of Modern Supervised Classification and Feature Selection Methods for Text Categorization
Aphinyanaphongs, Yindalon; Fu, Lawrence D; Li, Zhiguo; Peskin, Eric R; Efstathiadis, Efstratios; Aliferis, Constantin F; Statnikov, Alexander
An important aspect to performing text categorization is selecting appropriate supervised classification and feature selection methods. A comprehensive benchmark is needed to inform best practices in this broad application field. Previous benchmarks have evaluated performance for a few supervised classification and feature selection methods and limited ways to optimize them. The present work updates prior benchmarks by increasing the number of classifiers and feature selection methods order of magnitude, including adding recently developed, state-of-the-art methods. Specifically, this study used 229 text categorization data sets/tasks, and evaluated 28 classification methods (both well-established and proprietary/commercial) and 19 feature selection methods according to 4 classification performance metrics. We report several key findings that will be helpful in establishing best methodological practices for text categorization.
ISI:000342346500002
ISSN: 2330-1643
CID: 1313832
Text classification for automatic detection of alcohol use-related tweets: A feasibility study
Chapter by: Aphinyanaphongs, Y; Ray, B; Statnikov, A; Krebs, P
in: 2014 IEEE 15th International Conference on Information Reuse and Integration by
Piscataway, NJ : IEEE, 2014
pp. 93-97
ISBN: 978-1-4799-5880-1
CID: 1515072