Try a new search

Format these results:

Searched for:

in-biosketch:yes

person:aphiny01

Total Results:

97


Pilot Study on Text Classification Methods to Identify Potential Subjects for Clinical Trials

Chapter by: Ray, Bisakha; Heffron, Sean; Kang, Stella; Aphinyanaphongs, Yindalon
in: Program & abstract book (9th Annual Machine Learning Symposium March 13, 2015) by
[New York] : New York Academy of Sciences, 2015
pp. 56-56
ISBN:
CID: 1895872

Designing and Implementing INTREPID, an Intensive Program in Translational Research Methodologies for New Investigators

Plottel, Claudia S; Aphinyanaphongs, Yindalon; Shao, Yongzhao; Micoli, Keith J; Fang, Yixin; Goldberg, Judith D; Galeano, Claudia R; Stangel, Jessica H; Chavis-Keeling, Deborah; Hochman, Judith S; Cronstein, Bruce N; Pillinger, Michael H
Senior housestaff and junior faculty are often expected to perform clinical research, yet may not always have the requisite knowledge and skills to do so successfully. Formal degree programs provide such knowledge, but require a significant commitment of time and money. Short-term training programs (days to weeks) provide alternative ways to accrue essential information and acquire fundamental methodological skills. Unfortunately, published information about short-term programs is sparse. To encourage discussion and exchange of ideas regarding such programs, we here share our experience developing and implementing INtensive Training in Research Statistics, Ethics, and Protocol Informatics and Design (INTREPID), a 24-day immersion training program in clinical research methodologies. Designing, planning, and offering INTREPID was feasible, and required significant faculty commitment, support personnel and infrastructure, as well as committed trainees. Clin Trans Sci 2014; Volume #: 1-7.
PMCID:4267993
PMID: 25066862
ISSN: 1752-8062
CID: 1089772

A Comprehensive Empirical Comparison of Modern Supervised Classification and Feature Selection Methods for Text Categorization

Aphinyanaphongs, Yindalon; Fu, Lawrence D; Li, Zhiguo; Peskin, Eric R; Efstathiadis, Efstratios; Aliferis, Constantin F; Statnikov, Alexander
An important aspect to performing text categorization is selecting appropriate supervised classification and feature selection methods. A comprehensive benchmark is needed to inform best practices in this broad application field. Previous benchmarks have evaluated performance for a few supervised classification and feature selection methods and limited ways to optimize them. The present work updates prior benchmarks by increasing the number of classifiers and feature selection methods order of magnitude, including adding recently developed, state-of-the-art methods. Specifically, this study used 229 text categorization data sets/tasks, and evaluated 28 classification methods (both well-established and proprietary/commercial) and 19 feature selection methods according to 4 classification performance metrics. We report several key findings that will be helpful in establishing best methodological practices for text categorization.
ISI:000342346500002
ISSN: 2330-1643
CID: 1313832

Text classification for automatic detection of alcohol use-related tweets: A feasibility study

Chapter by: Aphinyanaphongs, Y; Ray, B; Statnikov, A; Krebs, P
in: 2014 IEEE 15th International Conference on Information Reuse and Integration by
Piscataway, NJ : IEEE, 2014
pp. 93-97
ISBN: 978-1-4799-5880-1
CID: 1515072

Computer models for identifying instrumental citations in the biomedical literature

Fu, Lawrence D.; Aphinyanaphongs, Yindalon; Aliferis, Constantin F.
The most popular method for evaluating the quality of a scientific publication is citation count. This metric assumes that a citation is a positive indicator of the quality of the cited work. This assumption is not always true since citations serve many purposes. As a result, citation count is an indirect and imprecise measure of impact. If instrumental citations could be reliably distinguished from non-instrumental ones, this would readily improve the performance of existing citation-based metrics by excluding the non-instrumental citations. A citation was operationally defined as instrumental if either of the following was true: the hypothesis of the citing work was motivated by the cited work, or the citing work could not have been executed without the cited work. This work investigated the feasibility of developing computer models for automatically classifying citations as instrumental or non-instrumental. Instrumental citations were manually labeled, and machine learning models were trained on a combination of content and bibliometric features. The experimental results indicate that models based on content and bibliometric features are able to automatically classify instrumental citations with high predictivity (AUC = 0.86). Additional experiments using independent hold out data and prospective validation show that the models are generalizeable and can handle unseen cases. This work demonstrates that it is feasible to train computer models to automatically identify instrumental citations. C1 [Fu, Lawrence D.; Aphinyanaphongs, Yindalon] NYU Med Ctr, Ctr Hlth Informat & Bioinformat, Dept Med, New York, NY 10016 USA. [Aliferis, Constantin F.] NYU Med Ctr, Ctr Hlth Informat & Bioinformat, Dept Pathol, New York, NY 10016 USA
ISI:000327219900020
ISSN: 0138-9130
CID: 687922

Identifying unproven cancer treatments on the health web: addressing accuracy, generalizability and scalability

Aphinyanaphongs, Yin; Fu, Lawrence D; Aliferis, Constantin F
Building machine learning models that identify unproven cancer treatments on the Health Web is a promising approach for dealing with the dissemination of false and dangerous information to vulnerable health consumers. Aside from the obvious requirement of accuracy, two issues are of practical importance in deploying these models in real world applications. (a) Generalizability: The models must generalize to all treatments (not just the ones used in the training of the models). (b) Scalability: The models can be applied efficiently to billions of documents on the Health Web. First, we provide methods and related empirical data demonstrating strong accuracy and generalizability. Second, by combining the MapReduce distributed architecture and high dimensionality compression via Markov Boundary feature selection, we show how to scale the application of the models to WWW-scale corpora. The present work provides evidence that (a) a very small subset of unproven cancer treatments is sufficient to build a model to identify unproven treatments on the web; (b) unproven treatments use distinct language to market their claims and this language is learnable; (c) through distributed parallelization and state of the art feature selection, it is possible to prepare the corpora and build and apply models with large scalability.
PMCID:4162393
PMID: 23920640
ISSN: 0926-9630
CID: 484192

A comparison of evaluation metrics for biomedical journals, articles, and websites in terms of sensitivity to topic

Fu, Lawrence D; Aphinyanaphongs, Yindalon; Wang, Lily; Aliferis, Constantin F
Evaluating the biomedical literature and health-related websites for quality are challenging information retrieval tasks. Current commonly used methods include impact factor for journals, PubMed's clinical query filters and machine learning-based filter models for articles, and PageRank for websites. Previous work has focused on the average performance of these methods without considering the topic, and it is unknown how performance varies for specific topics or focused searches. Clinicians, researchers, and users should be aware when expected performance is not achieved for specific topics. The present work analyzes the behavior of these methods for a variety of topics. Impact factor, clinical query filters, and PageRank vary widely across different topics while a topic-specific impact factor and machine learning-based filter models are more stable. The results demonstrate that a method may perform excellently on average but struggle when used on a number of narrower topics. Topic-adjusted metrics and other topic robust methods have an advantage in such situations. Users of traditional topic-sensitive metrics should be aware of their limitations
PMCID:3143298
PMID: 21419864
ISSN: 1532-0480
CID: 135570

Trends and developments in bioinformatics in 2010: prospects and perspectives

Aliferis, C F; Alekseyenko, A V; Aphinyanaphongs, Y; Brown, S; Fenyo, D; Fu, L; Shen, S; Statnikov, A; Wang, J
OBJECTIVES: To survey major developments and trends in the field of Bioinformatics in 2010 and their relationships to those of previous years, with emphasis on long-term trends, on best practices, on quality of the science of informatics, and on quality of science as a function of informatics. METHODS: A critical review of articles in the literature of Bioinformatics over the past year. RESULTS: Our main results suggest that Bioinformatics continues to be a major catalyst for progress in Biology and Translational Medicine, as a consequence of new assaying technologies, most pre-dominantly Next Generation Sequencing, which are changing the landscape of modern biological and medical research. These assays critically depend on bioinformatics and have led to quick growth of corresponding informatics methods development. Clinical-grade molecular signatures are proliferating at a rapid rate. However, a highly publicized incident at a prominent university showed that deficiencies in informatics methods can lead to catastrophic consequences for important scientific projects. Developing evidence-driven protocols and best practices is greatly needed given how serious are the implications for the quality of translational and basic science. CONCLUSIONS: Several exciting new methods have appeared over the past 18 months, that open new roads for progress in bioinformatics methods and their impact in biomedicine. At the same time, the range of open problems of great significance is extensive, ensuring the vitality of the field for many years to come.
PMID: 21938341
ISSN: 0943-4747
CID: 174460

Text categorization models for identifying unproven cancer treatments on the web

Aphinyanaphongs, Yin; Aliferis, Constantin
The nature of the internet as a non-peer-reviewed (and largely unregulated) publication medium has allowed wide-spread promotion of inaccurate and unproven medical claims in unprecedented scale. Patients with conditions that are not currently fully treatable are particularly susceptible to unproven and dangerous promises about miracle treatments. In extreme cases, fatal adverse outcomes have been documented. Most commonly, the cost is financial, psychological, and delayed application of imperfect but proven scientific modalities. To help protect patients, who may be desperately ill and thus prone to exploitation, we explored the use of machine learning techniques to identify web pages that make unproven claims. This feasibility study shows that the resulting models can identify web pages that make unproven claims in a fully automatic manner, and substantially better than previous web tools and state-of-the-art search engine technology
PMID: 17911859
ISSN: 0926-9630
CID: 106405

A comparison of impact factor, clinical query filters, and pattern recognition query filters in terms of sensitivity to topic

Fu, Lawrence D; Wang, Lily; Aphinyanagphongs, Yindalon; Aliferis, Constantin F
Evaluating journal quality and finding high-quality articles in the biomedical literature are challenging information retrieval tasks. The most widely used method for journal evaluation is impact factor, while novel approaches for finding articles are PubMed's clinical query filters and machine learning-based filter models. The related literature has focused on the average behavior of these methods over all topics. The present study evaluates the variability of these approaches for different topics. We find that impact factor and clinical query filters are unstable for different topics while a topic-specific impact factor and machine learning-based filter models appear more robust. Thus when using the less stable methods for a specific topic, researchers should realize that their performance may diverge from expected average performance. Better yet, the more stable methods should be preferred whenever applicable
PMID: 17911810
ISSN: 0926-9630
CID: 86989