NYUHSL Faculty Bibliography

Searched for:

in-biosketch:yes

person:aphiny01

Total Results:

Studies in health technology & informatics. 2007:129(Pt 2):968-72.DOI:

Text categorization models for identifying unproven cancer treatments on the web

Aphinyanaphongs, Yin; Aliferis, Constantin

The nature of the internet as a non-peer-reviewed (and largely unregulated) publication medium has allowed wide-spread promotion of inaccurate and unproven medical claims in unprecedented scale. Patients with conditions that are not currently fully treatable are particularly susceptible to unproven and dangerous promises about miracle treatments. In extreme cases, fatal adverse outcomes have been documented. Most commonly, the cost is financial, psychological, and delayed application of imperfect but proven scientific modalities. To help protect patients, who may be desperately ill and thus prone to exploitation, we explored the use of machine learning techniques to identify web pages that make unproven claims. This feasibility study shows that the resulting models can identify web pages that make unproven claims in a fully automatic manner, and substantially better than previous web tools and state-of-the-art search engine technology

PMID: 17911859

ISSN: 0926-9630

CID: 106405

Studies in health technology & informatics. 2007:129(Pt 1):716-20.DOI:

A comparison of impact factor, clinical query filters, and pattern recognition query filters in terms of sensitivity to topic

Fu, Lawrence D; Wang, Lily; Aphinyanagphongs, Yindalon; Aliferis, Constantin F

Evaluating journal quality and finding high-quality articles in the biomedical literature are challenging information retrieval tasks. The most widely used method for journal evaluation is impact factor, while novel approaches for finding articles are PubMed's clinical query filters and machine learning-based filter models. The related literature has focused on the average behavior of these methods over all topics. The present study evaluates the variability of these approaches for different topics. We find that impact factor and clinical query filters are unstable for different topics while a topic-specific impact factor and machine learning-based filter models appear more robust. Thus when using the less stable methods for a specific topic, researchers should realize that their performance may diverge from expected average performance. Better yet, the more stable methods should be preferred whenever applicable

PMID: 17911810

ISSN: 0926-9630

CID: 86989

Journal of the American Medical Informatics Association. 2006:13(4):446-55.DOI: 10.1197/jamia.M2031

A comparison of citation metrics to machine learning filters for the identification of high quality MEDLINE documents

Aphinyanaphongs, Yindalon; Statnikov, Alexander; Aliferis, Constantin F

OBJECTIVE: The present study explores the discriminatory performance of existing and novel gold-standard-specific machine learning (GSS-ML) focused filter models (i.e., models built specifically for a retrieval task and a gold standard against which they are evaluated) and compares their performance to citation count and impact factors, and non-specific machine learning (NS-ML) models (i.e., models built for a different task and/or different gold standard). DESIGN: Three gold standard corpora were constructed using the SSOAB bibliography, the ACPJ-cited treatment articles, and the ACPJ-cited etiology articles. Citation counts and impact factors were obtained for each article. Support vector machine models were used to classify the articles using combinations of content, impact factors, and citation counts as predictors. MEASUREMENTS: Discriminatory performance was estimated using the area under the receiver operating characteristic curve and n-fold cross-validation. RESULTS: For all three gold standards and tasks, GSS-ML filters outperformed citation count, impact factors, and NS-ML filters. Combinations of content with impact factor or citation count produced no or negligible improvements to the GSS machine learning filters. CONCLUSIONS: These experiments provide evidence that when building information retrieval filters focused on a retrieval task and corresponding gold standard, the filter models have to be built specifically for this task and gold standard. Under those conditions, machine learning filters outperform standard citation metrics. Furthermore, citation counts and impact factors add marginal value to discriminatory performance. Previous research that claimed better performance of citation metrics than machine learning in one of the corpora examined here is attributed to using machine learning filters built for a different gold standard and task

PMCID:1513679

PMID: 16622165

ISSN: 1067-5027

CID: 86993

Journal of the American Medical Informatics Association. 2006:13(1):96-105.DOI: 10.1197/jamia.M1909

Using citation data to improve retrieval from MEDLINE

Bernstam, Elmer V; Herskovic, Jorge R; Aphinyanaphongs, Yindalon; Aliferis, Constantin F; Sriram, Madurai G; Hersh, William R

OBJECTIVE: To determine whether algorithms developed for the World Wide Web can be applied to the biomedical literature in order to identify articles that are important as well as relevant. DESIGN AND MEASUREMENTS A direct comparison of eight algorithms: simple PubMed queries, clinical queries (sensitive and specific versions), vector cosine comparison, citation count, journal impact factor, PageRank, and machine learning based on polynomial support vector machines. The objective was to prioritize important articles, defined as being included in a pre-existing bibliography of important literature in surgical oncology. RESULTS Citation-based algorithms were more effective than noncitation-based algorithms at identifying important articles. The most effective strategies were simple citation count and PageRank, which on average identified over six important articles in the first 100 results compared to 0.85 for the best noncitation-based algorithm (p < 0.001). The authors saw similar differences between citation-based and noncitation-based algorithms at 10, 20, 50, 200, 500, and 1,000 results (p < 0.001). Citation lag affects performance of PageRank more than simple citation count. However, in spite of citation lag, citation-based algorithms remain more effective than noncitation-based algorithms. CONCLUSION Algorithms that have proved successful on the World Wide Web can be applied to biomedical information retrieval. Citation-based algorithms can help identify important articles within large sets of relevant results. Further studies are needed to determine whether citation-based algorithms can effectively meet actual user information needs

PMCID:1380202

PMID: 16221938

ISSN: 1067-5027

CID: 86994

AMIA ... Annual Symposium proceedings. 2006:6-10.DOI:

Prospective validation of text categorization filters for identifying high-quality, content-specific articles in MEDLINE

Aphinyanaphongs, Yindalon; Aliferis, Constantin

In prior work, we introduced a machine learning method to identify high quality MEDLINE documents in internal medicine. The performance of the original filter models built with this corpus on years outside 1998-2000 was not assessed directly. Validating the performance of the original filter models on current corpora is crucial to validate them for use in current years, to verify that the model fitting and model error estimation procedures do not over-fit the models, and to validate consistency of the chosen ACPJ gold standard (i.e., that ACPJ editorial policies and criteria are stable over time). Our prospective validation results indicated that in the categories of treatment, etiology, diagnosis, and prognosis, the original machine learning filter models built from the 1998-2000 corpora maintained their discriminatory performance of 0.97, 0.97, 0.94, and 0.94 area under the curve in each respective category when applied to a 2005 corpus. The ACPJ is a stable, reliable gold standard and the machine learning methodology provides robust models and model performance estimates. Machine learning filter models built with 1998-2000 corpora can be applied to identify high quality articles in recent years

PMCID:1839419

PMID: 17238292

ISSN: 1559-4076

CID: 106403

Journal of the American Medical Informatics Association. 2005:12(2):207-16.DOI: 10.1197/jamia.M1641

Text categorization models for high-quality article retrieval in internal medicine

Aphinyanaphongs, Yindalon; Tsamardinos, Ioannis; Statnikov, Alexander; Hardin, Douglas; Aliferis, Constantin F

OBJECTIVE Finding the best scientific evidence that applies to a patient problem is becoming exceedingly difficult due to the exponential growth of medical publications. The objective of this study was to apply machine learning techniques to automatically identify high-quality, content-specific articles for one time period in internal medicine and compare their performance with previous Boolean-based PubMed clinical query filters of Haynes et al. DESIGN The selection criteria of the ACP Journal Club for articles in internal medicine were the basis for identifying high-quality articles in the areas of etiology, prognosis, diagnosis, and treatment. Naive Bayes, a specialized AdaBoost algorithm, and linear and polynomial support vector machines were applied to identify these articles. MEASUREMENTS The machine learning models were compared in each category with each other and with the clinical query filters using area under the receiver operating characteristic curves, 11-point average recall precision, and a sensitivity/specificity match method. RESULTS In most categories, the data-induced models have better or comparable sensitivity, specificity, and precision than the clinical query filters. The polynomial support vector machine models perform the best among all learning methods in ranking the articles as evaluated by area under the receiver operating curve and 11-point average recall precision. CONCLUSION This research shows that, using machine learning methods, it is possible to automatically build models for retrieving high-quality, content-specific articles using inclusion or citation by the ACP Journal Club as a gold standard in a given time period in internal medicine that perform better than the 1994 PubMed clinical query filters

PMCID:551552

PMID: 15561789

ISSN: 1067-5027

CID: 86997

Studies in health technology & informatics. 2004:107(Pt 1):263-7.DOI:

Learning Boolean queries for article quality filtering

Aphinyanaphongs, Yin; Aliferis, Constantin F

Prior research has shown that Support Vector Machine models have the ability to identify high quality content-specific articles in the domain of internal medicine. These models, though powerful, cannot be used in Boolean search engines nor can the content of the models be verified via human inspection. In this paper, we use decision trees combined with several feature selection methods to generate Boolean query filters for the same domain and task. The resulting trees are generated automatically and exhibit high performance. The trees are understandable, manageable, and able to be validated by humans. The subsequent Boolean queries are sensible and can be readily used as filters by Boolean search engines

PMID: 15360815

ISSN: 0926-9630

CID: 87001

AMIA ... Annual Symposium proceedings. 2003:31-5.DOI:

Text categorization models for retrieval of high quality articles in internal medicine

Aphinyanaphongs, Y; Aliferis, C F

The discipline of Evidence Based Medicine (EBM) studies formal and quasi-formal methods for identifying high quality medical information and abstracting it in useful forms so that patients receive the best customized care possible [1]. Current computer-based methods for finding high quality information in PubMed and similar bibliographic resources utilize search tools that employ preconstructed Boolean queries. These clinical queries are derived from a combined application of (a) user interviews, (b) ad-hoc manual document quality review, and (c) search over a constrained space of disjunctive Boolean queries. The present research explores the use of powerful text categorization (machine learning) methods to identify content-specific and high-quality PubMed articles. Our results show that models built with the proposed approach outperform the Boolean based PubMed clinical query filters in discriminatory power

PMCID:1480096

PMID: 14728128

ISSN: 1559-4076

CID: 87002

Investigative ophthalmology & visual science. IOVS. 1999:40(4):664B624-664B624.DOI:

Computational resolving power improvement for the scanning laser ophthalmoscope [Meeting Abstract]

O'Connor, N; Aphinyanaphongs, Y; Zinser, G; Bartsch, D; Freeman, W; Flanagan, J; Hutchins, N; Hudson, C; Holmes, T

ISI:000079269200664

ISSN: 0146-0404

CID: 106430