NYUHSL Faculty Bibliography

Searched for:

in-biosketch:yes

person:sbj2002

Total Results:

132

PLoS one. 2021:16(4).DOI: 10.1371/journal.pone.0244641

ReCiter: An open source, identity-driven, authorship prediction algorithm optimized for academic institutions

Albert, Paul J; Dutta, Sarbajit; Lin, Jie; Zhu, Zimeng; Bales, Michael; Johnson, Stephen B; Mansour, Mohammad; Wright, Drew; Wheeler, Terrie R; Cole, Curtis L

Academic institutions need to maintain publication lists for thousands of faculty and other scholars. Automated tools are essential to minimize the need for direct feedback from the scholars themselves who are practically unable to commit necessary effort to keep the data accurate. In relying exclusively on clustering techniques, author disambiguation applications fail to satisfy key use cases of academic institutions. Algorithms can perfectly group together a set of publications authored by a common individual, but, for them to be useful to an academic institution, they need to programmatically and recurrently map articles to thousands of scholars of interest en masse. Consistent with a savvy librarian's approach for generating a scholar's list of publications, identity-driven authorship prediction is the process of using information about a scholar to quantify the likelihood that person wrote certain articles. ReCiter is an application that attempts to do exactly that. ReCiter uses institutionally-maintained identity data such as name of department and year of terminal degree to predict which articles a given scholar has authored. To compute the overall score for a given candidate article from PubMed (and, optionally, Scopus), ReCiter uses: up to 12 types of commonly available, identity data; whether other members of a cluster have been accepted or rejected by a user; and the average score of a cluster. In addition, ReCiter provides scoring and qualitative evidence supporting why particular articles are suggested. This context and confidence scoring allows curators to more accurately provide feedback on behalf of scholars. To help users to more efficiently curate publication lists, we used a support vector machine analysis to optimize the scoring of the ReCiter algorithm. In our analysis of a diverse test group of 500 scholars at an academic private medical center, ReCiter correctly predicted 98% of their publications in PubMed.

PMCID:8016248

PMID: 33793563

ISSN: 1932-6203

CID: 4862332

New England journal of medicine. 2020:382(25):2441-2448.DOI: 10.1056/NEJMoa2008975

Renin-Angiotensin-Aldosterone System Inhibitors and Risk of Covid-19

Reynolds, Harmony R; Adhikari, Samrachana; Pulgarin, Claudia; Troxel, Andrea B; Iturrate, Eduardo; Johnson, Stephen B; Hausvater, AnaÃ¯s; Newman, Jonathan D; Berger, Jeffrey S; Bangalore, Sripal; Katz, Stuart D; Fishman, Glenn I; Kunichoff, Dennis; Chen, Yu; Ogedegbe, Gbenga; Hochman, Judith S

BACKGROUND:There is concern about the potential of an increased risk related to medications that act on the renin-angiotensin-aldosterone system in patients exposed to coronavirus disease 2019 (Covid-19), because the viral receptor is angiotensin-converting enzyme 2 (ACE2). METHODS:We assessed the relation between previous treatment with ACE inhibitors, angiotensin-receptor blockers, beta-blockers, calcium-channel blockers, or thiazide diuretics and the likelihood of a positive or negative result on Covid-19 testing as well as the likelihood of severe illness (defined as intensive care, mechanical ventilation, or death) among patients who tested positive. Using Bayesian methods, we compared outcomes in patients who had been treated with these medications and in untreated patients, overall and in those with hypertension, after propensity-score matching for receipt of each medication class. A difference of at least 10 percentage points was prespecified as a substantial difference. RESULTS:Among 12,594 patients who were tested for Covid-19, a total of 5894 (46.8%) were positive; 1002 of these patients (17.0%) had severe illness. A history of hypertension was present in 4357 patients (34.6%), among whom 2573 (59.1%) had a positive test; 634 of these patients (24.6%) had severe illness. There was no association between any single medication class and an increased likelihood of a positive test. None of the medications examined was associated with a substantial increase in the risk of severe illness among patients who tested positive. CONCLUSIONS:We found no substantial increase in the likelihood of a positive test for Covid-19 or in the risk of severe Covid-19 among patients who tested positive in association with five common classes of antihypertensive medications.

PMID: 32356628

ISSN: 1533-4406

CID: 4412912

Journal of the American Medical Informatics Association. 2019:26(8-9):722-729.DOI: 10.1093/jamia/ocz040

Underserved populations with missing race ethnicity data differ significantly from those with structured race/ethnicity documentation

Sholle, Evan T; Pinheiro, Laura C; Adekkanattu, Prakash; Davila, Marcos A; Johnson, Stephen B; Pathak, Jyotishman; Sinha, Sanjai; Li, Cassidie; Lubansky, Stasi A; Safford, Monika M; Campion, Thomas R

OBJECTIVE:We aimed to address deficiencies in structured electronic health record (EHR) data for race and ethnicity by identifying black and Hispanic patients from unstructured clinical notes and assessing differences between patients with or without structured race/ethnicity data. MATERIALS AND METHODS/METHODS:Using EHR notes for 16 665 patients with encounters at a primary care practice, we developed rule-based natural language processing (NLP) algorithms to classify patients as black/Hispanic. We evaluated performance of the method against an annotated gold standard, compared race and ethnicity between NLP-derived and structured EHR data, and compared characteristics of patients identified as black or Hispanic using only NLP vs patients identified as such only in structured EHR data. RESULTS:For the sample of 16 665 patients, NLP identified 948 additional patients as black, a 26%increase, and 665 additional patients as Hispanic, a 20% increase. Compared with the patients identified as black or Hispanic in structured EHR data, patients identified as black or Hispanic via NLP only were older, more likely to be male, less likely to have commercial insurance, and more likely to have higher comorbidity. DISCUSSION/CONCLUSIONS:Structured EHR data for race and ethnicity are subject to data quality issues. Supplementing structured EHR race data with NLP-derived race and ethnicity may allow researchers to better assess the demographic makeup of populations and draw more accurate conclusions about intergroup differences in health outcomes. CONCLUSIONS:Black or Hispanic patients who are not documented as such in structured EHR race/ethnicity fields differ significantly from those who are. Relatively simple NLP can help address this limitation.

PMCID:6696506

PMID: 31329882

ISSN: 1527-974x

CID: 4259162

Computers, informatics, nursing. 2019:37(8):396-404.DOI: 10.1097/CIN.0000000000000537

Alignment of American Association of Colleges of Nursing Graduate-Level Nursing Informatics Competencies With American Medical Informatics Association Health Informatics Core Competencies

Monsen, Karen A; Bush, Ruth A; Jones, Josette; Manos, E LaVerne; Skiba, Diane J; Johnson, Stephen B

This study yielded a map of the alignment of American Association of Colleges of Nursing Graduate-Level Nursing Informatics Competencies with American Medical Informatics Association Health Informatics Core Competencies in an effort to understand graduate-level accreditation and certification opportunities in nursing informatics. Nursing Informatics Program Directors from the American Medical Informatics Association and a health informatics expert independently mapped the American Association of Colleges of Nursing competencies to the American Medical Informatics Association Health Informatics knowledge, skills, and attitudes. The Nursing Informatics Program Directors' map connected an average of 4.0 American Medical Informatics Association Core Competencies per American Association of Colleges of Nursing competency, whereas the health informatics expert's map connected an average of 5.0 American Medical Informatics Association Core Competencies per American Association of Colleges of Nursing competency. Agreement across the two maps ranged from 14% to 60% per American Association of Colleges of Nursing competency, revealing alignment between the two groups' competencies according to knowledge, skills, and attitudes. These findings suggest that graduates of master's degree programs in nursing, especially those specializing in nursing informatics, will likely be prepared to sit for the proposed Advanced Health Informatics Certification in addition to the American Nurses Credentialing Center bachelor's-level Informatics Nursing Certification. This preliminary map sets the stage for further in-depth mapping of nursing informatics curricula with American Medical Informatics Association Core Competencies and will enable interprofessional conversations around nursing informatics specialty program accreditation, nursing workforce preparation, and nursing informatics advanced certification. Nursing informaticists should examine their need for credentials as key contributors who will address critical health informatics needs.

PMID: 31149911

ISSN: 1538-9774

CID: 4100602

Epilepsia. 2019:60(6):1209-1220.DOI: 10.1111/epi.15966

Automated detection of sudden unexpected death in epilepsy risk factors in electronic medical records using natural language processing

Barbour, Kristen; Hesdorffer, Dale C; Tian, Niu; Yozawitz, Elissa G; McGoldrick, Patricia E; Wolf, Steven; McDonough, Tiffani L; Nelson, Aaron; Loddenkemper, Tobias; Basma, Natasha; Johnson, Stephen B; Grinspan, Zachary M

OBJECTIVE:Sudden unexpected death in epilepsy (SUDEP) is an important cause of mortality in epilepsy. However, there is a gap in how often providers counsel patients about SUDEP. One potential solution is to electronically prompt clinicians to provide counseling via automated detection of risk factors in electronic medical records (EMRs). We evaluated (1) the feasibility and generalizability of using regular expressions to identify risk factors in EMRs and (2) barriers to generalizability. METHODS:Data included physician notes for 3000 patients from one medical center (home) and 1000 from five additional centers (away). Through chart review, we identified three SUDEP risk factors: (1) generalized tonic-clonic seizures, (2) refractory epilepsy, and (3) epilepsy surgery candidacy. Regular expressions of risk factors were manually created with home training data, and performance was evaluated with home test and away test data. Performance was evaluated by sensitivity, positive predictive value, and F-measure. Generalizability was defined as an absolute decrease in performance by <0.10 for away versus home test data. To evaluate underlying barriers to generalizability, we identified causes of errors seen more often in away data than home data. To demonstrate how small revisions can improve generalizability, we removed three "boilerplate" standard text phrases from away notes and repeated performance. RESULTS:We observed high performance in home test data (F-measure range = 0.86-0.90), and low to high performance in away test data (F-measure range = 0.53-0.81). After removing three boilerplate phrases, away performance improved (F-measure range = 0.79-0.89) and generalizability was achieved for nearly all measures. The only significant barrier to generalizability was use of boilerplate phrases, causing 104 of 171 errors (61%) in away data. SIGNIFICANCE/CONCLUSIONS:Regular expressions are a feasible and probably a generalizable method to identify variables related to SUDEP risk. Our methods may be implemented to create large patient cohorts for research and to generate electronic prompts for SUDEP counseling.

PMID: 31111463

ISSN: 1528-1167

CID: 3935952

AMIA ... Annual Symposium proceedings. 2018:2018:147-156.DOI:

Ascertaining Depression Severity by Extracting Patient Health Questionnaire-9 (PHQ-9) Scores from Clinical Notes

Adekkanattu, Prakash; Sholle, Evan T; DeFerio, Joseph; Pathak, Jyotishman; Johnson, Stephen B; Campion, Thomas R

The Patient Health Questionnaire-9 (PHQ-9) is a validated instrument for assessing depression severity. While some electronic health record (EHR) systems capture PHQ-9 scores in a structured format, unstructured clinical notes remain the only source in many settings, which presents data retrieval challenges for research and clinical decision support. To address this gap, we extended the open-source Leo natural language processing (NLP) platform to extract PHQ-9 scores from clinical notes and evaluated performance using EHR data for n=123,703 patients who were prescribed antidepressants. Compared to a reference standard, the NLP method exhibited high accuracy (97%), sensitivity (98%), precision (97%), and F-score (97%). Furthermore, of patients with PHQ-9 scores identified by the NLP method, 31% (n=498) had at least one PHQ-9 score clinically indicative of major depressive disorder (MDD), but lacked a structured ICD-9/10 diagnosis code for MDD. This NLP technique may facilitate accurate identification and stratification of patients with depression.

PMCID:6371338

PMID: 30815052

ISSN: 1942-597x

CID: 4259152

Journal of the American Medical Informatics Association. 2018:25(12):1657-1668.DOI: 10.1093/jamia/ocy132

AMIA Board White Paper: AMIA 2017 core competencies for applied health informatics education at the master's degree level

Valenta, Annette L; Berner, Eta S; Boren, Suzanne A; Deckard, Gloria J; Eldredge, Christina; Fridsma, Douglas B; Gadd, Cynthia; Gong, Yang; Johnson, Todd; Jones, Josette; Manos, E LaVerne; Phillips, Kirk T; Roderer, Nancy K; Rosendale, Douglas; Turner, Anne M; Tusch, Guenter; Williamson, Jeffrey J; Johnson, Stephen B

This White Paper presents the foundational domains with examples of key aspects of competencies (knowledge, skills, and attitudes) that are intended for curriculum development and accreditation quality assessment for graduate (master's level) education in applied health informatics. Through a deliberative process, the AMIA Accreditation Committee refined the work of a task force of the Health Informatics Accreditation Council, establishing 10 foundational domains with accompanying example statements of knowledge, skills, and attitudes that are components of competencies by which graduates from applied health informatics programs can be assessed for competence at the time of graduation. The AMIA Accreditation Committee developed the domains for application across all the subdisciplines represented by AMIA, ranging from translational bioinformatics to clinical and public health informatics, spanning the spectrum from molecular to population levels of health and biomedicine. This document will be periodically updated, as part of the responsibility of the AMIA Accreditation Committee, through continued study, education, and surveys of market trends.

PMID: 30371862

ISSN: 1527-974x

CID: 3586582

AMIA Summits on Translational Science proceedings. 2018:2017:104-112.DOI:

From Sour Grapes to Low-Hanging Fruit: A Case Study Demonstrating a Practical Strategy for Natural Language Processing Portability

Johnson, Stephen B; Adekkanattu, Prakash; Campion, Thomas R; Flory, James; Pathak, Jyotishman; Patterson, Olga V; DuVall, Scott L; Major, Vincent; Aphinyanaphongs, Yindalon

Natural Language Processing (NLP) holds potential for patient care and clinical research, but a gap exists between promise and reality. While some studies have demonstrated portability of NLP systems across multiple sites, challenges remain. Strategies to mitigate these challenges can strive for complex NLP problems using advanced methods (hard-to-reach fruit), or focus on simple NLP problems using practical methods (low-hanging fruit). This paper investigates a practical strategy for NLP portability using extraction of left ventricular ejection fraction (LVEF) as a use case. We used a tool developed at the Department of Veterans Affair (VA) to extract the LVEF values from free-text echocardiograms in the MIMIC-III database. The approach showed an accuracy of 98.4%, sensitivity of 99.4%, a positive predictive value of 98.7%, and F-score of 99.0%. This experience, in which a simple NLP solution proved highly portable with excellent performance, illustrates the point that simple NLP applications may be easier to disseminate and adapt, and in the short term may prove more useful, than complex applications.

PMCID:5961788

PMID: 29888051

ISSN: 2153-4063

CID: 3154942

Epilepsia open. 2018:3(1):91-97.DOI: 10.1002/epi4.12095

Common terms for rare epilepsies: Synonyms, associated terms, and links to structured vocabularies

Grinspan, Zachary M; Tian, Niu; Yozawitz, Elissa G; McGoldrick, Patricia E; Wolf, Steven M; McDonough, Tiffani L; Nelson, Aaron; Hafeez, Baria; Johnson, Stephen B; Hesdorffer, Dale C

Identifying individuals with rare epilepsy syndromes in electronic data sources is difficult, in part because of missing codes in the International Classification of Diseases (ICD) system. Our objectives were the following: (1) to describe the representation of rare epilepsies in other medical vocabularies, to identify gaps; and (2) to compile synonyms and associated terms for rare epilepsies, to facilitate text and natural language processing tools for cohort identification and population-based surveillance. We describe the representation of 33 epilepsies in 3 vocabularies: Orphanet, SNOMED-CT, and UMLS-Metathesaurus. We compiled terms via 2 surveys, correspondence with parent advocates, and review of web resources and standard vocabularies. UMLS-Metathesaurus had entries for all 33 epilepsies, Orphanet 32, and SNOMED-CT 25. The vocabularies had redundancies and missing phenotypes. Emerging epilepsies (SCN2A-, SCN8A-, KCNQ2-, SLC13A5-, andSYNGAP-related epilepsies) were underrepresented. Survey and correspondence respondents included 160 providers, 375 caregivers, and 11 advocacy group leaders. Each epilepsy syndrome had a median of 15 (range 6-28) synonyms. Nineteen had associated terms, with a median of 4 (range 1-41). We conclude that medical vocabularies should fill gaps in representation of rare epilepsies to improve their value for epilepsy research. We encourage epilepsy researchers to use this resource to develop tools to identify individuals with rare epilepsies in electronic data sources.

PMCID:5839304

PMID: 29588993

ISSN: 2470-9239

CID: 3010882

Yearbook of medical informatics. 2017:26(1):193-200.DOI: 10.15265/IY-2017-022

Clinical Research Informatics: Supporting the Research Study Lifecycle

Johnson, S B

Objectives: The primary goal of this review is to summarize significant developments in the field of Clinical Research Informatics (CRI) over the years 2015-2016. The secondary goal is to contribute to a deeper understanding of CRI as a field, through the development of a strategy for searching and classifying CRI publications. Methods: A search strategy was developed to query the PubMed database, using medical subject headings to both select and exclude articles, and filtering publications by date and other characteristics. A manual review classified publications using stages in the "research study lifecycle", with key stages that include study definition, participant enrollment, data management, data analysis, and results dissemination. Results: The search strategy generated 510 publications. The manual classification identified 125 publications as relevant to CRI, which were classified into seven different stages of the research lifecycle, and one additional class that pertained to multiple stages, referring to general infrastructure or standards. Important cross-cutting themes included new applications of electronic media (Internet, social media, mobile devices), standardization of data and procedures, and increased automation through the use of data mining and big data methods. Conclusions: The review revealed increased interest and support for CRI in large-scale projects across institutions, regionally, nationally, and internationally. A search strategy based on medical subject headings can find many relevant papers, but a large number of non-relevant papers need to be detected using text words which pertain to closely related fields such as computational statistics and clinical informatics. The research lifecycle was useful as a classification scheme by highlighting the relevance to the users of clinical research informatics solutions.

PMCID:6239240

PMID: 29063565

ISSN: 2364-0502

CID: 3650952