NYUHSL Faculty Bibliography

Searched for:

in-biosketch:yes

person:sbj2002

Total Results:

132

AMIA ... Annual Symposium proceedings. 2003.DOI:

A native XML database design for clinical document research

Johnson, Stephen B; Campbell, David A; Krauthammer, Michael; Tulipano, P Karina; Medonca, Eneida A; Friedman, Carol; Hripcsak, George

Health-care institutions are gaining an increasing interest in exploiting the data that are gathered through electronic medical records. Narrative data, generated by transcription or direct entry, represents a far greater challenge for analytic tasks. Moreover, a small number of institutions are beginning to explore deeper structuring of narrative data using natural language processing (NLP). The data produced by NLP systems has a complex, nested structure. Current electronic medical records do not have the ability to store and retrieve data of this complexity in a suitable way.

PMCID:1479907

PMID: 14728388

ISSN: 1942-597x

CID: 3585942

AMIA ... Annual Symposium proceedings. 2003.DOI:

Automatic learning of the morphology of medical language using information compression

Mollah, Shamim Ara; Johnson, Stephen B

Conversion of free-text strings in a natural language to a standard representation (codes) is an important reoccurring problem in biomedical informatics. Determining the content of a string involves identifying its meaningful constituents (morphemes). One current method of identifying these constituents is to look them up in a preexisting table (lexicon). Manual construction of lexicons and grammars in complex domains such as biomedicine is extremely laborious. As an alternative to the lexico-grammatical approach, we introduce a segmentation algorithm that automatically learns lexical and structural preferences from corpora via information compression. The method is based on the Minimum Description Length (MDL) principle from classic information theory.

PMCID:1480252

PMID: 14728443

ISSN: 1942-597x

CID: 3585952

JAMA. 2003:289(10):1278-87.DOI: 10.1001/jama.289.10.1278

Central challenges facing the national clinical research enterprise

Sung, Nancy S; Crowley, William F Jr; Genel, Myron; Salber, Patricia; Sandy, Lewis; Sherwood, Louis M; Johnson, Stephen B; Catanese, Veronica; Tilson, Hugh; Getz, Kenneth; Larson, Elaine L; Scheinberg, David; Reece, E Albert; Slavkin, Harold; Dobs, Adrian; Grebb, Jack; Martinez, Rick A; Korn, Allan; Rimoin, David

Medical scientists and public health policy makers are increasingly concerned that the scientific discoveries of the past generation are failing to be translated efficiently into tangible human benefit. This concern has generated several initiatives, including the Clinical Research Roundtable at the Institute of Medicine, which first convened in June 2000. Representatives from a diverse group of stakeholders in the nation's clinical research enterprise have collaborated to address the issues it faces. The context of clinical research is increasingly encumbered by high costs, slow results, lack of funding, regulatory burdens, fragmented infrastructure, incompatible databases, and a shortage of qualified investigators and willing participants. These factors have contributed to 2 major obstacles, or translational blocks: impeding the translation of basic science discoveries into clinical studies and of clinical studies into medical practice and health decision making in systems of care. Considering data from across the entire health care system, it has become clear that these 2 translational blocks can be removed only by the collaborative efforts of multiple system stakeholders. The goal of this article is to articulate the 4 central challenges facing clinical research at present--public participation, information systems, workforce training, and funding; to make recommendations about how they might be addressed by particular stakeholders; and to invite a broader, participatory dialogue with a view to improving the overall performance of the US clinical research enterprise.

PMID: 12633190

ISSN: 0098-7484

CID: 164318

Yearbook of medical informatics. 2002(1):173-180.DOI:

Medical Informatics Training and Research at Columbia University

Shortliffe, E H; Johnson, S B

PMID: 27706367

ISSN: 2364-0502

CID: 3650902

Journal of the American Medical Informatics Association. 2002:9(6):621-36.DOI: 10.1197/jamia.m1101

Automatic resolution of ambiguous terms based on machine learning and conceptual relations in the UMLS

Liu, Hongfang; Johnson, Stephen B; Friedman, Carol

UNLABELLED:Motivation. The UMLS has been used in natural language processing applications such as information retrieval and information extraction systems. The mapping of free-text to UMLS concepts is important for these applications. To improve the mapping, we need a method to disambiguate terms that possess multiple UMLS concepts. In the general English domain, machine-learning techniques have been applied to sense-tagged corpora, in which senses (or concepts) of ambiguous terms have been annotated (mostly manually). Sense disambiguation classifiers are then derived to determine senses (or concepts) of those ambiguous terms automatically. However, manual annotation of a corpus is an expensive task. We propose an automatic method that constructs sense-tagged corpora for ambiguous terms in the UMLS using MEDLINE abstracts. METHODS:For a term W that represents multiple UMLS concepts, a collection of MEDLINE abstracts that contain W is extracted. For each abstract in the collection, occurrences of concepts that have relations with W as defined in the UMLS are automatically identified. A sense-tagged corpus, in which senses of W are annotated, is then derived based on those identified concepts. The method was evaluated on a set of 35 frequently occurring ambiguous biomedical abbreviations using a gold standard set that was automatically derived. The quality of the derived sense-tagged corpus was measured using precision and recall. RESULTS:The derived sense-tagged corpus had an overall precision of 92.9% and an overall recall of 47.4%. After removing rare senses and ignoring abbreviations with closely related senses, the overall precision was 96.8% and the overall recall was 50.6%. CONCLUSIONS:UMLS conceptual relations and MEDLINE abstracts can be used to automatically acquire knowledge needed for resolving ambiguity when mapping free-text to UMLS concepts.

PMCID:349379

PMID: 12386113

ISSN: 1067-5027

CID: 3585872

Proceedings (AMIA Annual Symposium). 2002:405-9.DOI:

Representing nested semantic information in a linear string of text using XML

Krauthammer, Michael; Johnson, Stephen B; Hripcsak, George; Campbell, David A; Friedman, Carol

XML has been widely adopted as an important data interchange language. The structure of XML enables sharing of data elements with variable degrees of nesting as long as the elements are grouped in a strict tree-like fashion. This requirement potentially restricts the usefulness of XML for marking up written text, which often includes features that do not properly nest within other features. We encountered this problem while marking up medical text with structured semantic information from a Natural Language Processor. Traditional approaches to this problem separate the structured information from the actual text mark up. This paper introduces an alternative solution, which tightly integrates the semantic structure with the text. The resulting XML markup preserves the linearity of the medical texts and can therefore be easily expanded with additional types of information.

PMCID:2244450

PMID: 12463856

ISSN: 1531-605x

CID: 3585882

Proceedings (AMIA Annual Symposium). 2002:742-6.DOI:

The sublanguage of cross-coverage

Stetson, Peter D; Johnson, Stephen B; Scotch, Matthew; Hripcsak, George

At Columbia-Presbyterian Medical Center, free-text "Signout" notes are typed into the electronic record by clinicians for the purpose of cross-coverage. We plan to "unlock" information about adverse events contained in these notes in a subsequent project using Natural Language Processing (NLP). To better understand the requirements for parsing, Signout notes were compared to other common medical notes (ambulatory clinic notes and discharge summaries) on a series of quantitative metrics. They are shorter (mean length 59.25 words vs. 144.11 and 340.85 for ambulatory and discharge notes respectively) and use more abbreviations (26.88% vs. 20.07% and 3.57%). Despite being terser, Signout notes use less ambiguous abbreviations (8.34% vs. 9.09% and 18.02%). Differences were found using Relative Entropy and Squared Chi-square Distance in a novel fashion to compare these medical corpora. Signout notes appear to constitute a unique sublanguage of medicine. The implications for parsing free-text cross-coverage notes into coded medical data are discussed.

PMCID:2244148

PMID: 12463923

ISSN: 1531-605x

CID: 3585892

Proceedings (AMIA Annual Symposium). 2002:850-4.DOI:

The cognitive demands of an innovative query user interface

Wang, Di; Kaufman, David R; Mendonca, Eneida A; Seol, Yoon-Hu; Johnson, Stephen B; Cimino, James J

Too often, online searches for health information are time consuming and produce results that are not sufficiently precise to answer clinicians' or patients' questions. The PERSIVAL project is designed to circumvent this problem by personalizing and tailoring searches and presentation to the demands of the user and the particular clinical context. This paper focuses on a cognitive evaluation of one component of this project, a Query User Interface (QUI). The study examines the system's ability to allow users to easily and intuitively express their information needs. We performed several analyses including a cognitive walkthrough of the interface and quantitative estimations of cognitive load. The paper also presents a preliminary analysis of usability testing. The analyses suggest that there are features in the QUI that contribute to a greater cognitive load and result in greater effort on the part of the subject. The results of usability testing are consistent with these findings. However, subjects found it to be relatively easy and intuitive to generate well-formed queries using the interface. This study contributed to the iterative design of the interface and to the next generation of the PERSIVAL system.

PMCID:2244191

PMID: 12463945

ISSN: 1531-605x

CID: 3585902

Journal of biomedical informatics. 2001:34(2):85-98.DOI: 10.1006/jbin.2001.1012

Accessing heterogeneous sources of evidence to answer clinical questions

MendonÃ§a, E A; Cimino, J J; Johnson, S B; Seol, Y H

The large and rapidly growing number of information sources relevant to health care, and the increasing amounts of new evidence produced by researchers, are improving the access of professionals and students to valuable information. However, seeking and filtering useful, valid information can be still very difficult. An online information system that conducts searches based on individual patient data can have a beneficial influence on the particular patient's outcome and educate the healthcare worker. In this paper, we describe the underlying model for a system that aims to facilitate the search for evidence based on clinicians' needs. This paper reviews studies of information needs of clinicians, describes principles of information retrieval, and examines the role that standardized terminologies can play in the integration between a clinical system and literature resources, as well as in the information retrieval process. The paper also describes a model for a digital library system that supports the integration of clinical systems with online information sources, making use of information available in the electronic medical record to enhance searches and information retrieval. The model builds on several different, previously developed techniques to identify information themes that are relevant to specific clinical data. Using a framework of evidence-based practice, the system generates well-structured questions with the intent of enhancing information retrieval. We believe that by helping clinicians to pose well-structured clinical queries and including in them relevant information from individual patients' medical records, we can enhance information retrieval and thus can improve patient-care.

PMID: 11515415

ISSN: 1532-0464

CID: 3650672

Proceedings (AMIA Annual Symposium). 2001:90-4.DOI:

Comparing syntactic complexity in medical and non-medical corpora

Campbell, D A; Johnson, S B

With the growing use of Natural Language Processing (NLP) techniques as solutions in Medical Informatics, the need to quickly and efficiently create the knowledge structures used by these systems has grown concurrently. Automatic discovery of a lexicon for use by an NLP system through machine learning will require information about the syntax of medical language. Understanding the syntactic differences between medical and non-medical corpora may allow more efficient acquisition of a lexicon. Three experiments designed to quantify the syntactic differences in medical and non-medical corpora were conducted. The results show that the syntax of medical language shows less variation than non-medical language and is likely simpler. The differences were great enough to question the applicability of general language tools on medical language. These differences may reduce the difficulty of some free text machine learning problems by capitalizing on the simpler nature of narrative medical syntax.

PMCID:2243419

PMID: 11825160

ISSN: 1531-605x

CID: 3650682