Try a new search

Format these results:

Searched for:

in-biosketch:yes

person:sbj2002

Total Results:

127


Unlocking clinical data from narrative reports: a study of natural language processing

Hripcsak, G; Friedman, C; Alderson, P O; DuMouchel, W; Johnson, S B; Clayton, P D
OBJECTIVE:To evaluate the automated detection of clinical conditions described in narrative reports. DESIGN/METHODS:Automated methods and human experts detected the presence or absence of six clinical conditions in 200 admission chest radiograph reports. STUDY SUBJECTS/METHODS:A computerized, general-purpose natural language processor; 6 internists; 6 radiologists; 6 lay persons; and 3 other computer methods. MAIN OUTCOME MEASURES/METHODS:Intersubject disagreement was quantified by "distance" (the average number of clinical conditions per report on which two subjects disagreed) and by sensitivity and specificity with respect to the physicians. RESULTS:Using a majority vote, physicians detected 101 conditions in the 200 reports (0.51 per report); the most common condition was acute bacterial pneumonia (prevalence, 0.14), and the least common was chronic obstructive pulmonary disease (prevalence, 0.03). Pairs of physicians disagreed on the presence of at least 1 condition for an average of 20% of reports. The average intersubject distance among physicians was 0.24 (95% Cl, 0.19 to 0.29) out of a maximum possible distance of 6. No physician had a significantly greater distance than the average. The average distance of the natural language processor from the physicians was 0.26 (Cl, 0.21 to 0.32; not significantly greater than the average among physicians). Lay persons and alternative computer methods had significantly greater distance from the physicians (all > 0.5). The natural language processor had a sensitivity of 81% (Cl, 73% to 87%) and a specificity of 98% (Cl, 97% to 99%); physicians had an average sensitivity of 85% and an average specificity of 98%. CONCLUSIONS:Physicians disagreed on the interpretation of narrative reports, but this was not caused by outlier physicians or a consistent difference in the way internists and radiologists read reports. The natural language processor was not distinguishable from the physicians and was superior to all other comparison subjects. Although the domain of this study was restricted (six clinical conditions in chest radiographs), natural language processing seems to have the potential to extract clinical information from narrative reports in a manner that will support automated decision-support and clinical research.
PMID: 7702231
ISSN: 0003-4819
CID: 3650972

Managing vocabulary for a centralized clinical system

Cimino, J J; Johnson, S B; Hripcsak, G; Hill, C L; Clayton, P D
The clinical computing environment at Columbia-Presbyterian Medical Center is organized around a centralized database of coded patient information collected from various ancillary sources. The Medical Entities Dictionary (MED) is the central repository for the controlled vocabulary used to encode the patient data. The MED is composed of terms used in the ancillary departments and, as such, changes in the source vocabularies must be maintained in the MED. The MED also contains some basic knowledge about the terms, and sophisticated maintenance tools have been developed that take advantage of this knowledge. This paper describes the success of the knowledge-based approach by describing the techniques used in two tasks: addition of a new vocabulary and maintenance of an existing one.
PMID: 8591133
ISSN: 1569-6332
CID: 3651092

Applying a controlled medical terminology to a distributed, production clinical information system

Forman, B H; Cimino, J J; Johnson, S B; Sengupta, S; Sideli, R; Clayton, P
To maximize the value of computerized medical records systems, an organizing structure is needed. That structure can be provided by a controlled medical terminology (CMT). At Columbia-Presbyterian Medical Center, we have been employing a controlled medical terminology, our Medical Entities Dictionary (MED), to mediate the storage and retrieval of patient data and enable decision support applications. This paper describes how the MED is actually used for data management in our distributed clinical information systems environment. Our system tools which access the MED for production purposes facilitate the mapping of terms from many sources to a uniform representation of concepts and also return information about the relationships between concepts. Applications which access a CMT for production purposes should be optimized for performance in high volume settings, fault tolerant, synchronizable, extensible, portable, and maintainable. We briefly describe our system architecture and then demonstrate how we utilize the MED for translation and semantic information as data is moved into and out of our patient database. We discuss our current tools and present a preview of the next generation of applications which will manage access to the MED for our production systems.
PMCID:2579127
PMID: 8563316
ISSN: 0195-4210
CID: 3651082

A data model that captures clinical reasoning about patient problems

Barrows, R C; Johnson, S B
We describe a data model that has been implemented for the CPMC Ambulatory Care System, and exemplify its function for patient problems. The model captures some nuances of clinical thinking about patients that are not accommodated in most other models, such as an evolution of clinical understanding about patient problems. A record of this understanding has clinical utility, and serves research interests as well as medical audit concerns. The model is described with an example, and advantages and limitations in the current implementation are discussed.
PMCID:2579123
PMID: 8563311
ISSN: 0195-4210
CID: 3651072

Medical decision support: experience with implementing the Arden Syntax at the Columbia-Presbyterian Medical Center

Jenders, R A; Hripcsak, G; Sideli, R V; DuMouchel, W; Zhang, H; Cimino, J J; Johnson, S B; Sherman, E H; Clayton, P D
We began implementation of a medical decision support system (MDSS) at the Columbia-Presbyterian Medical Center (CPMC) using the Arden Syntax in 1992. The Clinical Event Monitor which executes the Medical Logic Modules (MLMs) runs on a mainframe computer. Data are stored in a relational database and accessed via PL/I programs known as Data Access Modules (DAMs). Currently we have 18 clinical, 12 research and 10 administrative MLMs. On average, the clinical MLMs generate 50357 simple interpretations of laboratory data and 1080 alerts each month. The number of alerts actually read varies by subject of the MLM from 32.4% to 73.5%. Most simple interpretations are not read at all. A significant problem of MLMs is maintenance, and changes in laboratory testing and message output can impair MLM execution significantly. We are now using relational database technology and coded MLM output to study the process outcome of our MDSS.
PMCID:2579077
PMID: 8563259
ISSN: 0195-4210
CID: 3651062

A general natural-language text processor for clinical radiology

Friedman, C; Alderson, P O; Austin, J H; Cimino, J J; Johnson, S B
OBJECTIVE:Development of a general natural-language processor that identifies clinical information in narrative reports and maps that information into a structured representation containing clinical terms. DESIGN/METHODS:The natural-language processor provides three phases of processing, all of which are driven by different knowledge sources. The first phase performs the parsing. It identifies the structure of the text through use of a grammar that defines semantic patterns and a target form. The second phase, regularization, standardizes the terms in the initial target structure via a compositional mapping of multi-word phrases. The third phase, encoding, maps the terms to a controlled vocabulary. Radiology is the test domain for the processor and the target structure is a formal model for representing clinical information in that domain. MEASUREMENTS/METHODS:The impression sections of 230 radiology reports were encoded by the processor. Results of an automated query of the resultant database for the occurrences of four diseases were compared with the analysis of a panel of three physicians to determine recall and precision. RESULTS:Without training specific to the four diseases, recall and precision of the system (combined effect of the processor and query generator) were 70% and 87%. Training of the query component increased recall to 85% without changing precision.
PMCID:116194
PMID: 7719797
ISSN: 1067-5027
CID: 3650982

IAIMS and sharing

Sengupta, S; Clayton, P D; Molholt, P; Sideli, R V; Cimino, J J; Hripcsak, G; Johnson, S B; Allen, B; McCormack, M; Hill, C
The Integrated Academic Information Management System (IAIMS) concept is about sharing resources and information, and about improving the decision-making ability of health care professionals by integrating information. At Columbia-Presbyterian Medical Center, the IAIMS project has established an information architecture based on common, shared computing and networking resources. The institutional computing culture has been changed with increased sharing of information and, consequently, improved quality of information. Several classes of information in the areas of clinical, scholarly, administrative, basic research, and core resources have been identified for better understanding of information responsibility. Technical problems such as heterogeneity on workstation platforms and lack of universal syntactic and semantic standards for health care information exchange still impede inter-institutional sharing of information.
PMID: 8125648
ISSN: 0020-7101
CID: 3651002

Accessing the Columbia Clinical Repository

Johnson, S B; Hripcsak, G; Chen, J; Clayton, P
The Columbia Clinical Repository is the foundation of the Clinical Information System at the Columbia Presbyterian Medical Center (CPMC). The Repository is implemented as a relational database on an IBM mainframe, using a generic design that employs a small number of tables. Client applications on remote platforms send and receive data through Database Access Modules (DAMs), which support the HL7 protocol, while applications on the mainframe manipulate data through DAMs supporting a locally defined "query template". Implementation using static (compiled) SQL is compared to dynamic (ad hoc) SQL in terms of efficiency and flexibility.
PMCID:2247734
PMID: 7949935
ISSN: 0195-4210
CID: 3650992

Full-text document storage and retrieval in a clinical information system

Sideli, R V; Johnson, S B; Clayton, P D
The overall design of the CIS at CPMC is heavily influenced by the decision support component. The type of automated decision support being implemented dictates the need for highly structured or coded data. The value of decision support systems has been well documented. The current reliance on free-text documents is natural and a rewarding first step to a more valuable mix of coded and free text. While the health care provider might find the textual comments of the various reports extremely useful, the capability of an automated system to vigilantly review every data element for trends and anomalies is becoming invaluable in today's ever more complex health care delivery environment. Other approaches such as optical imaging systems would facilitate human decision support, but do not supply data in a format that can be processed by automated decision support systems. The developers of the CIS at CPMC believe that data are most valuable when available for both human and automated decision support.
PMID: 10139111
ISSN: 1065-0989
CID: 3650572

Generic queries for meeting clinical information needs

Cimino, J J; Aguirre, A; Johnson, S B; Peng, P
This paper describes a model for automated information retrieval in which questions posed by clinical users are analyzed to establish common syntactic and semantic patterns. The patterns are used to develop a set of general-purpose questions called generic queries. These generic queries are used in responding to specific clinical information needs. Users select generic queries in one of two ways. The user may type in questions, which are then analyzed, using natural language processing techniques, to identify the most relevant generic query; or the user may indicate patient data of interest and then pick one of several potentially relevant questions. Once the query and medical concepts have been determined, an information source is selected automatically, a retrieval strategy is composed and executed, and the results are sorted and filtered for presentation to the user. This work makes extensive use of the National Library of Medicine's Unified Medical Language System (UMLS): medical concepts are derived from the Metathesaurus, medical queries are based on semantic relations drawn from the UMLS Semantic Network, and automated source selection makes use of the Information Sources Map. The paper describes research currently under way to implement this model and reports on experience and results to date.
PMCID:225762
PMID: 8472005
ISSN: 0025-7338
CID: 3651052