Searched for: in-biosketch:yes
person:sbj2002
Use of the Extensible Stylesheet Language (XSL) for medical data transformation
Seol, Y H; Johnson, S B; Starren, J
Recently, the Extensible Markup Language (XML) has received growing attention as a simple but flexible mechanism to represent medical data. As XML-based markups become more common there will be an increasing need to transform data stored in one XML markup into another markup. The Extensible Stylesheet Language (XSL) is a stylesheet language for XML. Development of a new mammography reporting system created a need to convert XML output from the MEDLee natural language processing system into a format suitable for cross-patient reporting. This paper examines the capability of XSL as a rule specification language that supports the medical XML data transformation. A set of nine relevant transformations was identified: Filtering, Substitution, Specification, Aggregation, Merging, Splitting, Transposition, Push-down and Pull-up. XSL-based methods for implementing these transformations are presented. The strengths and limitations of XSL are discussed in the context of XML medical data transformation.
PMCID:2232783
PMID: 10566337
ISSN: 1531-605x
CID: 3650602
A technique for semantic classification of unknown words using UMLS resources
Campbell, D A; Johnson, S B
Natural Language Processing (NLP) is a tool for transforming natural text into codable form. Success of NLP systems is contingent on a well constructed semantic lexicon. However, creation and maintenance of these lexicons is difficult, costly and time consuming. The UMLS contains semantic and syntactic information of medical terms, which may be used to automate some of this task. Using UMLS resources we have observed that it is possible to define one semantic type by its syntactic combinations with other types in a corpus of discharge summaries. These patterns of combination can then be used to classify words which are not in the lexicon. The technique was applied to a corpus for a single semantic type and generated a list of 875 words which matched the classification criteria for that type. The words were ranked by number of patterns matched and the top 95 words were correctly typed with 80% accuracy.
PMCID:2232586
PMID: 10566453
ISSN: 1531-605x
CID: 3650622
Extended SQL for manipulating clinical warehouse data
Johnson, S B; Chatziantoniou, D
Health care institutions are beginning to collect large amounts of clinical data through patient care applications. Clinical data warehouses make these data available for complex analysis across patient records, benefiting administrative reporting, patient care and clinical research. Data gathered for patient care purposes are difficult to manipulate for analytic tasks; the schema presents conceptual difficulties for the analyst, and many queries perform poorly. An extension to SQL is presented that enables the analyst to designate groups of rows. These groups can then be manipulated and aggregated in various ways to solve a number of useful analytic problems. The extended SQL is concise and runs in linear time, while standard SQL requires multiple statements with polynomial performance. The extensions are extremely powerful for performing aggregations on large amounts of data, which is useful in clinical data mining applications.
PMCID:2232585
PMID: 10566474
ISSN: 1531-605x
CID: 3650632
Security architecture for multi-site patient records research
Behlen, F M; Johnson, S B
A security system was developed as part of a patient records research database project intended for both local and multi-site studies. A comprehensive review of ethical foundations and legal environment was undertaken, and a security system comprising both administrative policies and computer tools was developed. For multi-site studies, Institutional Review Board (IRB) approval is required for each study, at each participating site. A sponsoring Principal Investigator (PI) is required at each site, and each PI needs automated enforcement tools. Systems fitting this model were implemented at two academic medical centers. Security features of commercial database systems were found to be adequate for basic enforcement of approved research protocols.
PMCID:2232693
PMID: 10566404
ISSN: 1531-605x
CID: 3650612
Conceptual graph grammar--a simple formalism for sublanguage
Johnson, S B
There are a wide variety of computer applications that deal with various aspects of medical language: concept representation, controlled vocabulary, natural language processing, and information retrieval. While technical and theoretical methods appear to differ, all approaches investigate different aspects of the same phenomenon: medical sublanguage. This paper surveys the properties of medical sublanguage from a formal perspective, based on detailed analyses cited in the literature. A review of several computer systems based on sublanguage approaches shows some of the difficulties in addressing the interaction between the syntactic and semantic aspects of sublanguage. A formalism called Conceptual Graph Grammar is presented that attempts to combine both syntax and semantics into a single notation by extending standard Conceptual Graph notation. Examples from the domain of pathology diagnoses are provided to illustrate the use of this formalism in medical language analysis. The strengths and weaknesses of the approach are then considered. Conceptual Graph Grammar is an attempt to synthesize the common properties of different approaches to sublanguage into a single formalism, and to begin to define a common foundation for language-related research in medical informatics.
PMID: 9865032
ISSN: 0026-1270
CID: 3651152
Developing online support for clinical information system developers: the FAQ approach
Wilcox, A; Hripcsak, G; Johnson, S B; Hwang, J J; Wu, M
OBJECTIVE:We investigate a knowledge-based help system for developers of an integrated clinical information system (CIS). The first objective in the study was to determine the system's ability to answer users' questions effectively. User performance and behavior were studied. The second objective was to evaluate the effect of using questions and answers to augment or replace traditional program documentation. DESIGN/METHODS:A comparative study of user and system effectiveness using a collection of 47 veritable questions regarding the CIS, solicited from various CIS developers, is conducted. Most questions were concerning the clinical data model and acquiring the data. MEASUREMENTS/METHODS:Answers using current documentation known by users were compared to answers found using the help system. Answers existing within traditional documentation were compared to answers existing within question-answer exchanges (Q-A's). RESULTS:The support system augmented 39% of users' answers to test questions. Though the Q-A's were less than 5% of the total documentation collected, these files contained answers to nearly 50% of the questions in the test group. The rest of the documentation contained about 75% of the answers. CONCLUSIONS:A knowledge-based help system built by collecting questions and answers can be a viable alternative to large documentation files, providing the questions and answers can be collected effectively.
PMID: 9570902
ISSN: 0010-4809
CID: 3651142
Generic database design for patient management information
Johnson, S B; Paul, T; Khenina, A
Patient management information tracks general facts about the location of the patient and the providers assigned to care for the patient. The Clinical Data Repository at Columbia Presbyterian Medical Center employs a generic schema to record patient management events. The schema is extremely simple, yet can support several different views of patient information, as required by different applications: a longitudinal view of patient visits, including both inpatient and outpatient encounters; a visit-oriented view, to record facts related to a current encounter; a location-based view to provide a census of a nursing ward; and a provider-based view to give a list of the patients currently being cared for by a given clinician. All of these views can be supported in a highly efficient manner by the use of appropriate indexes.
PMCID:2233478
PMID: 9357581
ISSN: 1091-8280
CID: 3651132
Generic data modeling for clinical repositories
Johnson, S B
OBJECTIVE:To construct a large-scale clinical repository that accurately captures a detailed understanding of the data vital to the process of health care and that provides highly efficient access to patient information for the users of a clinical information system. DESIGN/METHODS:Conventional approaches to data modeling encourage the development of a highly specific data schema in order to capture as much information as possible about a given domain. In contrast, current database technology functions most effectively for clinical databases when a generic data schema is used. The technique of "generic data modeling" is presented as a method of reconciling these opposing views of clinical data, using formal operations to transform a detailed schema into a generic one. RESULTS:A complex schema consisting of hundreds of entities and representing a rich set of constraints about the patient care domain is transformed into a generic schema consisting of roughly two dozen tables. The resulting database design is efficient for patient-oriented queries and is highly flexible in adapting to the changing information needs of a health care institution, particularly changes involving the collection of new data elements. CONCLUSION/CONCLUSIONS:Conventional approaches to data modeling can be used to develop rich, complex models of clinical data that are useful for understanding and managing the process of patient care. Generic data modeling techniques can successfully transform a detailed design into an efficient generic design that is flexible enough to meet the needs of an operational clinical information system.
PMCID:116317
PMID: 8880680
ISSN: 1067-5027
CID: 3651112
Design of a clinical event monitor
Hripcsak, G; Clayton, P D; Jenders, R A; Cimino, J J; Johnson, S B
The issues and implementation of a clinical event monitor are described. An event monitor generates messages for providers, patients, and organizations based on clinical events and patient data. For example, an order for a medication might trigger the generation of a warning about a drug interaction. A model based on the active database literature has as its main components an event (which triggers a rule to fire), a condition (which tests whether an action ought to be performed), and an action (often the generation of a message). The details of implementing such a monitor are described, using as an example the Columbia-Presbyterian Medical Center clinical event monitor, which is based on the Arden Syntax for Medical Logic Modules.
PMID: 8812070
ISSN: 0010-4809
CID: 3651102
Integrating data from natural language processing into a clinical information system
Johnson, S B; Friedman, C
Demographic data extracted from discharge summaries by natural language processing was compared to data gathered by a conventional hospital admitting system. Discrepancies in data were noted in names, age, sex, race, and ethnicity. Some differences are attributable to errors in collection: interaction with patient, dictation, transcription, and data entry. Very few differences were due to errors in natural language processing. Other differences can be used to critique existing data, or to enhance data with more detailed information. Discrepancies in data as elementary as patient demographics raise the issue of resolving conflicts when neither source of data is known to be more reliable. Clinical repositories can represent conflicting data from multiple sources, but clinical information systems must bear the cost of increased complexity in the application programs that will use the data.
PMCID:2233157
PMID: 8947724
ISSN: 1091-8280
CID: 3651122