Try a new search

Format these results:

Searched for:



Total Results:


Fast kernel-based association testing of non-linear genetic effects for biobank-scale data

Fu, Boyang; Pazokitoroudi, Ali; Sudarshan, Mukund; Liu, Zhengtong; Subramanian, Lakshminarayanan; Sankararaman, Sriram
Our knowledge of non-linear genetic effects on complex traits remains limited, in part, due to the modest power to detect such effects. While kernel-based tests offer a versatile approach to test for non-linear relationships between sets of genetic variants and traits, current approaches cannot be applied to Biobank-scale datasets containing hundreds of thousands of individuals. We propose, FastKAST, a kernel-based approach that can test for non-linear effects of a set of variants on a quantitative trait. FastKAST provides calibrated hypothesis tests while enabling analysis of Biobank-scale datasets with hundreds of thousands of unrelated individuals from a homogeneous population. We apply FastKAST to 53 quantitative traits measured across ≈ 300 K unrelated white British individuals in the UK Biobank to detect sets of variants with non-linear effects at genome-wide significance.
PMID: 37582955
ISSN: 2041-1723
CID: 5595662

Predicting food crises using news streams

Balashankar, Ananth; Subramanian, Lakshminarayanan; Fraiberger, Samuel P
Anticipating food crisis outbreaks is crucial to efficiently allocate emergency relief and reduce human suffering. However, existing predictive models rely on risk measures that are often delayed, outdated, or incomplete. Using the text of 11.2 million news articles focused on food-insecure countries and published between 1980 and 2020, we leverage recent advances in deep learning to extract high-frequency precursors to food crises that are both interpretable and validated by traditional risk indicators. We demonstrate that over the period from July 2009 to July 2020 and across 21 food-insecure countries, news indicators substantially improve the district-level predictions of food insecurity up to 12 months ahead relative to baseline models that do not include text information. These results could have profound implications on how humanitarian aid gets allocated and open previously unexplored avenues for machine learning to improve decision-making in data-scarce environments.
PMID: 36867695
ISSN: 2375-2548
CID: 5448562

Deep significance clustering: a novel approach for identifying risk-stratified and predictive patient subgroups

Huang, Yufang; Liu, Yifan; Steel, Peter A D; Axsom, Kelly M; Lee, John R; Tummalapalli, Sri Lekha; Wang, Fei; Pathak, Jyotishman; Subramanian, Lakshminarayanan; Zhang, Yiye
OBJECTIVE:Deep significance clustering (DICE) is a self-supervised learning framework. DICE identifies clinically similar and risk-stratified subgroups that neither unsupervised clustering algorithms nor supervised risk prediction algorithms alone are guaranteed to generate. MATERIALS AND METHODS:Enabled by an optimization process that enforces statistical significance between the outcome and subgroup membership, DICE jointly trains 3 components, representation learning, clustering, and outcome prediction while providing interpretability to the deep representations. DICE also allows unseen patients to be predicted into trained subgroups for population-level risk stratification. We evaluated DICE using electronic health record datasets derived from 2 urban hospitals. Outcomes and patient cohorts used include discharge disposition to home among heart failure (HF) patients and acute kidney injury among COVID-19 (Cov-AKI) patients, respectively. RESULTS:Compared to baseline approaches including principal component analysis, DICE demonstrated superior performance in the cluster purity metrics: Silhouette score (0.48 for HF, 0.51 for Cov-AKI), Calinski-Harabasz index (212 for HF, 254 for Cov-AKI), and Davies-Bouldin index (0.86 for HF, 0.66 for Cov-AKI), and prediction metric: area under the Receiver operating characteristic (ROC) curve (0.83 for HF, 0.78 for Cov-AKI). Clinical evaluation of DICE-generated subgroups revealed more meaningful distributions of member characteristics across subgroups, and higher risk ratios between subgroups. Furthermore, DICE-generated subgroup membership alone was moderately predictive of outcomes. DISCUSSION:DICE addresses a gap in current machine learning approaches where predicted risk may not lead directly to actionable clinical steps. CONCLUSION:DICE demonstrated the potential to apply in heterogeneous populations, where having the same quantitative risk does not equate with having a similar clinical profile.
PMID: 34571540
ISSN: 1527-974x
CID: 5090592

High Precision Mammography Lesion Identification From Imprecise Medical Annotations

An, Ulzee; Bhardwaj, Ankit; Shameer, Khader; Subramanian, Lakshminarayanan
Breast cancer screening using Mammography serves as the earliest defense against breast cancer, revealing anomalous tissue years before it can be detected through physical screening. Despite the use of high resolution radiography, the presence of densely overlapping patterns challenges the consistency of human-driven diagnosis and drives interest in leveraging state-of-art localization ability of deep convolutional neural networks (DCNN). The growing availability of digitized clinical archives enables the training of deep segmentation models, but training using the most widely available form of coarse hand-drawn annotations works against learning the precise boundary of cancerous tissue in evaluation, while producing results that are more aligned with the annotations rather than the underlying lesions. The expense of collecting high quality pixel-level data in the field of medical science makes this even more difficult. To surmount this fundamental challenge, we propose LatentCADx, a deep learning segmentation model capable of precisely annotating cancer lesions underlying hand-drawn annotations, which we procedurally obtain using joint classification training and a strict segmentation penalty. We demonstrate the capability of LatentCADx on a publicly available dataset of 2,620 Mammogram case files, where LatentCADx obtains classification ROC of 0.97, AP of 0.87, and segmentation AP of 0.75 (IOU = 0.5), giving comparable or better performance than other models. Qualitative and precision evaluation of LatentCADx annotations on validation samples reveals that LatentCADx increases the specificity of segmentations beyond that of existing models trained on hand-drawn annotations, with pixel level specificity reaching a staggering value of 0.90. It also obtains sharp boundary around lesions unlike other methods, reducing the confused pixels in the output by more than 60%.
PMID: 34977563
ISSN: 2624-909x
CID: 5106812

Sepsis in the era of data-driven medicine: personalizing risks, diagnoses, treatments and prognoses

Liu, Andrew C; Patel, Krishna; Vunikili, Ramya Dhatri; Johnson, Kipp W; Abdu, Fahad; Belman, Shivani Kamath; Glicksberg, Benjamin S; Tandale, Pratyush; Fontanez, Roberto; Mathew, Oommen K; Kasarskis, Andrew; Mukherjee, Priyabrata; Subramanian, Lakshminarayanan; Dudley, Joel T; Shameer, Khader
Sepsis is a series of clinical syndromes caused by the immunological response to infection. The clinical evidence for sepsis could typically attribute to bacterial infection or bacterial endotoxins, but infections due to viruses, fungi or parasites could also lead to sepsis. Regardless of the etiology, rapid clinical deterioration, prolonged stay in intensive care units and high risk for mortality correlate with the incidence of sepsis. Despite its prevalence and morbidity, improvement in sepsis outcomes has remained limited. In this comprehensive review, we summarize the current landscape of risk estimation, diagnosis, treatment and prognosis strategies in the setting of sepsis and discuss future challenges. We argue that the advent of modern technologies such as in-depth molecular profiling, biomedical big data and machine intelligence methods will augment the treatment and prevention of sepsis. The volume, variety, veracity and velocity of heterogeneous data generated as part of healthcare delivery and recent advances in biotechnology-driven therapeutics and companion diagnostics may provide a new wave of approaches to identify the most at-risk sepsis patients and reduce the symptom burden in patients within shorter turnaround times. Developing novel therapies by leveraging modern drug discovery strategies including computational drug repositioning, cell and gene-therapy, clustered regularly interspaced short palindromic repeats -based genetic editing systems, immunotherapy, microbiome restoration, nanomaterial-based therapy and phage therapy may help to develop treatments to target sepsis. We also provide empirical evidence for potential new sepsis targets including FER and STARD3NL. Implementing data-driven methods that use real-time collection and analysis of clinical variables to trace, track and treat sepsis-related adverse outcomes will be key. Understanding the root and route of sepsis and its comorbid conditions that complicate treatment outcomes and lead to organ dysfunction may help to facilitate identification of most at-risk patients and prevent further deterioration. To conclude, leveraging the advances in precision medicine, biomedical data science and translational bioinformatics approaches may help to develop better strategies to diagnose and treat sepsis in the next decade.
PMID: 31190075
ISSN: 1477-4054
CID: 3955532

The Importance of Long-term Care Populations in Models of COVID-19

Pillemer, Karl; Subramanian, Lakshminarayanan; Hupert, Nathaniel
PMID: 32501504
ISSN: 1538-3598
CID: 4469482

Quantifying the localized relationship between vector containment activities and dengue incidence in a real-world setting: A spatial and time series modelling analysis based on geo-located data from Pakistan

Abdur Rehman, Nabeel; Salje, Henrik; Kraemer, Moritz U G; Subramanian, Lakshminarayanan; Saif, Umar; Chunara, Rumi
Increasing urbanization is having a profound effect on infectious disease risk, posing significant challenges for governments to allocate limited resources for their optimal control at a sub-city scale. With recent advances in data collection practices, empirical evidence about the efficacy of highly localized containment and intervention activities, which can lead to optimal deployment of resources, is possible. However, there are several challenges in analyzing data from such real-world observational settings. Using data on 3.9 million instances of seven dengue vector containment activities collected between 2012 and 2017, here we develop and assess two frameworks for understanding how the generation of new dengue cases changes in space and time with respect to application of different types of containment activities. Accounting for the non-random deployment of each containment activity in relation to dengue cases and other types of containment activities, as well as deployment of activities in different epidemiological contexts, results from both frameworks reinforce existing knowledge about the efficacy of containment activities aimed at the adult phase of the mosquito lifecycle. Results show a 10% (95% CI: 1-19%) and 20% reduction (95% CI: 4-34%) reduction in probability of a case occurring in 50 meters and 30 days of cases which had Indoor Residual Spraying (IRS) and fogging performed in the immediate vicinity, respectively, compared to cases of similar epidemiological context and which had no containment in their vicinity. Simultaneously, limitations due to the real-world nature of activity deployment are used to guide recommendations for future deployment of resources during outbreaks as well as data collection practices. Conclusions from this study will enable more robust and comprehensive analyses of localized containment activities in resource-scarce urban settings and lead to improved allocation of resources of government in an outbreak setting.
PMID: 32392225
ISSN: 1935-2735
CID: 4431002

Identifying unreliable and adversarial workers in crowdsourced labeling tasks

Jagabathula, Srikanth; Subramanian, Lakshminarayanan; Venkataraman, Ashwin
We study the problem of identifying unreliable and adversarial workers in crowdsourcing systems where workers (or users) provide labels for tasks (or items). Most existing studies assume that worker responses follow specific probabilistic models; however, recent evidence shows the presence of workers adopting non-random or even malicious strategies. To account for such workers, we suppose that workers comprise a mixture of honest and adversarial workers. Honest workers may be reliable or unreliable, and they provide labels according to an unknown but explicit probabilistic model. Adversaries adopt labeling strategies different from those of honest workers, whether probabilistic or not. We propose two reputation algorithms to identify unreliable honest workers and adversarial workers from only their responses. Our algorithms assume that honest workers are in the majority, and they classify workers with outlier label patterns as adversaries. Theoretically, we show that our algorithms successfully identify unreliable honest workers, workers adopting deterministic strategies, and worst-case sophisticated adversaries who can adopt arbitrary labeling strategies to degrade the accuracy of the inferred task labels. Empirically, we show that filtering out outliers using our algorithms can significantly improve the accuracy of several state-of-the-art label aggregation algorithms in real-world crowdsourcing datasets.
ISSN: 1532-4435
CID: 2874732

The fake vs real goods problem: Microscopy and machine learning to the rescue

Chapter by: Sharma, Ashlesh; Srinivasan, Vidyuth; Kanchan, Vishal; Subramanian, Lakshminarayanan
in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining by
[S.l.] : Association for Computing Machinery, 2017
pp. 2011-2019
ISBN: 9781450348874
CID: 2874742

Extracting signals from news streams for disease outbreak prediction

Chapter by: Chakraborty, Sunandan; Subramanian, Lakshminarayanan
in: 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings by
[S.l.] : Institute of Electrical and Electronics Engineers Inc., 2017
pp. 1300-1304
ISBN: 9781509045457
CID: 2874722