NYUHSL Faculty Bibliography

Searched for:

in-biosketch:yes

person:diazi07

Total Results:

145

International journal of biostatistics. 2015:11(2):233-51.DOI: 10.1515/ijb-2014-0039

Targeted Maximum Likelihood Estimation using Exponential Families

DÃaz, IvÃ¡n; Rosenblum, Michael

Targeted maximum likelihood estimation (TMLE) is a general method for estimating parameters in semiparametric and nonparametric models. The key step in any TMLE implementation is constructing a sequence of least-favorable parametric models for the parameter of interest. This has been done for a variety of parameters arising in causal inference problems, by augmenting standard regression models with a "clever-covariate." That approach requires deriving such a covariate for each new type of problem; for some problems such a covariate does not exist. To address these issues, we give a general TMLE implementation based on exponential families. This approach does not require deriving a clever-covariate, and it can be used to implement TMLE for estimating any smooth parameter in the nonparametric model. A computational advantage is that each iteration of TMLE involves estimation of a parameter in an exponential family, which is a convex optimization problem for which software implementing reliable and computationally efficient methods exists. We illustrate the method in three estimation problems, involving the mean of an outcome missing at random, the parameter of a median regression model, and the causal effect of a continuous exposure, respectively. We conduct a simulation study comparing different choices for the parametric submodel. We find that the choice of submodel can have an important impact on the behavior of the estimator in finite samples.

PMID: 26197469

ISSN: 1557-4679

CID: 5304222

Health economics. 2015:24(9):1213-28.DOI: 10.1002/hec.3189

Evaluation of the Effect of a Continuous Treatment: A Machine Learning Approach with an Application to Treatment for Traumatic Brain Injury

Kreif, NoÃ©mi; Grieve, Richard; DÃaz, IvÃ¡n; Harrison, David

For a continuous treatment, the generalised propensity score (GPS) is defined as the conditional density of the treatment, given covariates. GPS adjustment may be implemented by including it as a covariate in an outcome regression. Here, the unbiased estimation of the dose-response function assumes correct specification of both the GPS and the outcome-treatment relationship. This paper introduces a machine learning method, the 'Super Learner', to address model selection in this context. In the two-stage estimation approach proposed, the Super Learner selects a GPS and then a dose-response function conditional on the GPS, as the convex combination of candidate prediction algorithms. We compare this approach with parametric implementations of the GPS and to regression methods. We contrast the methods in the Risk Adjustment in Neurocritical care cohort study, in which we estimate the marginal effects of increasing transfer time from emergency departments to specialised neuroscience centres, for patients with acute traumatic brain injury. With parametric models for the outcome, we find that dose-response curves differ according to choice of specification. With the Super Learner approach to both regression and the GPS, we find that transfer time does not have a statistically significant marginal effect on the outcomes.

PMCID:4744663

PMID: 26059721

ISSN: 1099-1050

CID: 5304442

PLoS one. 2015:10(3).DOI: 10.1371/journal.pone.0120031

Variable importance and prediction methods for longitudinal problems with missing variables

DÃaz, IvÃ¡n; Hubbard, Alan; Decker, Anna; Cohen, Mitchell

We present prediction and variable importance (VIM) methods for longitudinal data sets containing continuous and binary exposures subject to missingness. We demonstrate the use of these methods for prognosis of medical outcomes of severe trauma patients, a field in which current medical practice involves rules of thumb and scoring methods that only use a few variables and ignore the dynamic and high-dimensional nature of trauma recovery. Well-principled prediction and VIM methods can provide a tool to make care decisions informed by the high-dimensional patient's physiological and clinical history. Our VIM parameters are analogous to slope coefficients in adjusted regressions, but are not dependent on a specific statistical model, nor require a certain functional form of the prediction regression to be estimated. In addition, they can be causally interpreted under causal and statistical assumptions as the expected outcome under time-specific clinical interventions, related to changes in the mean of the outcome if each individual experiences a specified change in the variable (keeping other variables in the model fixed). Better yet, the targeted MLE used is doubly robust and locally efficient. Because the proposed VIM does not constrain the prediction model fit, we use a very flexible ensemble learner (the SuperLearner), which returns a linear combination of a list of user-given algorithms. Not only is such a prediction algorithm intuitive appealing, it has theoretical justification as being asymptotically equivalent to the oracle selector. The results of the analysis show effects whose size and significance would have been not been found using a parametric approach (such as stepwise regression or LASSO). In addition, the procedure is even more compelling as the predictor on which it is based showed significant improvements in cross-validated fit, for instance area under the curve (AUC) for a receiver-operator curve (ROC). Thus, given that 1) our VIM applies to any model fitting procedure, 2) under assumptions has meaningful clinical (causal) interpretations and 3) has asymptotic (influence-curve) based robust inference, it provides a compelling alternative to existing methods for estimating variable importance in high-dimensional clinical (or other) data.

PMCID:4376910

PMID: 25815719

ISSN: 1932-6203

CID: 5304432

Journal of causal inference. 2014:3(1):21-31.DOI: 10.1515/em-2014-0012

Discussion of Identification, Estimation and Approximation of Risk under Interventions that Depend on the Natural Value of Treatment Using Observational Data, by Jessica Young, Miguel HernÃ¡n, and James Robins

van der Laan, Mark J; Luedtke, Alexander R; DÃaz, IvÃ¡n

Young, HernÃ¡n, and Robins consider the mean outcome under a dynamic intervention that may rely on the natural value of treatment. They first identify this value with a statistical target parameter, and then show that this statistical target parameter can also be identified with a causal parameter which gives the mean outcome under a stochastic intervention. The authors then describe estimation strategies for these quantities. Here we augment the authors' insightful discussion by sharing our experiences in situations where two causal questions lead to the same statistical estimand, or the newer problem that arises in the study of data adaptive parameters, where two statistical estimands can lead to the same estimation problem. Given a statistical estimation problem, we encourage others to always use a robust estimation framework where the data generating distribution truly belongs to the statistical model. We close with a discussion of a framework which has these properties.

PMCID:4666557

PMID: 26636024

ISSN: 2193-3677

CID: 5304452

American journal of epidemiology. 2014:180(7):737-48.DOI: 10.1093/aje/kwu197

Estimating population treatment effects from a survey subsample

Rudolph, Kara E; DÃaz, IvÃ¡n; Rosenblum, Michael; Stuart, Elizabeth A

We considered the problem of estimating an average treatment effect for a target population using a survey subsample. Our motivation was to generalize a treatment effect that was estimated in a subsample of the National Comorbidity Survey Replication Adolescent Supplement (2001-2004) to the population of US adolescents. To address this problem, we evaluated easy-to-implement methods that account for both nonrandom treatment assignment and a nonrandom 2-stage selection mechanism. We compared the performance of a Horvitz-Thompson estimator using inverse probability weighting and 2 doubly robust estimators in a variety of scenarios. We demonstrated that the 2 doubly robust estimators generally outperformed inverse probability weighting in terms of mean-squared error even under misspecification of one of the treatment, selection, or outcome models. Moreover, the doubly robust estimators are easy to implement and provide an attractive alternative to inverse probability weighting for applied epidemiologic researchers. We demonstrated how to apply these estimators to our motivating example.

PMCID:4172168

PMID: 25190679

ISSN: 1476-6256

CID: 5304972

arXiv. 2014.DOI: 10.48550/arXiv.1406.0423

Targeted Maximum Likelihood Estimation using Exponential Families [PrePrint]

Diaz, Ivan; Rosenblum, Michael

ORIGINAL:0015896

ISSN: 2331-8422

CID: 5305492

International journal of biostatistics. 2013:9(2):149-60.DOI: 10.1515/ijb-2013-0004

Sensitivity analysis for causal inference under unmeasured confounding and measurement error problems

DÃaz, IvÃ¡n; van der Laan, Mark J

In this article, we present a sensitivity analysis for drawing inferences about parameters that are not estimable from observed data without additional assumptions. We present the methodology using two different examples: a causal parameter that is not identifiable due to violations of the randomization assumption, and a parameter that is not estimable in the nonparametric model due to measurement error. Existing methods for tackling these problems assume a parametric model for the type of violation to the identifiability assumption and require the development of new estimators and inference for every new model. The method we present can be used in conjunction with any existing asymptotically linear estimator of an observed data parameter that approximates the unidentifiable full data parameter and does not require the study of additional models.

PMID: 24246288

ISSN: 1557-4679

CID: 5304382

International journal of biostatistics. 2013:9(2):161-74.DOI: 10.1515/ijb-2013-0014

Assessing the causal effect of policies: an example using stochastic interventions

DÃaz, IvÃ¡n; van der Laan, Mark J

Assessing the causal effect of an exposure often involves the definition of counterfactual outcomes in a hypothetical world in which the stochastic nature of the exposure is modified. Although stochastic interventions are a powerful tool to measure the causal effect of a realistic intervention that intends to alter the population distribution of an exposure, their importance to answer questions about plausible policy interventions has been obscured by the generalized use of deterministic interventions. In this article, we follow the approach described in DÃaz and van der Laan (2012) to define and estimate the effect of an intervention that is expected to cause a truncation in the population distribution of the exposure. The observed data parameter that identifies the causal parameter of interest is established, as well as its efficient influence function under the non-parametric model. Inverse probability of treatment weighted (IPTW), augmented IPTW and targeted minimum loss-based estimators (TMLE) are proposed, their consistency and efficiency properties are determined. An extension to longitudinal data structures is presented and its use is demonstrated with a real data example.

PMID: 24246287

ISSN: 1557-4679

CID: 5304372

Journal of trauma & acute care surgery. 2013:75(1 Suppl 1):S53-60.DOI: 10.1097/TA.0b013e3182914553

Time-dependent prediction and evaluation of variable importance using superlearning in high-dimensional clinical data

Hubbard, Alan; Munoz, Ivan Diaz; Decker, Anna; Holcomb, John B; Schreiber, Martin A; Bulger, Eileen M; Brasel, Karen J; Fox, Erin E; del Junco, Deborah J; Wade, Charles E; Rahbar, Mohammad H; Cotton, Bryan A; Phelan, Herb A; Myers, John G; Alarcon, Louis H; Muskat, Peter; Cohen, Mitchell J

BACKGROUND:Prediction of outcome after injury is fraught with uncertainty and statistically beset by misspecified models. Single-time point regression only gives prediction and inference at one time, of dubious value for continuous prediction of ongoing bleeding. New statistical machine learning techniques such as SuperLearner (SL) exist to make superior prediction at iterative time points while evaluating the changing relative importance of each measured variable on an outcome. This then can provide continuously changing prediction of outcome and evaluation of which clinical variables likely drive a particular outcome. METHODS:PROMMTT data were evaluated using both naive (standard stepwise logistic regression) and SL techniques to develop a time-dependent prediction of future mortality within discrete time intervals. We avoided both underfitting and overfitting using cross validation to select an optimal combination of predictors among candidate predictors/machine learning algorithms. SL was also used to produce interval-specific robust measures of variable importance measures (VIM resulting in an ordered list of variables, by time point) that have the strongest impact on future mortality. RESULTS:Nine hundred eighty patients had complete clinical and outcome data and were included in the analysis. The prediction of ongoing transfusion with SL was superior to the naive approach for all time intervals (correlations of cross-validated predictions with the outcome were 0.819, 0.789, 0.792 for time intervals 30-90, 90-180, 180-360, >360 minutes). The estimated VIM of mortality also changed significantly at each time point. CONCLUSION/CONCLUSIONS:The SL technique for prediction of outcome from a complex dynamic multivariate data set is superior at each time interval to standard models. In addition, the SL VIM at each time point provides insight into the time-specific drivers of future outcome, patient trajectory, and targets for clinical intervention. Thus, this automated approach mimics clinical practice, changing form and content through time to optimize the accuracy of the prognosis based on the evolving trajectory of the patient.

PMCID:3744063

PMID: 23778512

ISSN: 2163-0763

CID: 5304912

Journal of causal inference. 2013:1(2):171-192.DOI:

Targeted Data Adaptive Estimation of the Causal Dose-Response Curve

Diaz, Ivan; Van der Laan, Mark J.

ISI:000218558300001

ISSN: 2193-3677

CID: 5304812