Searched for: in-biosketch:yes
person:diazi07
Enhanced precision in the analysis of randomized trials with ordinal outcomes
DÃaz, Iván; Colantuoni, Elizabeth; Rosenblum, Michael
We present a general method for estimating the effect of a treatment on an ordinal outcome in randomized trials. The method is robust in that it does not rely on the proportional odds assumption. Our estimator leverages information in prognostic baseline variables, and has all of the following properties: (i) it is consistent; (ii) it is locally efficient; (iii) it is guaranteed to have equal or better asymptotic precision than both the inverse probability-weighted and the unadjusted estimators. To the best of our knowledge, this is the first estimator of the causal relation between a treatment and an ordinal outcome to satisfy these properties. We demonstrate the estimator in simulations based on resampling from a completed randomized clinical trial of a new treatment for stroke; we show potential gains of up to 39% in relative efficiency compared to the unadjusted estimator. The proposed estimator could be a useful tool for analyzing randomized trials with ordinal outcomes, since existing methods either rely on model assumptions that are untenable in many practical applications, or lack the efficiency properties of the proposed estimator. We provide R code implementing the estimator.
PMID: 26576013
ISSN: 1541-0420
CID: 5304252
Second-Order Inference for the Mean of a Variable Missing at Random
DÃaz, Iván; Carone, Marco; van der Laan, Mark J
We present a second-order estimator of the mean of a variable subject to missingness, under the missing at random assumption. The estimator improves upon existing methods by using an approximate second-order expansion of the parameter functional, in addition to the first-order expansion employed by standard doubly robust methods. This results in weaker assumptions about the convergence rates necessary to establish consistency, local efficiency, and asymptotic linearity. The general estimation strategy is developed under the targeted minimum loss-based estimation (TMLE) framework. We present a simulation comparing the sensitivity of the first and second-order estimators to the convergence rate of the initial estimators of the outcome regression and missingness score. In our simulation, the second-order TMLE always had a coverage probability equal or closer to the nominal value 0.95, compared to its first-order counterpart. In the best-case scenario, the proposed second-order TMLE had a coverage probability of 0.86 when the first-order TMLE had a coverage probability of zero. We also present a novel first-order estimator inspired by a second-order expansion of the parameter functional. This estimator only requires one-dimensional smoothing, whereas implementation of the second-order TMLE generally requires kernel smoothing on the covariate space. The first-order estimator proposed is expected to have improved finite sample performance compared to existing first-order estimators. In the best-case scenario of our simulation study, the novel first-order TMLE improved the coverage probability from 0 to 0.90. We provide an illustration of our methods using a publicly available dataset to determine the effect of an anticoagulant on health outcomes of patients undergoing percutaneous coronary intervention. We provide R code implementing the proposed estimator.
PMID: 27227727
ISSN: 1557-4679
CID: 5304262
Estimator and model selection using cross-validation
Chapter by: Diaz, Ivan
in: Handbook of big data by Buhlmann, Peter; et al [Eds]
Boca Raton, FL : CRC Press, Taylor & Francis Group, 2016
pp. 223-239
ISBN: 9781482249088
CID: 5304882
Efficient Estimation of Quantiles in Missing Data Models [PrePrint]
Diaz, Ivan
ORIGINAL:0015893
ISSN: 2331-8422
CID: 5305462
Deductive derivation and turing-computerization of semiparametric efficient estimation
Frangakis, Constantine E; Qian, Tianchen; Wu, Zhenke; Diaz, Ivan
Researchers often seek robust inference for a parameter through semiparametric estimation. Efficient semiparametric estimation currently requires theoretical derivation of the efficient influence function (EIF), which can be a challenging and time-consuming task. If this task can be computerized, it can save dramatic human effort, which can be transferred, for example, to the design of new studies. Although the EIF is, in principle, a derivative, simple numerical differentiation to calculate the EIF by a computer masks the EIF's functional dependence on the parameter of interest. For this reason, the standard approach to obtaining the EIF relies on the theoretical construction of the space of scores under all possible parametric submodels. This process currently depends on the correctness of conjectures about these spaces, and the correct verification of such conjectures. The correct guessing of such conjectures, though successful in some problems, is a nondeductive process, i.e., is not guaranteed to succeed (e.g., is not computerizable), and the verification of conjectures is generally susceptible to mistakes. We propose a method that can deductively produce semiparametric locally efficient estimators. The proposed method is computerizable, meaning that it does not need either conjecturing, or otherwise theoretically deriving the functional form of the EIF, and is guaranteed to produce the desired estimates even for complex parameters. The method is demonstrated through an example.
PMID: 26237182
ISSN: 1541-0420
CID: 5304242
Rejoinder to Discussions on: Deductive derivation and turing-computerization of semiparametric efficient estimation [Comment]
Frangakis, Constantine E; Qian, Tianchen; Wu, Zhenke; DÃaz, Iván
PMID: 26229019
ISSN: 1541-0420
CID: 5304232
Second-Order Inference for the Mean of a Variable Missing at Random [PrePrint]
Diaz, Ivan; Carone, Marco; van der Laan, Mark J
ORIGINAL:0015895
ISSN: 2331-8422
CID: 5305482
Improved Precision in the Analysis of Randomized Trials with Survival Outcomes, without Assuming Proportional Hazards [PrePrint]
Diaz, Ivan; Colantuoni, Elizabeth; Hanely, Daniel F; Rosenblum, Michael
ORIGINAL:0015894
ISSN: 2331-8422
CID: 5305472
Targeted Maximum Likelihood Estimation using Exponential Families
DÃaz, Iván; Rosenblum, Michael
Targeted maximum likelihood estimation (TMLE) is a general method for estimating parameters in semiparametric and nonparametric models. The key step in any TMLE implementation is constructing a sequence of least-favorable parametric models for the parameter of interest. This has been done for a variety of parameters arising in causal inference problems, by augmenting standard regression models with a "clever-covariate." That approach requires deriving such a covariate for each new type of problem; for some problems such a covariate does not exist. To address these issues, we give a general TMLE implementation based on exponential families. This approach does not require deriving a clever-covariate, and it can be used to implement TMLE for estimating any smooth parameter in the nonparametric model. A computational advantage is that each iteration of TMLE involves estimation of a parameter in an exponential family, which is a convex optimization problem for which software implementing reliable and computationally efficient methods exists. We illustrate the method in three estimation problems, involving the mean of an outcome missing at random, the parameter of a median regression model, and the causal effect of a continuous exposure, respectively. We conduct a simulation study comparing different choices for the parametric submodel. We find that the choice of submodel can have an important impact on the behavior of the estimator in finite samples.
PMID: 26197469
ISSN: 1557-4679
CID: 5304222
Evaluation of the Effect of a Continuous Treatment: A Machine Learning Approach with an Application to Treatment for Traumatic Brain Injury
Kreif, Noémi; Grieve, Richard; DÃaz, Iván; Harrison, David
For a continuous treatment, the generalised propensity score (GPS) is defined as the conditional density of the treatment, given covariates. GPS adjustment may be implemented by including it as a covariate in an outcome regression. Here, the unbiased estimation of the dose-response function assumes correct specification of both the GPS and the outcome-treatment relationship. This paper introduces a machine learning method, the 'Super Learner', to address model selection in this context. In the two-stage estimation approach proposed, the Super Learner selects a GPS and then a dose-response function conditional on the GPS, as the convex combination of candidate prediction algorithms. We compare this approach with parametric implementations of the GPS and to regression methods. We contrast the methods in the Risk Adjustment in Neurocritical care cohort study, in which we estimate the marginal effects of increasing transfer time from emergency departments to specialised neuroscience centres, for patients with acute traumatic brain injury. With parametric models for the outcome, we find that dose-response curves differ according to choice of specification. With the Super Learner approach to both regression and the GPS, we find that transfer time does not have a statistically significant marginal effect on the outcomes.
PMCID:4744663
PMID: 26059721
ISSN: 1099-1050
CID: 5304442