Seminar

Targeted Learning with High Dimensional Data - Mark van der Laan

Date
Mon October 28th 2013, 12:45pm
Event Sponsor
the Institute for Research in the Social Sciences (IRiSS) and the Graduate School of Business (GSB)
Location
Room P102 in the Patterson Building (part of the Knight Management Center) at Stanford's Graduate School of Business
Targeted Learning with High Dimensional Data - Mark van der Laan

Mark van der Laan, Professor of Biostatistics and Statistics at UC Berkeley and Founder, Target Analytics, Inc.

Slides available here.

Abstract

Learning from data involves defining 1) the experiment that generated the data, 2) the (often low dimensional) target parameter of the data generating distribution that we want to learn, the so called estimand, 3) the collection of possible data generating distributions, the so called statistical model, and 4) it’s possible parameterization in terms of underlying distributions, often involving non-testable assumptions, giving the so called model.

The statistical model represents our statistical knowledge and should be defined so that it contains the true data generating distribution. The statistical estimation problem is now defined by the target parameter and statistical model. Realistic estimation problems thus involve learning a target parameter in very large semiparametric model for often very high dimensional data structures. Classical methods such as maximum likelihood based estimation, though optimal for small semiparametric models, break down for such large semiparametric models, due to wrong (non-targeted) bias-variance trade-off.

In response to this, we developed targeted maximum likelihood estimation, and its natural generalization, targeted minimum loss based estimation (TMLE), as a template for construction of semiparametric efficient estimators of pathwise differentiable target parameters. It involves defining an initial estimator of the relevant part of the data generating distribution, allowing the integration of the state of the art in ensemble learning fully utilizing the power of cross-validation, and targeted bias reduction step defined by a least favorable parametric submodel through the initial estimator, and a loss function to estimate the amount of fluctuation. The estimator of the target parameter is now the plug-in estimator corresponding with this updated initial estimator. Under appropriate conditions, TMLE results in semiparametric efficient (often robust w.r.t various misspecifications) substitution estimators. We assigned the name Targeted Learning to the field concerned with data adaptive estimation of target parameters while still providing statistical inference.

In our talk we will review this template, and demonstrate some recent work involving applications of TMLE to nonparametrically estimate optimal individualized treatment rules, while providing statistical inference or its gain relative to a standard treatment, and to estimate causal effects of stochastic interventions on a network of individuals.

Contact Phone Number