Semiparametric Estimation

The goal is general purpose learning and inference for the nonparametric causal parameter \(\theta_0 \in \mathbb{R}\). Many such parameters \(\theta_0\) have multiply robust moment functions with nuisance parameters \((\nu_0, \delta_0, \alpha_0, \eta_0)\). In this section, we describe a meta algorithm to turn estimators \((\hat{\nu}, \hat{\delta}, \hat{\alpha}, \hat{\eta})\) into an estimator \(\hat{\theta}\) such that \(\hat{\theta}\) has a valid and practical confidence interval. This meta algorithm can be seen as an extension of classic one step corrections amenable to the use of modern machine learning, and it has been termed debiased machine learning. Unlike targeted minimum loss inference with a finite sample, it does not involve substitution, iteration, or bootstrapping.

The target estimator \(\hat{\theta}\) as well as its confidence interval will depend on nuisance estimators \((\hat{\nu}, \hat{\delta}, \hat{\alpha}, \hat{\eta})\). This nuisances will typically come from the estimation of the nested NPIV from an outcome model (\((\hat{\nu}, \hat{\delta})\)) or from the nested NPIV from an action model (\((\hat{\alpha}, \hat{\eta})\)).

The general theory only requires that each nuisance estimator converges to the corresponding nuisance parameter in mean square error. The general meta algorithm is as follows.

Algorithm (Debiased machine learning). Given a sample \((Y_i, W_i)\) (\(i = 1, \ldots, n\)), partition the sample into folds (\(I_\ell\)) (\(\ell = 1, \ldots, L\)). Denote by \(I^c_\ell\) the complement of \(I_\ell\).

For each fold \(\ell\), estimate \((\hat{\nu}_\ell, \hat{\delta}_\ell, \hat{\alpha}_\ell, \hat{\eta}_\ell)\) from observations in \(I^c_\ell\).
Estimate \(\theta_0\) as

\[\hat{\theta} = \frac{1}{n} \sum_{\ell=1}^L \sum_{i \in I_\ell} \left[ \hat{\nu}_\ell(W_i) + \hat{\alpha}_\ell(W_i)\{Y_i - \hat{\delta}_\ell(W_i)\} + \hat{\eta}_\ell(W_i)\{\hat{\delta}_\ell(W_i) - \hat{\nu}_\ell(W_i)\} \right].\]
Estimate its \((1 - \alpha)100%\) confidence interval as \(\hat{\theta} \pm c_\alpha \hat{\sigma} n^{-1/2}\), where \(c_\alpha\) is the \(1 - \alpha/2\) quantile of the standard Gaussian and

\[\hat{\sigma}^2 = \frac{1}{n} \sum_{\ell=1}^L \sum_{i \in I_\ell} \left[ \hat{\nu}_\ell(W_i) + \hat{\alpha}_\ell(W_i)\{Y_i - \hat{\delta}_\ell(W_i)\} + \hat{\eta}_\ell(W_i)\{\hat{\delta}_\ell(W_i) - \hat{\nu}_\ell(W_i)\} - \hat{\theta} \right]^2.\]