Nonparametric Instrumental Variables
This module implements Debiased Machine Learning for Nonparametric Instrumental Variables (DML-npiv). It provides tools for estimating causal effects using a combination of machine learning models and instrumental variables techniques. The module supports cross-validation, kernel density estimation for localization, and confidence interval computation.
- Classes:
DML_npiv: Main class for performing DML-npiv with various configuration options.
- DML_npiv Methods:
__init__: Initialize the DML_npiv instance with data and model configurations.
_calculate_confidence_interval: Calculate confidence intervals for the estimates.
_localization: Perform localization using kernel density estimation.
_npivfit_outcome: Fit the outcome model using nonparametric instrumental variables.
_propensity_score: Estimate the propensity score.
_npivfit_action: Fit the action model using nonparametric instrumental variables.
_process_fold: Process a single fold for cross-validation.
_split_and_estimate: Split the data and estimate the model using cross-validation.
dml: Perform Debiased Machine Learning for Nonparametric Instrumental Variables.
- class dml_npiv.DML_npiv(Y, D, Z, W, X1=None, V=None, v_values=None, loc_kernel='gau', bw_loc='silverman', estimator='MR', model1=<nnpiv.rkhs.rkhsiv.ApproxRKHSIVCV object>, nn_1=False, modelq1=<nnpiv.rkhs.rkhsiv.ApproxRKHSIVCV object>, nn_q1=False, alpha=0.05, n_folds=5, n_rep=1, random_seed=123, prop_score=sklearn.linear_model.LogisticRegression, CHIM=False, verbose=True, fitargs1=None, fitargsq1=None, opts=None)[source]
Bases:
objectDebiased Machine Learning for Nonparametric Instrumental Variables (DML-npiv) class.
- Parameters
Y (array-like) – Outcome variable.
D (array-like) – Treatment variable.
Z (array-like) – Instrumental variable.
W (array-like) – Negative control outcome.
X1 (array-like, optional) – Additional covariates.
V (array-like, optional) – Localization covariates.
v_values (array-like, optional) – Values for localization.
loc_kernel (str, optional) – Kernel for localization. Options include ‘gau’, ‘epanechnikov’, ‘uniform’, etc.
bw_loc (str, optional) – Bandwidth for localization.
estimator (str, optional) – Estimator type (‘MR’, ‘OR’, ‘IPW’).
model1 (estimator, optional) – Model for the first stage.
nn_1 (bool, optional) – Use neural network for the first stage.
modelq1 (estimator, optional) – Model for the second stage.
nn_q1 (bool, optional) – Use neural network for the second stage.
alpha (float, optional) – Significance level for confidence intervals.
n_folds (int, optional) – Number of folds for estimation.
n_rep (int, optional) – Number of repetitions for estimation.
random_seed (int, optional) – Seed for random number generator.
prop_score (estimator, optional) – Model for propensity score.
CHIM (bool, optional) – Use CHIM method. Dropping observations with extreme values of the propensity score - CHIM (2009).
verbose (bool, optional) – Print progress information.
fitargs1 (dict, optional) – Arguments for fitting the first stage model.
fitargsq1 (dict, optional) – Arguments for fitting the second stage model.
opts (dict, optional) – Additional options.
- _calculate_confidence_interval(theta, theta_var)[source]
Calculate the confidence interval for the given estimates.
- Parameters
theta (array-like) – Estimated values.
theta_var (array-like) – Variance of the estimates.
- Returns
Lower and upper bounds of the confidence intervals.
- Return type
array-like
- _localization(V, v_val, bw)[source]
Perform localization using kernel density estimation.
- Parameters
V (array-like) – Localization covariates.
v_val (array-like) – Values for localization.
bw (float) – Bandwidth for localization.
- Returns
Weights for localization.
- Return type
array-like
- _npivfit_action(ps_hat_1, W, X, Z, alfa=0.0)[source]
Fit the action model using nonparametric instrumental variables.
- Parameters
ps_hat_1 (array-like) – Estimated propensity scores.
W (array-like) – Control variable.
X (array-like) – Covariates.
Z (array-like) – Instrumental variable.
alfa (float, optional) – Threshold alpha for propensity scores.
- Returns
Fitted models for treated and control groups.
- Return type
- _npivfit_outcome(Y, D, X, Z)[source]
Fit the outcome model using nonparametric instrumental variables.
- Parameters
Y (array-like) – Outcome variable.
D (array-like) – Treatment variable.
X (array-like) – Covariates.
Z (array-like) – Instrumental variable.
- Returns
Fitted models for treatment and control groups.
- Return type
- _propensity_score(X, W, D)[source]
Estimate the propensity score.
- Parameters
X (array-like) – Covariates.
W (array-like) – Control variable.
D (array-like) – Treatment variable.
- Returns
Estimated propensity scores and threshold alpha.
- Return type
- dml_npiv._fun_threshold_alpha(alpha, g)[source]
Auxiliary function for computation of optimal alpha for improvement in overlap: CHIM (Dealing with limited overlap in estimation of average treatment effects).
Richard K. Crump, V. Joseph Hotz, Guido W. Imbens, Oscar A. Mitnik Biometrika, Volume 96, Issue 1, March 2009.
Mediation Analysis
Sequential mediation
This module performs Debiased Machine Learning for mediation analysis, using the sequential estimators for the longitudinal nonparametric parameters (in the Nested NPIV framework). It provides tools for estimating causal effects with mediation using a combination of machine learning models and instrumental variables techniques. The module supports different types of mediated estimands, cross-validation, kernel density estimation for localization, and confidence interval computation.
- Classes:
DML_mediated: Main class for performing DML for mediation analysis with various configuration options.
- DML_mediated Methods:
__init__: Initialize the DML_mediated instance with data and model configurations.
_calculate_confidence_interval: Calculate confidence intervals for the estimates.
_localization: Perform localization using kernel density estimation.
_nnpivfit_outcome_m: Fit the mediated outcome model using nonparametric instrumental variables.
_npivfit_outcome: Fit the outcome model using nonparametric instrumental variables.
_propensity_score: Estimate the propensity score.
_nnpivfit_action_m: Fit the mediated action model using nonparametric instrumental variables.
_npivfit_action: Fit the action model using nonparametric instrumental variables.
_scores_mediated: Calculate the scores for the mediated effects.
_scores_Y1: Calculate the scores for the Y1 estimand.
_process_fold: Process a single fold for cross-validation.
_split_and_estimate: Split the data and estimate the model for each fold.
dml: Perform Debiased Machine Learning for Nonparametric Instrumental Variables.
- class dml_mediated.DML_mediated(Y, D, M, W, Z, X1=None, V=None, v_values=None, loc_kernel='gau', bw_loc='silverman', estimator='MR', estimand='ATE', model1=<nnpiv.rkhs.rkhsiv.ApproxRKHSIVCV object>, nn_1=False, model2=<nnpiv.rkhs.rkhsiv.ApproxRKHSIVCV object>, nn_2=False, modelq1=<nnpiv.rkhs.rkhsiv.ApproxRKHSIVCV object>, nn_q1=False, modelq2=<nnpiv.rkhs.rkhsiv.ApproxRKHSIVCV object>, nn_q2=False, alpha=0.05, n_folds=5, n_rep=1, random_seed=123, prop_score=sklearn.linear_model.LogisticRegression, CHIM=False, verbose=True, fitargs1=None, fitargs2=None, fitargsq1=None, fitargsq2=None, opts=None)[source]
Bases:
objectDebiased Machine Learning for mediation analysis (DML-mediation) class.
- Parameters
Y (array-like) – Outcome variable.
D (array-like) – Treatment variable.
M (array-like) – Mediator variable.
W (array-like) – Negative control outcome.
Z (array-like) – Instrumental variable.
X1 (array-like, optional) – Additional covariates.
V (array-like, optional) – Localization covariates.
v_values (array-like, optional) – Values for localization.
loc_kernel (str, optional) – Kernel for localization. Options are [‘gau’, ‘epa’, ‘uni’].
bw_loc (str, optional) – Bandwidth for localization.
estimator (str, optional) – Estimator type (‘MR’, ‘OR’, ‘hybrid’, ‘IPW’).
estimand (str, optional) – Type of estimand (‘ATE’, ‘Indirect’, ‘Direct’, ‘E[Y1]’, ‘E[Y0]’, ‘E[Y(1,M(0))]’).
model1 (estimator, optional) – Model for the first stage.
nn_1 (bool, optional) – Use neural network for the first stage.
model2 (estimator, optional) – Model for the second stage.
nn_2 (bool, optional) – Use neural network for the second stage.
modelq1 (estimator, optional) – Model for the q1 stage.
nn_q1 (bool, optional) – Use neural network for the q1 stage.
modelq2 (estimator, optional) – Model for the q2 stage.
nn_q2 (bool, optional) – Use neural network for the q2 stage.
alpha (float, optional) – Significance level for confidence intervals.
n_folds (int, optional) – Number of folds for estimation.
n_rep (int, optional) – Number of repetitions for estimation.
random_seed (int, optional) – Seed for random number generator.
prop_score (estimator, optional) – Model for propensity score.
CHIM (bool, optional) – Use CHIM method: Dropping observations with extreme values of the propensity score - CHIM (2009)
verbose (bool, optional) – Print progress information.
fitargs1 (dict, optional) – Arguments for fitting the first stage model.
fitargs2 (dict, optional) – Arguments for fitting the second stage model.
fitargsq1 (dict, optional) – Arguments for fitting the q1 stage model.
fitargsq2 (dict, optional) – Arguments for fitting the q2 stage model.
opts (dict, optional) – Additional options.
- _calculate_confidence_interval(theta, theta_var)[source]
Calculate the confidence interval for the given estimates.
- Parameters
theta (array-like) – Estimated values.
theta_var (array-like) – Variance of the estimates.
- Returns
Lower and upper bounds of the confidence intervals.
- Return type
array-like
- _localization(V, v_val, bw)[source]
Perform localization using kernel density estimation.
- Parameters
V (array-like) – Localization covariates.
v_val (array-like) – Values for localization.
bw (float) – Bandwidth for localization.
- Returns
Weights for localization.
- Return type
array-like
- _nnpivfit_action_m(ps_hat_0, ps_hat_00, D, M, W, X, Z, alfa=0.0)[source]
Fit the mediated action model using nonparametric instrumental variables.
- Parameters
ps_hat_0 (array-like) – Estimated propensity scores for control group.
ps_hat_00 (array-like) – Estimated propensity scores for mediated control group.
D (array-like) – Treatment variable.
M (array-like) – Mediator variable.
W (array-like) – Negative control outcome.
X (array-like) – Covariates.
Z (array-like) – Instrumental variable.
alfa (float, optional) – Threshold alpha for propensity scores.
- Returns
Fitted models for mediated action.
- Return type
- _nnpivfit_outcome_m(Y, D, M, W, X, Z)[source]
Fit the mediated outcome model using nonparametric instrumental variables.
- Parameters
Y (array-like) – Outcome variable.
D (array-like) – Treatment variable.
M (array-like) – Mediator variable.
W (array-like) – Negative control outcome.
X (array-like) – Covariates.
Z (array-like) – Instrumental variable.
- Returns
Fitted models for treatment and control groups.
- Return type
- _npivfit_action(ps_hat_1, W, X, Z, alfa=0.0)[source]
Fit the action model using nonparametric instrumental variables.
- _npivfit_outcome(Y, D, X, Z)[source]
Fit the outcome model using nonparametric instrumental variables.
- Parameters
Y (array-like) – Outcome variable.
D (array-like) – Treatment variable.
X (array-like) – Covariates.
Z (array-like) – Instrumental variable.
- Returns
Fitted model.
- Return type
- _propensity_score(M, X, W, D)[source]
Estimate the propensity score.
- Parameters
M (array-like) – Mediator variable.
X (array-like) – Covariates.
W (array-like) – Negative control outcome.
D (array-like) – Treatment variable.
- Returns
Estimated propensity scores and threshold alpha.
- Return type
- _scores_Y1(train_Y, train_D, train_M, train_W, train_X, train_Z, test_Y, test_D, test_X, test_Z)[source]
Calculate the scores for the Y1 estimand.
- Parameters
train_Y (array-like) – Training outcome variable.
train_D (array-like) – Training treatment variable.
train_M (array-like) – Training mediator variable.
train_W (array-like) – Training negative control outcome.
train_X (array-like) – Training covariates.
train_Z (array-like) – Training instrumental variable.
test_Y (array-like) – Testing outcome variable.
test_D (array-like) – Testing treatment variable.
test_X (array-like) – Testing covariates.
test_Z (array-like) – Testing instrumental variable.
- Returns
Estimated moment functions for the test data.
- Return type
array-like
- _scores_mediated(train_Y, train_D, train_M, train_W, train_X, train_Z, test_Y, test_D, test_M, test_W, test_X, test_Z)[source]
Calculate the scores for the mediated effects.
- Parameters
train_Y (array-like) – Training outcome variable.
train_D (array-like) – Training treatment variable.
train_M (array-like) – Training mediator variable.
train_W (array-like) – Training negative control outcome.
train_X (array-like) – Training covariates.
train_Z (array-like) – Training instrumental variable.
test_Y (array-like) – Testing outcome variable.
test_D (array-like) – Testing treatment variable.
test_M (array-like) – Testing mediator variable.
test_W (array-like) – Testing negative control outcome.
test_X (array-like) – Testing covariates.
test_Z (array-like) – Testing instrumental variable.
- Returns
Estimated moment functions for the test data.
- Return type
array-like
- dml_mediated._fun_threshold_alpha(alpha, g)[source]
Auxiliary function for computation of optimal alpha for improvement in overlap: CHIM (Dealing with limited overlap in estimation of average treatment effects, Crump et al., Biometrika, 2009).
Joint mediation
This module performs Debiased Machine Learning for mediation analysis, using joint estimation for longitudinal nonparametric parameters (in the Nested NPIV framework). It provides tools for estimating causal effects with mediation using a combination of machine learning models and instrumental variables techniques.
- Classes:
DML_joint_mediated: Main class for performing DML for mediation analysis with joint model fitting.
- DML_joint_mediated Methods:
__init__: Initialize the DML_joint_mediated instance with data and model configurations.
_calculate_confidence_interval: Calculate confidence intervals for the estimates.
_localization: Perform localization using kernel density estimation.
_npivfit_outcome: Fit the outcome model using nonparametric instrumental variables.
_propensity_score: Estimate the propensity score.
_npivfit_action: Fit the action model using nonparametric instrumental variables.
_scores_mediated: Calculate the scores for the mediated effects.
_scores_Y1: Calculate the scores for the Y1 estimand.
_process_fold: Process a single fold for cross-validation.
_split_and_estimate: Split the data and estimate the model for each fold.
dml: Perform Debiased Machine Learning for Nonparametric Instrumental Variables.
- class dml_joint_mediated.DML_joint_mediated(Y, D, M, W, Z, X1=None, V=None, v_values=None, loc_kernel='gau', bw_loc='silverman', estimator='MR', estimand='ATE', model1=<nnpiv.rkhs.rkhs2iv.RKHS2IVCV object>, nn_1=False, modelq1=<nnpiv.rkhs.rkhs2iv.RKHS2IVCV object>, nn_q1=False, alpha=0.05, n_folds=5, n_rep=1, random_seed=123, prop_score=sklearn.linear_model.LogisticRegression, CHIM=False, verbose=True, fitargs1=None, fitargsq1=None, opts=None)[source]
Bases:
objectDebiased Machine Learning for mediation analysis (DML-mediation) class with joint model fitting.
- Parameters
Y (array-like) – Outcome variable.
D (array-like) – Treatment variable.
M (array-like) – Mediator variable.
W (array-like) – Negative control outcome.
Z (array-like) – Instrumental variable.
X1 (array-like, optional) – Additional covariates.
V (array-like, optional) – Localization covariates.
v_values (array-like, optional) – Values for localization.
loc_kernel (str, optional) – Kernel for localization. Options are [‘gau’, ‘epa’, ‘uni’].
bw_loc (str, optional) – Bandwidth for localization.
estimator (str, optional) – Estimator type (‘MR’, ‘OR’, ‘hybrid’, ‘IPW’).
estimand (str, optional) – Type of estimand (‘ATE’, ‘Indirect’, ‘Direct’, ‘E[Y1]’, ‘E[Y0]’, ‘E[Y(1,M(0))]’).
model1 (estimator, optional) – Model for the first stage.
nn_1 (bool, optional) – Use neural network for the first stage.
modelq1 (estimator, optional) – Model for the q1 stage.
nn_q1 (bool, optional) – Use neural network for the q1 stage.
alpha (float, optional) – Significance level for confidence intervals.
n_folds (int, optional) – Number of folds for estimation.
n_rep (int, optional) – Number of repetitions for estimation.
random_seed (int, optional) – Seed for random number generator.
prop_score (estimator, optional) – Model for propensity score.
CHIM (bool, optional) – Use CHIM method: Dropping observations with extreme values of the propensity score - CHIM (2009)
verbose (bool, optional) – Print progress information.
fitargs1 (dict, optional) – Arguments for fitting the first stage model.
fitargsq1 (dict, optional) – Arguments for fitting the q1 stage model.
opts (dict, optional) – Additional options.
- _calculate_confidence_interval(theta, theta_var)[source]
Calculate the confidence interval for the given estimates.
- Parameters
theta (array-like) – Estimated values.
theta_var (array-like) – Variance of the estimates.
- Returns
Lower and upper bounds of the confidence intervals.
- Return type
array-like
- _localization(V, v_val, bw)[source]
Perform localization using kernel density estimation.
- Parameters
V (array-like) – Localization covariates.
v_val (array-like) – Values for localization.
bw (float) – Bandwidth for localization.
- Returns
Weights for localization.
- Return type
array-like
- _npivfit_action(ps_hat_1, W, X, Z, alfa=0.0)[source]
Fit the action model using nonparametric instrumental variables.
- _npivfit_outcome(Y, D, X, Z)[source]
Fit the outcome model using nonparametric instrumental variables.
- Parameters
Y (array-like) – Outcome variable.
D (array-like) – Treatment variable.
X (array-like) – Covariates.
Z (array-like) – Instrumental variable.
- Returns
Fitted model.
- Return type
- _propensity_score(M, X, W, D)[source]
Estimate the propensity score.
- Parameters
M (array-like) – Mediator variable.
X (array-like) – Covariates.
W (array-like) – Negative control outcome.
D (array-like) – Treatment variable.
- Returns
Estimated propensity scores and threshold alpha.
- Return type
- _scores_Y1(train_Y, train_D, train_M, train_W, train_X, train_Z, test_Y, test_D, test_X, test_Z)[source]
Calculate the scores for the Y1 estimand.
- Parameters
train_Y (array-like) – Training outcome variable.
train_D (array-like) – Training treatment variable.
train_M (array-like) – Training mediator variable.
train_W (array-like) – Training negative control outcome.
train_X (array-like) – Training covariates.
train_Z (array-like) – Training instrumental variable.
test_Y (array-like) – Testing outcome variable.
test_D (array-like) – Testing treatment variable.
test_X (array-like) – Testing covariates.
test_Z (array-like) – Testing instrumental variable.
- Returns
Estimated moment functions for the test data.
- Return type
array-like
- _scores_mediated(train_Y, train_D, train_M, train_W, train_X, train_Z, test_Y, test_D, test_M, test_W, test_X, test_Z)[source]
Calculate the scores for the mediated effects.
- Parameters
train_Y (array-like) – Training outcome variable.
train_D (array-like) – Training treatment variable.
train_M (array-like) – Training mediator variable.
train_W (array-like) – Training negative control outcome.
train_X (array-like) – Training covariates.
train_Z (array-like) – Training instrumental variable.
test_Y (array-like) – Testing outcome variable.
test_D (array-like) – Testing treatment variable.
test_M (array-like) – Testing mediator variable.
test_W (array-like) – Testing negative control outcome.
test_X (array-like) – Testing covariates.
test_Z (array-like) – Testing instrumental variable.
- Returns
Estimated moment functions for the test data.
- Return type
array-like
- dml_joint_mediated._fun_threshold_alpha(alpha, g)[source]
Auxiliary function for computation of optimal alpha for improvement in overlap: CHIM (Dealing with limited overlap in estimation of average treatment effects, Crump et al., Biometrika, 2009).
Longterm Analysis
Sequential longterm
This module implements the Debiased Machine Learning for long-term causal analysis (DML-longterm) class. The estimand can be either for a model with a surrogacy assumption (Athey et al., 2020b. [Estimating treatment effects using multiple surrogates: the role of the surrogate score and the surrogate index](https://arxiv.org/abs/1603.09326)) or with a latent unconfounded model (Athey et al., 2020a. [Combining experimental and observational data to estimate treatment effects on long-term outcomes](https://arxiv.org/abs/2006.09676)). The semiparametric efficiency is derived in Chen and Ritzwoller (2023. [Semiparametric estimation of long-term treatment effects](https://doi.org/10.1016/j.jeconom.2023.105545)).
- class dml_longterm.DML_longterm(Y, D, S, G, X1=None, V=None, v_values=None, loc_kernel='gau', bw_loc='silverman', estimator='MR', longterm_model='surrogacy', model1=<nnpiv.rkhs.rkhsiv.ApproxRKHSIVCV object>, nn_1=False, model2=<nnpiv.rkhs.rkhsiv.ApproxRKHSIVCV object>, nn_2=False, alpha=0.05, n_folds=5, n_rep=1, random_seed=123, prop_score=sklearn.linear_model.LogisticRegression, CHIM=False, verbose=True, fitargs1=None, fitargs2=None, opts=None)[source]
Bases:
objectDebiased Machine Learning for long-term causal analysis (DML-longterm) class.
The estimand can be either for a model with a surrogacy assumption (Athey, S., Chetty, R., Imbens, G., Kang, H., 2020b. Estimating treatment effects using multiple surrogates: the role of the surrogate score and the surrogate index. arXiv preprint arXiv:1603.09326) or with a latent unconfounded model (Athey, S.; Chetty, R.; Imbens, G., Combining experimental and observational data to estimate treatment effects on long-term outcomes. arXiv preprint arXiv:2006.09676 (2020)). The semiparametric efficiency is derived in Jiafeng Chen, David M. Ritzwoller, Semiparametric estimation of long-term treatment effects, Journal of Econometrics, Volume 237, Issue 2, Part A, 2023.
- Parameters
Y (array-like) – Long-term outcome variable.
D (array-like) – Treatment variable.
S (array-like) – Surrogate outcome variable.
G (array-like) – Group indicator (0 for experimental, 1 for observational).
X1 (array-like, optional) – Additional covariates.
V (array-like, optional) – Localization covariates.
v_values (array-like, optional) – Values for localization.
loc_kernel (str, optional) – Kernel for localization. Options are [‘gau’, ‘epa’, ‘uni’].
bw_loc (str, optional) – Bandwidth for localization.
estimator (str, optional) – Estimator type (‘MR’, ‘OR’, ‘hybrid’, ‘IPW’).
longterm_model (str, optional) – Long-term model type (‘latent_unconfounded’, ‘surrogacy’).
model1 (estimator, optional) – Model for the first stage.
nn_1 (bool, optional) – Use neural network for the first stage.
model2 (estimator, optional) – Model for the second stage.
nn_2 (bool, optional) – Use neural network for the second stage.
alpha (float, optional) – Significance level for confidence intervals.
n_folds (int, optional) – Number of folds for estimation.
n_rep (int, optional) – Number of repetitions for estimation.
random_seed (int, optional) – Seed for random number generator.
prop_score (estimator, optional) – Model for propensity score.
CHIM (bool, optional) – Use CHIM method.
verbose (bool, optional) – Print progress information.
fitargs1 (dict, optional) – Arguments for fitting the first stage model.
fitargs2 (dict, optional) – Arguments for fitting the second stage model.
opts (dict, optional) – Additional options.
- _calculate_confidence_interval(theta, theta_var)[source]
Calculate the confidence interval for the given estimates.
- Parameters
theta (array-like) – Estimated values.
theta_var (array-like) – Variance of the estimates.
- Returns
Lower and upper bounds of the confidence intervals.
- Return type
array-like
- _localization(V, v_val, bw)[source]
Perform localization using kernel density estimation.
- Parameters
V (array-like) – Localization covariates.
v_val (array-like) – Values for localization.
bw (float) – Bandwidth for localization.
- Returns
Weights for localization.
- Return type
array-like
- _nnpivfit_outcome_latent(Y, D, S, X, G)[source]
Fit the outcome model using the latent unconfounded framework.
This method is based on the model proposed in Athey, S.; Chetty, R.; Imbens, G., Combining experimental and observational data to estimate treatment effects on long-term outcomes. arXiv preprint arXiv:2006.09676 (2020).
- Parameters
Y (array-like) – Outcome variable.
D (array-like) – Treatment variable.
S (array-like) – Surrogate variable.
X (array-like) – Covariates.
G (array-like) – Group indicator.
- Returns
Fitted models for treatment and control groups.
- Return type
- _nnpivfit_outcome_surrogacy(Y, D, S, X, G)[source]
Fit the outcome model using the surrogacy framework.
This method is based on the model proposed in Athey, S., Chetty, R., Imbens, G., Kang, H., 2020b. Estimating treatment effects using multiple surrogates: the role of the surrogate score and the surrogate index. arXiv preprint arXiv:1603.09326.
- Parameters
Y (array-like) – Outcome variable.
D (array-like) – Treatment variable.
S (array-like) – Surrogate variable.
X (array-like) – Covariates.
G (array-like) – Group indicator.
- Returns
Fitted models for the outcome.
- Return type
- _process_fold(fold_idx, train_data, test_data)[source]
Process each fold in the K-fold cross-validation.
- _propensity_score_latent(S_train, X_train, D_train, G_train, S_test, X_test)[source]
Estimate the propensity scores using the latent unconfounded framework.
This method is based on the model proposed in Athey, S.; Chetty, R.; Imbens, G., Combining experimental and observational data to estimate treatment effects on long-term outcomes. arXiv preprint arXiv:2006.09676 (2020).
- Parameters
S_train (array-like) – Training surrogate variable.
X_train (array-like) – Training covariates.
D_train (array-like) – Training treatment variable.
G_train (array-like) – Training group indicator.
S_test (array-like) – Testing surrogate variable.
X_test (array-like) – Testing covariates.
- Returns
Estimated propensity scores and threshold alpha.
- Return type
- _propensity_score_surrogacy(S_train, X_train, D_train, G_train, S_test, X_test)[source]
Estimate the propensity scores using the surrogacy framework.
This method is based on the model proposed in Athey, S., Chetty, R., Imbens, G., Kang, H., 2020b. Estimating treatment effects using multiple surrogates: the role of the surrogate score and the surrogate index. arXiv preprint arXiv:1603.09326.
- Parameters
S_train (array-like) – Training surrogate variable.
X_train (array-like) – Training covariates.
D_train (array-like) – Training treatment variable.
G_train (array-like) – Training group indicator.
S_test (array-like) – Testing surrogate variable.
X_test (array-like) – Testing covariates.
- Returns
Estimated propensity scores and threshold alpha.
- Return type
- dml_longterm._fun_threshold_alpha(alpha, g)[source]
Auxiliary function for computation of optimal alpha for improvement in overlap: CHIM (Dealing with limited overlap in estimation of average treatment effects, Crump et al., Biometrika, 2009).
Joint longterm
Debiased Machine Learning for long-term causal analysis with a joint estimator (DML-joint-longterm) class. The estimand can be either for a model with a surrogacy assumption (Athey et al., 2020b. [Estimating treatment effects using multiple surrogates: the role of the surrogate score and the surrogate index](https://arxiv.org/abs/1603.09326)) or with a latent unconfounded model (Athey et al., 2020a. [Combining experimental and observational data to estimate treatment effects on long-term outcomes](https://arxiv.org/abs/2006.09676)). The semiparametric efficiency is derived in Chen and Ritzwoller (2023. [Semiparametric estimation of long-term treatment effects](https://doi.org/10.1016/j.jeconom.2023.105545)).
- class dml_joint_longterm.DML_joint_longterm(Y, D, S, G, X1=None, V=None, v_values=None, loc_kernel='gau', bw_loc='silverman', estimator='MR', longterm_model='surrogacy', model1=<nnpiv.rkhs.rkhs2iv.RKHS2IVCV object>, nn_1=False, model2=<nnpiv.rkhs.rkhs2iv.RKHS2IVCV object>, nn_2=False, alpha=0.05, n_folds=5, n_rep=1, random_seed=123, prop_score=sklearn.linear_model.LogisticRegression, CHIM=False, verbose=True, fitargs1=None, fitargs2=None, opts=None)[source]
Bases:
objectDebiased Machine Learning for long-term causal analysis (DML-longterm) class with joint model fitting.
- Parameters
Y (array-like) – Outcome variable.
D (array-like) – Treatment variable.
S (array-like) – Surrogate variable.
G (array-like) – Group variable.
X1 (array-like, optional) – Additional covariates.
V (array-like, optional) – Localization covariates.
v_values (array-like, optional) – Values for localization.
loc_kernel (str, optional) – Kernel for localization. Options are [‘gau’, ‘epa’, ‘uni’].
bw_loc (str, optional) – Bandwidth for localization.
estimator (str, optional) – Estimator type (‘MR’, ‘OR’, ‘hybrid’, ‘IPW’).
longterm_model (str, optional) – Model type for long-term analysis (‘surrogacy’, ‘latent_unconfounded’).
model1 (estimator, optional) – Model for the first stage.
nn_1 (bool, optional) – Use neural network for the first stage.
model2 (estimator, optional) – Model for the second stage.
nn_2 (bool, optional) – Use neural network for the second stage.
alpha (float, optional) – Significance level for confidence intervals.
n_folds (int, optional) – Number of folds for estimation.
n_rep (int, optional) – Number of repetitions for estimation.
random_seed (int, optional) – Seed for random number generator.
prop_score (estimator, optional) – Model for propensity score.
CHIM (bool, optional) – Use CHIM method for dealing with limited overlap.
verbose (bool, optional) – Print progress information.
fitargs1 (dict, optional) – Arguments for fitting the first stage model.
fitargs2 (dict, optional) – Arguments for fitting the second stage model.
opts (dict, optional) – Additional options.
- _calculate_confidence_interval(theta, theta_var)[source]
Calculate the confidence interval for the given estimates.
- Parameters
theta (array-like) – Estimated values.
theta_var (array-like) – Variance of the estimates.
- Returns
Lower and upper bounds of the confidence intervals.
- Return type
array-like
- _localization(V, v_val, bw)[source]
Perform localization using kernel density estimation.
- Parameters
V (array-like) – Localization covariates.
v_val (array-like) – Values for localization.
bw (float) – Bandwidth for localization.
- Returns
Weights for localization.
- Return type
array-like
- _nnpivfit_outcome_latent(train_Y, train_D, train_S, train_X, train_G, test_X, test_S)[source]
Fit the outcome model using nonparametric instrumental variables for the latent unconfounded model.
- Parameters
train_Y (array-like) – Training outcome variable.
train_D (array-like) – Training treatment variable.
train_S (array-like) – Training surrogate variable.
train_X (array-like) – Training covariates.
train_G (array-like) – Training group variable.
test_X (array-like) – Testing covariates.
test_S (array-like) – Testing surrogate variable.
- Returns
Estimated values for delta_d1_hat, delta_d0_hat, nu_1_hat, nu_0_hat.
- Return type
- _nnpivfit_outcome_surrogacy(train_Y, train_D, train_S, train_X, train_G, test_X, test_S)[source]
Fit the outcome model using nonparametric instrumental variables for the surrogacy model.
- Parameters
train_Y (array-like) – Training outcome variable.
train_D (array-like) – Training treatment variable.
train_S (array-like) – Training surrogate variable.
train_X (array-like) – Training covariates.
train_G (array-like) – Training group variable.
test_X (array-like) – Testing covariates.
test_S (array-like) – Testing surrogate variable.
- Returns
Estimated values for delta_d1_hat, delta_d0_hat, nu_1_hat, nu_0_hat.
- Return type
- _propensity_score_latent(S_train, X_train, D_train, G_train, S_test, X_test)[source]
Estimate the propensity score for the latent unconfounded model.
- Parameters
S_train (array-like) – Training surrogate variable.
X_train (array-like) – Training covariates.
D_train (array-like) – Training treatment variable.
G_train (array-like) – Training group variable.
S_test (array-like) – Testing surrogate variable.
X_test (array-like) – Testing covariates.
- Returns
Estimated propensity scores and threshold alpha.
- Return type
- _propensity_score_surrogacy(S_train, X_train, D_train, G_train, S_test, X_test)[source]
Estimate the propensity score for the surrogacy model.
- Parameters
S_train (array-like) – Training surrogate variable.
X_train (array-like) – Training covariates.
D_train (array-like) – Training treatment variable.
G_train (array-like) – Training group variable.
S_test (array-like) – Testing surrogate variable.
X_test (array-like) – Testing covariates.
- Returns
Estimated propensity scores and threshold alpha.
- Return type
- dml_joint_longterm._fun_threshold_alpha(alpha, g)[source]
Auxiliary function for computation of optimal alpha for improvement in overlap: CHIM (Dealing with limited overlap in estimation of average treatment effects, Crump et al., Biometrika, 2009).