Regularized Kernel Hilbert Space
In this section we assume that the function classes whenever \(\mathcal{G}\), \(\mathcal{H}\), \(\mathcal{F}\), \(\mathcal{F}^\prime\) are RKHS. Let \(\Phi_A:\mathcal{G}\rightarrow\mathbb{R}^n\) be an operator with \(i\) th row \(\langle \phi(A_i), \cdot \rangle_{\mathcal{G}}\) with corresponding kernel matrix \(K_A\). Define analogously \(\Phi_B, \ldots\) for the rest of the function classes.
Closed form - Estimator 1
We study the estimator
Formula of minimizers
The minimizer takes the form \(\hat{g} = \Phi_A^* \hat{\alpha}\) where,
|
RKHS IV estimator. |
|
RKHS IV estimator with cross-validation. |
Remark (Nystrom approximation) A low-rank approximation using Nystrom method is also implemented.
|
Approximate RKHS IV estimator using kernel approximations. |
|
Approximate RKHS IV estimator with cross-validation using kernel approximations. |
Closed form - Estimator 2
We study the estimator
Formula of minimizers
The minimizer takes the form \(\hat{g} = \Phi_A^* \hat{\alpha}\) where,
|
RKHS IV estimator with L2 regularization. |
|
RKHS IV estimator with L2 regularization and cross-validation. |
Closed form - Estimator 3
We study the ridge regularized joint estimator:
Let \(V_{g,h}' = g(A) - Y\) and \(V_{g,h} = h(B) - g(A)\). Let \(\Phi_C : \mathcal{F} \rightarrow \mathbb{R}^n\) be an operator with \(i\) th row \(\langle \phi(C_i), \cdot \rangle_{\mathcal{F}}\). Define \(\Phi_{C'}\) analogously, replacing \(C_i\) with \(C_i'\). Let \(K_C\) and \(K_{C'}\) be the corresponding kernel matrices.
In remarks below, we also study the following modification, which we call the “subsetted” estimator:
where \([p]\) and \([q]\) partition \([n] = (1, \ldots, n)\), so \(p + q = n\).
For the index set \([p]\), let \(I_{[p]} \in \mathbb{R}^{p \times n}\) be the matrix of ones and zeros such that \(V_{[p]} = I_{[p]} V\) gives the elements of \(V\) whose indices are in \([p]\).
Maximizers
Existence of maximizers
There exist coefficients \(\hat{\gamma}_{g,h}, \hat{\gamma}'_{g,h} \in \mathbb{R}^n\) such that maximizers take the form \(\hat{f}_{g,h} = \Phi_C^* \hat{\gamma}_{g,h}\) and \(\hat{f}'_{g,h} = \Phi_{C'}^* \hat{\gamma}'_{g,h}\).
Remark (Subsetted estimator)
For the subsetted estimator, the same results hold but with \(\hat{\gamma}_{g,h;[q]} \in \mathbb{R}^q\) and \(\hat{\gamma}'_{g,h;[p]} \in \mathbb{R}^p\), acting on appropriately modified feature operators \(\Phi^*_{C;[q]}\) and \(\Phi^*_{C';[p]}\).
Proof
Write the objectives for the maximizers as
We prove the former result; the latter is similar. By the Riesz representation theorem,
For an RKHS, evaluation is a continuous functional represented as the inner product with the feature map. Due to the ridge penalty, the stated objective has a maximizer \(\hat{f}_{g,h}\) that obtains the maximum.
To lighten notation, we suppress the indexing of \(\hat{f}_{g,h}\) by \((g,h)\) for the rest of this argument. Write \(\hat{f} = \hat{f}_n + \hat{f}^{\perp}_n\) where \(\hat{f}_n \in \text{row}(\Phi_C)\) and \(\hat{f}_n^{\perp} \in \text{null}(\Phi_C)\). Substituting this decomposition of \(\hat{f}\) into the objective, we see that
Hence if \(\hat{f}\) is a maximizer, then there exists \(\hat{f}_n\) that is also a maximizer.
Formula of maximizers
The explicit formula for the coefficients is \(\hat{\gamma}_{g,h} = K_C^{\dagger} \vec{V}_{g,h}\) and \(\hat{\gamma}'_{g,h} = K_{C'}^{\dagger} \vec{V}'_{g,h}\).
Remark (Subsetted estimator)
For the subsetted estimator, the same results hold but with \(\hat{\gamma}_{g,h;[q]} = K_{C;[q,q]}^{\dagger} \vec{V}_{g,h;[q]}\) and \(\hat{\gamma}'_{g,h;[p]} = K_{C';[p,p]}^{\dagger} \vec{V}'_{g,h;[p]}\).
Proof
We prove the former result; the latter is similar. Write the objective as
where \(\hat{\mu}_{g,h} = \mathbb{E}_n \{ V_{g,h} \phi(C) \} = \frac{1}{n} \Phi_C^* \vec{V}_{g,h}\) and \(\hat{T}_C = \mathbb{E}_n \{ \phi(C) \otimes \phi(C)^* \} = \frac{1}{n} \Phi_C^* \Phi_C\). Hence by the existence of maximizers,
Since \(K_C = \Phi_C \Phi_C^*\), the first order condition yields \(K_C \vec{V}_{g,h} = K_C^2 \hat{\gamma}_{g,h}\), i.e. \(\hat{\gamma}_{g,h} = K_C^{\dagger} \vec{V}_{g,h}\) where \(K_C^{\dagger}\) is the pseudoinverse of \(K_C\).
Minimizers
Let \(\Phi_A : \mathcal{H} \rightarrow \mathbb{R}^n\) be an operator with \(i\)langle phi(A_i), cdot rangle_{mathcal{H}}`. Define \(\Phi_B\) analogously, replacing \(A_i\) with \(B_i\). Let \(K_A\) and \(K_B\) be the corresponding kernel matrices.
Existence of minimizers
There exist coefficients \(\alpha, \beta \in \mathbb{R}^n\) such that minimizers take the form \(\hat{g} = \Phi_A^* \hat{\alpha}\) and \(\hat{h} = \Phi_B^* \hat{\beta}\).
Remark (Subsetted estimator)
The result remains true for the subsetted estimator.
Proof
To begin, write the objective \(\mathcal{E}(g,h)\) as
By the existence and formula of maximizers,
Hence \((g,h)\) only appear via \(V'_{g,h} = g(A) - Y\), \(V_{g,h} = h(B) - g(A)\), and directly as \(g(A)\) and \(h(B)\). In all of these expressions, they can be further expressed as \(g(A) = \langle g, \phi(A) \rangle_{\mathcal{G}}\) and \(h(B) = \langle h, \phi(B) \rangle_{\mathcal{H}}\), which is a linear functional. The overall objective is quadratic in such terms, so the stated objective has maximizers \((\hat{g}, \hat{h})\) that obtain the maximum.
By a similar argument to the existence of maximizers, for any \((\hat{g}, \hat{h})\) attaining the maximum, \(\mathcal{E}(\hat{g}, \hat{h}) = \mathcal{E}(\hat{g}_n, \hat{h}_n)\) where \(\hat{g}_n \in \text{row}(\Phi_A)\) and \(\hat{h}_n \in \text{row}(\Phi_B)\).
Properties of pseudo-inverse
For any square symmetric matrix \(K \in \mathbb{R}^{n \times n}\), its eigendecomposition is \(K = U \Sigma U^{\top}\) where \(\Sigma \in \mathbb{R}^{r \times r}\) and \(r \leq n\). Its pseudo-inverse is \(K^- = U \Sigma^{\dagger} U^{\top}\). Moreover, \(K^{\dagger} K = KK^{\dagger} = UU^{\top}\), which is a projection.
To lighten notation, let \(K_C^{\dagger} K_C = P_C\).
Formula of minimizers
The explicit formula for the coefficients is
|
Nested RKHS IV estimator with L2 regularization. |
|
Nested RKHS IV estimator with L2 regularization and cross-validation. |
Proof
We proceed in steps.
Write the objective \(\mathcal{E}(g,h)\) as
where
and the covariance operators are defined analogously to the formula of maximizers. Hence,
Let \(Y, G, H \in \mathbb{R}^n\) be defined with \(G_i = g(A_i)\) and \(H_i = h(B_i)\). In this notation,
Combining with \(G = \Phi_A g = K_A \alpha\) and \(H = \Phi_B h = K_B \beta\) from the existence of minimizers,
The first order conditions yield
Substituting the latter into the former,
and solving for \(\hat{\beta}\),
Remark (Subsetted estimator)
Formula of minimizers (Subsetted estimator)
The explicit formula for the coefficients is
where \(\tilde{P}_{C'} = \frac{n}{p} I_{[p]}^{\top} P_{C';[p,p]} I_{[p]}\) and \(\tilde{P}_C = \frac{n}{q} I_{[q]}^{\top} P_{C;[q,q]} I_{[q]}\). Note that \(P_{C';[p,p]} = (K_{C';[p,p]})^- K_{C';[p,p]}\) and \(K_{C';[p,p]} = I_{[p]} K_{C'} I_{[p]}^{\top}\).
Proof
We proceed in steps.
Write the objective \(\mathcal{E}(g,h)\) as
where
and the covariance operators are defined analogously to the subsetted estimator. Hence,
Let \(Y, G, H \in \mathbb{R}^n\) be defined with \(G_i = g(A_i)\) and \(H_i = h(B_i)\) as before. Now, let \(\tilde{P}_{C'} = \frac{n}{p} I_{[p]}^{\top} P_{C';[p,p]} I_{[p]}\) and \(\tilde{P}_C = \frac{n}{q} I_{[q]}^{\top} P_{C;[q,q]} I_{[q]}\). Then
Hereafter we use the same argument as in the formula of minimizers.
Closed form - Estimator 3 (RKHS norm)
We study the RKHS-norm regularized joint estimator:
Formula of minimizers
The minimizer takes the form \(\hat{g} = \Phi_A^*\hat\alpha\), \(\hat{h} = \Phi_B^*\hat\beta\) where,
and
|
Nested RKHS IV estimator. |
|
Nested RKHS IV estimator with cross-validation. |
Remark: Subsetted estimator
Formula of minimizers (Subsetted estimator)
The subsetted estimator satisfies:
with \(\tilde{P}_{C'}=\frac{n}{p}I_{[p]}^{\top}P_{C';[p,p]}I_{[p]}\) and \(\tilde{P}_{C}=\frac{n}{q}I_{[q]}^{\top}P_{C;[q,q]}I_{[q]}\). And