Reconstruction-error curve for basis-rank selection
Source:R/rank_selection.R
reconstruction_error_curve.RdFor each candidate number of bases K, run dfspa() followed by
project_to_simplex() and report the average projection residual
$$\mathcal E(K) = \frac{1}{m}\sum_{i=1}^m
\left\|\hat f^{(i)} - \sum_{k=1}^K \hat\pi_{ik}\hat g_k\right\|_{L^2(\mu)}.$$
Plotting error against K typically shows an elbow.
Usage
reconstruction_error_curve(
F_hat,
K_range = NULL,
grid_weights = NULL,
dfspa_args = list()
)Arguments
- F_hat
An
m-by-G_gridnumeric matrix of study-level function evaluations on the shared grid.- K_range
Integer vector of candidate
Kvalues. Defaults to2:min(nrow(F_hat) - 1, 10).- grid_weights
Optional length-
G_gridnon-negative numeric vector used for the \(L^2(\mu)\) norm; defaults to uniform.- dfspa_args
Named list of extra arguments passed to
dfspa(), e.g.list(denoise = FALSE, N = 2).
Value
A data frame with columns K (integer) and error (numeric).
Rows where dfspa() or the projection fails are reported with
error = NA and a single warning summarising the failures.
Details
This is the unsupervised rank-selection criterion of Section 3.2 of the paper (Equation for \(\mathcal E(K)\)). It does not require study-level covariates.
Examples
set.seed(1)
G <- 40
x <- seq(0, 1, length.out = G)
basis <- rbind(sin(pi * x), cos(pi * x), x)
m <- 50
pi_mat <- matrix(stats::rgamma(m * 3, shape = 0.5), m, 3)
pi_mat <- pi_mat / rowSums(pi_mat)
F_hat <- pi_mat %*% basis + matrix(stats::rnorm(m * G, sd = 0.02), m, G)
elbow <- reconstruction_error_curve(F_hat, K_range = 2:6)
elbow
#> K error
#> 1 2 0.10309809
#> 2 3 0.05037484
#> 3 4 0.04141392
#> 4 5 0.03386482
#> 5 6 0.02471533