Recovers a set of K latent basis functions from a collection of
study-level function estimates under the low-rank cross-study heterogeneity
assumption of Shi, Imai, and Zhang. Implements Algorithm 1 of the paper
("The d-fSPA Algorithm for basis hunting").
Arguments
- F_hat
An
m-by-Gnumeric matrix where rowiis the evaluation of the estimated function \(\hat f^{(i)}\) atGgrid points.- K
Integer number of basis functions to recover. Must satisfy
1 <= K <= mafter denoising.- grid_weights
Optional length-
Gnon-negative numeric vector of grid weights defining the \(L^2(\mu)\) inner product. Defaults to uniform weights1 / G.- N, Delta
Optional numeric tuning parameters controlling denoising. See Details.
- denoise
Logical; if
FALSE, the denoising step is skipped and plain fSPA is run. Defaults toTRUE.
Value
An object of class "dfspa": a list containing
basesA
K-by-Gmatrix whose rows are the recovered basis functions evaluated on the grid (denoised, if applicable).selectedLength-
Kinteger vector of the selected row indices into the post-denoising function matrix.original_indicesLength-
Kinteger vector of the selected study indices in the original inputF_hat(before any rows were dropped by denoising).keptInteger vector of row indices of
F_hatthat survived denoising.F_denoisedThe post-denoising function matrix (
length(kept)-by-G).grid_weightsGrid weights used.
N,DeltaTuning parameters actually used (or
NAwhendenoise = FALSE).KNumber of bases requested.
callThe matched call.
Details
Each study-level function is represented by its evaluations on a shared
grid of G points. The (weighted) \(L^2(\mu)\) inner product is
\(\langle f,g\rangle = \sum_{j=1}^G w_j f(x_j) g(x_j)\), where the
grid_weights w_j are proportional to the measure \(\mu\). If not
supplied, uniform weights 1 / G are used.
Denoising follows Jin (2024): for each study i, let
\(B_\Delta(\hat f^{(i)}) = \{j : \|\hat f^{(j)} - \hat f^{(i)}\| \le
\Delta\}\). If \(|B_\Delta(\hat f^{(i)})| < N\), study i is discarded;
otherwise \(\hat f^{(i)}\) is replaced by the average of the functions
in \(B_\Delta\).
After denoising, the functional SPA step iteratively selects, at each of
the K iterations, the remaining function with the largest norm after
projecting out the span of previously selected bases.
Default tuning parameters follow the heuristics of the paper:
N = 0.5 * log(m) and
\(\Delta = \max_{ij} \|\hat f^{(i)} - \hat f^{(j)}\| / 10\).
Examples
set.seed(1)
G <- 50
x <- seq(0, 1, length.out = G)
basis <- rbind(sin(pi * x), cos(pi * x), x) # 3 true bases
pi_mat <- rbind(diag(3), # 3 pure studies
c(0.5, 0.3, 0.2),
c(0.2, 0.5, 0.3),
c(0.3, 0.3, 0.4))
F_hat <- pi_mat %*% basis # m = 6, G = 50
fit <- dfspa(F_hat, K = 3, denoise = FALSE)
fit$original_indices # should be a permutation of 1, 2, 3
#> [1] 2 1 3