Choose d-fSPA denoising parameters by cross-validation
Source:R/select_denoising_params.R
select_denoising_params.RdAt a fixed K, performs k-fold cross-validation over a grid of denoising
parameter pairs (N, Delta) for dfspa(). For each candidate pair and
each fold, the full MetaHunt pipeline is fit on the out-of-fold studies
and predicts the held-out studies' functions. The pair with the lowest
average prediction error is selected.
Arguments
- F_hat
An
m-by-G_gridnumeric matrix of study-level function evaluations.- W
An
m-by-pmatrix or data frame of study-level covariates.- K
Integer number of basis functions (fixed during this search).
- N_grid
Optional numeric vector of candidate
Nvalues. Defaults toc(0.2, 0.5, 1.0) * log(nrow(F_hat)).- Delta_grid
Optional numeric vector of candidate
Deltavalues. Defaults toc(0.05, 0.10, 0.20, 0.30)times the maximum pairwise \(L^2(\mu)\) distance among studies.- n_folds
Integer number of folds (default
5).- grid_weights
Optional length-
G_gridnon-negative numeric vector.- dfspa_args
Named list of additional arguments for
dfspa()(e.g.list(); do not includeN,Delta, ordenoisehere, they are set by the search).- weight_model_args
Named list of additional arguments for
fit_weight_model().- seed
Optional integer seed for reproducible fold assignment.
Value
An object of class metahunt_denoising_search: a list with
gridA data frame with one row per
(N, Delta)pair, columnsN,Delta,cv_error,cv_se,n_folds_ok.bestA list with the
(N, Delta)minimisingcv_error.K,n_folds,grid_weightsInputs echoed back for traceability.
Details
This is the cross-validated tuning of the denoising parameters discussed
in Section 3.1 of the paper. Joint tuning over (K, N, Delta) is not
supported because it scales poorly; if you also want to choose K, do
it first via cv_error_curve() and then call this function at the
selected K.
Default candidate grids:
N_grid = m * c(NA, NA, NA)resolved at runtime toc(0.2, 0.5, 1.0) * log(m).Delta_grid = max_pairwise_dist * c(0.05, 0.10, 0.20, 0.30).
Pass your own N_grid / Delta_grid (in original units) to override.
See also
cv_error_curve() for selecting K, dfspa() for the
underlying basis-hunting algorithm.
Examples
# \donttest{
set.seed(1)
G <- 30; m <- 80
x <- seq(0, 1, length.out = G)
basis <- rbind(sin(pi * x), cos(pi * x), x)
W <- data.frame(w1 = rnorm(m), w2 = rnorm(m))
eta <- as.matrix(W) %*% cbind(c(1, -0.5), c(-0.4, 1), c(0, 0))
pi_true <- exp(eta) / rowSums(exp(eta))
F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G)
tune <- select_denoising_params(F_hat, W, K = 3, n_folds = 4, seed = 1)
#> Warning: 8 (N, Delta, fold) combinations failed:
#> (N=2.19101, Delta=0.0479349, fold 1): Only 0 studies survive denoising but K = 3; consider increasing `Delta`, decreasing `N`, or reducing `K`.
#> (N=2.19101, Delta=0.0479349, fold 2): Only 0 studies survive denoising but K = 3; consider increasing `Delta`, decreasing `N`, or reducing `K`.
#> (N=2.19101, Delta=0.0479349, fold 3): Only 0 studies survive denoising but K = 3; consider increasing `Delta`, decreasing `N`, or reducing `K`.
#> (N=2.19101, Delta=0.0479349, fold 4): Only 0 studies survive denoising but K = 3; consider increasing `Delta`, decreasing `N`, or reducing `K`.
#> (N=4.38203, Delta=0.0479349, fold 1): Only 0 studies survive denoising but K = 3; consider increasing `Delta`, decreasing `N`, or reducing `K`.
#> (N=4.38203, Delta=0.0479349, fold 2): Only 0 studies survive denoising but K = 3; consider increasing `Delta`, decreasing `N`, or reducing `K`.
#> (N=4.38203, Delta=0.0479349, fold 3): Only 0 studies survive denoising but K = 3; consider increasing `Delta`, decreasing `N`, or reducing `K`.
#> (N=4.38203, Delta=0.0479349, fold 4): Only 0 studies survive denoising but K = 3; consider increasing `Delta`, decreasing `N`, or reducing `K`.
tune$grid
#> N Delta cv_error cv_se n_folds_ok
#> 1 0.8764053 0.04793491 0.07080549 0.002541663 4
#> 2 2.1910133 0.04793491 NaN NA 0
#> 3 4.3820266 0.04793491 NaN NA 0
#> 4 0.8764053 0.09586983 0.07454212 0.003150261 4
#> 5 2.1910133 0.09586983 0.07890784 0.003902666 4
#> 6 4.3820266 0.09586983 0.08239793 0.003161490 4
#> 7 0.8764053 0.19173966 0.08429097 0.003326955 4
#> 8 2.1910133 0.19173966 0.08429097 0.003326955 4
#> 9 4.3820266 0.19173966 0.08525947 0.003824418 4
#> 10 0.8764053 0.28760949 0.09502216 0.006381783 4
#> 11 2.1910133 0.28760949 0.09502216 0.006381783 4
#> 12 4.3820266 0.28760949 0.09502216 0.006381783 4
tune$best
#> $N
#> [1] 0.8764053
#>
#> $Delta
#> [1] 0.04793491
#>
#> $cv_error
#> [1] 0.07080549
#>
# }