Cross-validation for solution path of Logistic FAR.
Source:R/opath_solver.R
Logistic_FAR_CV_opath.Rd
Logistic_FAR_CV_opath
finds the solution path of logistic functional additive regression
with log-contrast constrain via Logistic_FAR_OPath
, which means it will perform within-group orthonormalization to
standardize the data before the real computation. Also, it uses cross-validation
to assess the goodness of the estimations in the solution path.
Usage
Logistic_FAR_CV_opath(
y_vec,
x_mat,
h,
kn,
p,
p_type,
p_param,
lambda_seq,
lambda_length,
min_lambda_ratio = 0.01,
mu2,
a = 1,
bj_vec = rep(1/sqrt(kn), p),
cj_vec = rep(1, p),
rj_vec = 1e-05,
svd_thresh = 10^(-6),
relax_vec,
delta_init,
eta_stack_init,
mu_1_init,
tol,
max_iter,
nfold = 5,
fold_seed = NULL,
post_selection = FALSE,
post_a = 1,
verbose = 0
)
Arguments
- y_vec
response vector, 0 for control, 1 for case. n = length(y_vec) is the number of observations.
- x_mat
covariate matrix, consists of two parts. dim(x_mat) = (n, h + p * kn) First h columns are for demographical covariates(can include an intercept term) Rest columns are for p functional covariates, each being represented by a set of basis functions resulting kn covariates.
- h, kn, p
dimension information for the dataset(
x_mat
).- p_type
an character variable indicating different types of the penalty
- p_param
numerical vector for the penalty function.
p_param[1]
store sthe lambda value and will be provided bylambda_seq
.- lambda_seq
a non-negative sequence of lambda, along which the solution path is searched. It is RECOMMENED to not supply this parameter and let the function itself determines it from the given data.
- lambda_length
length of the lambda sequence when computing
lambda_seq
. Iflambda_seq
is provided, then of courselambda_length = length(lambda_seq)
.- mu2
quadratic term in the ADMM algorithm
- a, bj_vec, cj_vec, rj_vec
parameters for the algorithm. See Algorithm_Details.pdf for more information.
- svd_thresh
a small value for threashing the singular value vectors.
- relax_vec
not used
- delta_init, eta_stack_init, mu1_init
initial values for the algorithm.
- tol, max_iter
convergence tolerance and max number of iteration of the algorithm.
- nfold
integer, number of folds
- fold_seed
if supplied, use this seed to generate the partitions for cross-validation. Can be useful for reproducible runs.
- post_selection
bool, should the function also computes cross-validation results based on post selection estimation results.
- post_a
a
for the post selection estimation.- verbose
integer, indicating level of information to be printed during computation, currently supports: always: some info if something went wrong, e.g. when no penalty function is matched 1: information about the start and stop of the iteration 2. How the loss value is changed during each iteration
- min_lam_ratio:
min(lambda_seq) / max{lambda_seq}
. This function uses this parameter to determine the minimal value oflambda_seq
. Ifp > n
, then it is recommended to set this no smaller than 0.01 (sometimes even 0.05), otherwise you can set it to 0.001 or even smaller.
Value
A list containing the solution path of delta
, eta_stack
, mu1
and some computation information such as convergency, iteration number and the lambda
sequence of this solution path. Also information of CV is returned such as the fold ID
for each observation, the loglikelihood results on each test set and the index with the
highest average loglik on the testsets. If post_selection = TRUE
, same results
based on the post selection estimation are also returned.
Note
Although this function will return the index of lambda given the highest
averaged loglik on the testsets. It is more recommended to use the stand alone
*_pick
functions in this packages, such as CV_Pick
to find a optimal
lambda since those functions give more flexibility.
This function conducts cross validation in a sequential manner. For possible
parallel implementation, see Logistic_FAR_CV_opath_par
.