Logistic_FAR_CV_path
finds the solution path of logistic functional
additive regression with log-contrast constrain via Logistic_FAR_Path
.
And it will use cross-validation to assess the goodness of the estimations
in the solution path.
Usage
Logistic_FAR_CV_path(
y_vec,
x_mat,
h,
kn,
p,
p_type,
p_param,
lambda_seq,
lambda_length,
min_lambda_ratio = 0.01,
mu2,
a = 1,
bj_vec = rep(1/sqrt(kn), p),
cj_vec = rep(1, p),
rj_vec = 1e-05,
weight_vec = 1,
logit_weight_vec = 1,
weight_already_combine = FALSE,
relax_vec,
delta_init,
eta_stack_init,
mu_1_init,
tol,
max_iter,
nfold = 5,
fold_seed,
post_selection = FALSE,
post_a = 1,
verbose = 0
)
Arguments
- y_vec
response vector, 0 for control, 1 for case. n = length(y_vec) is the number of observations.
- x_mat
covariate matrix, consists of two parts. dim(x_mat) = (n, h + p * kn) First h columns are for demographical covariates(can include an intercept term) Rest columns are for p functional covariates, each being represented by a set of basis functions resulting kn covariates.
- h, kn, p
dimension information for the dataset(
x_mat
).- p_type
an character variable indicating different types of the penalty
- p_param
numerical vector for the penalty function.
p_param[1]
store sthe lambda value and will be provided bylambda_seq
.- lambda_seq
a non-negative sequence of lambda, along which the solution path is searched. It is RECOMMENED to not supply this parameter and let the function itself determines it from the given data.
- lambda_length
length of the lambda sequence when computing
lambda_seq
. Iflambda_seq
is provided, then of courselambda_length = length(lambda_seq)
.- mu2
quadratic term in the ADMM algorithm
- a, bj_vec, cj_vec, rj_vec
parameters for the algorithm. See Algorithm_Details.pdf for more information.
- weight_vec
weight vector for each subject. The final weight for each subject will be adjusted also by
logit_weight_vec
. And the summation of the final weight vector is normalized ton
, the sample size.- logit_weight_vec
weight vector for each subject when computing the integral in the logit values. Each entry should be positive and no more than 1. This is a naive method for adjusting for early stop during the interval.
- weight_already_combine
boolen, indicating whether the
weight_vec
is already combined withlogit_weight_vec
for each subject.- relax_vec
not used.
- delta_init, eta_stack_init, mu1_init
initial values for the algorithm.
- tol, max_iter
convergence tolerance and max number of iteration of the algorithm.
- nfold
integer, number of folds
- fold_seed
if supplied, use this seed to generate the partitions for cross-validation. Can be useful for reproducible runs.
- post_selection
bool, should the function also computes cross-validation results based on post selection estimation results.
- post_a
a
for the post selection estimation.- verbose
integer, indicating level of information to be printed during computation, currently supports: always: some info if something went wrong, e.g. when no penalty function is matched 1: information about the start and stop of the iteration 2. How the loss value is changed during each iteration
- min_lam_ratio:
min(lambda_seq) / max{lambda_seq}
. This function uses this parameter to determine the minimal value oflambda_seq
. Ifp > n
, then it is recommended to set this no smaller than 0.01 (sometimes even 0.05), otherwise you can set it to 0.001 or even smaller.- svd_thresh
not used.
Value
A list containing the solution path of delta
, eta_stack
, mu1
and some computation information such as convergency, iteration number and the lambda
sequence of this solution path. Also information of CV is returned such as the fold ID
for each observation, the loglikelihood results on each test set and the index with the
highest average loglik on the testsets. If post_selection = TRUE
, same results
based on the post selection estimation are also returned.