Cross-validation for solution path of Logistic FAR. — Logistic_FAR_CV

Logistic_FAR_CV_opath finds the solution path of logistic functional additive regression with log-contrast constrain via Logistic_FAR_OPath, which means it will perform within-group orthonormalization to standardize the data before the real computation. Also, it uses cross-validation to assess the goodness of the estimations in the solution path.

Usage

Logistic_FAR_CV_opath(
  y_vec,
  x_mat,
  h,
  kn,
  p,
  p_type,
  p_param,
  lambda_seq,
  lambda_length,
  min_lambda_ratio = 0.01,
  mu2,
  a = 1,
  bj_vec = rep(1/sqrt(kn), p),
  cj_vec = rep(1, p),
  rj_vec = 1e-05,
  svd_thresh = 10^(-6),
  relax_vec,
  delta_init,
  eta_stack_init,
  mu_1_init,
  tol,
  max_iter,
  nfold = 5,
  fold_seed = NULL,
  post_selection = FALSE,
  post_a = 1,
  verbose = 0
)

Arguments

y_vec: response vector, 0 for control, 1 for case. n = length(y_vec) is the number of observations.
x_mat: covariate matrix, consists of two parts. dim(x_mat) = (n, h + p * kn) First h columns are for demographical covariates(can include an intercept term) Rest columns are for p functional covariates, each being represented by a set of basis functions resulting kn covariates.
h, kn, p: dimension information for the dataset(x_mat).
p_type: an character variable indicating different types of the penalty
p_param: numerical vector for the penalty function. p_param[1] store sthe lambda value and will be provided by lambda_seq.
lambda_seq: a non-negative sequence of lambda, along which the solution path is searched. It is RECOMMENED to not supply this parameter and let the function itself determines it from the given data.
lambda_length: length of the lambda sequence when computing lambda_seq. If lambda_seq is provided, then of course lambda_length = length(lambda_seq).
mu2: quadratic term in the ADMM algorithm
a, bj_vec, cj_vec, rj_vec: parameters for the algorithm. See Algorithm_Details.pdf for more information.
svd_thresh: a small value for threashing the singular value vectors.
relax_vec: not used
delta_init, eta_stack_init, mu1_init: initial values for the algorithm.
tol, max_iter: convergence tolerance and max number of iteration of the algorithm.
nfold: integer, number of folds
fold_seed: if supplied, use this seed to generate the partitions for cross-validation. Can be useful for reproducible runs.
post_selection: bool, should the function also computes cross-validation results based on post selection estimation results.
post_a: a for the post selection estimation.
verbose: integer, indicating level of information to be printed during computation, currently supports: always: some info if something went wrong, e.g. when no penalty function is matched 1: information about the start and stop of the iteration 2. How the loss value is changed during each iteration
min_lam_ratio:: min(lambda_seq) / max{lambda_seq}. This function uses this parameter to determine the minimal value of lambda_seq. If p > n, then it is recommended to set this no smaller than 0.01 (sometimes even 0.05), otherwise you can set it to 0.001 or even smaller.

Value

A list containing the solution path of delta, eta_stack, mu1 and some computation information such as convergency, iteration number and the lambda sequence of this solution path. Also information of CV is returned such as the fold ID for each observation, the loglikelihood results on each test set and the index with the highest average loglik on the testsets. If post_selection = TRUE, same results based on the post selection estimation are also returned.

Note

Although this function will return the index of lambda given the highest averaged loglik on the testsets. It is more recommended to use the stand alone *_pick functions in this packages, such as CV_Pick to find a optimal lambda since those functions give more flexibility.

This function conducts cross validation in a sequential manner. For possible parallel implementation, see Logistic_FAR_CV_opath_par.