Skip to contents

Logistic_FAR_FLiRTI_CV_path finds the solution path of logistic functional additive regression with log-contrast constrain via Logistic_FAR_FLiRTI_Path. And it will use cross-validation to assess the goodness of the estimations in the solution path. NOTE: This documentation needs to be double checked and updated!

Usage

Logistic_FAR_FLiRTI_CV_path(
  y_vec,
  x_mat,
  h,
  kn,
  p,
  p_type,
  p_param,
  lambda_seq,
  lambda_length,
  min_lambda_ratio = 0.01,
  mu2,
  a = 1,
  bj_vec = rep(1/sqrt(kn), p),
  cj_vec = rep(1, p),
  rj_vec = 1e-05,
  weight_vec = weight_vec,
  relax_vec,
  delta_init,
  eta_stack_init,
  mu_1_init,
  tol,
  max_iter,
  nfold = 5,
  fold_seed,
  post_selection = FALSE,
  post_a = 1,
  verbose = 0
)

Arguments

y_vec

response vector, 0 for control, 1 for case. n = length(y_vec) is the number of observations.

x_mat

covariate matrix, consists of two parts. dim(x_mat) = (n, h + p * kn) First h columns are for demographical covariates(can include an intercept term) Rest columns are for p functional covariates, each being represented by a set of basis functions resulting kn covariates.

h, kn, p

dimension information for the dataset(x_mat).

p_type

an character variable indicating different types of the penalty

p_param

numerical vector for the penalty function. p_param[1] store sthe lambda value and will be provided by lambda_seq.

lambda_seq

a non-negative sequence of lambda, along which the solution path is searched. It is RECOMMENED to not supply this parameter and let the function itself determines it from the given data.

lambda_length

length of the lambda sequence when computing lambda_seq. If lambda_seq is provided, then of course lambda_length = length(lambda_seq).

mu2

quadratic term in the ADMM algorithm

a, bj_vec, cj_vec, rj_vec

parameters for the algorithm. See Algorithm_Details.pdf for more information.

relax_vec

not used.

delta_init, eta_stack_init, mu1_init

initial values for the algorithm.

tol, max_iter

convergence tolerance and max number of iteration of the algorithm.

nfold

integer, number of folds

fold_seed

if supplied, use this seed to generate the partitions for cross-validation. Can be useful for reproducible runs.

post_selection

bool, should the function also computes cross-validation results based on post selection estimation results.

post_a

a for the post selection estimation.

verbose

integer, indicating level of information to be printed during computation, currently supports: always: some info if something went wrong, e.g. when no penalty function is matched 1: information about the start and stop of the iteration 2. How the loss value is changed during each iteration

min_lam_ratio:

min(lambda_seq) / max{lambda_seq}. This function uses this parameter to determine the minimal value of lambda_seq. If p > n, then it is recommended to set this no smaller than 0.01 (sometimes even 0.05), otherwise you can set it to 0.001 or even smaller.

svd_thresh

not used.

Value

A list containing the solution path of delta, eta_stack, mu1 and some computation information such as convergency, iteration number and the lambda sequence of this solution path. Also information of CV is returned such as the fold ID for each observation, the loglikelihood results on each test set and the index with the highest average loglik on the testsets. If post_selection = TRUE, same results based on the post selection estimation are also returned.

Note

Although this function will return the index of lambda given the highest averaged loglik on the testsets. It is more recommended to use the stand alone *_pick functions in this packages, such as CV_Pick to find a optimal lambda since those functions give more flexibility.