Skip to contents

Logistic_FARMM_CV_path finds the solution path of logistic functional additive regression with log-contrast constrain via Logistic_FAR_Path. And it will use cross-validation to assess the goodness of the estimations in the solution path.

Usage

Logistic_FARMM_CV_path(
  y_vec,
  x_mat,
  h,
  kn,
  p,
  rand_eff_df,
  p_type,
  p_param,
  lambda_seq,
  lambda_length,
  min_lambda_ratio = 0.01,
  mu2,
  a = 1,
  bj_vec = rep(1/sqrt(kn), p),
  cj_vec = rep(1, p),
  rj_vec = 1e-05,
  weight_vec = 1,
  logit_weight_vec = 1,
  weight_already_combine = FALSE,
  relax_vec,
  delta_init,
  eta_stack_init,
  mu_1_init,
  tol,
  max_iter,
  nfold = 5,
  fold_seed,
  post_selection = TRUE,
  post_a = 1,
  verbose = 0
)

Arguments

y_vec

response vector, 0 for control, 1 for case. n = length(y_vec) is the number of observations.

x_mat

covariate matrix, consists of two parts. dim(x_mat) = (n, h + p * kn) First h columns are for demographical covariates(can include an intercept term) Rest columns are for p functional covariates, each being represented by a set of basis functions resulting kn covariates.

h, kn, p

dimension information for the dataset(x_mat).

rand_eff_df

data.frame of random effect related data. It must contain at least one column named "subj_vec_fct", which indicates the subject level. If this is the only column in rand_eff_df, then a constant random effect is applied. If there is other column(s), then they will all be additively added to the random effect as the slope term. The number of rows of rand_eff_df is the the same as length(y_vec).

p_type

an character variable indicating different types of the penalty

p_param

numerical vector for the penalty function. p_param[1] store sthe lambda value and will be provided by lambda_seq.

lambda_seq

a non-negative sequence of lambda, along which the solution path is searched. It is RECOMMENED to not supply this parameter and let the function itself determines it from the given data.

lambda_length

length of the lambda sequence when computing lambda_seq. If lambda_seq is provided, then of course lambda_length = length(lambda_seq).

mu2

quadratic term in the ADMM algorithm

a, bj_vec, cj_vec, rj_vec

parameters for the algorithm. See Algorithm_Details.pdf for more information.

weight_vec

weight vector for each subject. The final weight for each subject will be adjusted also by logit_weight_vec. And the summation of the final weight vector is normalized to n, the sample size.

logit_weight_vec

weight vector for each subject when computing the integral in the logit values. Each entry should be positive and no more than 1. This is a naive method for adjusting for early stop during the interval.

weight_already_combine

boolen, indicating whether the weight_vec is already combined with logit_weight_vec for each subject.

relax_vec

not used.

delta_init, eta_stack_init, mu1_init

initial values for the algorithm.

tol, max_iter

convergence tolerance and max number of iteration of the algorithm.

nfold

integer, number of folds

fold_seed

if supplied, use this seed to generate the partitions for cross-validation. Can be useful for reproducible runs.

post_selection

bool, should the function also computes cross-validation results based on post selection estimation results.

post_a

a for the post selection estimation.

verbose

integer, indicating level of information to be printed during computation, currently supports: always: some info if something went wrong, e.g. when no penalty function is matched 1: information about the start and stop of the iteration 2. How the loss value is changed during each iteration

min_lam_ratio:

min(lambda_seq) / max{lambda_seq}. This function uses this parameter to determine the minimal value of lambda_seq. If p > n, then it is recommended to set this no smaller than 0.01 (sometimes even 0.05), otherwise you can set it to 0.001 or even smaller.

svd_thresh

not used.

Value

A list containing the solution path of delta, eta_stack, mu1 and some computation information such as convergency, iteration number and the lambda sequence of this solution path. Also information of CV is returned such as the fold ID for each observation, the loglikelihood results on each test set and the index with the highest average loglik on the testsets. If post_selection = TRUE, same results based on the post selection estimation are also returned.

Note

Although this function will return the index of lambda given the highest averaged loglik on the testsets. It is more recommended to use the stand alone *_pick functions in this packages, such as CV_Pick to find a optimal lambda since those functions give more flexibility.