Introduction
In this note we talk about some properties of the oracle estimator.
This LogisticFAR
package deals with high-dimensional
logistic regression problem with groupwise-covariate selection feature
and sum-to-zero constraint. For the oracle estimator, the important
covariates are pre-given, so there’s no need for the variable selection
part. We just need to perform the logistic regression with sum-to-zero
constrant.
First we generate a toy example:
set.seed(1024)
n <- 200
kn <- 2
p <- 3
demox <- matrix(rnorm(n), nrow = n)
fx <- matrix(rnorm(n * p * kn), nrow = n)
beta0 <- c(1, # intercept
1, # demographical covariates
1, -2, # group covaritates 1
-2, 1, # group covariates 2
1, 1 # group covariates 3
)
logit0 <- cbind(1, demox, fx) %*% beta0
pi0 <- exp(logit0) / (1 + exp(logit0))
y <- rbinom(n, size = 1, prob = pi0)
Here we generate a sample of size 200, with 1 demographical variables and 3 groups of covariates, each group contains 2 covariates. Here we omit other non-important group covariates since they will not be included in the oracle estimator. Note that the group covariate coefficient vector satisfying
Oracle Estimator
Fit the model via glm
Since the low-dimensional model structure is given, we can safely
pick a reference level and fit a non-constraint model via
glm
logitx <- cbind(demox, fx[, c(1, 3)] - fx[, 5], fx[, c(2, 4)] - fx[, 6])
colnames(logitx) <- c("demo",
"x1_ref5", "x2_ref6",
"x3_ref5", "x4_ref6")
glmfit <- glm(y ~ logitx, family = binomial())
glmfit$coefficients
#> (Intercept) logitxdemo logitxx1_ref5 logitxx2_ref6 logitxx3_ref5
#> 0.5655721 1.0069234 0.7934938 -1.6096452 -1.6637660
#> logitxx4_ref6
#> 0.8454335
Here we use the 3rd group as the reference level and the full covariate effect vector would be
glmest <- c(glmfit$coefficients,
-(glmfit$coefficients["logitxx1_ref5"] + glmfit$coefficients["logitxx3_ref5"]),
-(glmfit$coefficients["logitxx2_ref6"] + glmfit$coefficients["logitxx4_ref6"]))
names(glmest) <- c("intercept", "demo",
paste("x", 1 : 6, sep = ""))
glmest
#> intercept demo x1 x2 x3 x4 x5
#> 0.5655721 1.0069234 0.7934938 -1.6096452 -1.6637660 0.8454335 0.8702722
#> x6
#> 0.7642117
Fit the model via
LogisticFAR::Logistic_FAR_Path_Further_Improve
The LogisticFAR::Logistic_FAR_Path_Further_Improve
used
for post-selection estimation is essentially an unpenalized estiamtor,
which is suitable for oracle estimation as long as we give the right
model structure.
h <- 2 # intercept term is included
far_x <- cbind(1, demox, fx)
delta_init <- rep(0, 2)
eta_init <- rep(0.001, p * kn)
farfit <- LogisticFAR::Logistic_FAR_Path_Further_Improve(far_x, y, h = h, k_n = kn, p = p,
delta_vec_init = delta_init,
eta_stack_init = eta_init,
mu1_vec_init = rep(0, kn),
a = 1,
weight_vec = 1,
mu2 = 5,
lam = 0.001, tol = 10 ^ (-10))
farest <- c(farfit$delta_vec, farfit$eta_stack_vec)
names(farest) <- c("intercept", "demo",
paste("x", 1 : 6, sep = ""))
farest
#> intercept demo x1 x2 x3 x4 x5
#> 0.5653995 1.0066646 0.7932918 -1.6633419 -1.6091950 0.8452099 0.8159059
#> x6
#> 0.8181323
In LogisticFAR::Logistic_FAR_Path_Further_Improve
, the
objective function takes the form
Therefore a
and mu2
controls the balance
between MLE and sum-to-zero constraint, lam
is a robust
ridge-type regularizer and weight_vec
is a weight for each
observation. Varying these parameters will lead to different estimators
than the glm
one.
Check the langrangian condition
By the method of subgradient(or equivalently the KKT condition), we know that the oracel estimator satisfies
We can verify this condition for the oracle estimator.
LogisticFAR
fit
pi_vec_far <- exp(far_x %*% c(farfit$delta_vec, farfit$eta_stack_vec)) / (1 + exp(far_x %*% c(farfit$delta_vec, farfit$eta_stack_vec)))
t(fx) %*% (pi_vec_far - y)
#> [,1]
#> [1,] -0.1984088
#> [2,] 4.1651383
#> [3,] -0.1950729
#> [4,] 4.1614994
#> [5,] -0.1984187
#> [6,] 4.1617573
As we can see, they all satisfy this condition(up to some numerical
error). The oracle estimator from glm
based on referencing
a group is more accurate than LogisticFAR
fit since there
is a robust regularizer in the LogisticFAR
objective
function also the tol
and mu2
should also be
adjusted for a more accurate estimator.
NOTE: If we plug-in the real underlying parameters, then we get