Robust Subgroup Analysis for High-dimensional Data(preprint)

By Chao Cheng, Xingdong Feng in Subgroup analysis Variable selection Parallel computation Robust regression M-estimation

May 1, 2020

Abstract: It becomes an interesting problem to identify subgroup structures in data analysis as populations are probably heterogeneous in practice. In this paper, we consider M-estimators together with both concave and pairwise fusion penalties, which can deal with high-dimensional data containing some outliers. The penalties are applied both on covariates and treatment effects, where the estimation is expected to achieve both variable selection and data clustering simultaneously. An algorithm is proposed to process relatively large datasets based on parallel computing. We establish the convergence analysis of the proposed algorithm, the oracle property of the penalized M-estimators, and the selection consistency of the proposed criterion. Our numerical study demonstrates that the proposed method is promising to efficiently identify subgroups hidden in high-dimensional data.

An R package RSAVS is provided for this paper. Click the Code button above to check its Github repo.

Subgroup analysis results for student performance data

Posted on:
May 1, 2020
Length:
1 minute read, 141 words
Categories:
Subgroup analysis Variable selection Parallel computation Robust regression M-estimation
Tags:
Subgroup analysis Variable selection Parallel computation Robust regression M-estimation
See Also:
Parallel Computing in R, from parallel to foreach and future
Robust analysis of cancer heterogeneity for high-dimensional data
Variable Selection under Logistic Regression for Compositional Functional Data