Robust analysis of cancer heterogeneity for high-dimensional data

By Chao Cheng, Xingdong Feng, Xiaoguang Li, Mengyun Wu in Subgroup analysis Variable selection Parallel computation Robust regression M-estimation

November 6, 2022

Cancer heterogeneity plays an important role in the understanding of tumor etiology, progression, and response to treatment. To accommodate heterogeneity, cancer subgroup analysis has been extensively conducted. However, most of the existing studies share the limitation that they cannot accommodate heavy-tailed or contaminated outcomes and also high dimensional covariates, both of which are not uncommon in biomedical research. In this study, we propose a robust subgroup identification approach based on M-estimators together with concave and pairwise fusion penalties, which advances from existing studies by effectively accommodating high-dimensional data containing some outliers. The penalties are applied on both latent heterogeneity factors and covariates, where the estimation is expected to achieve subgroup identification and variable selection simultaneously, with the number of subgroups being apriori unknown. We innovatively develop an algorithm based on parallel computing strategy, with a significant advantage of capable of processing large-scale data. The convergence property of the proposed algorithm, oracle property of the penalized M-estimators, and selection consistency of the proposed BIC criterion are carefully established. Simulation and analysis of TCGA breast cancer data demonstrate that the proposed approach is promising to efficiently identify underlying subgroups in high-dimensional data.

An R package RSAVS is provided for this paper. Click the Code button above to check its Github repo.

Composition of PAM50 subtypes for the five identified subgroups

Posted on:
November 6, 2022
Length:
1 minute read, 209 words
Categories:
Subgroup analysis Variable selection Parallel computation Robust regression M-estimation
Tags:
Subgroup analysis Variable selection Parallel computation Robust regression M-estimation
See Also:
Parallel Computing in R, from parallel to foreach and future
Variable Selection under Logistic Regression for Compositional Functional Data
Robust Analysis of Cancer Heterogeneity for High-dimensional Data