Biomarker detection for disease classification in longitudinal microbiome data

By Chao Cheng, Hanteng Ma, Yujie Zhong, Anne-Catrin Uhlemann, Xingdong Feng, Jianhua Hu in Compositional data Functional data analysis High-dimensional data logistic regression

June 1, 2025

The microbiome has been found to have a close relationship with human health. Advancements in sequencing technologies have enabled in-depth studies of microbial communities and their associations with various diseases. When analyzing microbiome data, it is common to perform compositional scale normalization to ensure statistical validity. This requires special treatment to address the unique characteristics of microbiome data. Furthermore, biomedical studies often involve repeated measurements of microbial samples, which adds complexity to the data analysis. In this paper we focus on a liver transplant microbiome study. The main objective is to investigate the association between the colonization status of multidrug-resistant bacteria (MDRB) and the longitudinal microbial abundance profile. To accomplish this, we employ a regularized functional logistic regression model in our analysis. Specifically, we utilize the log-contrast model with a low-rank approximation to handle the compositional covariates and nonconvex penalties to select the important components in the covariate space. We propose an efficient estimation algorithm and establish the oracle property of the estimator. We name this new development as Functional Compositional data Quadratic Method (FCQM). We demonstrate the promise of the proposed method with extensive simulation studies and the liver transplant application.

An R package LogisticFAR is provided for this paper. Click the Code button above to check its Github repo.

Estimated functional coefficients of OTUs chosen by FCQM

Posted on:
June 1, 2025
Length:
1 minute read, 211 words
Categories:
Compositional data Functional data analysis High-dimensional data logistic regression
Tags:
Compositional data Functional data analysis High-dimensional data logistic regression
See Also:
Variable Selection under Logistic Regression for Compositional Functional Data