Skip to main content
QUICK REVIEW

[論文レビュー] Learning functional groups in complex microbiomes

Matthew S. Schmitt, Kiseok Keith Lee|arXiv (Cornell University)|Mar 3, 2026
Gut microbiota and health被引用数 0
ひとこと要約

SCiFI, a neural-network–based soft clustering algorithm, learns function-informed functional groups from microbiome abundance data and links them to community function, enabling sparse, interpretable structure-function maps across gut, soil, and ocean systems.

ABSTRACT

From soil to the gut, communities composed of thousands of microbes perform functions such as carbon sequestration and immune system regulation. Here, we introduce a data-driven approach that explains how community function can be traced to just a few groups of microbes or genes. In gut communities, our neural-network based clustering algorithm correctly recovers known functional groups. In the ocean metagenome, it distills ~500 gene modules down to three sparse groups highlighting survival strategies at different depths. In soils, it distills ~4400 bacterial species into two groups that enter a mathematical model of nitrate metabolism. By combining interpretable ML with strain isolation and sequencing experiments, we connect the metabolic specialization of each group to community-wide responses to perturbations. This integrated approach yields simple structure-function maps of microbiomes, allowing the discovery of molecular mechanisms underlying human and environmental health. More broadly, we illustrate how to do function-informed dimensionality reduction in biology.

研究の動機と目的

  • Distill a small number of functional groups from high-dimensional microbiome data that are informative of specific community functions.
  • Develop a function-informed clustering method that enables nonlinear mappings from group abundances to function.
  • Demonstrate that the learned groups are sparse, interpretable, and experimentally verifiable.
  • Integrate machine learning with targeted experiments to connect group metabolism to community responses under perturbations.

提案手法

  • Introduce SCiFI, a soft clustering function-informed algorithm using a Gumbel softmax trick to learn group assignments jointly with a neural network that maps group abundances to function.
  • Represent grouping through a differentiable clustering matrix that aggregates species by summation into functionally relevant groups.
  • Optionally apply gating to promote sparsity, yielding groups with few member species or modules.
  • Train the clustering matrix and neural network parameters end-to-end by minimizing prediction error on the target function.
  • Benchmark SCiFI against methods lacking function-informed clustering or nonlinear structure-function mapping.
  • Apply SCiFI to synthetic gut communities, ocean Tara Oceans metagenomes, and soil microcosms to identify functional groups and relate them to measured functions.
Graphical Abstract: An integrated ML and experimental pipeline to discover functional groups and their dynamics in complex microbiomes and beyond. (a) First, our Soft Clustering Function Informed (SCiFI) algorithm identifies functional groups directly from species abundances data using neural networ
Graphical Abstract: An integrated ML and experimental pipeline to discover functional groups and their dynamics in complex microbiomes and beyond. (a) First, our Soft Clustering Function Informed (SCiFI) algorithm identifies functional groups directly from species abundances data using neural networ

実験結果

リサーチクエスチョン

  • RQ1Can a function-informed clustering approach identify a small set of microbial groups that predict community function from abundance data?
  • RQ2Do learned functional groups enable nonlinear structure-function mappings that explain real microbiome dynamics across different ecosystems?
  • RQ3Can the identified groups be interpreted biologically and experimentally validated through targeted sequencing or isolation?
  • RQ4How does SCiFI perform relative to methods that cluster without function information or that assume linear mappings?

主な発見

  • SCiFI accurately predicts function and recovers known functional groups in gut, soil, and ocean microbiomes.
  • Learned groups are sparse and biologically interpretable, enabling links to metabolic pathways and gene signatures.
  • In gut and soil datasets, nonlinear structure-function mappings are essential for accurate predictions.
  • In ocean metagenomes, three sparse gene groups capture environmental gradients and can be interpreted via KEGG modules.
  • The two learned soil groups can be incorporated into a simple consumer-resource model to predict nitrate dynamics across pH perturbations, aligning with experimental observations.
  • Targeted isolation and sequencing of representative group members reveal distinct denitrification capabilities that explain pH-dependent nitrate reduction.
Figure 1: Data-driven discovery of functional groups and their dynamics (a) Microbial communities perform crucial environmental functions from the soil to the ocean to the gut. (b) In soils, microbes collectively reduce nitrate to dinitrogen gas in a process called denitrification. This process is c
Figure 1: Data-driven discovery of functional groups and their dynamics (a) Microbial communities perform crucial environmental functions from the soil to the ocean to the gut. (b) In soils, microbes collectively reduce nitrate to dinitrogen gas in a process called denitrification. This process is c

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。