Skip to main content
QUICK REVIEW

[论文解读] Learning functional groups in complex microbiomes

Matthew S. Schmitt, Kiseok Keith Lee|arXiv (Cornell University)|Mar 3, 2026
Gut microbiota and health被引用 0
一句话总结

SCiFI是一种基于神经网络的软聚类算法,从丰度数据中学习功能信息驱动的功能群,并将它们与群落功能联系起来,能够在肠道、土壤和海洋系统中获得稀疏、可解释的结构-功能映射。

ABSTRACT

From soil to the gut, communities composed of thousands of microbes perform functions such as carbon sequestration and immune system regulation. Here, we introduce a data-driven approach that explains how community function can be traced to just a few groups of microbes or genes. In gut communities, our neural-network based clustering algorithm correctly recovers known functional groups. In the ocean metagenome, it distills ~500 gene modules down to three sparse groups highlighting survival strategies at different depths. In soils, it distills ~4400 bacterial species into two groups that enter a mathematical model of nitrate metabolism. By combining interpretable ML with strain isolation and sequencing experiments, we connect the metabolic specialization of each group to community-wide responses to perturbations. This integrated approach yields simple structure-function maps of microbiomes, allowing the discovery of molecular mechanisms underlying human and environmental health. More broadly, we illustrate how to do function-informed dimensionality reduction in biology.

研究动机与目标

  • 从高维微生物组数据中提炼出少量对特定群落功能有信息量的功能群。
  • 开发一个功能信息驱动的聚类方法,使群体丰度到功能的映射具有非线性特征。
  • 证明所学习的群体是稀疏、可解释且可通过实验验证的。
  • 将机器学习与定向实验结合起来,在扰动条件下将群体代谢与群落响应联系起来。

提出的方法

  • 提出SCiFI,一种使用Gumbel softmax技巧的软聚类功能信息驱动算法,与一个将群体丰度映射到功能的神经网络共同学习群体分配。
  • 用可微分的聚类矩阵表示分组,通过求和将物种聚合为在功能上相关的群体。
  • 可选地应用门控以促进稀疏性,得到成员物种或模块较少的群体。
  • 通过最小化目标功能的预测误差,对聚类矩阵和神经网络参数进行端到端训练。
  • 将SCiFI与缺乏功能信息聚类或缺乏非线性结构-功能映射的方法进行基准比较。
  • 将SCiFI应用于合成肠道群落、海洋Tara Oceans宏基因组以及土壤微观体系,以识别功能群并将其与测量的功能相关联。
Graphical Abstract: An integrated ML and experimental pipeline to discover functional groups and their dynamics in complex microbiomes and beyond. (a) First, our Soft Clustering Function Informed (SCiFI) algorithm identifies functional groups directly from species abundances data using neural networ
Graphical Abstract: An integrated ML and experimental pipeline to discover functional groups and their dynamics in complex microbiomes and beyond. (a) First, our Soft Clustering Function Informed (SCiFI) algorithm identifies functional groups directly from species abundances data using neural networ

实验结果

研究问题

  • RQ1功能信息驱动的聚类方法是否能够从丰度数据中预测群落功能,从而识别少量微生物群体?
  • RQ2所学习的功能群是否能够实现非线性结构-功能映射,从而解释跨不同生态系统的真实微生物组动力学?
  • RQ3所识别的群体能否通过生物学解释并通过定向测序或分离进行实验验证?
  • RQ4与缺乏功能信息聚类或假设线性映射的方法相比,SCiFI的表现如何?

主要发现

  • SCiFI能够准确预测功能并在肠道、土壤和海洋微生物组中恢复已知的功能群。
  • 学习得到的群体稀疏且生物学上可解释,能够与代谢通路和基因标记相关联。
  • 在肠道和土壤数据集中,非线性结构-功能映射对准确预测至关重要。
  • 在海洋宏基因组中,三个稀疏的基因群可以捕捉环境梯度,并可通过KEGG模块进行解释。
  • 所学习的两个土壤群可以整合到一个简单的消费者-资源模型中,用以在pH扰动下预测硝酸盐动态,与实验观察结果相一致。
  • 对代表性群体成员的定向分离与测序揭示了不同的脱氮能力,解释了pH依赖的硝酸盐还原现象。
Figure 1: Data-driven discovery of functional groups and their dynamics (a) Microbial communities perform crucial environmental functions from the soil to the ocean to the gut. (b) In soils, microbes collectively reduce nitrate to dinitrogen gas in a process called denitrification. This process is c
Figure 1: Data-driven discovery of functional groups and their dynamics (a) Microbial communities perform crucial environmental functions from the soil to the ocean to the gut. (b) In soils, microbes collectively reduce nitrate to dinitrogen gas in a process called denitrification. This process is c

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。