[论文解读] The inference of gene trees with species trees
本文综述了通过考虑基因重复、丢失、转移和不完全谱系分选来调和基因树与物种树的模型,表明将这些模型与序列演化模型整合可提升基因树推断的准确性。主要贡献在于倡导联合推断方法,以提高基因组演化研究和祖先基因组重建的准确性。
Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can co-exist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice-versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. In this article we review the various models that have been used to describe the relationship between gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a better basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.
研究动机与目标
- 解决基因树(单个基因的演化)与物种树(整个谱系的演化)之间的脱节问题,后者常因不完全谱系分选、基因重复、丢失和水平转移等生物过程而产生不一致。
- 克服传统方法仅孤立推断基因树的局限性,忽略物种树背景,导致重建结果出现偏差或不一致。
- 推动联合推断方法的发展,同时利用基因家族演化模型和序列演化模型重建基因树与物种树。
- 强调开发可扩展、集成化且增量式的计算框架的必要性,以应对基因组数据规模不断增长的挑战。
- 通过将多种进化过程(如基因顺序演化和重排)整合到系统发育模型中,推动领域发展,实现更准确的祖先基因组重建。
提出的方法
- 综述并比较现有的基因树-物种树调和模型,包括 DTL(重复-转移-局部)和 DL(重复-丢失)模型。
- 将序列演化模型与基因家族演化模型整合,以提高基因树推断的准确性。
- 使用出生-死亡过程和动态规划方法,沿物种树建模基因家族演化,实现对调和关系的统计推断。
- 提出将邻接关系和邻域演化模型(如基因顺序变化)纳入基因树-物种树调和中,以捕捉大规模基因组变化。
- 倡导采用增量式计算框架,复用先前分析中的信息,以减少大规模基因组项目中的冗余计算。
- 探索将断裂点检测模型(如基于HMM的模型)整合到基因家族演化模型中,以在全基因组尺度检测系统发育不一致。
实验结果
研究问题
- RQ1当已知物种树时,基因树-物种树调和模型在多大程度上能提升基因树推断的准确性?
- RQ2不完全谱系分选、基因重复、丢失和水平转移等过程在多大程度上导致基因树与物种树之间的拓扑不一致?
- RQ3将序列演化模型与基因家族演化模型整合,能否带来更准确且一致的基因树重建结果?
- RQ4在大规模基因组数据集下,联合推断基因树与物种树面临哪些计算与概念上的挑战?
- RQ5如何将基因组重排与基因邻域演化模型整合到基因树-物种树调和中,以改进祖先基因组重建?
主要发现
- 模拟研究与实证研究表明,将基因树-物种树模型与序列演化模型结合,可显著提升基因树重建的准确性。
- 在物种树约束下推断的基因树更具一致性,且错误率更低,尤其在不完全谱系分选情况下,人类基因组中多达30%的区域可能与物种树不一致。
- 当前的 DTL 和 DL 模型可能存在偏差,原因在于多重事件频繁发生;整合邻域演化模型可降低此类偏差。
- 将基因顺序与邻接关系演化整合到调和模型中,可实现对祖先基因组结构(如祖先染色体和基因邻域)的重建。
- 迫切需要可扩展、增量式的算法,以复用先前计算结果,因为当前方法通常需对每个新数据集从头重新计算基因家族、比对和树结构。
- 未来方法必须在提升模型复杂度以更真实反映生物现实与实现大规模基因组数据集可扩展性之间取得平衡,这是比较基因组学面临的关键挑战。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。