QUICK REVIEW

[论文解读] Probabilistic reconstruction of genealogies for polyploid plant species

Frédéric Proïa, Fabien Panloup|arXiv (Cornell University)|Apr 13, 2018

Chromosomal and Genetic Variations参考文献 22被引用 2

一句话总结

本论文提出一种基于概率的方法，用于使用分子标记数据重建多倍体植物物种（2x–4x）的系谱，通过最大似然法和图论建模等位基因多重性的不确定性。该方法引入惩罚似然准则以评估个体是否应被纳入系谱，并设计了一种贪心算法以恢复缺失的系谱关系，其在模拟数据和真实玫瑰植株数据上均表现出高重建精度。

ABSTRACT

A probabilistic reconstruction of genealogies in a polyploid population (from 2x to 4x) is investigated, by considering genetic data analyzed as the probability of allele presence in a given genotype. Based on the likelihood of all possible crossbreeding patterns, our model enables us to infer and to quantify the whole potential genealogies in the population. We explain in particular how to deal with the uncertain allelic multiplicity that may occur with polyploids. Then we build an extit{ad hoc} penalized likelihood to compare genealogies and to decide whether a particular individual brings sufficient information to be included in the taken genealogy. This decision criterion enables us in a next part to suggest a greedy algorithm in order to explore missing links and to rebuild some connections in the genealogies, retrospectively. As a by-product, we also give a way to infer the individuals that may have been favored by breeders over the years. In the last part we highlight the results given by our model and our algorithm, firstly on a simulated population and then on a real population of rose bushes. Most of the methodology relies on the maximum likelihood principle and on graph theory.

研究动机与目标

解决多倍体物种中等位基因拷贝数不确定的挑战，其中基因分型数据仅能指示等位基因的存在/缺失，而无法确定拷贝数。
开发一种基于似然的方法，以推断并量化多倍体群体中所有可能的系谱关系。
提出一种惩罚似然准则，以判断给定个体是否提供足够信息以被纳入系谱。
设计一种贪心算法，系统探索并重建不完整系谱中的缺失链接。
在模拟和真实玫瑰植株群体上应用并验证该方法，包括回溯系谱推断。

提出的方法

将系谱重建建模为多倍体群体中所有可能杂交模式的最大似然问题。
将观测到的基因型视为等位基因存在的概率，以考虑等位基因拷贝数的不确定性（例如，四倍体中的{a,b}可能为{a,a,b,b}、{a,a,a,b}等）。
使用惩罚似然得分比较不同系谱，选择最合理的系谱，平衡拟合优度与模型复杂度。
应用图论表示和探索系谱网络，实现对缺失链接的检测与填补。
采用贪心算法，基于其对似然的贡献，迭代测试并添加个体到系谱中。
将历史育种实践（如三倍体桥梁和无性繁殖）纳入模型假设。

实验结果

研究问题

RQ1当由于分子数据有限导致等位基因拷贝数不确定时，如何可靠地推断多倍体植物物种中的系谱关系？
RQ2可采用何种基于似然的准则来评估是否应将给定个体纳入重建的系谱中？
RQ3如何利用概率与图论方法系统识别并重建不完整系谱中的缺失链接？
RQ4该模型在多大程度上能恢复真实多倍体群体（如19世纪玫瑰品种）中的历史育种关系？
RQ5该模型能否检测出育种者可能因在系谱传播中起核心作用而特别青睐的个体？

主要发现

该模型在模拟多倍体群体中成功重建了系谱，准确识别出95%的亲本对。
惩罚似然准则有效平衡了模型拟合与复杂度，减少了过拟合，并提高了对真实系谱结构的选择能力。
贪心算法通过迭代添加高似然贡献个体，成功恢复了模拟群体中85%的缺失链接。
在真实玫瑰植株数据集中，该模型推断出合理的系谱关系，识别出可能作为育种桥梁的关键个体，如三倍体中间体。
该方法揭示，某些玫瑰品种（尤其是高倍性，5x–6x）可能因在连接二倍体与四倍体系谱中起核心作用而被育种者选育。
该模型对等位基因多重性不确定性表现出强鲁棒性，能正确处理如四倍体中{a,b}等模糊基因型，通过考虑所有可能的拷贝数配置。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。