QUICK REVIEW

[论文解读] Fréchet ChemblNet Distance: A metric for generative models for molecules.

Kristina Preuer, Philipp Renz|arXiv (Cornell University)|Mar 26, 2018

Computational Drug Discovery Methods参考文献 15被引用 4

一句话总结

本文提出Fréchet ChemblNet距离（FCD），一种新颖的分子设计生成模型评估度量，通过利用训练用于预测药物活性的深度神经网络ChemblNet的倒数第二层特征，评估生成分子在化学和生物相似性以及多样性方面与真实分子的差异，提供了一种强大且统一的替代方案，以应对现有度量标准不一致且易被操纵的问题。

ABSTRACT

The new wave of successful generative models in machine learning has increased the interest in deep learning driven de novo drug design. However, assessing the performance of such generative models is notoriously difficult. Metrics that are typically used to assess the performance of such generative models are the percentage of chemically valid molecules or the similarity to real molecules in terms of particular descriptors, such as the partition coefficient (logP) or druglikeness. However, method comparison is difficult because of the inconsistent use of evaluation metrics, the necessity for multiple metrics, and the fact that some of these measures can easily be tricked by simple rule-based systems. We propose a novel distance measure between two sets of molecules, called Fr\'echet ChemblNet distance (FCD), that can be used as an evaluation metric for generative models. The FCD is similar to a recently established performance metric for comparing image generation methods, the Fr\'echet Inception Distance (FID). Whereas the FID uses one of the hidden layers of InceptionNet, the FCD utilizes the penultimate layer of a deep neural network called ChemblNet, which was trained to predict drug activities. Thus, the FCD metric takes into account chemically and biologically relevant information about molecules, and also measures the diversity of the set via the distribution of generated molecules. The FCD's advantage over previous metrics is that it can detect if generated molecules are a) diverse and have similar b) chemical and c) biological properties as real molecules. We further provide an easy-to-use implementation that only requires the SMILES representation of the generated molecules as input to calculate the FCD. Implementations are available at: this https URL

研究动机与目标

为解决在从头药物设计中生成模型评估缺乏一致且可靠度量的问题。
克服现有度量如有效性、logP或药物样性等的局限性，这些度量易被操纵且缺乏生物相关性。
开发一种单一、全面的度量，同时捕捉分子在化学和生物空间中的多样性与与真实分子的相似性。
提供一种实用且易于使用的评估工具，仅需生成分子的SMILES输入即可完成。

提出的方法

FCD度量源自ChemblNet倒数第二层潜在表示的多变量高斯分布之间的Fréchet距离。
ChemblNet是一种在药物活性预测任务上预训练的深度神经网络，可提供具有生物信息的分子嵌入。
从ChemblNet的倒数第二层提取真实分子和生成分子的潜在向量，以捕捉结构和活性相关特征。
计算这些潜在向量经验分布之间的Fréchet距离，以衡量两者在高维空间中的相似性。
该方法无需额外模型训练，仅依赖于生成分子的SMILES字符串即可完成评估。
提供开源实现，便于集成到现有生成模型流程中。

实验结果

研究问题

RQ1单一度量能否有效评估从头药物设计中生成分子的多样性以及化学/生物相似性？
RQ2与传统度量如有效性、logP或药物样性相比，FCD在检测分布偏移和模型失效方面表现如何？
RQ3FCD在不依赖手工设计描述符的情况下，能在多大程度上检测生成分子的化学合理性与生物相关性？
RQ4FCD对可能欺骗传统度量的简单规则生成策略是否具有鲁棒性？
RQ5FCD能否作为跨不同数据集和模型架构的生成模型可靠且统一的基准？

主要发现

通过利用ChemblNet学习到的表示，FCD成功捕捉了生成分子的化学和生物相关性。
该度量能够检测到传统度量如logP或药物样性常忽略的分布偏移和模型失效。
与多个分散度量相比，FCD提供了更稳健且一致的评估，降低了被简单规则系统误导的风险。
该方法可在无需访问模型架构或训练数据的情况下，直接比较不同生成模型。
开源实现使研究人员仅通过SMILES字符串即可轻松计算FCD，显著提升了可复现性与采用率。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。