QUICK REVIEW

[论文解读] Sample Efficiency Matters: A Benchmark for Practical Molecular Optimization

Wenhao Gao, Tianfan Fu|arXiv (Cornell University)|Jun 22, 2022

Computational Drug Discovery Methods被引用 56

一句话总结

PMO 在严格的 10,000 次 oracle 调用预算下，对 23 个 oracle 函数评测 25 种分子设计方法，揭示许多现代方法并未优于老基线，且样本效率关键重要。

ABSTRACT

Molecular optimization is a fundamental goal in the chemical sciences and is of central interest to drug and material design. In recent years, significant progress has been made in solving challenging problems across various aspects of computational molecular optimizations, emphasizing high validity, diversity, and, most recently, synthesizability. Despite this progress, many papers report results on trivial or self-designed tasks, bringing additional challenges to directly assessing the performance of new methods. Moreover, the sample efficiency of the optimization--the number of molecules evaluated by the oracle--is rarely discussed, despite being an essential consideration for realistic discovery applications. To fill this gap, we have created an open-source benchmark for practical molecular optimization, PMO, to facilitate the transparent and reproducible evaluation of algorithmic advances in molecular optimization. This paper thoroughly investigates the performance of 25 molecular design algorithms on 23 tasks with a particular focus on sample efficiency. Our results show that most "state-of-the-art" methods fail to outperform their predecessors under a limited oracle budget allowing 10K queries and that no existing algorithm can efficiently solve certain molecular optimization problems in this setting. We analyze the influence of the optimization algorithm choices, molecular assembly strategies, and oracle landscapes on the optimization performance to inform future algorithm development and benchmarking. PMO provides a standardized experimental setup to comprehensively evaluate and compare new molecule optimization methods with existing ones. All code can be found at https://github.com/wenhao-gao/mol_opt.

研究动机与目标

在现实的 oracle 预算（最多 10,000 次评估）下评估广泛分子优化算法的性能。
提供一个标准化、可重复的基准（PMO），具备多样的 oracle landscape 以比较方法。
识别算法选择、组装策略和 oracle landscapes 如何影响性能。
通过独立重复和广泛的超参数调优促进透明度，为未来方法开发提供指导。

提出的方法

在一个覆盖大规模化学空间的标量目标下定义一个通用的单目标分子优化设定。
对分子组装策略进行分类（SMILES、SELFIES、基于图的、基于合成的）以及优化算法（GA、MCTS、BO、VAE、GAN、SBM、GFlowNet、RL、HC、GRAD 等）。
在 23 个 oracle 函数（QED、DRD2、GSK3β、JNK3，以及基于 Guacamol 的 MPOs）上评估 25 种方法，结果标准化至 [0,1]。
以 AUCTop-10（前10平均性质对 oracle 调用次数的曲线下面积）作为主要样本效率指标，限额为 10,000 次 oracle 调用。
进行多次独立试验和超参数重新优化，以确保对比的鲁棒性。
提供开源代码和标准化的实验协议以实现可重复性。

实验结果

研究问题

RQ1在固定且现实的 oracle 预算（最多 10,000 次调用）下，不同的分子组装策略和优化算法的表现如何？
RQ2在使用如 AUCTop-10 这样的样本效率指标进行评估时， newer 的、据称处于最前沿的方法真的优于较旧的基线吗？
RQ3oracle landscape（基于同分异构体、基于相似性、以及基于 MPO 的）如何影响方法性能，哪些方法最适合哪些景观？
RQ4哪些因素（超参数调优、随机性、基于模型与非模型的方法）在实际分子优化中最影响样本效率？
RQ5像 PMO 这样的标准化基准测试为设计未来的分子优化算法和基准协议提供了哪些指导？

主要发现

在 10,000 次 oracle 调用内，所研究的任何方法都无法可靠地解决从头的分子优化，只有极少数简单任务能高效完成。
在 PMO 协议下，像 REINVENT 和 Graph GA 这样的较旧方法常常优于较新的方法，强调了强基线的价值。
基于字符串的遗传算法（如 SMILES GA、STONED）在同分异构体类型任务上表现出色，而基于模型的方法则因代理质量和设计而结果参差不齐。
基于模型的优化可以提高样本效率，但需要对内部/外部循环进行精心设计；仅仅增加一个预测模型并不能保证收益。
SELFIES 在优化能力或样本效率上并不始终优于基于 SMILES 的方法，尽管在动作被标记为令牌时，基于 SELFIES 的 GA 显示出优势。
超参数重新优化至关重要；原始论文中的默认设置在有限预算情景下往往表现不佳。
超参数变异性和跨次运行的非确定性使得需要进行多次独立试验以实现鲁棒的基准测试。
PMO 强调在分子优化领域推动进展需要标准化报告、广泛的超参数调优和可重复性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。