QUICK REVIEW

[论文解读] Re-evaluating Retrosynthesis Algorithms with Syntheseus

Krzysztof Maziarz, Austin Tripp|arXiv (Cornell University)|Oct 30, 2023

Machine Learning in Materials Science被引用 8

一句话总结

该论文介绍了 syntheseus 基准测试库，用于对单步和多步逆向合成进行一致评估，并利用它重新评估现有方法，揭示在仔细评估下模型排名的变化。

ABSTRACT

Automated Synthesis Planning has recently re-emerged as a research area at the intersection of chemistry and machine learning. Despite the appearance of steady progress, we argue that imperfect benchmarks and inconsistent comparisons mask systematic shortcomings of existing techniques, and unnecessarily hamper progress. To remedy this, we present a synthesis planning library with an extensive benchmarking framework, called syntheseus, which promotes best practice by default, enabling consistent meaningful evaluation of single-step models and multi-step planning algorithms. We demonstrate the capabilities of syntheseus by re-evaluating several previous retrosynthesis algorithms, and find that the ranking of state-of-the-art models changes in controlled evaluation experiments. We end with guidance for future works in this area, and call the community to engage in the discussion on how to improve benchmarks for synthesis planning.

研究动机与目标

由于基准和比较不一致，促进对逆向合成的改进评估实践。
提供一个标准化、可扩展的评估框架（syntheseus），默认强制执行最佳实践。
重新评估现有的单步和多步逆向合成方法，以说明在端到端仔细评估下排名的变化。
基于系统分析为未来的逆向合成研究与评估提供指导。

提出的方法

将 syntheseus 作为一个模块化、与模型无关的逆向合成评估平台引入。

实验结果

研究问题

RQ1当前单步和多步逆向合成评估实践中的陷阱是什么？
RQ2标准化的基准流程如何影响逆向合成模型的性能和排名的报告？
RQ3领域应采用哪些最佳实践以实现对 CASP 系统的公平、端到端评估？
RQ4使用 syntheseus 重新评估是否能够纠正或修改先前报道的结果和排名？
RQ5对未来在逆向合成评估方面可以提供哪些指导？

主要发现

Syntheseus 能实现对逆向合成方法的端到端一致评估，并揭示与先前文献相比的模型排名变化。
再评估显示某些指标相对于文献有改进，原因是一致的后处理、去重和有效性检查。
单步模型在速度—准确性权衡方面存在差异，基于变换的图输出在更高的 top-k 精度下往往优于纯解码器为基础的方法。
多步搜索结果取决于固定的单步模型和评估设置，强调需要公平的基线和受控比较。
研究强调基于召回率的度量的局限性，主张报告推理时间和多样性，以更好地反映端到端的 CASP 性能。
最佳实践包括使用以精度为重点的评估、对输出进行去重、验证分子有效性、缓存模型调用以及结合专家定性评估。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。