Skip to main content
QUICK REVIEW

[论文解读] Bayesian Optimization of Catalysis With In-Context Learning

Mayk Caldas Ramos, Shane S. Michtavy|arXiv (Cornell University)|Apr 11, 2023
Machine Learning in Materials Science被引用 30
一句话总结

本文展示了通过冻结的大语言模型在上下文学习(ICL)中实现带不确定性的回归,从而在催化剂设计和合成条件预测中实现贝叶斯优化且无需训练。它比较了不同提示策略和模型家族在溶解度和 C2 收率任务上的表现,并显示了在 ICL 与基线之间进行贝叶斯优化的可行性。

ABSTRACT

Large language models (LLMs) can perform accurate classification with zero or few examples through in-context learning. We extend this capability to regression with uncertainty estimation using frozen LLMs (e.g., GPT-3.5, Gemini), enabling Bayesian optimization (BO) in natural language without explicit model training or feature engineering. We apply this to materials discovery by representing experimental catalyst synthesis and testing procedures as natural language prompts. A key challenge in materials discovery is the need to characterize suboptimal candidates, which slows progress. While BO is effective for navigating large design spaces, standard surrogate models like Gaussian processes assume smoothness and continuity, an assumption that fails in highly non-linear domains such as heterogeneous catalysis. Our task-agnostic BO workflow overcomes this by operating directly in language space, producing interpretable and actionable predictions without requiring structural or electronic descriptors. On benchmarks like aqueous solubility and oxidative coupling of methane (OCM), BO-ICL matches or outperforms Gaussian processes. In live experiments on the reverse water-gas shift (RWGS) reaction, BO-ICL identifies near-optimal multi-metallic catalysts within six iterations from a pool of 3,700 candidates. Our method redefines materials representation and accelerates discovery, with broad applications across catalysis, materials science, and AI. Code: https://github.com/ur-whitelab/BO-ICL.

研究动机与目标

  • 证明在冻结的大语言模型中进行的上下文学习可以产生适用于催化与相关材料设计中贝叶斯优化的不确定性预测。
  • 展示自然语言合成程序如何表示催化剂及条件以预测性质。
  • 在溶解度和催化剂产率数据集上评估性能,并与基线和微调方法进行对比。

提出的方法

  • 使用 LIFT 将催化合成程序转换为用于解码器式大语言模型(GPT-3、GPT-3.5、GPT-4)的自然语言提示。
  • 通过两种提示策略(多选五项和 topk k 次完成)从标记级概率推导带不确定性的回归。
  • 量化不确定性以在询问-给出循环中使用获取函数(EI、UCB)实现贝叶斯优化。
  • 在 ESOL 溶解度和 C2 收率数据集上比较 ICL 与基线(LIFT 微调、KRR、GPR、KNN)。
  • 通过最大边际相关性(MMR)进行上下文选择,以扩展 ICL 超出模型上下文窗口的能力。
  • 评估更新的模型性能(GPT-4)并通过不确定性重新校准步骤评估校准。
Figure 1: Our approach uses a Language-Interfaced Fine-Tuning (LIFT) framework with a Generative Pre-trained Transformer (GPT) to generate tokens that represent the reaction conditions that include a synthesis procedure. The catalyst synthesis and testing data is converted to an embedding vector and
Figure 1: Our approach uses a Language-Interfaced Fine-Tuning (LIFT) framework with a Generative Pre-trained Transformer (GPT) to generate tokens that represent the reaction conditions that include a synthesis procedure. The catalyst synthesis and testing data is converted to an embedding vector and

实验结果

研究问题

  • RQ1冻结的大语言模型的上下文学习是否能在催化问题中提供足够的预测不确定性以用于贝叶斯优化?
  • RQ2提示策略(多选 vs topk)和上下文样本选择如何影响 ICL 的准确性与 BO 的性能?
  • RQ3ICL 与传统基线(KRR、GPR、KNN)及微调在预测溶解度和催化产率方面的比较如何?
  • RQ4模型规模和新鲜度(GPT-4 vs Curie)对该领域的 BO 结果有何影响?
  • RQ5是否可通过 ICL 提示实现逆设计,以引导实验合成程序朝向期望性能?

主要发现

模型提示RMSE ↓MAE ↓r ↑neg-ll ↓
text-curie-001multi13.4873.8780.0518.139
text-curie-001topk3.0162.2710.49916.985
text-davinci-003multi3.6152.5760.41115.031
text-davinci-003topk2.6521.9960.6034.842
gpt-4topk2.6831.8540.6137.629
Fine-tuned text-ada-001topk1.9361.3250.8249.775
  • ICL 在溶解度预测方面可达到与基线相当或更高的性能,且在不确定性重新校准后表现更优。
  • Topk 提示通常比多选提示具有更好的数据效率,达到可比的 MAE 与相关性仅需更少的示例。
  • GPT-4 和更新的模型相比早期的 LLMs 提高了性能,尽管聊天模型可能因缺少 logprobs 而影响不确定性使用。
  • 在低数据量条件下,带 ICL 的 BO 是可行的,能够识别高溶解度性能值,但 C2 收率表现出更高的复杂性,在某些设置下使用嵌入的 GPR 可以超越 ICL。
  • 重新校准不确定性可以改善校准度,并在溶解度任务中使 ICL 在重新校准后超越基线。
  • 使用文本嵌入的 GPR 基线可能很强,而在更复杂的 C2 数据集上微调仍然占优;同时 ICL 避免了训练成本。
Figure 2: Dependence of the six models considered in this work as a function of the number of training points $N$ from where the model could select examples to create the context (for ICL models) or to train (for baseline models). In these experiments, our ICL models have a fixed example selector si
Figure 2: Dependence of the six models considered in this work as a function of the number of training points $N$ from where the model could select examples to create the context (for ICL models) or to train (for baseline models). In these experiments, our ICL models have a fixed example selector si

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。