Skip to main content
QUICK REVIEW

[논문 리뷰] Escaping the Hydrolysis Trap: An Agentic Workflow for Inverse Design of Durable Photocatalytic Covalent Organic Frameworks

Iman Peivaste, Nicolas D. Boscher|arXiv (Cornell University)|2026. 03. 05.
Covalent Organic Framework Applications인용 수 0
한 줄 요약

The paper introduces Ara, an LLM-guided agent that navigates COF design space to jointly optimize band gap, band-edge, and hydrolytic stability, achieving superior hit rates and faster first-hit discovery compared with random search and Bayesian optimization.

ABSTRACT

Covalent organic frameworks (COFs) are promising photocatalysts for solar hydrogen production, yet the most electronically favorable linkages, imines, hydrolyze rapidly in water, creating a stability--activity trade-off that limits practical deployment. Navigating the combinatorial design space of nodes, linkers, linkages, and functional groups to identify candidates that are simultaneously active and durable remains a formidable challenge. Here we introduce Ara, a large-language-model (LLM) agent that leverages pretrained chemical knowledge, donor--acceptor theory, conjugation effects, and linkage stability hierarchies, to guide the search for photocatalytic COFs satisfying joint band-gap, band-edge, and hydrolytic-stability criteria. Evaluated against random search and Bayesian optimization (BO) over a space consisting of candidates with various nodes, linkers, linkages, and r-groups, screened with a GFN1-xTB fragment pipeline, Ara achieves a 52.7\% hit rate (11.5$ imes$ random, p = 0.006), finds its first hit at iteration 12 versus 25 for random search, and significantly outperforms BO (p = 0.006). Inspection of the agent's reasoning traces reveals interpretable chemical logic: early convergence on vinylene and beta-ketoenamine linkages for stability, node selection informed by electron-withdrawing character, and systematic R-group optimization to center the band gap at 2.0 eV. Exhaustive evaluation of the full search space uncovers a complementary exploitation--exploration trade-off between the agent and BO, suggesting that hybrid strategies may combine the strengths of both approaches. These results demonstrate that LLM chemical priors can substantially accelerate multi-criteria materials discovery.

연구 동기 및 목표

  • Address the stability–activity trade-off in imine-linked COFs under aqueous photocatalytic conditions.
  • Develop and evaluate an LLM-based agent that leverages chemical priors to guide multi-criteria COF design.
  • Compare the agent's performance against random search and Bayesian optimization on a defined COF design space.
  • Calibrate a fragment-based screening pipeline and quantify the agent's sample efficiency and decision rationale.

제안 방법

  • Define a combinatorial COF design space with nodes, linkers, linkages, and R-groups (820 candidates after compatibility constraints).
  • Use a fragment-based screening pipeline with RDKit assembly, 3D embedding, GFN1-xTB geometry optimization, and delta-SCF IP−EA to estimate electronic gaps.
  • Calibrate xTB gaps to DFT scale via a linear transfer function using a set of 13 COFs; map CBM positions relative to NHE.
  • Compute a composite stability index SCSI from linkage stability, shielding, and hydrophobicity with weights (0.50, 0.30, 0.20).
  • Classify hits by satisfying: band gap 1.8–2.2 eV, CBM < 0 V, and SCSI ≥ 0.7; use a continuous reward r combining gap, CBM score, and SCSI.
  • Compare three search strategies (random, Bayesian optimization, Ara) over 200 iterations with five seeds.
Figure 1: Overview of the Ara agentic workflow for COF photocatalyst discovery. (a) The combinatorial design space comprises 820 candidates formed from 7 trigonal nodes, 19 ditopic linkers, 4 linkage chemistries of varying hydrolytic stability, and 10 aromatic R-group substituents, subject to chemic
Figure 1: Overview of the Ara agentic workflow for COF photocatalyst discovery. (a) The combinatorial design space comprises 820 candidates formed from 7 trigonal nodes, 19 ditopic linkers, 4 linkage chemistries of varying hydrolytic stability, and 10 aromatic R-group substituents, subject to chemic

실험 결과

연구 질문

  • RQ1Can an LLM-guided agent efficiently navigate a multi-criteria COF design space to identify candidates that meet electronic and stability requirements?
  • RQ2How does agent-guided search compare to random search and Bayesian optimization in hit rate, first-hit timing, and cumulative hits?
  • RQ3What driving chemical strategies (linkage type, node choice, and R-group tuning) enable high-quality hits within budget constraints?
  • RQ4Is the agent’s advantage robust to variations in the stability scoring weights (SCSI)?

주요 결과

MethodCum. hitsHit rate (%)First hit (iter.)Best rewardSuccess rate (%)
Random9.2 ± 1.94.625 ± 160.903 ± 0.01282.1
Bayesian optimization28.2 ± 3.814.122 ± 310.921 ± 0.000100.0
Agent (Ara)105.4 ± 40.952.712 ± 130.895 ± 0.01695.7
  • The Ara agent achieved a cumulative hit rate of 52.7% across 200 iterations, outperforming random search 11.5-fold (p=0.006).
  • Ara’s hit rate is higher than Bayesian optimization, with statistically significant superiority (p=0.006).
  • The first hit was found earlier with Ara at iteration 12, versus 25 for random and 22 for BO.
  • Across seeds, Ara yielded 105.4 ± 40.9 cum. hits with a 52.7% hit rate (vs. 9.2 ± 1.9 cum. hits for random and 28.2 ± 3.8 for BO).
  • The agent’s reasoning traces show a progression toward vinylene and β-ketoenamine linkages for stability, avoidance of overly electron-withdrawing nodes, and systematic R-group tuning to center the band gap around 2.0 eV.
  • An exhaustive evaluation of 670 successfully computed candidates identified 38 ground-truth hits, revealing an exploitation–exploration trade-off where the agent excels at rapid high-quality hits while BO covers broader hit landscapes.
  • Sensitivity analysis confirmed the agent’s advantage persists across 30 stability-weight triples for SCSI, indicating robustness to scoring-parameter choices.
Figure 2: Scatter plot of xTB (IP $-$ EA) fundamental gap versus DFT band gap for 13 COFs spanning six linkage types, with the linear transfer function overlaid. The calibration set includes boronate ester, boroxine, and triazine linkage types not present in the search space to broaden the range of
Figure 2: Scatter plot of xTB (IP $-$ EA) fundamental gap versus DFT band gap for 13 COFs spanning six linkage types, with the linear transfer function overlaid. The calibration set includes boronate ester, boroxine, and triazine linkage types not present in the search space to broaden the range of

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.