QUICK REVIEW

[論文レビュー] Efficient Protein Optimization via Structure-aware Hamiltonian Dynamics

Jiahao Wang, Shuangjia Zheng|arXiv (Cornell University)|Jan 16, 2026

Protein Structure and Dynamics被引用数 0

ひとこと要約

HADESは構造情報を取り入れたベイズ最適化とハミルトニアン動力学を組み合わせ、タンパク質配列空間を効率的に探索して高適合性かつ構造的に適合する変異体を設計します。GB1と PhoQ におけるインシリコ測定でベースラインを上回ります。

ABSTRACT

The ability to engineer optimized protein variants has transformative potential for biotechnology and medicine. Prior sequence-based optimization methods struggle with the high-dimensional complexities due to the epistasis effect and the disregard for structural constraints. To address this, we propose HADES, a Bayesian optimization method utilizing Hamiltonian dynamics to efficiently sample from a structure-aware approximated posterior. Leveraging momentum and uncertainty in the simulated physical movements, HADES enables rapid transition of proposals toward promising areas. A position discretization procedure is introduced to propose discrete protein sequences from such a continuous state system. The posterior surrogate is powered by a two-stage encoder-decoder framework to determine the structure and function relationships between mutant neighbors, consequently learning a smoothed landscape to sample from. Extensive experiments demonstrate that our method outperforms state-of-the-art baselines in in-silico evaluations across most metrics. Remarkably, our approach offers a unique advantage by leveraging the mutual constraints between protein structure and sequence, facilitating the design of protein sequences with similar structures and optimized properties. The code and data are publicly available at https://github.com/GENTEL-lab/HADES.

研究の動機と目的

高次元でエピスタシスが豊富な最適化問題としてのタンパク質工学を動機づける。
タンパク質構造を先見情報として活用し適合性の景観を平滑化する。
構造認識型のベイズ最適化フレームワークを開発し有望な変異体をサンプルする。
連続表現から位置分離手順を通じて離散的な配列提案を実現する。

提案手法

ハミルトニアン動力学を用いて連続的なタンパク質状態空間で遠位サンプルを提案する。
ポテンシャルエネルギーU(q)をサロゲート予測器下の負の対数確率として定義し、運動量ベースの運動エネルギーK(p)を用いる。
仮想障壁/跳ね返り機構を leapfrog 更新時に用いて、連続状態を離散的なアミノ酸配列へ離散化する。
不確実性を考慮した獲得戦略としてアンサンブルサロゲートと上限信頼区間(Upper Confidence Bound, UCB)を用いる。
RMSD priors from ESMFoldを用いて訓練する共有シーケンスエンコーダと独立した構造および適合性デコーダからなる二段階エンコーダ-デコーダサロゲートを実装する。

実験結果

リサーチクエスチョン

RQ1構造認識型事前情報はタンパク質設計のベイズ最適化を安定化・加速できるか。
RQ2ハミルトニアン動力学ベースのサンプリングは高次元かつ離散的な配列空間の探索を改善するか。
RQ3構造摂動を priors として組み込むことは設計タンパク質の品質と多様性にどう影響するか。
RQ4不確実性推定と離散化制約は設計パフォーマンスにどのような影響を与えるか。

主な発見

HADESはGB1とPhoQにおいて累積的最大適合性、平均適合性、多様性指標で最先端のベースラインを上回る。
GB1では10回の全ての実行で最適配列を識別的に発見し、分散0。
PhoQでは最大適合性が高く、実行を重ねても機能的多様性（fDiv）を維持する。
アブレーションによるとハミルトニアンサンプリングまたは構造 prior を除くと性能が低下し、不確実性推定と仮想障壁は離散化誤差に対するロバスト性を高める。
結果はクエリ予算(K)とラウンド数が大きくなると規模が拡大し、ベースラインとの差が拡大する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。