QUICK REVIEW

[論文レビュー] Junction Tree Variational Autoencoder for Molecular Graph Generation

Wengong Jin, Regina Barzilay|arXiv (Cornell University)|Feb 12, 2018

Machine Learning in Materials Science参考文献 37被引用数 711

ひとこと要約

JT-VAE は二段階プロセスで分子グラフを生成します：まず有効なサブ構造の結節木を作成し、次にグラフデコーダが全分子を組み立て、100% の妥当性と強力な性質最適化を実現します。

ABSTRACT

We seek to automate the design of molecules based on specific chemical properties. In computational terms, this task involves continuous embedding and generation of molecular graphs. Our primary contribution is the direct realization of molecular graphs, a task previously approached by generating linear SMILES strings instead of graphs. Our junction tree variational autoencoder generates molecular graphs in two phases, by first generating a tree-structured scaffold over chemical substructures, and then combining them into a molecule with a graph message passing network. This approach allows us to incrementally expand molecules while maintaining chemical validity at every step. We evaluate our model on multiple tasks ranging from molecular generation to optimization. Across these tasks, our model outperforms previous state-of-the-art baselines by a significant margin.

研究の動機と目的

性質最適化と有効なグラフ生成を促進する連続表現を学習することにより、分子設計を自動化する。
化学的に有効な中間体を持つ分子グラフを直接モデリングすることにより、SMILESベースの制限を克服する。
生成過程での実行可能性を保証するために、結節木→グラフの二段階デコーダを開発する。

提案手法

分子を有効なサブ構造（クラスター）上の結節木として表現する。
結節木と完全な分子グラフの両方を、木型エンコーダとグラフエンコーダを用いたメッセージパッシングニューラルネットワークにより潜在ベクトル z_T と z_G にエンコードする。
まず結節木を再構成し、次にサブグラフをグラフデコーダで完全な分子グラフへ組み立てることによりデコードする。
木のトポロジーとラベル予測に対する交差エントロピー損失とともに、変分オートエンコーダ目的で訓練する。
デコード中の化学的妥当性を確保するために、クラスターラベルを化学的に適合するオプションに制約する。

Figure 1 : Two almost identical molecules with markedly different canonical SMILES in RDKit. The edit distance between two strings is 22 (50.5% of the whole sequence).

実験結果

リサーチクエスチョン

RQ1結節木表現を用いた直接的なグラフベース生成は、SMILESベースの方法より化学的妥当性と多様性を向上させることができるか？
RQ2二段階の JT-VAE は、初期分布からのサンプリング下での再構成精度、妥当性、および性質指向の最適化を改善するか？
RQ3分子性質のベイズ最適化および制約付き最適化における JT-VAE の性能はどうか？

主な発見

手法	再構成	妥当性
CVAE	44.6%	0.7%
GVAE	53.7%	7.2%
SD-VAE	76.2%	43.5%
GraphVAE	-	13.5%
Atom-by-Atom LSTM	-	89.2%
JT-VAE	76.7%	100.0%

JT-VAE は事前分布からデコードしたとき、再構成精度 76.7%、妥当性は 100% を達成する。
JT-VAE は分子生成と最適化タスクにおいて、SMILESベースのベースラインを著しく上回る。
ベイズ最適化では、JT-VAE がベースラインより高い性質スコアを持つトップ分子を見つける（トップ1スコア 5.30 対 SD-VAE の 4.04）。
JT-VAE 埋め込みで訓練されたスパースガウス過程は、ベースラインより予測性能が良い（対数尤度 LL = -1.658、RMSE = 1.290）。
制約付き最適化は、類似性制約（デルタ = 0.4）で最大 80% の成功率を示し、平均改善は 0.84。
結節木分解によるクラスタ数の線形性により、デコードは効率的な計算量を達成する。

Figure 2 : Comparison of two graph generation schemes: Structure by structure approach is preferred as it avoids invalid intermediate states (marked in red) encountered in node by node approach.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。