Skip to main content
QUICK REVIEW

[论文解读] Hierarchical Generation of Molecular Graphs using Structural Motifs

Wengong Jin, Regina Barzilay|arXiv (Cornell University)|Feb 8, 2020
Machine Learning in Materials Science参考文献 49被引用 109
一句话总结

We introduce HierVAE, a motif-based hierarchical graph encoder-decoder that uses large structural motifs to generate and reconstruct large molecular graphs, outperforming prior atom- and substructure-based methods on polymers and graph translation tasks.

ABSTRACT

Graph generation techniques are increasingly being adopted for drug discovery. Previous graph generation approaches have utilized relatively small molecular building blocks such as atoms or simple cycles, limiting their effectiveness to smaller molecules. Indeed, as we demonstrate, their performance degrades significantly for larger molecules. In this paper, we propose a new hierarchical graph encoder-decoder that employs significantly larger and more flexible graph motifs as basic building blocks. Our encoder produces a multi-resolution representation for each molecule in a fine-to-coarse fashion, from atoms to connected motifs. Each level integrates the encoding of constituents below with the graph at that level. Our autoregressive coarse-to-fine decoder adds one motif at a time, interleaving the decision of selecting a new motif with the process of resolving its attachments to the emerging molecule. We evaluate our model on multiple molecule generation tasks, including polymers, and show that our model significantly outperforms previous state-of-the-art baselines.

研究动机与目标

  • Motivate the use of large structural motifs to improve generation of large molecules like polymers.
  • Develop a hierarchical encoder that represents molecules from atoms to motifs for multi-resolution grounding.
  • Propose a motif-based autoregressive decoder that builds molecules motif-by-motif with attachment decisions.
  • Demonstrate superior reconstruction, translation performance, and decoding speed compared to existing baselines.

提出的方法

  • Extract a motif vocabulary from training molecules by decomposing graphs at bridge bonds and selecting frequently occurring subgraphs as motifs.
  • Build a three-layer hierarchical graph representation (motif, attachment, atom) and encode it with three hierarchical MPNs to obtain latent z for each molecule.
  • Use an autoregressive, coarse-to-fine decoder that predicts next motif, its attachment configuration, and how it attaches to the existing graph (sourced from z).
  • Train with teacher forcing to maximize a variational lower bound (ELBO) on the molecule distribution.
  • Extend to graph-to-graph translation by incorporating latent variables to produce diverse, property-optimized outputs with attention mechanisms.
  • When translating, use an encoder–decoder with hierarchical attention over multi-resolution representations to guide motif-level predictions.

实验结果

研究问题

  • RQ1Can larger, flexible motifs as building blocks improve generation and reconstruction of large molecules compared to atom- or small-substructure-based methods?
  • RQ2How does hierarchical motif-based encoding decouple and inform the decoding process for scalable polymer generation and graph translation?
  • RQ3Does the motif-based decoder enable faster decoding and better distributional similarity to real molecules on polymer and translation tasks?
  • RQ4What is the impact of using larger motifs versus restricted small motifs on reconstruction accuracy and property optimization metrics?

主要发现

  • HierVAE achieves significantly higher reconstruction accuracy (79.9%) than baselines (e.g., JT-VAE 58.5%).
  • On polymer generation, HierVAE attains state-of-the-art distributional statistics, with improvements in logP and molecular weight metrics.
  • HierVAE provides faster decoding than prior substructure-based methods, reducing generation steps and increasing speed by about 6.3x in decoding compared to a baseline.
  • In graph-to-graph translation, HierG2G attains higher QED and DRD2 improvements and faster decoding than JTNN and AtomG2G baselines.
  • Ablations show large motifs outperform small motifs, validating the central claim that motif-scale building blocks improve performance for large molecules.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。