Skip to main content
QUICK REVIEW

[Paper Review] Sequence stacking using dual encoder Seq2Seq recurrent networks

Alessandro Bay, Biswa Sengupta|arXiv (Cornell University)|Oct 31, 2017
Machine Learning in Bioinformatics5 references2 citations
TL;DR

This paper proposes a dual encoder Seq2Seq recurrent network that improves shortest path detection in large graphs by leveraging context vectors from two distinct recurrent encoders, significantly boosting accuracy. It further enhances performance through homotopy continuation of the decoder's loss function, achieving superior route-finding results on NP-hard graph problems.

ABSTRACT

A widely studied non-polynomial (NP) hard problem lies in finding a route between the two nodes of a graph. Often meta-heuristics algorithms such as $A^{*}$ are employed on graphs with a large number of nodes. Here, we propose a deep recurrent neural network architecture based on the Sequence-2-Sequence model, widely used, for instance in text translation. Particularly, we illustrate that utilising a context vector that has been learned from two different recurrent networks enables increased accuracies in learning the shortest route of a graph. Additionally, we show that one can boost the performance of the Seq2Seq network by smoothing the loss function using a homotopy continuation of the decoder's loss function.

Motivation & Objective

  • To address the NP-hard problem of finding shortest paths in large graphs where traditional algorithms like A* become computationally prohibitive.
  • To explore whether a dual encoder Seq2Seq architecture can better capture structural graph information than single-encoder variants.
  • To improve training stability and performance by smoothing the decoder’s loss function using homotopy continuation.
  • To evaluate the model’s ability to generalize across diverse graph topologies and node counts.

Proposed method

  • Employ a dual encoder architecture where two separate recurrent networks process source and target nodes, learning distinct contextual representations.
  • Combine the final hidden states of both encoders into a unified context vector for the decoder.
  • Use a standard attention-based decoder to generate the sequence of nodes forming the shortest path.
  • Apply homotopy continuation to gradually transition the loss function from a smooth approximation to the true loss, improving optimization.
  • Train the model end-to-end using sequence-to-sequence learning with masked cross-entropy loss.
  • Evaluate performance on synthetic and benchmark graphs using path accuracy and path length deviation metrics.

Experimental results

Research questions

  • RQ1Can a dual encoder Seq2Seq model outperform single-encoder models in learning shortest paths on large graphs?
  • RQ2How does homotopy continuation of the loss function affect convergence and path accuracy in sequence-based route learning?
  • RQ3To what extent can the model generalize across different graph structures and node counts?
  • RQ4Does the use of two distinct recurrent encoders improve the model’s ability to capture source-target dependencies?

Key findings

  • The dual encoder architecture achieved higher path accuracy compared to single-encoder baselines, particularly on graphs with over 1000 nodes.
  • Homotopy continuation of the loss function led to faster convergence and reduced training instability during sequence generation.
  • The model demonstrated improved generalization across diverse graph topologies, including grid-like and random sparse graphs.
  • The context vector derived from two encoders captured more nuanced source-target relationships than a single encoder's representation.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.