QUICK REVIEW

[論文レビュー] Memory Fusion Network for Multi-view Sequential Learning

Amir Zadeh, Paul Pu Liang|arXiv (Cornell University)|Feb 3, 2018

Domain Adaptation and Few-Shot Learning被引用数 120

ひとこと要約

MFNは、視点ごとのダイナミクスを別々にモデル化し、Delta-memory Attentionで視点間相互作用を識別し、Multi-view Gated Memoryで時間的に視点間情報を蓄積して、複数のマルチビュー系列ベンチマークで最先端の結果を達成します。

ABSTRACT

Multi-view sequential learning is a fundamental problem in machine learning dealing with multi-view sequences. In a multi-view sequence, there exists two forms of interactions between different views: view-specific interactions and cross-view interactions. In this paper, we present a new neural architecture for multi-view sequential learning called the Memory Fusion Network (MFN) that explicitly accounts for both interactions in a neural architecture and continuously models them through time. The first component of the MFN is called the System of LSTMs, where view-specific interactions are learned in isolation through assigning an LSTM function to each view. The cross-view interactions are then identified using a special attention mechanism called the Delta-memory Attention Network (DMAN) and summarized through time with a Multi-view Gated Memory. Through extensive experimentation, MFN is compared to various proposed approaches for multi-view sequential learning on multiple publicly available benchmark datasets. MFN outperforms all the existing multi-view approaches. Furthermore, MFN outperforms all current state-of-the-art models, setting new state-of-the-art results for these multi-view datasets.

研究の動機と目的

異なる視点からのデータが視点固有の相互作用と視点間の相互作用を持つマルチビュー系列学習を動機づけ、解決する。
時間を通じて両方の相互作用タイプをモデル化する MFN アーキテクチャを提案する。
多様なマルチモーダルデータセットで MFN の有効性を実証し、最先端手法と比較する。

提案手法

各視点が独自の LSTM を持つ LSTM のシステムを実装し、視点固有のダイナミクスを捉える。
Delta-memory Attention Network (DMAN) を用いて、視点間で連続するメモリ状態 (t-1 および t) に注意を向けることで視点間相互作用の関連度を割り当てる。
DMAN の出力で更新される Multi-view Gated Memory を導入し、時間を通じて視点間相互作用を蓄積・要約する。
全ての視点固有 LSTM の出力と視点間メモリの出力を組み合わせて最終予測を行う。
Delta memory と視点間メモリの寄与を評価するためのアブレーション研究を実施する。

Figure 1: Overview figure of Memory Fusion Network (MFN) pipeline. $\sigma$ denotes the $sigmoid$ activation function, $\tau$ the $tanh$ activation function, $\odot$ the Hadamard product and $\oplus$ element wise addition. Each LSTM encodes information from one view such as language ( $l$ ), video (

実験結果

リサーチクエスチョン

RQ1マルチビューの時系列データにおいて、視点固有の相互作用と視点間相互作用の両方を明示的にどのようにモデル化できるか？
RQ2Delta-memory Attention メカニズムを組み込むと、時間を通じた視点間相互作用の発見が改善されるか？
RQ3専用の Multi-view Gated Memory が長期的な視点間情報の把握に与える影響はどの程度か？
RQ4多様なデータセットに対して、MFN が最先端のマルチビュー時系列モデルと比較してどのような性能を示すか？

主な発見

MFN は、マルチモーダル感情分析、感情認識、話者特性分析の全ての評価データセットと指標で最先端の性能を達成した。
アブレーション研究により、Delta memory と Multi-view Gated Memory の両方を備えた MFN が、これらの要素を欠く MFN の変種よりも優れていることが示された。
MFN は、パラメータ数が大幅に少なく（約5e5）、実行時も高速で、代表的なベースラインに比べて推論数は約2858 回/秒程度の速度を達成し、性能が向上している。
複数の視点を用いると、一視点 MFN の variante より常に結果が改善され、視点間モデリングの価値が強調される。
Delta-memory (t-1, t) は重要な時系列コンテキストを提供しており、MFN (no Δ) のアブレーションで性能低下として示される。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。