QUICK REVIEW

[論文レビュー] Binding and Perspective Taking as Inference in a Generative Neural Network Model

Mahdi Sadeghi, Fabian Schrodt|arXiv (Cornell University)|Dec 9, 2020

Action Observation and Synchronization参考文献 28被引用数 3

ひとこと要約

本論文は、パrametric biasニューロンにおける勾配ベースの後向き推論を通じて、視点の取り方と特徴の結合という問題を解決する生成的ニューラルネットワークモデルを提案する。標準的な運動パターンで訓練し、予測誤差の誤差逆伝播によって結合および視点パラメータを適応させることで、歪んだ視点下でも生物的運動の強固なゲシュタルト的知覚を達成し、集団符号化が性能を顕著に向上させる。

ABSTRACT

The ability to flexibly bind features into coherent wholes from different perspectives is a hallmark of cognition and intelligence. Importantly, the binding problem is not only relevant for vision but also for general intelligence, sensorimotor integration, event processing, and language. Various artificial neural network models have tackled this problem with dynamic neural fields and related approaches. Here we focus on a generative encoder-decoder architecture that adapts its perspective and binds features by means of retrospective inference. We first train a model to learn sufficiently accurate generative models of dynamic biological motion or other harmonic motion patterns, such as a pendulum. We then scramble the input to a certain extent, possibly vary the perspective onto it, and propagate the prediction error back onto a binding matrix, that is, hidden neural states that determine feature binding. Moreover, we propagate the error further back onto perspective taking neurons, which rotate and translate the input features onto a known frame of reference. Evaluations show that the resulting gradient-based inference process solves the perspective taking and binding problem for known biological motion patterns, essentially yielding a Gestalt perception mechanism. In addition, redundant feature properties and population encodings are shown to be highly useful. While we evaluate the algorithm on biological motion patterns, the principled approach should be applicable to binding and Gestalt perception problems in other domains.

研究の動機と目的

神経ネットワーク的手法を用いて、認知的知覚における視点の取り方と特徴の結合という二重の課題に取り組む。
歪んだまたは入れ混ぜられた視覚入力から、標準的な視点と一貫性のある特徴の結合を推論できるモデルを開発する。
位置、方向、大きさといった運動特徴の分解と集団符号化が、特徴の結合と視点推論にどのように寄与するかを調査する。
視点と結合のためのパrametric biasニューロンが、予測誤差の誤差逆伝播によってオンラインで適応可能であることを示す。
生物的運動にとどまらず、柔軟な特徴統合と視点変換を要する他の分野へのモデルの適用可能性を拡張する。

提案手法

視点の取り方（回転行列および平行移動行列）と特徴の結合（結合行列）のための別々のモジュールを備えた生成的オートエナコーダアーキテクチャを採用する。
各関節の運動を相対的位置、運動方向、運動の大きさという3つのサブモダリティに分解し、それぞれを集団符号で符号化する。
振り子や歩行の歩行パターンなどの標準的な運動パターンでモデルを訓練し、正確な生成モデルを学習する。
再構成誤差を視点および結合パラメータに逆伝播させることで、後向き推論を適用し、オンラインでパラメータを適応させる。
回転、平行移動、結合の行列というパrametric biasニューロンを、勾配降下法で最適化可能なパラメータとして使用する。
回転や平行移動などのさまざまな歪み下での性能を評価し、集団符号化の有無を比較する。

実験結果

リサーチクエスチョン

RQ1神経ネットワークモデルは、動的運動パターンの歪んだ視覚入力から、標準的な視点を推論できるか？
RQ2視点の歪み下でも、モデルは個々の運動特徴を一貫性のあるゲシュタルト的知覚に結合できるか？
RQ3位置、方向、大きさというサブモダリティへの分解と集団符号化が、特徴の結合と視点推論に与える影響は何か？
RQ4視点と結合のためのパrametric biasニューロンが、後向き誤差逆伝播によってどの程度適応可能か？
RQ5本モデルは、顕著な視点変化下でも、3次元人体の歩行のような複雑な運動パターンに一般化可能か？

主な発見

本モデルは、3軸にわたるほぼ90°の回転といった強い歪み下でも、正しい標準的視点（回転および平行移動）を正しく推論できる。
平行移動と回転の歪みが同時に最適化され、より極端な変換では収束遅延が増加する。
運動特徴の集団符号化が、特に複雑またはノイズの多い状況下で、正しい視点および結合の推論能力を顕著に向上させる。
正則化、残差接続、またはディープネットワークで一般的に用いられるその他のスケーリング技術を一切用いなくても、信頼性の高い特徴の結合と視点の取り方を達成できる。
位置、方向、大きさへのサブモダリティへの分解が、モデルの耐障害性と結合メカニズムの解釈可能性を向上させる。
後向き推論メカニズムにより、結合および視点パラメータのオンライン適応が可能となり、生物学的に妥当な近似的ベイズ推論を模倣する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。