QUICK REVIEW

[論文レビュー] RPNT: Robust Pre-trained Neural Transformer -- A Pathway for Generalized Motor Decoding

Hao Fang, Ryan A. Canfield|arXiv (Cornell University)|Jan 25, 2026

EEG and Brain-Computer Interfaces被引用数 0

ひとこと要約

RPNT は Multidimensional Rotary Positional Embedding と文脈ベースの注意機構を備えた頑健な事前学習済みニューラルトランスフォーマを導入し、セッション・被験体・タスク・記録部位を跨ぐ優れた一般化運動デコードを実現する。マイクロ電極データセットと Neuropixel データセットで実証。

ABSTRACT

Brain decoding aims to interpret and translate neural activity into behaviors. As such, it is imperative that decoding models are able to generalize across variations, such as recordings from different brain sites, distinct sessions, different types of behavior, and a variety of subjects. Current models can only partially address these challenges and warrant the development of pretrained neural transformer models capable to adapt and generalize. In this work, we propose RPNT - Robust Pretrained Neural Transformer, designed to achieve robust generalization through pretraining, which in turn enables effective finetuning given a downstream task. In particular, RPNT unique components include 1) Multidimensional rotary positional embedding (MRoPE) to aggregate experimental metadata such as site coordinates, session name and behavior types; 2) Context-based attention mechanism via convolution kernels operating on global attention to learn local temporal structures for handling non-stationarity of neural population activity; 3) Robust self-supervised learning (SSL) objective with uniform causal masking strategies and contrastive representations. We pretrained two separate versions of RPNT on distinct datasets a) Multi-session, multi-task, and multi-subject microelectrode benchmark; b) Multi-site recordings using high-density Neuropixel 1.0 probes. The datasets include recordings from the dorsal premotor cortex (PMd) and from the primary motor cortex (M1) regions of nonhuman primates (NHPs) as they performed reaching tasks. After pretraining, we evaluated the generalization of RPNT in cross-session, cross-type, cross-subject, and cross-site downstream behavior decoding tasks. Our results show that RPNT consistently achieves and surpasses the decoding performance of existing decoding models in all tasks.

研究の動機と目的

セッション、部位、被験体、行動間での神経デコードの非定常性と記録変動に対処する。
神経スパイクからの頑健な一般化を可能にする事前学習-微調整パイプラインを開発する。
神経データに特化したニューラルトランスフォーマー要素を設計する（MRoPE、文脈ベースの注意、均一因果マスキングを用いたSSL）。
クロスセッション、クロス被験体、クロス型、クロスサイトのデコード性能向上を最先端ベースラインと比較して示す。

提案手法

サイト座標、セッション名、行動種別、時系列位置などの実験メタデータをエンコードする Multidimensional Rotary Positional Embedding (MRoPE) を導入する。
グローバルな注意機構上で学習可能な畳み込み核を用いる文脈ベースの注意機構を実装し、局所的な時間構造を捉え、非定常性を扱う。
均一因果マスキングと対照表現を用いる頑健な自己教師付き学習目的を用いて RPNT を事前学習する。
異なる神経データセット（マイクロ電極ベンチマーク、Neuropixel 記録）で2つの RPNT 変種を訓練し、クロスセッション・クロスタイプ・クロス被験体・クロスサイトのタスクで微調整を伴うデコードを評価する。
SSL事前学習時には因果マスク付き自己回帰目的と補助的なサイト不変性損失を採用する。
データ駆動の運動変数エンコーディングの洞察を明らかにするため、解釈可能な注意マップを提供する。

Figure 1: Overall illustration of the pretraining and finetuning workflow for generalized motor decoding. (A) Experimental setup for data collection while NHPs performed reaching tasks. (B) Preparation of pretaining data. (C) and (D) overall schemes for SSL and SFT, respectively. (E) Illustration of

実験結果

リサーチクエスチョン

RQ1RPNT は見たことのない脳部位、セッション、行動、被験体に対して頑健な運動デコードの一般化を達成できるか？
RQ2提案されたアーキテクチャ要素（MRoPE、文脈ベースの注意）と SSL 戦略は、クロスドメインシナリオで既存の神経デコーダより改善をもたらすか？
RQ3RPNT の事前学習と微調整の制度（FS-SFT vs Full-SFT）は、多様なデータセットで先行研究のベースラインと比較してどうか？
RQ4RPNT を用いたクロ-site Neuropixel データおよびクロスセッションベンチマークで、下流デコードにどの程度の利得が得られるか？

主な発見

Method	Cross-Session (C-CO)	Cross-Subject (T-CO)	Cross-Task (T-RT)
Wiener filter	0.8712 ±0.0137	-	-
MLP	0.9210 ±0.0010	0.7976 ±0.0220	0.7007 ±0.0774
S4D	0.9381 ±0.0083	0.8526 ±0.0243	0.7145 ±0.0671
Mamba	0.9287 ±0.0034	0.7692 ±0.0235	0.6694 ±0.1220
GRU	0.9376 ±0.0036	0.8453 ±0.0200	0.7279 ±0.0679
POYO-SS	0.9427 ±0.0019	0.8705 ±0.0193	0.7156 ±0.0966
POSSM-S4D-SS	0.9515 ±0.0021	0.8838 ±0.0171	0.7505 ±0.0735
POSSM-Mamba-SS	0.9550 ±0.0003	0.8747 ±0.0173	0.7418 ±0.0790
POSSM-GRU-SS	0.9549 ±0.0012	0.8863 ±0.0222	0.7687 ±0.0669
RPNT	0.9647 ±0.0026	0.9103 ±0.0182	0.8356 ±0.0914
NDT-2 (FT)	0.8507 ±0.0110	0.6549 ±0.0290	0.5903 ±0.1430
POYO-1 (FT)	0.9611 ±0.0035	0.8859 ±0.0275	0.7591 ±0.0770
o-POSSM-S4D (FT)	0.9618 ±0.0007	0.9069 ±0.0120	0.7584 ±0.0637
o-POSSM-Mamba (FT)	0.9574 ±0.0016	0.9011 ±0.0148	0.7621 ±0.0765
o-POSSM-GRU (FT)	0.9587 ±0.0052	0.9021 ±0.0241	0.7717 ±0.0595
RPNT (FS-SFT)	0.9801 ±0.0060	0.9431 ±0.0103	0.8515 ±0.1071
RPNT (Full-SFT)	0.9894 ±0.0037	0.9626 ±0.0059	0.8778 ±0.1005

RPNT は、公的ベンチマークの3つの一般化シナリオ（クロスセッション、クロス被験体、クロスタスク）すべてでベースラインモデルを上回る。
単一セッションからのスクラッチ regime では、RPNT は R^2 において C-CO が 0.9647±0.0026、T-CO が 0.9103±0.0182、T-RT が 0.8356±0.0914。
RPNT の事前学習後に少数ショットまたは完全微調整を行うと、ベースラインより常に高い R^2 を示し、FS-SFT は 0.9801±0.0060（C-CO）、0.9431±0.0103（T-CO）、0.8515±0.1071（T-RT）を達成；Full-SFT はそれぞれ 0.9894±0.0037、0.9626±0.0059、0.8778±0.1005。
クロスサイト Neuropixel データでは、RPNT（スクラッチ）0.6358±0.0311 に対し RPNT（事前学習済み）0.6612±0.0328、事前学習済み RPNT が少数ショットで強力な性能を示す（例：10% 訓練分割）。
アブレーションにより MRoPE は他の位置エンコーディングより優れており、文脈ベースの注意は標準注意に対して有意な利得をもたらす（約5%）。
機能的結合は空間的注意マップから推定可能で、運動の神経エンコードに関するデータ駆動の洞察を可能にする。

Figure 2: A schematic of components in RPNT. Components in black indicate standard transformer signal flow (i.e, no masking and standard attention mechanism). Our novel proposed components include MRoPE (green), context-based attention (cyan), and uniform random masking strategy (pink). MRoPE incorp

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。