QUICK REVIEW

[논문 리뷰] RPNT: Robust Pre-trained Neural Transformer -- A Pathway for Generalized Motor Decoding

Hao Fang, Ryan A. Canfield|arXiv (Cornell University)|2026. 01. 25.

EEG and Brain-Computer Interfaces인용 수 0

한 줄 요약

RPNT는 다차원 로터리 포지션 임베딩과 맥락 기반 어텐션을 갖춘 강건한 사전 학습 신경 변환기를 도입하여 세션, 피험자, 작업, 기록 사이트에 걸친 일반화된 모터 디코딩을 우수하게 달성하며, 마이크로전극 데이터셋과 Neuropixel 데이터셋에서 시연됩니다.

ABSTRACT

Brain decoding aims to interpret and translate neural activity into behaviors. As such, it is imperative that decoding models are able to generalize across variations, such as recordings from different brain sites, distinct sessions, different types of behavior, and a variety of subjects. Current models can only partially address these challenges and warrant the development of pretrained neural transformer models capable to adapt and generalize. In this work, we propose RPNT - Robust Pretrained Neural Transformer, designed to achieve robust generalization through pretraining, which in turn enables effective finetuning given a downstream task. In particular, RPNT unique components include 1) Multidimensional rotary positional embedding (MRoPE) to aggregate experimental metadata such as site coordinates, session name and behavior types; 2) Context-based attention mechanism via convolution kernels operating on global attention to learn local temporal structures for handling non-stationarity of neural population activity; 3) Robust self-supervised learning (SSL) objective with uniform causal masking strategies and contrastive representations. We pretrained two separate versions of RPNT on distinct datasets a) Multi-session, multi-task, and multi-subject microelectrode benchmark; b) Multi-site recordings using high-density Neuropixel 1.0 probes. The datasets include recordings from the dorsal premotor cortex (PMd) and from the primary motor cortex (M1) regions of nonhuman primates (NHPs) as they performed reaching tasks. After pretraining, we evaluated the generalization of RPNT in cross-session, cross-type, cross-subject, and cross-site downstream behavior decoding tasks. Our results show that RPNT consistently achieves and surpasses the decoding performance of existing decoding models in all tasks.

연구 동기 및 목표

세션, 사이트, 피험자 및 행동 간의 신경 디코딩에서의 비정상성 및 기록 편차를 해결한다.
신경 스파이크로부터 모터 디코딩의 강건한 일반화를 가능하게 하는 프리트레이닝-파인튜닝 파이프라인을 개발한다.
신경 데이터에 맞춘 신경 변환기 구성요소를 설계한다( MRoPE, 맥락 기반 어텐션, 균일한 인과 마스킹이 포함된 SSL ).
다양한 데이터셋에서의 세션 간, 피험자 간, 유형 간, 사이트 간 디코딩 성능 향상을 최첨단 기준선 대비 시연한다.

제안 방법

실험 메타데이터(사이트 좌표, 세션 이름, 행동 유형, 시간 위치 등)를 인코딩하기 위해 Multidimensional Rotary Positional Embedding(MRoPE)을 도입한다.
전역 어텐션에서 작동하는 학습 가능한 컨볼루션 커널을 이용한 맥락 기반 어텐션 메커니즘을 구현하여 로컬 시간 구조를 포착하고 비정상성을 다룬다.
균일한 인과 마스킹과 대조적 표현을 활용한 강건한 자기지도 학습 목표를 사용하여 RPNT를 프리트레이닝한다.
두 가지 서로 다른 신경 데이터셋(마이크로전극 벤치마크; Neuropixel 기록)에서 두 가지 RPNT 변형을 학습시키고 세션 간, 유형 간, 피험자 간, 사이트 간 작업에서 파인튜닝으로 다운스트림 디코딩을 평가한다.
SSL 프리트레이닝 동안 인과 마스킹된 오토리그레시브 목표와 보조 사이트 불변 손실을 도입한다.
데이터 기반 모션 변수 인코딩에 대한 해석 가능한 어텐션 맵을 제공하여 해석 가능한 데이터 기반 인사이트를 제시한다.

Figure 1: Overall illustration of the pretraining and finetuning workflow for generalized motor decoding. (A) Experimental setup for data collection while NHPs performed reaching tasks. (B) Preparation of pretaining data. (C) and (D) overall schemes for SSL and SFT, respectively. (E) Illustration of

실험 결과

연구 질문

RQ1RPNT가 보이지 않는 뇌 사이트, 세션, 행동 및 피험자 간에 강건한 모터 디코딩 일반화를 달성할 수 있는가?
RQ2제안된 아키텍처 구성요소(MRoPE, 맥락 기반 어텐션)와 SSL 전략이 교차 도메인 시나리오에서 기존 디코더보다 개선을 제공하는가?
RQ3RPNT의 프리트레이닝 및 파인튜닝 규칙(FS-SFT 대 Full-SFT)이 다양한 데이터셋에서 최첨단 기준선과 비교하여 어떤 차이를 보이는가?
RQ4RPNT를 사용할 때 교차 사이트 Neuropixel 데이터와 세션 간 벤치마크에서 어떤 다운스트림 디코딩 이득을 얻을 수 있는가?

주요 결과

RPNT는 공개 벤치마크의 세 가지 일반화 시나리오(세션 간, 피험자 간, 작업 간)에서 기본 모델을 모두 능가한다.
단일 세션에서 처음부터 학습하는(regime) 조건에서 RPNT는 R^2가 각각 0.9647±0.0026(C-CO), 0.9103±0.0182(T-CO), 0.8356±0.0914(T-RT)이다.
RPNT를 프리트레이닝한 뒤 몇 샷 또는 완전 파인튜닝을 수행하면 기준선보다 일관되게 더 높은 R^2를 보이며, FS-SFT는 0.9801±0.0060(C-CO), 0.9431±0.0103(T-CO), 0.8515±0.1071(T-RT)을, Full-SFT는 각각 0.9894±0.0037, 0.9626±0.0059, 0.8778±0.1005를 달성한다.
크로스 사이트 Neuropixel 데이터에서 RPNT(처음부터 학습) 0.6358±0.0311 대비 프리트레인된 RPNT 0.6612±0.0328로 나타나, 프리트레인된 RPNT가 강력한 몇-shot 성능(예: 10% 학습 분할)을 보임을 시사한다.
제거를 통한 MRoPE가 다른 위치 인코딩보다 우수하고, 맥락 기반 어텐션이 표준 어텐션 대비 상당한 이득(약 5%)을 제공한다.
기능적 연결성은 공간적 어텐션 맵으로부터 유추될 수 있으며, 모션 변수의 신경 인코딩에 대한 데이터 기반 통찰을 가능하게 한다.

Figure 2: A schematic of components in RPNT. Components in black indicate standard transformer signal flow (i.e, no masking and standard attention mechanism). Our novel proposed components include MRoPE (green), context-based attention (cyan), and uniform random masking strategy (pink). MRoPE incorp

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.