QUICK REVIEW

[論文レビュー] Clinical-Prior Guided Multi-Modal Learning with Latent Attention Pooling for Gait-Based Scoliosis Screening

Dong Chen, Zizhuang Wei|arXiv (Cornell University)|Feb 6, 2026

Scoliosis diagnosis and treatment被引用数 0

ひとこと要約

ScoliGaitを紹介。AISスクリーニングのための非重複・放射線標識付き歩行ビデオベンチマークと、臨床事前知識マップを用いた潜在的注意機構付き多模态モデルを提案。知識マップ・ビデオ・テキストを統合し、解釈可能で最先端の性能を実現。

ABSTRACT

Adolescent Idiopathic Scoliosis (AIS) is a prevalent spinal deformity whose progression can be mitigated through early detection. Conventional screening methods are often subjective, difficult to scale, and reliant on specialized clinical expertise. Video-based gait analysis offers a promising alternative, but current datasets and methods frequently suffer from data leakage, where performance is inflated by repeated clips from the same individual, or employ oversimplified models that lack clinical interpretability. To address these limitations, we introduce ScoliGait, a new benchmark dataset comprising 1,572 gait video clips for training and 300 fully independent clips for testing. Each clip is annotated with radiographic Cobb angles and descriptive text based on clinical kinematic priors. We propose a multi-modal framework that integrates a clinical-prior-guided kinematic knowledge map for interpretable feature representation, alongside a latent attention pooling mechanism to fuse video, text, and knowledge map modalities. Our method establishes a new state-of-the-art, demonstrating a significant performance gap on a realistic, non-repeating subject benchmark. Our approach establishes a new state of the art, showing a significant performance gain on a realistic, subject-independent benchmark. This work provides a robust, interpretable, and clinically grounded foundation for scalable, non-invasive AIS assessment.

研究の動機と目的

gait-based AIS screening datasetsのデータリークと被験者独立性の課題を解決する。
運動学的知識マップを通じて歩行を臨床的に根拠づけた解釈可能な表現を提供する。
動画・知識マップ・テキストを潜在的注意プーリングで統合する頑健な多模态融合手法を開発する。

提案手法

放射線 Cobb角と臨床テキストプロンプトで注釈付けされた1,572クレップの訓練データと300独立テストクリップを含むScoliGaitデータセットを提案する。
運動空間・自己骨格空間・信号相関の238特徴からなる運動学的知識マップを構築する。
知識マップ・ビデオ・テキストの3つのモダリティ専用エンコーダと潜在的注意プーリング機構を用いてモダリティを統合する。
モダリティ間の位置埋め込みを整合させて統合性能を向上させる。
注意スコアを臨床的に意味のある知識マップへマッピングすることで解釈可能性を提供する。
テキストエンコーディングにはSentence-Transformers、ビデオと知識マップモダリティにはVision Transformerバックボーンを採用する。

Figure 1: ScoliGait system for multi-modal gait analysis from mobile video. Left: temporal alignment of the knowledge map and video. Right: generation of video, knowledge map, and text modalities via pose estimation, showing kinematic alignment and knowledge-guided synthesis.

実験結果

リサーチクエスチョン

RQ1臨床的に根拠づけられた多模态フレームワークは被験者独立データセットにおけるAISスクリーニング精度を向上させるか。
RQ2構造化された運動学的知識マップを動画とテキストと統合することで解釈性と診断性能は高まるか。
RQ3潜在的注意プーリングと跨モーダル整合は統合品質と臨床的関連性にどのような影響を与えるか。

主な発見

知識マップのみは二値AISスクリーニングで動画のみを上回り、精度が1.7%、F1スコアが3.2%向上。
知識マップ・動画・テキストの潜在的注意プーリングを用いた多模态融合が最も良い性能を達成：精度70.0%、F1スコア61.9%。
ScoliGaitは独自個人からの訓練データ1,572クレップと300独立テストクリップを含む非重複・放射線ラベル付きベンチマークを提供。
注意を臨床的に意味のある知識マップへマッピングすることで歩行特徴を時間軸で解釈可能にすることで説明可能性が向上。
アブレーションは潜在的注意プーリングが単純な結合より優れていること、並びに跨モーダル埋め込み整合が結果を改善することを示す。

Figure 2: Proposed three-modal fusion architecture for AIS screening. Inputs from Knowledge Map, Vision, and Text modalities are integrated via a Latent Attention Pooling mechanism (bottom). Remapped attention scores from the Knowledge Map (top) are filtered for salient features to enable clinical i

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。