QUICK REVIEW

[논문 리뷰] Pyramid Self-attention Polymerization Learning for Semi-supervised Skeleton-based Action Recognition

Binqian Xu, Xiangbo Shu|arXiv (Cornell University)|2023. 02. 05.

Human Pose and Action Recognition인용 수 41

한 줄 요약

PSP Learning은 피라미드 자기 주의 폴리머라이제이션 및 거친-정교 대조 학습을 통해 몸-부위-관절 수준의 골격 표현을 공동으로 학습하고 반지도 학습 기반 행동 인식을 수행한다. NTU RGB+D와 NW-UCLA 데이터셋에서 경쟁력 있는 성능을 달성한다.

ABSTRACT

Most semi-supervised skeleton-based action recognition approaches aim to learn the skeleton action representations only at the joint level, but neglect the crucial motion characteristics at the coarser-grained body (e.g., limb, trunk) level that provide rich additional semantic information, though the number of labeled data is limited. In this work, we propose a novel Pyramid Self-attention Polymerization Learning (dubbed as PSP Learning) framework to jointly learn body-level, part-level, and joint-level action representations of joint and motion data containing abundant and complementary semantic information via contrastive learning covering coarse-to-fine granularity. Specifically, to complement semantic information from coarse to fine granularity in skeleton actions, we design a new Pyramid Polymerizing Attention (PPA) mechanism that firstly calculates the body-level attention map, part-level attention map, and joint-level attention map, as well as polymerizes these attention maps in a level-by-level way (i.e., from body level to part level, and further to joint level). Moreover, we present a new Coarse-to-fine Contrastive Loss (CCL) including body-level contrast loss, part-level contrast loss, and joint-level contrast loss to jointly measure the similarity between the body/part/joint-level contrasting features of joint and motion data. Finally, extensive experiments are conducted on the NTU RGB+D and North-Western UCLA datasets to demonstrate the competitive performance of the proposed PSP Learning in the semi-supervised skeleton-based action recognition task. The source codes of PSP Learning are publicly available at https://github.com/1xbq1/PSP-Learning.

연구 동기 및 목표

골격 데이터에서 관절 수준 표현을 넘어 거친-정교 시맨틱 정보를 활용하는 동기를 제시한다.
거친-정교 파이프라인(attention) 메커니즘을 제안하여 몸-부위-관절 수준 주의 정보를 거칠게부터 미세하게 융합한다.
관절 및 모션 모달리티 간의 몸-부위-관절 수준 특징을 정합하는 거친-정교 대조 손실을 도입한다.
레이블이 있는 스켈레톤 데이터와 레이블이 없는 데이터를 함께 학습하는 엔드-투-엔드 반지도 프레임워크를 개발한다.
공개 데이터셋 NTU RGB+D 및 Northwestern-UCLA에서 비교 및 분석을 통한 접근의 타당성을 검증한다.

제안 방법

원시 스켈레톤 시퀀스를 관절 데이터와 모션 데이터로 변환하여 관절 인코더와 모션 인코더를 구성한다.
조인트/모션 표현으로부터 몸-레벨, 부위-레벨, 관절-레벨 특징을 얻기 위해 Skeleton Pyramid를 구성한다.
Pyramid Polymerizing Attention을 적용하여 몸→부위→관절의 주의 맵을 폴리머라이즈하고 해당 폴리머라이징 특징을 생성한다.
관절 및 모션 표현 간의 대조 학습을 위해 몸-부위-관절 수준 가지를 포함하는 거칠-정교 대조 손실을 정의한다.
레이블이 없는 데이터에 대한 대조 손실과 레이블이 있는 데이터에 대한 인식 손실(교차엔트로피)을 결합하여 학습한다.

실험 결과

연구 질문

RQ1거친-정교(몸/부위/관절) 표현이 관절만 사용하는 접근법 대비 반지도 스켈레톤 행동 인식을 개선할 수 있는가?
RQ2Pyramid Polymerizing Attention이 다층 시맨틱 정보를 효과적으로 융합하여 대조 학습에 더 나은 특징을 생성하는가?
RQ3거친-정교 대조 손실이 다중 입자에서 관절 및 모션 모달리티 간 정합에 어떤 영향을 미치는가?
RQ4제안된 방법이 NTU RGB+D와 NW-UCLA 같은 표준 스켈레톤 데이터셋에서 부분 라벨링 환경에서도 강력하고 경쟁력이 있는가?

주요 결과

PSP Learning은 반지도 설정에서 NTU RGB+D와 NW-UCLA에서 경쟁력 있는 성능을 달성한다.
Pyramid Polymerizing Attention 메커니즘은 몸-부위-관절 수준 정보를 거칠게부터 미세하게 효과적으로 결합한다.
거친-정교 대조 손실은 관절 및 모션 모달리티 간의 몸-부위-관절 수준 특징의 유사성을 함께 제약한다.
다중-입자 대조 학습이 반지도 스켈레톤 행동 인식에서 이점이 있음을 프레임워크 차원에서 시연한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.