QUICK REVIEW

[논문 리뷰] RAPT: Model-Predictive Out-of-Distribution Detection and Failure Diagnosis for Sim-to-Real Humanoid Robots

Humphrey Munn, Brendan Tidd|arXiv (Cornell University)|2026. 02. 02.

Robotic Locomotion and Control인용 수 0

한 줄 요약

RAPT는 50 Hz 인간형 제어를 위한 경량의 배포-시간 모니터로, 분포 밖 실행을 감지하고 시뮬레이션-현실 전송을 위한 해석 가능하고 사후 실패 진단을 제공합니다.

ABSTRACT

Deploying learned control policies on humanoid robots is challenging: policies that appear robust in simulation can execute confidently in out-of-distribution (OOD) states after Sim-to-Real transfer, leading to silent failures that risk hardware damage. Although anomaly detection can mitigate these failures, prior methods are often incompatible with high-rate control, poorly calibrated at the extremely low false-positive rates required for practical deployment, or operate as black boxes that provide a binary stop signal without explaining why the robot drifted from nominal behavior. We present RAPT, a lightweight, self-supervised deployment-time monitor for 50Hz humanoid control. RAPT learns a probabilistic spatio-temporal manifold of nominal execution from simulation and evaluates execution-time predictive deviation as a calibrated, per-dimension signal. This yields (i) reliable online OOD detection under strict false-positive constraints and (ii) a continuous, interpretable measure of Sim-to-Real mismatch that can be tracked over time to quantify how far deployment has drifted from training. Beyond detection, we introduce an automated post-hoc root-cause analysis pipeline that combines gradient-based temporal saliency derived from RAPT's reconstruction objective with LLM-based reasoning conditioned on saliency and joint kinematics to produce semantic failure diagnoses in a zero-shot setting. We evaluate RAPT on a Unitree G1 humanoid across four complex tasks in simulation and on physical hardware. In large-scale simulation, RAPT improves True Positive Rate (TPR) by 37% over the strongest baseline at a fixed episode-level false positive rate of 0.5%. On real-world deployments, RAPT achieves a 12.5% TPR improvement and provides actionable interpretability, reaching 75% root-cause classification accuracy across 16 real-world failures using only proprioceptive data.

연구 동기 및 목표

향상된 sim-to-real 전송 후의 침묵적이고 높은 확신의 OOD 실패를 해결함으로써 인간형 로봇에 학습된 정책의 신뢰할 수 있는 배포를 촉진한다.
50 Hz로 작동하고 차원별 보정된 이상 신호를 갖춘 경량 온라인 탐지기를 개발한다.
GRADIENT 기반 주의성 및 LLM-조건부 의미 체계 분류기를 통해 해석 가능한 실패 진단을 제공한다.
자세한 원인 분석을 가능하게 하여 proprioceptive 데이터와 가끔 보이는 시각적 단서를 사용해 Sim-to-Real 불일치를 사후 분석한다.

제안 방법

확률적 재구성 기반 탐지기(RAPT)를 표준 시뮬레이션 데이터에서 학습시켜 유효한 humanoid 행동의 시공-공간 매니폴드를 모델링한다.
Residual encoder를 가진 GRU 기반 잠재 다리와 확률적 디코더를 사용하여 차원별 NLL(불확실성 인식) 재구성 점수를 생성한다.
Sim-to-Real 보정 단계로 이상 임계값을 보정하고 차원별 및 글로벌 게이트를 바운딩 박스 탐지기와 결합한다.
약 50 Hz에서 온라인 탐지를 수행하기 위해 세 게이트 시스템(차원별 최대, 글로벌 평균, 범위 검사)을 사용하여 강건한 안전성을 확보한다.
재구성 NLL에 대한 시간 역전파를 통해 시간 및 센서 전반의 실패를 귀속시키는 시간 기반 주의성을 Integrated Gradients로 계산한다.
구조화된 주의성과 운동학 데이터를 다중 모달 LLM으로 제로샷 방식으로 의미 체계적 원인 진단으로 번역한다.
운영자 정의 정책에 의해 안전 중지, 제어된 낙하 등의 안전 응답을 제공한다. 이는 자율 제어 수정이 아닌 정책에 의해 안내된다.

실험 결과

연구 질문

RQ1RAPT가 시뮬레이션 및 실제 세계의 인간형 작업에서 OOD 이벤트를 탐지하는 데 최첨단 기본값보다 뛰어난가?
RQ2RAPT가 시뮬레이션에서 실제 하드웨어로의 일반화가 가능하며 잘못된 긍정(false positives)을 낮추면서 일반화할 수 있는가?
RQ3그라디언트 기반 주의성과 LLM 기반 추론이 실제 배포에서 실패의 원인을 진단하는 데 얼마나 효과적인가?
RQ4시간적 순환, 확률적 디코딩, 보정, 주의성, 다중 모달 진단 각 구성 요소가 탐지 성능에 어느 정도 기여하는가?

주요 결과

시뮬레이션에서 RAPT는 매우 낮은 대기 시간으로 Task들에서 최고 안전 점수 및 AUROC를 달성한다 (1.63 ms).
동일한 고정 FPR에서 가장 강력한-baseline(LSTM-VAE) 대비 절대 AUROC를 +0.34 만큼 향상.
실제 하드웨어에서 Hybrid RAPT (RAPT와 Range 탐지기 결합)는 24개의 이상 실행 중 18개를 탐지(75% RCA 리콜)하고 베이스라인 대비 리콜이 우수하다.
RAPT의 고유한 proprioceptive 주의성 기반 진단은 시각 프레임 보강 시 상위 1 및 상위 3 원인 분류 정확도를 향상시킨다.
진단 파이프라인은 PD 게인과 같은 조용한 sim-to-real 차이를 식별하고 단순 범위 검사 이상의 배치 확인을 돕는다.
모델은 LLM이 주의성 및 운동학에 조건을 두고 제로샷 의미 실패 분류를 지원한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.