QUICK REVIEW

[논문 리뷰] Spatiotemporal Semantic V2X Framework for Cooperative Collision Prediction

Murat Arda Onsu, Poonam Lohan|arXiv (Cornell University)|2026. 01. 23.

Autonomous Vehicle Technology and Safety인용 수 0

한 줄 요약

본 논문은 RSU가 V-JEPA를 사용하여 영상으로부터 미래 임베딩을 예측하고, 차량에 컴팩트한 의미 메시지를 전송하며, 차량 내 경량 분류기가 임박한 충돌을 예측하는 시맨틱 V2X 프레임워크를 제안합니다. 이는 페이로드를 크게 줄이면서도 높은 정확도를 달성합니다.

ABSTRACT

Intelligent Transportation Systems (ITS) demand real-time collision prediction to ensure road safety and reduce accident severity. Conventional approaches rely on transmitting raw video or high-dimensional sensory data from roadside units (RSUs) to vehicles, which is impractical under vehicular communication bandwidth and latency constraints. In this work, we propose a semantic V2X framework in which RSU-mounted cameras generate spatiotemporal semantic embeddings of future frames using the Video Joint Embedding Predictive Architecture (V-JEPA). To evaluate the system, we construct a digital twin of an urban traffic environment enabling the generation of d verse traffic scenarios with both safe and collision events. These embeddings of the future frame, extracted from V-JEPA, capture task-relevant traffic dynamics and are transmitted via V2X links to vehicles, where a lightweight attentive probe and classifier decode them to predict imminent collisions. By transmitting only semantic embeddings instead of raw frames, the proposed system significantly reduces communication overhead while maintaining predictive accuracy. Experimental results demonstrate that the framework with an appropriate processing method achieves a 10% F1-score improvement for collision prediction while reducing transmission requirements by four orders of magnitude compared to raw video. This validates the potential of semantic V2X communication to enable cooperative, real-time collision prediction in ITS.

연구 동기 및 목표

V2X 대역폭 및 지연 제약 하에서 ITS의 적극적 충돌 예측 동기를 부여합니다.
원시 영상이 아닌 미래 프레임 임베딩을 전송하는 시맨틱 V2X 파이프라인을 개발합니다.
학습 및 평가를 위한 다양성 도시 교통 시나리오를 생성하기 위해 디지털 트윈을 사용합니다.
Post-processing이 V-JEPA 표현의 작업 관련 특징 추출을 어떻게 향상시키는지 평가합니다.

제안 방법

RSU가 V-JEPA를 사용하여 미래 프레임의 시공간 시맨틱 임베딩을 추출합니다.
RSU가 V2X 링크를 통해 차량에 하나의 컴팩트한 임베딩을 전송합니다.
차량의 경량 어텐티브-프로브 분류기가 임베딩을 해독하여 충돌 위험을 예측합니다.
인코딩 전에 히트맵, 이진 마스크 또는 하이브리드와 같은 후처리로 작업 관련 영역을 강조합니다.
실시간 추론을 차량 하드웨어에서 가능하게 하기 위해 디코더 복잡도를 낮게 유지합니다.
정확도를 유지하면서 원시 영상 페이로드를 최대 5개 오더 만큼 감소시킵니다.

실험 결과

연구 질문

RQ1V2X 통신에서 predictive한 시공간 임베딩이 실시간의 대역폭 효율적 충돌 예측을 가능하게 하는가?
RQ2후처리 기술이 임베딩 품질과 충돌 예측 정확도에 미치는 영향은 무엇인가?
RQ3제안된 시맨틱 프레임워크가 안전 및 충돌 이벤트가 있는 디지털 트윈 생성 도시 교통 환경에서 어떻게 성능을 발휘하는가?
RQ4V2X 시맨틱 커뮤니케이션에서 인코더-디코더 분할(RSU 대 차량)의 계산 및 지연 특성은 어떤가?

주요 결과

충돌 예측에서 92%의 정확도를 달성합니다.
기본값에 비해 충돌 예측에서 F1-score가 8% 개선됩니다.
원시 영상에 비해 통신 페이로드를 약 다섯 자오(5 orders of magnitude) 수준으로 감소시킵니다.
다양한 변조 스킴(QAM16 등)에서 지연이 5 ms V2X 임계치를 넘지 않습니다.
후처리를 사용하면 특정 프레임 간격 구성에서 F1 점수를 최대 84%까지 개선합니다.
시맨틱 임베딩을 사용한 V2X에서 협력적이고 실시간 충돌 예측의 가능성을 시연합니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.