QUICK REVIEW

[논문 리뷰] SEGAR: Selective Enhancement for Generative Augmented Reality

Fanjun Bu, Chenyang Yuan|arXiv (Cornell University)|2026. 03. 25.

Generative Adversarial Networks and Image Synthesis인용 수 0

한 줄 요약

SEGAR는 먼저 영역별 편집을 포함한 미래의 증강 프레임을 생성한 다음, 안전에 중요한 영역을 선택적으로 보정하여 편집을 보존하면서 실제 관찰과 일치시키는 두 단계 프레임워크를 도입한다. 이는 주행 시나리오에서 시연된다.

ABSTRACT

Generative world models offer a compelling foundation for augmented-reality (AR) applications: by predicting future image sequences that incorporate deliberate visual edits, they enable temporally coherent, augmented future frames that can be computed ahead of time and cached, avoiding per-frame rendering from scratch in real time. In this work, we present SEGAR, a preliminary framework that combines a diffusion-based world model with a selective correction stage to support this vision. The world model generates augmented future frames with region-specific edits while preserving others, and the correction stage subsequently aligns safety-critical regions with real-world observations while preserving intended augmentations elsewhere. We demonstrate this pipeline in driving scenarios as a representative setting where semantic region structure is well defined and real-world feedback is readily available. We view this as an early step toward generative world models as practical AR infrastructure, where future frames can be generated, cached, and selectively corrected on demand.

연구 동기 및 목표

시간적으로 일관된 사전 생성된 증강 미래를 가능하게 하여 생성형 월드 모델을 실용적인 AR 인프라로 정당화한다.
확산 기반 월드 모델과 선택적 보정 메커니즘을 결합하여 중요한 영역에서 출력물을 실제 세계 관찰에 근거하도록 한다.
선택적 보정이 동적 주행 시나리오에서 의도된 증강을 보존하면서 안전에 중요한 충실성을 향상시킬 수 있음을 입증한다.

제안 방법

Vista(확산 기반 주행 월드 모델)를 Stage I 생성 스타일러로 활용해 영역별 편집이 반영된 미래 프레임을 생성한다.
의미적 마스크로 안내되는 VACE 기반 인페인팅을 통해 세 가지 조건 프레임과 12프레임 목표를 사용해 Stage I을 엔드-투-엔드로 학습한다.
Stage II를 LoRA 미세조정 보정 단계로 도입해 안전에 중요한 영역을 실제 관찰과 align시키고 증강을 보존하며 공간적으로 마스킹된 잠재 재구성 손실을 사용한다.
Stage II 조건화는 VAE 잠재 기초화(실제 관찰)와 CLIP 의미 컨텍스트(증강 프레임)를 분리해 보정을 안내한다.
영역 간의 경계에 버퍼 존을 두어 전이에서 재구성 손실을 피하고 영역별 손실에 마스크 다운샘플링 방식을 사용한다.

Figure 1 : SEGAR system pipeline overview. In Stage I, we train a Vista-based generative stylizer to take three condition frames ( $t\in[1,3]$ ) and output future frames with desired augmented edits ( $t\in[4,12]$ ). In Stage II, the generative stylizer finetuned with LoRA takes the augmented future

실험 결과

연구 질문

RQ1생성적 확산 모델이 AR에서 영역별 편집을 포함한 시간적으로 일관된 증강 미래를 어떻게 생성할 수 있는가?
RQ2가볍고 선택적인 보정 단계가 의도된 증강을 손상시키지 않으면서 실제 관찰에 대한 안전에 중요한 충실도를 향상시킬 수 있는가?
RQ3주행 시나리오에서 안전에 중요한 영역의 정합성과 스타일링 편집의 보존 간의 단계적 보정 영향은 무엇인가?
RQ4프레임별 세밀도에서 현실 기반감을 강제하는 데 오프라인 마스크를 활용한 영역 기반 손실의 효율성은 어느 정도인가?

주요 결과

Stage II는 Stage I에 비해 안전에 중요한 영역의 정합성을 크게 향상시켰습니다(SSIM 0.770에서 0.943로; LPIPS 0.397에서 0.285로).
증강된 영역은 Stage I 증강에 비해 의도된 편집을 보존하며 SSIM 0.866 및 LPIPS 0.130을 보인다.
Stage II 이후에 중요 영역에서 실제 대비 증강 드리프트가 감소하고 비중요 편집은 시각적으로 일관성을 유지한다.
정성적 결과는 보정된 안전에 중요한 요소들(예: 보행자, 차량, 도로 표지판)이 실제 관찰과 일치함을 보여준다.
이 방법은 운전과 같은 실시간 환경에서 미래 AR 프레임을 생성, 캐시 및 선택적으로 보정하는 방향을 보여준다.

Figure 2 : Given an input image sequence, we compute inpainting regions using semantic segmentation. The resulting masks guide VACE’s inpainting process to augment static scene elements into a Tokyo-style appearance.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.