QUICK REVIEW

[논문 리뷰] V-Bridge: Bridging Video Generative Priors to Versatile Few-shot Image Restoration

Shenghe Zheng, Junpeng Jiang|arXiv (Cornell University)|2026. 03. 13.

Generative Adversarial Networks and Image Synthesis인용 수 0

한 줄 요약

V-Bridge는 사전 학습된 비디오 생성 프라이어를 재목적으로 활용하여 점진적 생성 보정으로 복원을 모델링함으로써 다양한 소수-shot 이미지 복원을 수행하며 약 1,000개의 학습 샘플만으로도 경쟁력 있는 결과를 달성합니다.

ABSTRACT

Large-scale video generative models are trained on vast and diverse visual data, enabling them to internalize rich structural, semantic, and dynamic priors of the visual world. While these models have demonstrated impressive generative capability, their potential as general-purpose visual learners remains largely untapped. In this work, we introduce V-Bridge, a framework that bridges this latent capacity to versatile few-shot image restoration tasks. We reinterpret image restoration not as a static regression problem, but as a progressive generative process, and leverage video models to simulate the gradual refinement from degraded inputs to high-fidelity outputs. Surprisingly, with only 1,000 multi-task training samples (less than 2% of existing restoration methods), pretrained video models can be induced to perform competitive image restoration, achieving multiple tasks with a single model, rivaling specialized architectures designed explicitly for this purpose. Our findings reveal that video generative models implicitly learn powerful and transferable restoration priors that can be activated with only extremely limited data, challenging the traditional boundary between generative modeling and low-level vision, and opening a new design paradigm for foundation models in visual tasks.

연구 동기 및 목표

이미지 복원을 대규모 비디오 모델의 전이 가능한 프리어를 활용하는 점진적, 비디오 유사 생성 프로세스로 재구성합니다.
중간 규모의 비디오 사전 학습 해상도에서 고해상도 복원으로 가는 데이터 효율적 학습 커리큘럼을 개발합니다.
단일 사전 학습 비디오 모델이 한정된 작업별 데이터로도 여러 복원 작업을 처리할 수 있음을 입증합니다.
비디오 프라이어의 해상도 편향을 완화하고 미세한 디테일의 정합성을 개선하기 위해 드리프트 보정을 도입합니다.

제안 방법

쌍으로 주어진 저화질/고화질 이미지를 이용해 점진적 복원 궤적을 시뮬레이션하기 위한 의사 시간 순서를 구성합니다.
공간 해상도를 점진적으로 증가시키는 점진적 커리큘럼으로 학습하여 거칠은 것에서 미세한 복원 다이내믹스를 학습합니다.
복원을 조건부 생성 프로세스 f_theta(I_0, t)로 형식화하여 중간 프레임 I_t를 예측합니다.
저해상도 및 중간 해상도 프라이어를 고해상도 ground truth와 정렬하기 위한 드리프트 보정 모듈을 incorporating합니다.
최종 프레임을 다듬고 질감 및 색상 정합성을 향상시키기 위해 경량 보조 모델을 사용합니다.

실험 결과

연구 질문

RQ1사전 학습된 비디오 생성 프라이어를 최소한의 작업별 데이터로도 다양한 이미지 복원 작업에 활성화할 수 있는가?
RQ2거친-to-미세한 학습 커리큘럼이 비디오 프리어를 고해상도 복원으로 효과적으로 전이하는가?
RQ3드리프트 보정 모듈이 복원에서 고주파 디테일 회복 및 지각 품질에 어떤 영향을 미치는가?
RQ4단일 비디오 모델이 보지 못한 열화 및 분포 외 작업에 얼마나 일반화될 수 있는가?

주요 결과

V-Bridge는 다중 작업 훈련 샘플이 최대 1,000샘플에 불과하더라도 baselines 대비 데이터의 0.1%–7% 수준으로 경쟁력 있는 복원 품질을 달성합니다.
FoundIR에서 15×~1,000× 더 많은 데이터로 학습된 baselines 대비 1.6dB PSNR 이득이 입증되었으며 SSIM에서도 개선이 있습니다.
드리프트 보정은 약 1.4dB PSNR 및 0.024 SSIM 이득을 가져와 세밀한 질감과 색상 정합성을 향상시킵니다.
해상도를 증가시키는 Progressive 커리큘럼 학습은 복원 성능과 학습 안정성을 향상시킵니다.
다양한 벤치마크와 저해상도/노이즈에 대해 강력한 분포 외 일반화 성능을 보입니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.