QUICK REVIEW

[논문 리뷰] Amortised MAP Inference for Image Super-resolution

Casper Kaae Sønderby, J. A. Caballero|arXiv (Cornell University)|2016. 10. 14.

Advanced Image Processing Techniques인용 수 155

한 줄 요약

이 논문은 다운샘플링 연산자와의 아핀-일관성을 강제하여 단일 이미지 초해상도에 대한 암묵 MAP 추론(amortised MAP inference)을 소개하고, MAP 해를 근사하기 위해 GAN 기반, 노이즈 제거기 가이드, 밀도 기반 접근법을 탐구한다.

ABSTRACT

Image super-resolution (SR) is an underdetermined inverse problem, where a large number of plausible high-resolution images can explain the same downsampled image. Most current single image SR methods use empirical risk minimisation, often with a pixel-wise mean squared error (MSE) loss. However, the outputs from such methods tend to be blurry, over-smoothed and generally appear implausible. A more desirable approach would employ Maximum a Posteriori (MAP) inference, preferring solutions that always have a high probability under the image prior, and thus appear more plausible. Direct MAP estimation for SR is non-trivial, as it requires us to build a model for the image prior from samples. Furthermore, MAP inference is often performed via optimisation-based iterative algorithms which don't compare well with the efficiency of neural-network-based alternatives. Here we introduce new methods for amortised MAP inference whereby we calculate the MAP estimate directly using a convolutional neural network. We first introduce a novel neural network architecture that performs a projection to the affine subspace of valid SR solutions ensuring that the high resolution output of the network is always consistent with the low resolution input. We show that, using this architecture, the amortised MAP inference problem reduces to minimising the cross-entropy between two distributions, similar to training generative models. We propose three methods to solve this optimisation problem: (1) Generative Adversarial Networks (GAN) (2) denoiser-guided SR which backpropagates gradient-estimates from denoising to train the network, and (3) a baseline method using a maximum-likelihood-trained image prior. Our experiments show that the GAN based approach performs best on real image data. Lastly, we establish a connection between GANs and amortised variational inference as in e.g. variational autoencoders.

연구 동기 및 목표

SR에 대한 MAP 추론을 동기화하여 MSE 기반 학습으로 인한 흐림 대신 그럴듯하고 높은 확률의 고해상도 이미지를 생성하도록 한다.
LR–HR 일관성을 보장하기 위해 유효한 SR 해들의 아핀 부분공간으로 출력을 투사하는 신경망 구조를 제안한다.
SR를 위한 세 가지 암묵 MAP 추론 방법(GAN 기반, 노이즈 제거기 가이드, 밀도 모델 기반)을 개발하고 비교한다.
GAN 기반 AffGAN 방식이 CelebA 및 자연 이미지 등 실제 이미지에서 시각적으로 선명하고 그럴듯한 SR 결과를 낳는다는 것을 보여준다.

제안 방법

LR 입력과의 일관성을 다운샘플링 연산자 A와 그 Moore–Penrose 역함수 A+를 통해 강제하는 아핀 투사 계층을 도입한다.
모델 출력 분포 qθ와 HR 이미지 사전 pY 간의 교차 엔트로피를 최소화하는 방식으로 암묵 MAP 추론을 형식화한다.
AffGAN, 아핀 투사로 구성된 제너레이터를 가진 GAN을 제안하고 KL[qθ∥pY]를 최소화하도록 학습한다.
AffDG, Bayes 최적 노이저로부터의 그래디언트 추정치를 역전파해 θ를 업데이트하는 노이즈 제거기 가이드를 제안한다.
AffLL, PixelCNN 스타일의 밀도 모델(MCGSM)을 사용해 pY와의 교차 엔트로피를 가이드하는 밀도-가이드 변형을 제안한다.
GAN 훈련의 안정성 트릭으로 인스턴스 노이즈를 논의하고, 확률적 AffGAN 변형을 암묵적 변분 추론과 연결지어 설명한다.

실험 결과

연구 질문

RQ1LR 입력과 일치하는 아핀 부분공간으로 출력을 제약함으로써 이미지 SR에 대한 암묵 MAP 추론을 효과적으로 학습할 수 있는가?
RQ2제안된 전략들(AffGAN, AffDG, AffLL 중 어느 것이 교차 엔트로피 H[qθ, pY]를 가장 잘 최소화하고 지각적으로 그럴듯한 SR 결과를 생성하는가?
RQ3아핀 일관성 강제가 전통적인 MSE 기반 학습과 비교했을 때 SR 정확도와 현실성에 어떤 영향을 미치는가?
RQ4이 설정에서 GAN 기반 SR과 변분/추론 프레임워크 사이의 연결 고리는 무엇인가?

주요 결과

Affine projection 계층은 LR→HR 일관성을 보장하고, Affine 프로젝트들은 실험에서 다운샘플링 오차를 거의 제로에 가깝게 감소시킨다.
AffGAN(GAN 기반)은 CelebA 및 자연 이미지와 같은 실제 데이터에서 가장 선명하고 그럴듯한 SR 이미지를 제공하며, 지각적 품질 면에서 소프트 제약 변형들을 능가한다.
AffGAN은 선명하고 그럴듯한 출력을 내는 경향이 있으며 GAN 기반 SR의 특성인 고주파 노이즈가 일부 나타나지만 MSE 학습 모델은 더 흐릿하다.
AffDG 및 AffLL은 일부 데이터셋에서 그럴듯한 결과를 낼 수 있지만 자연 이미지와 얼굴 데이터에서 AffGAN에 비해 흐리거나 덜 선명한 경향이 있다.
2D 토이 MAP 시연 및 실제 이미지 데이터셋에 걸쳐, AffGAN/ AffDG 방법은 MSE/MAE 기본값보다 MAP 해에 더 잘 수렴하도록 교차 엔트로피를 최소화한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.