[論文レビュー] RGI: robust GAN-inversion for mask-free image inpainting and unsupervised pixel-wise anomaly detection
One or two sentence direct-answer summary
Generative adversarial networks (GANs), trained on a large-scale image dataset, can be a good approximator of the natural image manifold. GAN-inversion, using a pre-trained generator as a deep generative prior, is a promising tool for image restoration under corruptions. However, the performance of GAN-inversion can be limited by a lack of robustness to unknown gross corruptions, i.e., the restored image might easily deviate from the ground truth. In this paper, we propose a Robust GAN-inversion (RGI) method with a provable robustness guarantee to achieve image restoration under unknown extit{gross} corruptions, where a small fraction of pixels are completely corrupted. Under mild assumptions, we show that the restored image and the identified corrupted region mask converge asymptotically to the ground truth. Moreover, we extend RGI to Relaxed-RGI (R-RGI) for generator fine-tuning to mitigate the gap between the GAN learned manifold and the true image manifold while avoiding trivial overfitting to the corrupted input image, which further improves the image restoration and corrupted region mask identification performance. The proposed RGI/R-RGI method unifies two important applications with state-of-the-art (SOTA) performance: (i) mask-free semantic inpainting, where the corruptions are unknown missing regions, the restored background can be used to restore the missing content; (ii) unsupervised pixel-wise anomaly detection, where the corruptions are unknown anomalous regions, the retrieved mask can be used as the anomalous region's segmentation mask.
研究の動機と目的
- Motivate robustness gaps in standard GAN-inversion under unknown gross corruptions.
- Propose RGI to recover clean images and identify corrupted regions without prior masks.
- Provide theoretical guarantees for asymptotic convergence of the restored image and mask.
- Extend to R-RGI to fine-tune the generator and reduce GAN approximation gap.
- Demonstrate state-of-the-art performance on mask-free inpainting and pixel-wise anomaly detection.
提案手法
- Formulate joint optimization over latent code z and a sparse mask M with L_rec((1−M)⊙x, (1−M)⊙G(z)) plus λ||M||_1.
- Prove asymptotic convergence: ẑ(λ) → z* as λ ↓ 0 (Theorem 1).
- Prove asymptotic mask convergence: M̂(λ) → M* as λ ↓ 0 (Theorem 2).
- Introduce Relaxed-RGI (R-RGI) by also optimizing generator parameters θ to reduce the GAN-manifold gap (Equation 4).
- Discuss connections to robust statistics and robust ML (M-estimators, Winsorizing) and relate to prior GAN-inversion approaches.
- Demonstrate mask-free semantic inpainting and unsupervised pixel-wise anomaly detection within a unified framework.
実験結果
リサーチクエスチョン
- RQ1Can RGI restore a clean image and identify the corrupted region without a pre-configured mask?
- RQ2Do the recovered image and mask converge to the ground-truth under mild assumptions and appropriate λ?
- RQ3Does relaxing with R-RGI further improve restoration quality by mitigating the GAN approximation gap?
- RQ4Can the method unify mask-free semantic inpainting and pixel-wise anomaly detection with state-of-the-art performance?
- RQ5What are the theoretical guarantees linking the optimized mask to true corrupted regions under unknown corruptions?
主な発見
| Datasets | Cases | metrics | methods | Yeh et al. w/o mask | Yeh et al. w/ mask | RGI | Pan et al. w/ mask | R-RGI |
|---|---|---|---|---|---|---|---|---|
| CelebA | Case (i) PSNR | PSNR ↑ | [Yeh w/o mask] | 11.50 | 20.82 | 19.70 | 21.74 | 20.05 |
| CelebA | Case (i) SSIM | SSIM ↑ | [Yeh w/o mask] | 0.358 | 0.492 | 0.451 | 0.570 | 0.509 |
| CelebA | Case (ii) PSNR | PSNR ↑ | [Yeh w/o mask] | 19.64 | 22.63 | 21.52 | 27.63 | 23.73 |
| CelebA | Case (ii) SSIM | SSIM ↑ | [Yeh w/o mask] | 0.440 | 0.536 | 0.490 | 0.766 | 0.655 |
| Cars | Case (i) PSNR | PSNR ↑ | [Yeh w/o mask] | 16.57 | 17.50 | 16.89 | 20.98 | 19.31 |
| Cars | Case (i) SSIM | SSIM ↑ | [Yeh w/o mask] | 0.359 | 0.377 | 0.363 | 0.636 | 0.618 |
| Cars | Case (ii) PSNR | PSNR ↑ | [Yeh w/o mask] | 17.36 | 17.71 | 17.52 | 21.61 | 21.18 |
| Cars | Case (ii) SSIM | SSIM ↑ | [Yeh w/o mask] | 0.361 | 0.382 | 0.363 | 0.650 | 0.588 |
| LSUN bedroom | Case (i) PSNR | PSNR ↑ | [Yeh w/o mask] | 16.15 | 19.27 | 17.67 | 21.36 | 18.72 |
| LSUN bedroom | Case (i) SSIM | SSIM ↑ | [Yeh w/o mask] | 0.405 | 0.428 | 0.416 | 0.587 | 0.567 |
| LSUN bedroom | Case (ii) PSNR | PSNR ↑ | [Yeh w/o mask] | 19.26 | 19.66 | 19.72 | 22.30 | 22.29 |
| LSUN bedroom | Case (ii) SSIM | SSIM ↑ | [Yeh w/o mask] | 0.419 | 0.433 | 0.420 | 0.599 | 0.557 |
- RGI achieves robustness to unknown gross corruptions, with asymptotic convergence of the restored image to the ground-truth background as λ→0.
- The identified mask converges to the true corrupted region mask in the limit of small λ, enabling exact mask recovery under mild conditions.
- R-RGI further improves restoration by fine-tuning the generator, closing the gap between learned and true image manifolds and boosting performance.
- On mask-free semantic inpainting, RGI matches or surpasses masked baselines without needing a pre-configured mask; R-RGI approaches the performance of mask-tuned methods.
- In unsupervised pixel-wise anomaly detection, RGI and especially R-RGI yield strong Dice scores and competitive/leading AUROC relative to SOTA baselines.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。