QUICK REVIEW

[논문 리뷰] Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules

Zhengxue Cheng, Heming Sun|arXiv (Cornell University)|2020. 01. 06.

Advanced Data Compression Techniques참고 문헌 26인용 수 44

한 줄 요약

요약: 이 논문은 엔트로피 모델링을 위한 이산화된 가우시안 혼합 가능도(discretized Gaussian mixture likelihoods)와 간소화된 어텐션 모듈을 도입하여 학습된 방법들 중에서 최첨단 RD 성능을 달성하고, VVC와 비교해 PSNR은 비슷하며 MS-SSIM 시각이 우수합니다.

ABSTRACT

Image compression is a fundamental research field and many well-known compression standards have been developed for many decades. Recently, learned compression methods exhibit a fast development trend with promising results. However, there is still a performance gap between learned compression algorithms and reigning compression standards, especially in terms of widely used PSNR metric. In this paper, we explore the remaining redundancy of recent learned compression algorithms. We have found accurate entropy models for rate estimation largely affect the optimization of network parameters and thus affect the rate-distortion performance. Therefore, in this paper, we propose to use discretized Gaussian Mixture Likelihoods to parameterize the distributions of latent codes, which can achieve a more accurate and flexible entropy model. Besides, we take advantage of recent attention modules and incorporate them into network architecture to enhance the performance. Experimental results demonstrate our proposed method achieves a state-of-the-art performance compared to existing learned compression methods on both Kodak and high-resolution datasets. To our knowledge our approach is the first work to achieve comparable performance with latest compression standard Versatile Video Coding (VVC) regarding PSNR. More importantly, our approach generates more visually pleasant results when optimized by MS-SSIM. This project page is at this https URL https://github.com/ZhengxueCheng/Learned-Image-Compression-with-GMM-and-Attention

연구 동기 및 목표

학습된 이미지 압축에서 엔트로피 모델링 개선을 통한 중복 제거 동기 부여.
이산화된 가우시안 혼합 가능도를 사용하는 유연하고 정확한 엔트로피 모델 제안.
네트워크 용량 증가 없이 경량 어텐션 모듈을 통합해 성능 향상.
표준 벤치마크(Kodak, CLIC)에서 전통 코덱 및 기존 학습 방법 대비 최첨단 성능 입증

제안 방법

잠재 코드 분포를 이산화된 가우시안 혼합 가능도로 모델링해 실제 주변 분포(p(y|z))를 더 잘 근사합니다.
공간적 및 컨텍스트 중복을 포착하기 위해 가우시안 혼합으로 보강된 하이프리온 프레임워크를 사용합니다.
복잡한 영역에 대한 초점을 강화하기 위해 인코더/디코더 내에 간소화된 어텐션 모듈을 통합합니다.
y와 z에 대한 비트 수와 왜곡 항을 여러 람다에 걸쳐 결합한 RD 목표로 학습합니다.
안정적인 학습을 위해 y 범위를 자르고 누적 확률을 사용하는 이산화된 합성 컨볼루션을 적용합니다.
표준 데이터셋에서 JPEG, JPEG2000, HEVC/VVC 및 기존 학습 방법과 비교합니다.

실험 결과

연구 질문

RQ1이산화된 가우시안 혼합 가능도가 기존의 가우시안/하이프리온 접근 방식보다 더 정확한 엔트로피 모델을 제공할 수 있는가?
RQ2어텐션 모듈을 도입하면 학습 비용이 과도하지 않으면서 RD 성능이 향상되는가?
RQ3학습된 압축 방법이 VVC의 PSNR 성능에 얼마나 근접하면서 MS-SSIM 품질을 유지 또는 개선할 수 있는가?
RQ4가우시안 혼합 가능도를 사용할 때 모델 용량(N)의 RD 성능에 미치는 영향은 무엇인가?
RQ5제안된 엔트로피 모델을 사용한 학습 방법이 Kodak 및 고해상도 데이터셋에서 전통 코덱보다 우수한가?

주요 결과

모델	PSNR (dB)	MS-SSIM	레이트 (bpp)
Joint	33.435	0.980	0.533
Ours	33.623	0.981	0.519

가우시안 혼합 가능도는 더 작은 스케일과 더 나은 공간 중복 제거를 가능하게 하여 엔트로피 모델을 개선합니다.
제안된 방법은 학습된 방법들 중 Kodak 및 고해상도 데이터셋에서 최첨단 성능을 달성합니다.
이 방법은 VVC와 비교해 PSNR이 상응하고 기존 방법 대비 MS-SSIM 결과가 우수합니다.
간소화된 어텐션 모듈은 중간 수준의 학습 비용으로 성능 향상을 제공하며 비어텐션 버전보다 우수합니다.
다중 용량 설정(N)에서 가우시안 혼합 모델링의 이점이 확인되는 제거 연구(ablation)들이 있습니다.
표 1에서 Joint vs. Ours: PSNR 33.435 vs 33.623 dB; MS-SSIM 0.980 vs 0.981; Rate 0.533 vs 0.519 bpp를 보입니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.