QUICK REVIEW

[논문 리뷰] Assessing UHD Image Quality from Aesthetics, Distortions, and Saliency

Wei Sun, Weixia Zhang|arXiv (Cornell University)|2024. 09. 01.

Image and Video Quality Assessment인용 수 5

한 줄 요약

논문은 UHD 이미지 품질을 글로벌 미학, 로컬 왜곡, 그리고 중요한 콘텐츠를 평가하는 세 가지 분기 멀티-브랜치 DNN으로 제안하며, 다운샘플 입력과 Swin Transformer 백본을 사용해 계산 비용을 낮추면서 높은 정확도를 달성합니다. 이 방법은 UHD-IQA에서 최첨단 결과를 얻고 MAC를 크게 줄였으며 AIM 2024 UHD-IQA 도전에서 우승했습니다.

ABSTRACT

UHD images, typically with resolutions equal to or higher than 4K, pose a significant challenge for efficient image quality assessment (IQA) algorithms, as adopting full-resolution images as inputs leads to overwhelming computational complexity and commonly used pre-processing methods like resizing or cropping may cause substantial loss of detail. To address this problem, we design a multi-branch deep neural network (DNN) to assess the quality of UHD images from three perspectives: global aesthetic characteristics, local technical distortions, and salient content perception. Specifically, aesthetic features are extracted from low-resolution images downsampled from the UHD ones, which lose high-frequency texture information but still preserve the global aesthetics characteristics. Technical distortions are measured using a fragment image composed of mini-patches cropped from UHD images based on the grid mini-patch sampling strategy. The salient content of UHD images is detected and cropped to extract quality-aware features from the salient regions. We adopt the Swin Transformer Tiny as the backbone networks to extract features from these three perspectives. The extracted features are concatenated and regressed into quality scores by a two-layer multi-layer perceptron (MLP) network. We employ the mean square error (MSE) loss to optimize prediction accuracy and the fidelity loss to optimize prediction monotonicity. Experimental results show that the proposed model achieves the best performance on the UHD-IQA dataset while maintaining the lowest computational complexity, demonstrating its effectiveness and efficiency. Moreover, the proposed model won first prize in ECCV AIM 2024 UHD-IQA Challenge. The code is available at https://github.com/sunwei925/UIQA.

연구 동기 및 목표

UHD(4K+) 이미지의 전체 해상도 직접 처리 없이 효율적인 품질 평가를 추진한다.
이미지 품질을 글로벌 미학, 로컬 distortions, 그리고 salient content로 분해해 계산을 줄인다.
다운샘플링 전략과 다-브랜치 아키텍처를 제안해 품질 인식 특징을 추출한다.
저 연산 비용으로 UHD-IQA에서 최첨단 성능을 보여준다.

제안 방법

다운샘플링된 UHD 이미지에서 미학을, 그리드 미니-패치 조각에서 왜곡을, 중심 잘려진 패치에서 주목도를 추출하는 세-브랜치 구조.
각 브랜치에 대해 AVA에서 사전학습된 Swin Transformer Tiny 백본을 사용해 품질 인식 특징을 추출한다.
브랜치 특징을 연결하고 두 층 MLP(128, 그다음 1)로 회귀해 품질 점수를 산출한다.
정확도와 순위 일관성을 최적화하기 위해 MSE와 충실도 손실의 결합 손실로 학습한다.
사전처리 전략: 미학을 위해 x_l 크기 조정, 왜곡을 위해 격자 미니 패치에서 x_f 조각, 주목도을 위해 중앙 크롭 x_s.

실험 결과

연구 질문

RQ1전체 해상도 처리를 하지 않고도 UHD 이미지 품질을 미학, 왜곡, 주목도로 분해해 정확히 예측할 수 있는가?
RQ2다운샘플링 및 패치 기반 전략이 신뢰할 수 있는 UHD IQA를 위한 충분한 정보를 보존하는가?
RQ3다-branched Swin Transformer 접근이 UHD-IQA 데이터에서 정확도와 효율성을 향상시키는가?
RQ4AVA 사전학습과 백본 선택이 UHD-IQA 성능에 미치는 영향은?
RQ5제안된 방법이 정확도와 계산 측면에서 기존의 최첨단 UHD IQA 방법들과 어떻게 비교되는가?

주요 결과

방법	SRCC	PLCC	KRCC	RMSE	MAE	MACs (G)
HyperIQA	0.553	0.103	0.389	0.118	0.070	211
Effnet-2C-MLSP	0.615	0.627	0.445	0.060	0.050	345
CONTRIQUE	0.716	0.712	0.521	0.049	0.038	855
ARNIQA	0.718	0.717	0.523	0.050	0.039	855
CLIP-IQA+	0.743	0.732	0.546	0.108	0.087	895
QualiCLIP	0.757	0.752	0.557	0.079	0.064	901
Proposed	0.817	0.823	0.625	0.040	0.032	43.5

제안된 방법은 UHD-IQA에서 비교 대상 방법 중 가장 높은 SRCC를 달성한다(검증 0.817, 테스트 0.846).
테스트 세트에서 제안 방법은 SRCC 0.846, PLCC 0.798, KRCC 0.657, RMSE 0.061, MAE 0.042를 달성하고 MACs는 43.5G이다.
방법은 경쟁 방법에 비해 계산 비용(MACs)을 크게 줄인다(43.5 vs 211+ G MACs).
고찰 연구에서 왜곡 분기가 가장 영향력이 크며, 미학과 왜곡의 조합이 강력한 성능을 제공한다.
AVA에서 Swin-T 사전학습 및 더 큰 백본(Swin-B)을 사용하면 이미지넷 사전학습보다 성능이 더 향상된다.
제안된 다운샘플링 프레임워크 내에서 입력 해상도를 높이면 성능이 향상되지만 전체 UHD 해상도에는 아직 미치지 못한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.