[论文解读] AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results
本论文报告了 AIM 2024 Challenge 的压缩视频质量评估结果,详细说明数据集、评估协议、参与团队以及利用视觉-语言模型特征的顶尖方法。
Video quality assessment (VQA) is a crucial task in the development of video compression standards, as it directly impacts the viewer experience. This paper presents the results of the Compressed Video Quality Assessment challenge, held in conjunction with the Advances in Image Manipulation (AIM) workshop at ECCV 2024. The challenge aimed to evaluate the performance of VQA methods on a diverse dataset of 459 videos, encoded with 14 codecs of various compression standards (AVC/H.264, HEVC/H.265, AV1, and VVC/H.266) and containing a comprehensive collection of compression artifacts. To measure the methods performance, we employed traditional correlation coefficients between their predictions and subjective scores, which were collected via large-scale crowdsourced pairwise human comparisons. For training purposes, participants were provided with the Compressed Video Quality Assessment Dataset (CVQAD), a previously developed dataset of 1022 videos. Up to 30 participating teams registered for the challenge, while we report the results of 6 teams, which submitted valid final solutions and code for reproducing the results. Moreover, we calculated and present the performance of state-of-the-art VQA methods on the developed dataset, providing a comprehensive benchmark for future research. The dataset, results, and online leaderboard are publicly available at https://challenges.videoprocessing.ai/challenges/compressedvideo-quality-assessment.html.
研究动机与目标
- Promote advancements in compressed video quality assessment (VQA) for diverse codecs and artifacts.
- Provide a large, diverse dataset with subjective scores to evaluate VQA methods.
- Benchmark state-of-the-art VQA approaches and establish reproducible baselines.
- Encourage participation and reproducibility through code submissions and an online leaderboard.
提出的方法
- Expanded the CVQAD dataset with 459 validation/testing videos across AVC, HEVC, AV1, and VVC codecs.
- Collected ground-truth subjective scores via crowdsourced pairwise comparisons and processed them with Bradley-Terry models.
- Evaluated methods using SROCC, KROCC, and PLCC, averaged to form the final score.
- Provided a training set (CVQAD) with 1022 videos for development, and a public/private test split for evaluation.
- Compared submitted methods against baselines MS-SSIM (FR) and VSFA (NR).
- Reported final rankings and analyzed top methods focusing on features from Visual-Language Models (VLMs).
实验结果
研究问题
- RQ1How well do contemporary VQA methods (including NR/FR approaches) correlate with subjective quality for compressed videos across multiple codecs?
- RQ2What architectural and feature choices (e.g., VLM-based features, multi-aspect quality modeling) yield strongest performance in compressed-VQA scenarios?
- RQ3How does model size relate to correlation performance in the AIM 2024 compressed video quality assessment setting?
- RQ4What is the impact of including compression-dynamics or temporal information in VQA models for compressed content?
主要发现
| Rank | Team | Type | SROCC ↑ | PLCC ↑ | KROCC ↑ | Result ↑ | #Params.(M) |
|---|---|---|---|---|---|---|---|
| 1 | TVQA-C | NR Video | 0.9376 | 0.9772 | 0.8505 | 0.9218 | 388.34 |
| 2 | SJTU-MultimediaLab | NR Video | 0.9378 | 0.9680 | 0.8442 | 0.9167 | 91.50 |
| 3 | FudanVIP | NR Video | 0.9113 | 0.9568 | 0.8009 | 0.8896 | 288 |
| 4 | Test IQA | FR Image | 0.8873 | 0.9497 | 0.7605 | 0.8658 | — |
| 5 | Fredlovematt | NR Image | 0.8688 | 0.9411 | 0.7617 | 0.8572 | 7 |
| 6 | VPT | NR Image | 0.8160 | 0.7741 | 0.5542 | 0.7148 | 317.3 |
| MS-SSIM [52] (baseline) | FR Image | 0.9149 | 0.9531 | 0.8062 | 0.8914 | ||
| VSFA [27] (baseline) | NR Video | 0.8844 | 0.9403 | 0.7913 | 0.8720 |
- Top-3 teams (TVQA-C, SJTU-MultimediaLab, FudanVIP) achieved highest SROCC, PLCC, and KROCC, outperforming the VSFA baseline.
- TVQA-C achieved SROCC 0.9376, PLCC 0.9772, KROCC 0.8505, with 388.34M parameters; SJTU-MultimediaLab achieved SROCC 0.9378, PLCC 0.9680, KROCC 0.8442, with 91.50M parameters; FudanVIP achieved SROCC 0.9113, PLCC 0.9568, KROCC 0.8009, with 288M parameters.
- All top-performing solutions leveraged features from Visual-Language Models (VLMs) and employed multi-aspect strategies (aesthetics/technical quality, local/global temporal information).
- The SJTU-MultimediaLab method included a module predicting video compression degree, highlighting compression-aware modeling as important for VQA.
- Baseline methods MS-SSIM (FR) and VSFA (NR) were outperformed by the top teams on the test set.
- The paper provides a public dataset, online leaderboard, and reproducible evaluation protocol.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。