[论文解读] Advancing Zero-Shot Digital Human Quality Assessment through Text-Prompted Evaluation
本文提出 SJTU-H3D,首个大规模全身 DHQA 数据库,并提出一种零样本、无参考的数字人质量指数(DHQI),利用文本提示的语义亲和性、空间自然性和几何损失度量。该方法结合基于 CLIP 的语义、NIQE 以及二面角网格几何实现强的零-shot 性能。
Digital humans have witnessed extensive applications in various domains, necessitating related quality assessment studies. However, there is a lack of comprehensive digital human quality assessment (DHQA) databases. To address this gap, we propose SJTU-H3D, a subjective quality assessment database specifically designed for full-body digital humans. It comprises 40 high-quality reference digital humans and 1,120 labeled distorted counterparts generated with seven types of distortions. The SJTU-H3D database can serve as a benchmark for DHQA research, allowing evaluation and refinement of processing algorithms. Further, we propose a zero-shot DHQA approach that focuses on no-reference (NR) scenarios to ensure generalization capabilities while mitigating database bias. Our method leverages semantic and distortion features extracted from projections, as well as geometry features derived from the mesh structure of digital humans. Specifically, we employ the Contrastive Language-Image Pre-training (CLIP) model to measure semantic affinity and incorporate the Naturalness Image Quality Evaluator (NIQE) model to capture low-level distortion information. Additionally, we utilize dihedral angles as geometry descriptors to extract mesh features. By aggregating these measures, we introduce the Digital Human Quality Index (DHQI), which demonstrates significant improvements in zero-shot performance. The DHQI can also serve as a robust baseline for DHQA tasks, facilitating advancements in the field. The database and the code are available at https://github.com/zzc-1998/SJTU-H3D.
研究动机与目标
- 创建一个用于全身数字人感知质量评估的数据库(SJTU-H3D),以推动 DHQA 研究和基准测试。
- 开发一个零样本、无参考的 DHQA 方法,能够超越带标签的数据集的泛化。
- 整合语义、空间与几何线索,形成一个稳健的 DHQI 用于 DHQA 任务。
提出的方法
- 用七种失真类型,构建包含40个高质量纹理网格参考和1,120个失真实例的 SJTU-H3D。
- 使用六个立方体状视图的投影作为输入进行语义与空间质量分析。
- 使用基于 CLIP 的语义亲和性,结合正/负文本提示,获得语义质量分数。
- 纳入 NIQE 捕捉低级别的空间失真,并对其进行归一化以便整合。
- 从网格中提取基于二面角的几何损失,以量化结构退化并映射到质量分数。
- 通过求和聚合语义、空间和几何度量,形成 DHQI。
实验结果
研究问题
- RQ1在不依赖主观评分训练数据的前提下,零样本、无参考框架如何评估数字人体质量?
- RQ2语义、空间和几何特征的多模态组合是否能提升 DHQA 的泛化性和鲁棒性?
- RQ3基于文本提示的 CLIP 语义度量,结合低级 NIQE 与几何描述符,是否能可靠地预测全身数字人体的感知质量?
主要发现
- SJTU-H3D 提供了首个包含 40 个参考和 1,120 个失真的大规模全身 DHQA 数据库。
- 提出的 DHQI 提升了零-shot 性能,并能够与监督方法竞争。
- 三分支质量指数(语义亲和性、空间自然性、几何损失)在不经微调的情况下就可以有效聚合。
- 基于 CLIP 的带质量相关文本对的语义提示能捕捉 3D 投影中的内容感知失真。
- 二面角几何描述符与失真水平相关,支持稳健的几何损失度量。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。