QUICK REVIEW

[논문 리뷰] Advancing Zero-Shot Digital Human Quality Assessment through Text-Prompted Evaluation

Zicheng Zhang, Wei Sun|arXiv (Cornell University)|2023. 07. 06.

Human Pose and Action Recognition인용 수 14

한 줄 요약

이 논문은 SJTU-H3D를 최초의 대규모 전신 DHQA 데이터베이스로 소개하고, 텍스트 프롬프트 기반의 의미적 친화도, 공간적 자연스러움, 기하 손실 측정을 활용한 제로샷 비참조 Digital Human Quality Index (DHQI)를 제안한다. 이 방법은 CLIP 기반 의미론, NIQE, 및 이접각(이접각) 메쉬 기하를 활용해 강력한 제로샷 성능을 달성한다.

ABSTRACT

Digital humans have witnessed extensive applications in various domains, necessitating related quality assessment studies. However, there is a lack of comprehensive digital human quality assessment (DHQA) databases. To address this gap, we propose SJTU-H3D, a subjective quality assessment database specifically designed for full-body digital humans. It comprises 40 high-quality reference digital humans and 1,120 labeled distorted counterparts generated with seven types of distortions. The SJTU-H3D database can serve as a benchmark for DHQA research, allowing evaluation and refinement of processing algorithms. Further, we propose a zero-shot DHQA approach that focuses on no-reference (NR) scenarios to ensure generalization capabilities while mitigating database bias. Our method leverages semantic and distortion features extracted from projections, as well as geometry features derived from the mesh structure of digital humans. Specifically, we employ the Contrastive Language-Image Pre-training (CLIP) model to measure semantic affinity and incorporate the Naturalness Image Quality Evaluator (NIQE) model to capture low-level distortion information. Additionally, we utilize dihedral angles as geometry descriptors to extract mesh features. By aggregating these measures, we introduce the Digital Human Quality Index (DHQI), which demonstrates significant improvements in zero-shot performance. The DHQI can also serve as a robust baseline for DHQA tasks, facilitating advancements in the field. The database and the code are available at https://github.com/zzc-1998/SJTU-H3D.

연구 동기 및 목표

전신 디지털 휴먼(SJTU-H3D)의 인지 품질 평가 데이터베이스를 구축하여 DHQA 연구 및 벤치마킹을 가능하게 한다.
레이블링된 데이터셋을 넘어서 일반화하는 제로샷 비참조 DHQA 방법을 개발한다.
의미적, 공간적, 기하적 단서를 통합하여 DHQA 작업을 위한 강건한 DHQI를 형성한다.

제안 방법

7가지 왜곡 유형에 걸쳐 40개의 고품질 텍스처링 메시 참조와 1,120개의 왜곡 샘플로 SJTU-H3D를 구성한다.
의미적 및 공간적 품질 분석을 위한 입력으로 6개의 큐브 유사 뷰의 투영을 사용한다.
양성/음성 텍스트 프롬프트와 함께 CLIP 기반의 의미적 친화도를 사용해 의미적 품질 점수를 얻는다.
저수준 공간 왜곡을 포착하고 이를 통합 가능하도록 정규화하기 위해 NIQE를 도입한다.
메시의 이접각 기반 기하 손실을 추출하여 구조적 저하를 정량화하고 이를 품질 점수에 매핑한다.
의미적, 공간적, 기하 측정치를 합산하여 DHQI를 형성한다.

실험 결과

연구 질문

RQ1제로샷, 비참조 프레임워크가 주제 평가 데이터에 의존하지 않고 디지털 휴먼 품질을 어떻게 평가할 수 있는가?
RQ2의미적, 공간적, 기하적 특징의 다중 모달 결합이 DHQA의 일반화 및 강인성을 개선할 수 있는가?
RQ3품질 관련 텍스트 쌍과 함께 CLIP 기반의 의미 측정이 저수준 NIQE 및 기하 기 descriptors와 결합되어 전신 디지털 휴문의 지각 품질을 신뢰성 있게 예측하는가?

주요 결과

SJTU-H3D는 40개의 참조와 1,120개의 왜곡 샘플로 구성된 최초의 대규모 전신 DHQA 데이터베이스를 제공한다.
제안된 DHQI는 제로샷 성능을 개선하고 감독 학습 방법과 겨룰 수 있다.
정교한 미세 조정 없이도 의미적 친화도, 공간적 자연스러움, 기하 손실의 세 가지 분기를 효과적으로 집계할 수 있다.
품질 관련 텍스트 쌍을 가진 CLIP 기반의 의미 프롬프트가 3D 투영에서 콘텐츠 인지 왜곡을 포착한다.
이접각 기하 기술자는 왜곡 수준과 상관관계가 있으며 견고한 기하 손실 측정치를 지원한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.