QUICK REVIEW

[논문 리뷰] Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

Zibo Zhao, Zhiping Lai|ArXiv.org|2025. 01. 21.

Computer Graphics and Visualization Techniques인용 수 6

한 줄 요약

Hunyuan3D 2.0은 두 단계 시스템을 제시합니다(모양: Hunyuan3D-DiT with ShapeVAE; 질감: Hunyuan3D-Paint) 고해상도 질감 3D 자산을 생성하며 기하학 및 질감 품질 면에서 기존 방법을 능가합니다.

ABSTRACT

We present Hunyuan3D 2.0, an advanced large-scale 3D synthesis system for generating high-resolution textured 3D assets. This system includes two foundation components: a large-scale shape generation model -- Hunyuan3D-DiT, and a large-scale texture synthesis model -- Hunyuan3D-Paint. The shape generative model, built on a scalable flow-based diffusion transformer, aims to create geometry that properly aligns with a given condition image, laying a solid foundation for downstream applications. The texture synthesis model, benefiting from strong geometric and diffusion priors, produces high-resolution and vibrant texture maps for either generated or hand-crafted meshes. Furthermore, we build Hunyuan3D-Studio -- a versatile, user-friendly production platform that simplifies the re-creation process of 3D assets. It allows both professional and amateur users to manipulate or even animate their meshes efficiently. We systematically evaluate our models, showing that Hunyuan3D 2.0 outperforms previous state-of-the-art models, including the open-source models and closed-source models in geometry details, condition alignment, texture quality, and etc. Hunyuan3D 2.0 is publicly released in order to fill the gaps in the open-source 3D community for large-scale foundation generative models. The code and pre-trained weights of our models are available at: https://github.com/Tencent/Hunyuan3D-2

연구 동기 및 목표

고해상도 질감 3D 자산의 자동 생성을 다룬다.
모양과 질감 생성을 분리하여 품질과 유연성을 향상시킨다.
조건 정렬 및 질감 현실감을 높이기 위해 대규모 확산 트랜스포머와 기하학적 사전 지식을 활용한다.
디자이너와 개발자를 위한 생산 플랫폼(Hunyuan3D-Studio)을 제공한다.

제안 방법

두 단계 생성 파이프라인: 모양은 Hunyuan3D-DiT를 통한 뼈대 메시(바니어) 생성, 질감 맵 합성은 Hunyuan3D-Paint를 통한 질감(texture) 생성.
Shape 모델: latent 3D 토크나이제이션을 위한 ShapeVAE를 사용하고 중요도 샘플링, 이후 VAE 잠재 공간에서 작동하는 흐름 기반 확산 트랜스포머(flow matching objective).
Texture 모델: 이중 스트림 이미지 컨디셔닝 레퍼런스 네트워크를 통한 메쉬 조건화 다중 뷰 생성, 다중 뷰 및 기하학 컨디셔닝, 조밀 뷰 추론을 통한 질감 굽이 굽힘(stage).
Texture 전처리: 조명을 일정하게 유지하는 화이트 라이트 조명으로 이미지를 디라잇팅하여 조명 불변 질감 합성을 가능하게 함.
뷰포인트 전략: 기하학 인지적 8–12 뷰포인트를 그리드 방식으로 선택하여 질감 생성을 안내.
훈련 세부 정보: Stable Diffusion 2.x에서 파인튜닝, 512x512에서 80k 스텝, 학습률 5e-5; 텍스트- 및 이미지-대-텍스트 컨디셔닝(ControlNet, IP-Adapter)을 활용.

실험 결과

연구 질문

RQ1두 단계의 오픈 소스 기반 모델 접근 방식이 이미지 프롬프트에 정렬된 고충실도, 고해상도 3D 모양과 질감을 생성할 수 있는가?
RQ2모양과 질감 생성을 분리하는 것이 엔드-투-엔드 방법에 비해 기하 상세, 질감_realism, 다중视 각도 일관성을 향상시키는가?
RQ3기하학적 사전 지식과 다중 뷰 컨디셔닝이 생성 자산의 질감 매끄러움과 뷰-일관성에 어떤 영향을 주는가?
RQ4CLIP 기반, FID, CMMD, LPIPS 등의 인지적 및 작업 기반 지표가 기존 기준보다 향상됨을 보여주는가?
RQ5전문가와 비전문가 모두가 효과적으로 텍스처가 적용된 3D 자산을 생성하고 조작할 수 있는 생산 플랫폼이 존재하는가?

주요 결과

Method	V-IoU (↑)	S-IoU (↑)	Notes
3DShape2VecSet	87.88%	80.66%	reconstruction baselines
Michelangelo	84.93%	76.27%
Direct3D	88.43%	81.55%
Hunyuan3D-ShapeVAE	93.6%	89.16%	proposed method
ULIP-T
ULIP-I
Uni3D-T
Uni3D-I
Hunyuan3D-DiT			shape generation
TEXTure
Text2Tex
SyncMVD
Paint3D
TexPainter
Hunyuan3D-Paint			texture synthesis

Hunyuan3D-ShapeVAE는 베이스라인과 비교해 더 우수한 형태 재구성 IoU(V-IoU)와 거의 표면 IoU(S-IoU)를 보인다.
Hunyuan3D-DiT는 가장 강한 조건 추종 점수(ULIP-T/I, Uni3D-T/I)와 구멍이 없는 베어 메시를 구현한다.
Hunyuan3D-Paint는 CMMD, FID_CLIP, CLIP-score, LPIPS 지표에서 베이스라인 대비 최고의 질감 맵 품질을 제공한다.
Hunyuan3D 2.0으로 생성된 텍스처링된 3D 자산은 프롬프트에 대한 전체 이미지 기반 유사성과 의미적 정렬 측면에서 최고치를 달성한다(다양한 CLIP 기반 지표).
사용자 연구(참가자 50명, 결과 300건)는 Hunyuan3D 2.0이 이미지 조건 준수 및 인지 품질에서 비교 방법을 능가한다는 것을 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.