QUICK REVIEW

[論文レビュー] PackUV: Packed Gaussian UV Maps for 4D Volumetric Video

Aashish Rai, Angela Xing|arXiv (Cornell University)|Feb 26, 2026

Video Coding and Compression Technologies被引用数 0

ひとこと要約

PackUVはUVマップベースの4Dガウス表現とアトラスピラミッドを用いてコンパクトなストレージとコーデック対応の長期間ストリーミングを実現し、時間的一貫性を保つ適合法PackUV-GSを提供する。加えて最大規模の長期横断4DデータセットPackUV-2Bを紹介。

ABSTRACT

Volumetric videos offer immersive 4D experiences, but remain difficult to reconstruct, store, and stream at scale. Existing Gaussian Splatting based methods achieve high-quality reconstruction but break down on long sequences, temporal inconsistency, and fail under large motions and disocclusions. Moreover, their outputs are typically incompatible with conventional video coding pipelines, preventing practical applications. We introduce PackUV, a novel 4D Gaussian representation that maps all Gaussian attributes into a sequence of structured, multi-scale UV atlas, enabling compact, image-native storage. To fit this representation from multi-view videos, we propose PackUV-GS, a temporally consistent fitting method that directly optimizes Gaussian parameters in the UV domain. A flow-guided Gaussian labeling and video keyframing module identifies dynamic Gaussians, stabilizes static regions, and preserves temporal coherence even under large motions and disocclusions. The resulting UV atlas format is the first unified volumetric video representation compatible with standard video codecs (e.g., FFV1) without losing quality, enabling efficient streaming within existing multimedia infrastructure. To evaluate long-duration volumetric capture, we present PackUV-2B, the largest multi-view video dataset to date, featuring more than 50 synchronized cameras, substantial motion, and frequent disocclusions across 100 sequences and 2B (billion) frames. Extensive experiments demonstrate that our method surpasses existing baselines in rendering fidelity while scaling to sequences up to 30 minutes with consistent quality.

研究の動機と目的

4D体積動画再構成におけるメモリとストリーミングのボトルネックを解決する。
標準の動画コーデックと互換性のある統一的な4Dガウス表現を作成する。
大規模な動きとディソカーションを扱える時間的一貫性のある適合パイプラインを開発する。
長期水平データセットを提供し、体積動画手法の評価を行えるようにする。

提案手法

PackUVは3Dガウス属性を多段UVアトラスのシーケンスにパックし、進行的アトラスピラミッドとして可視化する。
UVアトラス層は幾何学的にダウンサンプリングされ、フレームごとに単一のアトラスへパックされ、コーデック対応のストレージを実現する。
PackUV-GSは固定レイヤー数でUV空間上のガウスパラメータを直接最適化し、光学フローのキーフレーミングと動的/静的ガウスのラベリングを導入してガウスを指示する。
フローガイド付きのガウスラベリング手法は動的ガウスを識別し、時間的一貫性を保つために静的なものを凍結する。
2つのUV空間プ pruning戦略（有効UV投影 pruning および max-K UV pruning）により疎性と効率を確保する。
UV表現で低精度最適化を行うことで、標準コーデック（FFV1/HEVC）と互換性のある8ビットチャンネルあたりのストレージを実現する。
ビデオキーフレーミングは長いシーケンスを分割し、RAFTベースの光学フローを用いてキーフレームを決定し更新を伝搬する。

Figure 2 : (Top) Three UV-map organization strategies: (a) naïvely stacking UV layers (deep layers become more and more sparse); (b) a geometric-progression UV pyramid (more uniform sparsity with less storage); (c) PackUV, which packs all pyramid layers into a single UV atlas for efficient, codec-fr

実験結果

リサーチクエスチョン

RQ1PackUVは4D体積動画の再構成品質を保ちながら画像ネイティブな圧縮ストレージを実現できるか？
RQ2PackUV-GSは大きな動きとディソカーション下でUV空間に直接長時間の動的シーンを時間的一貫性とともに適合できるか？
RQ3PackUVはレンダリング品質とスケーラビリティの点で従来の4Dガウス法とどう比較されるか？
RQ4PackUV表現はストリーミングとデプロイメントのために標準の動画コーデックと互換性があるか？
RQ5アトラスパッキングとUV空間プルーニングが性能とメモリ使用量に与える影響は何か？

主な発見

Method	PackUV-2B PSNR	PackUV-2B SSIM	PackUV-2B LPIPS	PackUV-2B Train	SelfCap PSNR	SelfCap SSIM	SelfCap LPIPS	SelfCap Train	N3DV PSNR	N3DV SSIM	N3DV LPIPS	N3DV Train
3DGStream	23.17	0.826	0.33	1.00	19.77	0.769	0.36	1.43	31.17	0.952	0.23	0.31
4DGS	23.11	0.808	0.35	2.30	19.56	0.745	0.37	3.18	29.81	0.951	0.21	3.27
RealTime	21.37	0.790	0.38	4.48	19.46	0.747	0.41	8.07	32.29	0.955	0.21	2.48
Deformable	20.07	0.778	0.33	2.04	17.89	0.708	0.38	2.09	26.51	0.935	0.24	0.62
ATGS	21.42	0.796	0.36	1.13	15.48	0.664	0.51	1.82	30.99	0.934	0.24	1.97
Grid4D	21.58	0.790	0.37	1.13	17.53	0.701	0.44	1.82	30.87	0.954	0.197	1.97
Ex4DGS	20.73	0.789	0.39	0.83	17.62	0.680	0.39	1.23	31.57	0.944	0.23	0.59
GIFStream	21.92	0.795	0.39	1.61	19.78	0.745	0.35	2.05	31.10	0.954	0.25	0.42
Ours (PackUV)	27.41	0.842	0.28	1.05	22.52	0.783	0.31	1.12	32.81	0.953	0.21	1.37

PackUVはガウス属性をUVアトラスと単一の進行的アトラスにパックすることで画像ネイティブのストレージを実現し、コーデック対応のストリーミングを可能にする。
PackUV-GSはフローガイド付きキーフレーミングによりUV空間のガウスを直接最適化し、長いシーケンス（最大30分）で時間的一貫性のある結果を達成する。
PackUVはデータセット全域で従来手法より高いPSNR/SSIM、低いLPIPSを実現し、レンダリング忠実度と長距離安定性が向上している。
PackUV-2Bは動的体積再構成を評価するための最大規模の実世界長時間多視点4Dデータセットであり、100シーケンス、>2Bフレーム、50以上のカメラを含む。
ビデオコーディング実験ではPackUVはFFV1でロスレスにエンコード可能であり、UVデータの8ビットチャンネルは標準コーデックと互換性があり、実用的なストリーミングを実現できる。

Figure 3 : PackUV-GS vs. baselines for large motion and disocclusion handling. The proposed keyframing and Gaussian labeling strategy effectively manages complex scenarios, such as new objects or people entering a room and dispersing. Zoom to view better.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。