QUICK REVIEW

[论文解读] PackUV: Packed Gaussian UV Maps for 4D Volumetric Video

Aashish Rai, Angela Xing|arXiv (Cornell University)|Feb 26, 2026

Video Coding and Compression Technologies被引用 0

一句话总结

PackUV 引入基于 UV-map 的4D 高斯表示，附带一个用于紧凑且兼容编解码器存储与长时间流式传输的 atlas 金字塔，以及保留时间一致性的拟合方法 PackUV-GS。它还提出 PackUV-2B，迄今最大的长时域多视角4D数据集。

ABSTRACT

Volumetric videos offer immersive 4D experiences, but remain difficult to reconstruct, store, and stream at scale. Existing Gaussian Splatting based methods achieve high-quality reconstruction but break down on long sequences, temporal inconsistency, and fail under large motions and disocclusions. Moreover, their outputs are typically incompatible with conventional video coding pipelines, preventing practical applications. We introduce PackUV, a novel 4D Gaussian representation that maps all Gaussian attributes into a sequence of structured, multi-scale UV atlas, enabling compact, image-native storage. To fit this representation from multi-view videos, we propose PackUV-GS, a temporally consistent fitting method that directly optimizes Gaussian parameters in the UV domain. A flow-guided Gaussian labeling and video keyframing module identifies dynamic Gaussians, stabilizes static regions, and preserves temporal coherence even under large motions and disocclusions. The resulting UV atlas format is the first unified volumetric video representation compatible with standard video codecs (e.g., FFV1) without losing quality, enabling efficient streaming within existing multimedia infrastructure. To evaluate long-duration volumetric capture, we present PackUV-2B, the largest multi-view video dataset to date, featuring more than 50 synchronized cameras, substantial motion, and frequent disocclusions across 100 sequences and 2B (billion) frames. Extensive experiments demonstrate that our method surpasses existing baselines in rendering fidelity while scaling to sequences up to 30 minutes with consistent quality.

研究动机与目标

解决4D体积视频重建中的内存与流式传输瓶颈。
创建与标准视频编解码器兼容的统一4D高斯表示。
开发可处理大运动和可错位的时间一致性拟合流程。
提供用于评估体积视频方法的大规模长时域数据集。

提出的方法

PackUV 将3D高斯属性打包成一系列多尺度UV纹理图集，并可视化为逐步的 atlas 金字塔。
UV 纹理图层几何下采样形成金字塔，并在每帧打包为单一纹理图集，便于编解码器友好存储。
PackUV-GS 直接在UV空间对高斯参数进行优化，固定层数，并以光流关键帧标记与动态/静态高斯标记方案引导。
一个基于光流的高斯标记方法用于识别动态高斯并冻结静态高斯以强化时间一致性。
两种UV空间剪枝策略（有效UV投影剪枝和最大-K UV剪枝）确保稀疏性与效率。
在UV表示中直接进行低精度优化，实现每通道8位存储，兼容标准编解码器（FFV1/HEVC）。
视频关键帧将长序列分段，使用基于 RAFT 的光流来确定关键帧并传播更新。

Figure 2 : (Top) Three UV-map organization strategies: (a) naïvely stacking UV layers (deep layers become more and more sparse); (b) a geometric-progression UV pyramid (more uniform sparsity with less storage); (c) PackUV, which packs all pyramid layers into a single UV atlas for efficient, codec-fr

实验结果

研究问题

RQ1PackUV 能否在保持重建质量的前提下实现面向图像的紧凑存储以保存4D体积视频？
RQ2PackUV-GS 能否在UV空间直接对长时动态场景进行拟合，在大运动和错位下保持时间一致性？
RQ3与现有4D高斯方法相比，PackUV 在渲染质量和可扩展性方面有何差异？
RQ4PackUV 表示是否与标准视频编解码器兼容以用于流式传输与部署？
RQ5图集打包与UV空间剪枝对性能和内存使用有何影响？

主要发现

Method	PackUV-2B PSNR	PackUV-2B SSIM	PackUV-2B LPIPS	PackUV-2B Train	SelfCap PSNR	SelfCap SSIM	SelfCap LPIPS	SelfCap Train	N3DV PSNR	N3DV SSIM	N3DV LPIPS	N3DV Train
3DGStream	23.17	0.826	0.33	1.00	19.77	0.769	0.36	1.43	31.17	0.952	0.23	0.31
4DGS	23.11	0.808	0.35	2.30	19.56	0.745	0.37	3.18	29.81	0.951	0.21	3.27
RealTime	21.37	0.790	0.38	4.48	19.46	0.747	0.41	8.07	32.29	0.955	0.21	2.48
Deformable	20.07	0.778	0.33	2.04	17.89	0.708	0.38	2.09	26.51	0.935	0.24	0.62
ATGS	21.42	0.796	0.36	1.13	15.48	0.664	0.51	1.82	30.99	0.934	0.24	1.97
Grid4D	21.58	0.790	0.37	1.13	17.53	0.701	0.44	1.82	30.87	0.954	0.197	1.97
Ex4DGS	20.73	0.789	0.39	0.83	17.62	0.680	0.39	1.23	31.57	0.944	0.23	0.59
GIFStream	21.92	0.795	0.39	1.61	19.78	0.745	0.35	2.05	31.10	0.954	0.25	0.42
Ours (PackUV)	27.41	0.842	0.28	1.05	22.52	0.783	0.31	1.12	32.81	0.953	0.21	1.37

通过将高斯属性打包到UV纹理图集和单一渐进式图集，PackUV 实现了面向图像原生存储与编解码器友好流式传输。
PackUV-GS 直接在UV空间对高斯进行优化，结合光流引导的关键帧标记，在长序列上实现时间一致性结果（最长可达30分钟）。
PackUV 在多个数据集上实现高于基线的 PSNR/SSIM、低于 LPIPS，展示更高的渲染保真度与长时域稳定性。
PackUV-2B 是迄今最大的真实世界长时域多视角4D数据集（100 个序列、>2B 帧、50+ 摄像头）用于评估动态体积重建。
视频编码实验表明 PackUV 可使用 FFV1 无损编码，且每通道8位的UV数据与标准编解码器兼容，便于实际流式传输。

Figure 3 : PackUV-GS vs. baselines for large motion and disocclusion handling. The proposed keyframing and Gaussian labeling strategy effectively manages complex scenarios, such as new objects or people entering a room and dispersing. Zoom to view better.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。