QUICK REVIEW

[论文解读] Uni-LVC: A Unified Method for Intra- and Inter-Mode Learned Video Compression

Yichi Zhang, Ruoyu Yang|arXiv (Cornell University)|Mar 5, 2026

Advanced Data Compression Techniques被引用 0

一句话总结

Uni-LVC 提出一个单一的学习视频压缩模型，通过跨注意力模块的时序线索条件化以及基于可靠性的门控，在多阶段训练策略下同时处理帧内、低延迟帧间和随机访问帧间压缩。

ABSTRACT

Recent advances in learned video compression (LVC) have led to significant performance gains, with codecs such as DCVC-RT surpassing the H.266/VVC low-delay mode in compression efficiency. However, existing LVCs still exhibit key limitations: they often require separate models for intra and inter coding modes, and their performance degrades when temporal references are unreliable. To address this, we introduce Uni-LVC, a unified LVC method that supports both intra and inter coding with low-delay and random-access in a single model. Building on a strong intra-codec, Uni-LVC formulates inter-coding as intra-coding conditioned on temporal information extracted from reference frames. We design an efficient cross-attention adaptation module that integrates temporal cues, enabling seamless support for both unidirectional (low-delay) and bidirectional (random-access) prediction modes. A reliability-aware classifier is proposed to selectively scale the temporal cues, making Uni-LVC behave closer to intra coding when references are unreliable. We further propose a multistage training strategy to facilitate adaptive learning across various coding modes. Extensive experiments demonstrate that Uni-LVC achieves superior rate-distortion performance in intra and inter configurations while maintaining comparable computational efficiency.

研究动机与目标

需要一个支持帧内和帧间编码模式且无需重新训练的单一模型的动机。
建立一个强大的帧内编解码骨架，为统一编码奠定基础。
设计高效的时序条件化机制和基于可靠性的门控，以应对不可靠的参考帧。
引入多阶段训练策略，以实现对帧内、LD 和 RA 模式的自适应学习。
在不同编码配置下展示具有竞争力的率失真性能，同时保持效率。

提出的方法

将帧间编码表述为在参考帧提取的时序特征条件化的帧内编码。
基于 DCVC-RT 构建强大的帧内编解码骨架，并对熵建模与量化进行增强。
引入轻量级的跨注意力适配模块，通过变形邻域跨注意力和极性感知线性跨注意力融合时序线索。
引入一个基于可靠性的分类器，动态门控是否使用时序特征，以基于参考的可靠性维持鲁棒性。
采用多阶段、基于课程学习的训练策略，结合知识回放，使模型在 AI、LD、RA 模式下实现跨模态学习。

实验结果

研究问题

RQ1单一模型是否能够在不使用单独模型的情况下有效支持帧内、低延迟帧间和随机访问帧间视频编码？
RQ2如何将时序线索稳健地整合以使帧间编码在帧内架构上进行条件化？
RQ3哪些训练策略能够使统一模型在多样的编码模式下都表现良好？
RQ4如何在不降低帧内性能的前提下处理不可靠的时序参考？
RQ5哪些机制能够确保跨模式的实际率失真性能与效率？

主要发现

Uni-LVC 在帧内和帧间配置下的率失真性能优于现有的学习型编解码器。
跨注意力适配模块有效地将时序线索整合到帧内骨架中以支持帧间编码。
基于可靠性的分类器动态门控时序特征，在不可靠参考下保持鲁棒性。
多阶段训练课程实现了帧内、LD 与 RA 模式的一体化学习，优化过程稳定。
该方法在扩展模式支持的同时，保持了与现有学习型编解码器相当的计算效率。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。