QUICK REVIEW

[论文解读] UniE2F: A Unified Diffusion Framework for Event-to-Frame Reconstruction with Video Foundation Models

Gang Xu, Zhiyu Zhu|arXiv (Cornell University)|Feb 22, 2026

Advanced Memory and Neural Computing被引用 0

一句话总结

UniE2F 使用预训练视频扩散模型并以稀疏事件数据引导，在零-shot 下实现视频帧内插与预测，提出基于事件的帧间残差引导并在事件表示上进行微调。

ABSTRACT

Event cameras excel at high-speed, low-power, and high-dynamic-range scene perception. However, as they fundamentally record only relative intensity changes rather than absolute intensity, the resulting data streams suffer from a significant loss of spatial information and static texture details. In this paper, we address this limitation by leveraging the generative prior of a pre-trained video diffusion model to reconstruct high-fidelity video frames from sparse event data. Specifically, we first establish a baseline model by directly applying event data as a condition to synthesize videos. Then, based on the physical correlation between the event stream and video frames, we further introduce the event-based inter-frame residual guidance to enhance the accuracy of video frame reconstruction. Furthermore, we extend our method to video frame interpolation and prediction in a zero-shot manner by modulating the reverse diffusion sampling process, thereby creating a unified event-to-frame reconstruction framework. Experimental results on real-world and synthetic datasets demonstrate that our method significantly outperforms previous approaches both quantitatively and qualitatively. We also refer the reviewers to the video demo contained in the supplementary material for video results. The code will be publicly available at https://github.com/CS-GangXu/UniE2F.

研究动机与目标

使用预训练视频扩散模型将稀疏事件数据与丰富的视频纹理连接起来。
引入基于事件的帧间残差引导，以提高帧的保真度。
通过在逆扩散采样中调制扩散分数，实现零-shot 的视频帧内插和预测。
提供一个统一的事件到帧重建框架，涵盖重建、内插和预测任务。
在真实与合成数据集上展示强大的定量与定性性能。

提出的方法

在编码的事件表示上对预训练的视频扩散模型（SVD）进行微调。
引入基于事件的帧间残差引导，以约束逆扩散过程中的帧间差异。
提出残差损失 L_residual，通过梯度下降更新去噪潜在变量，确保潜在更新落在数据流形上。
理论上证明残差引导在扩散模型数据流形的切向空间内运作，并收紧重建误差界。
通过在逆采样时用前一帧/后一帧对扩散分数进行调制，将其扩展为零-shot 的视频帧内插和预测。
给出一个在反向扩散采样中整合前后帧偏差以引导中间帧重建的算法。

Figure 1 : Illustration of the forward and backward diffusion processes for our UniE2F under the conditional event data. The right and left parts indicate the inputs and results of our algorithm, while in the central plot, the solid and dashed lines with the same color represent the reverse-time sam

实验结果

研究问题

RQ1如何用稀疏事件数据有效引导预训练视频扩散模型以重建高保真帧？
RQ2基于事件的帧间残差引导是否能提升重建精度并使结果保持在扩散模型的数据流形内？
RQ3是否可以用可用的参考帧将框架扩展到零-shot 的视频帧内插和预测？
RQ4在这一设定下，残差引导扩散的稳定性和质量的理论依据是什么？

主要发现

UniE2F 在真实世界和合成数据集上实现了最先进的重建质量，真实数据集的 MSE 为 0.0612、SSIM 为 0.4990、LPIPS 为 0.6740；合成数据集的 MSE 为 0.0167、SSIM 为 0.7100、LPIPS 为 0.3940。
该框架能够从接近灰度的事件输入中重建颜色丰富的视频，利用预训练的视频扩散先验。
基于事件的帧间残差引导通过使事件数据预测的帧间变化与模型生成的帧间变化对齐来提高重建精度。
零-shot 扩展实现了 4x 和 11x 的视频帧内插（VFI）以及视频帧预测（VFP），无需特定任务训练。
在单个 RTX A6000 上，大约需要 48 秒生成 12 帧 RGB，分辨率为 448x320 的重建时延。
定性结果显示颜色更自然、伪影更少；与现有方法相比，颜色基调仍存在因事件数据固有颜色限制而产生的差异。

Figure 2 : The schematic of the proposed framework, which integrates event-based inter-frame residual guidance during the inference stage. At step $t$ ( $t\leq\tau$ ), given event representations, we utilize a ResNet to predict the inter-frame residuals between consecutive frames. Then, these residu

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。