QUICK REVIEW

[论文解读] Generative Semantic Communication: Diffusion Models Beyond Bit Recovery

Eleonora Grassucci, Sergio Barbarossa|arXiv (Cornell University)|Jun 7, 2023

Generative Adversarial Networks and Image Synthesis被引用 20

一句话总结

该论文提出基于扩散模型的语义通信框架，传输压缩的语义地图，并使用快速去噪块和有噪声训练，在不同信道条件下合成语义上相符的图像，优于现有方法。

ABSTRACT

Semantic communication is expected to be one of the cores of next-generation AI-based communications. One of the possibilities offered by semantic communication is the capability to regenerate, at the destination side, images or videos semantically equivalent to the transmitted ones, without necessarily recovering the transmitted sequence of bits. The current solutions still lack the ability to build complex scenes from the received partial information. Clearly, there is an unmet need to balance the effectiveness of generation methods and the complexity of the transmitted information, possibly taking into account the goal of communication. In this paper, we aim to bridge this gap by proposing a novel generative diffusion-guided framework for semantic communication that leverages the strong abilities of diffusion models in synthesizing multimedia content while preserving semantic features. We reduce bandwidth usage by sending highly-compressed semantic information only. Then, the diffusion model learns to synthesize semantic-consistent scenes through spatially-adaptive normalizations from such denoised semantic information. We prove, through an in-depth assessment of multiple scenarios, that our method outperforms existing solutions in generating high-quality images with preserved semantic information even in cases where the received content is significantly degraded. More specifically, our results show that objects, locations, and depths are still recognizable even in the presence of extremely noisy conditions of the communication channel. The code is available at https://github.com/ispamm/GESCO.

研究动机与目标

通过聚焦语义内容的保留，推动一种超越比特恢复的语义通信范式。
提出一个扩散引导的框架，在传输的语义布局条件下生成照片级真实感的图像。
通过仅传输有信息量的一热编码语义地图并在接收端进行去噪来实现带宽压缩，从而实现鲁棒的图像合成。
用带噪声的语义地图训练扩散模型，以应对不利信道条件。
在多个数据集和信道场景中展示鲁棒性和语义保真度。

提出的方法

传输一热编码的压缩语义地图，而非完整图像，以节省带宽。
使用以传输的地图为条件的语义扩散模型来合成语义一致的图像。
在推理阶段引入一个快速去噪语义（FDS）块，用于清理接收到的嘈杂地图。
用带噪声的地图对扩散模型进行训练，以学习对信道鲁棒的生成。
应用无分类器引导以提升有条件生成质量。
用结合去噪损失项（L_d）和 KL 散度损失项（L_KL）的损失函数进行优化。

实验结果

研究问题

RQ1基于扩散的生成模型是否能够从严重降质的语义地图中重建具有语义保真性的图像？
RQ2在 AWGN 条件下，传输语义地图（而非原始图像）对带宽和语义保真度有何影响？
RQ3用带噪声的地图进行训练并使用快速去噪在跨数据集和不同 PSNR 区间是否能提升视觉和语义质量？
RQ4在语义保留、感知质量和生成真实度方面，所提方法与 SPADE、CC-FPSE、SMIS、OASIS 和 SDM 相较如何？

主要发现

Method	mIoU ↑	PSNR 100	PSNR 30	PSNR 20	PSNR 15	PSNR 10	PSNR 5
Full image	-	0.955 ± .032	0.911 ± .155	0.906 ± .247	0.906 ± .339	0.240 ± .193	0.110 ± .298
SPADE park2019SPADE	0.909 ± .127	0.914 ± .255	0.921 ± .315	0.812 ± .364	0.672 ± .321	0.253 ± .288	0.313 ± .144
CC-FPSE liu2019learning	0.908 ± .045	0.908 ± .121	0.911 ± .315	0.928 ± .345	0.852 ± .245	0.653 ± .183	0.322 ± .284
SMIS Zhu2020SemanticallyMI	0.909 ± .064	0.919 ± .066	0.909 ± .214	0.931 ± .208	0.901 ± .244	0.899 ± .290	0.876 ± .211
OASIS schonfeld2021you	0.910 ± .111	0.908 ± .191	0.912 ± .232	0.697 ± .165	0.662 ± .356	0.345 ± .112	0.232 ± .191
SDM Wang2022SemanticIS	0.921 ± .051	0.340 ± .022	0.333 ± .061	0.351 ± .011	0.297 ± .021	0.256 ± .019	0.211 ± .043
Our method	0.940 ± .014	0.942 ± .212	0.944 ± .297	0.945 ± .141	0.905 ± .112	0.913 ± .214	0.925 ± .111

所提方法在各 PSNR 值下均实现更高的语义保真度（mIoU），包括非常低的 PSNR。
LPIPS 分数在各种信道条件下表明所提方法具有更好的感知相似性。
FID 分数显示所提方法在跨 PSNR 的生成误差更低（更好）且鲁棒性更强。
在 Cityscapes 上，该方法在降质信道下保持对象含义和深度可见性。
在 COCO-Stuff 上，即使在 PSNR=10 时也能获得有意义的样本和有竞争力的语义指标。
该方法在 Cityscapes 上将传输比特预算约降低 92%，同时保持语义质量。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。