QUICK REVIEW

[论文解读] DisSR: Disentangling Speech Representation for Degradation-Prior Guided Cross-Domain Speech Restoration

Ziqi Liang, Zhijun Jia|arXiv (Cornell University)|Feb 13, 2026

Speech and Audio Processing被引用 0

一句话总结

DisSR 引入了一个解耦的语音表征框架，结合降解先验引导和跨域适配，实现对多种失真类型的通用扩散式语音修复。

ABSTRACT

Previous speech restoration (SR) primarily focuses on single-task speech restoration (SSR), which cannot address general speech restoration problems. Training specific SSR models for different distortions is time-consuming and lacks generality. In addition, most studies ignore the problem of model generalization across unseen domains. To overcome those limitations, we propose DisSR, a Disentangling Speech Representation based general speech restoration model with two properties: 1) Degradation-prior guidance, which extracts speaker-invariant degradation representation to guide the diffusion-based speech restoration model. 2) Domain adaptation, where we design cross-domain alignment training to enhance the model's adaptability and generalization on cross-domain data, respectively. Experimental results demonstrate that our method can produce high-quality restored speech under various distortion conditions. Audio samples can be found at https://itspsp.github.io/DisSR.

研究动机与目标

解决单任务语音超分模型针对特定失真而缺乏通用性的问题。
实现对未见降解类型的跨域泛化。
利用降解先验引导推动基于扩散的修复。
通过领域自适应训练提升说话人不变的修复性能。

提出的方法

提取说话人不变的降解表征以引导扩散式修复模型（降解先验引导）。
通过跨域对齐训练提升跨域适应性与泛化能力。
使用解耦的语音表征框架将降解与内容分离。
在跨域设定下训练以促进领域迁移能力。

实验结果

研究问题

RQ1降解先验引导信号是否能提高扩散式超分在多样失真下的鲁棒性？
RQ2跨域对齐训练是否提升未见领域的泛化？
RQ3解耦表示是否能有效将降解与语音内容分离以用于修复？
RQ4与基线相比，所提出的 DisSR 在不同失真条件下的表现如何？

主要发现

该方法在各种失真条件下实现高质量的修复语音（如作者所述）。
通过降解先验引导和跨域对齐训练，DisSR 展示出改进的泛化能力。
该框架在单一模型中同时解决了失真提取和领域适应性问题。
实验结果验证了解耦表征在跨域语音修复任务中的有效性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。