QUICK REVIEW

[论文解读] SEP-YOLO: Fourier-Domain Feature Representation for Transparent Object Instance Segmentation

Fan Zhang, Tao Yan|arXiv (Cornell University)|Mar 3, 2026

Advanced Neural Network Applications被引用 0

一句话总结

SEP-YOLO 引入双域框架，含频域细节增强模块和多尺度 refinement 流，提升透明对象实例分割的清晰度，在 Trans10K 与 GVD 上实现实时推理的 SOTA 性能。

ABSTRACT

Transparent object instance segmentation presents significant challenges in computer vision, due to the inherent properties of transparent objects, including boundary blur, low contrast, and high dependence on background context. Existing methods often fail as they depend on strong appearance cues and clear boundaries. To address these limitations, we propose SEP-YOLO, a novel framework that integrates a dual-domain collaborative mechanism for transparent object instance segmentation. Our method incorporates a Frequency Domain Detail Enhancement Module, which separates and enhances weak highfrequency boundary components via learnable complex weights. We further design a multi-scale spatial refinement stream, which consists of a Content-Aware Alignment Neck and a Multi-scale Gated Refinement Block, to ensure precise feature alignment and boundary localization in deep semantic features. We also provide high-quality instance-level annotations for the Trans10K dataset, filling the critical data gap in transparent object instance segmentation. Extensive experiments on the Trans10K and GVD datasets show that SEP-YOLO achieves state-of-the-art (SOTA) performance.

研究动机与目标

解决透明对象实例分割中的边界模糊与对比度低问题。
利用频域处理来增强边界相关的高频信号。
开发跨尺度特征融合与对齐，以在深层特征中保留边界细节。
为 Trans10K 提供高质量的实例级标注以支持该任务。

提出的方法

在 FFT 空间中引入可学习复权重的频域细节增强模块（FDDEM），强化边界相关的高频分量。
带多尺度门控单元（MS-Gated Refinement Block, MS-GRB）与 MSDWConv 的多尺度跨尺度 refinement 与降噪。
使用线性可变形卷积的 Content-Aware Alignment Neck（CA2-Neck）进行下采样，并以 DySample 进行自适应上采样，以保留边界细节。
通过双注意力机制将频域增强特征与空间特征进行融合，实现自适应加权。

实验结果

研究问题

RQ1如何在超越空间域限制的情况下增强透明物体的高频边界细节？
RQ2将频域处理与跨尺度 refinement 相结合，是否能提升透明物体的实例级分割？
RQ3更先进的对齐与上采样机制是否能降低金字塔特征在透明对象中的边界错位？
RQ4在透明对象基准（Trans10K 与 GVD）上，所提组件可带来何种性能提升？
RQ5加入高质量 Trans10K 实例级标注对模型性能有何影响？

主要发现

SEP-YOLO 在 Trans10K 与 GVD 的框架下，在框/掩码 mAP 指标上达到最先进的结果。
相较基线 YOLO11，加入 FDDEM 后 Trans10K Box mAP50 从 0.816 提升至 0.836，Mask mAP50 从 0.813 提升至 0.833。
引入 MS-GRB 与 CA2-Neck 进一步带来增益，完整的 SEP-YOLO 在 Trans10K 上达到 Box mAP50 0.852 与 Mask mAP50 0.851。
在 GVD 上，SEP-YOLO 达到 Box mAP50 0.882 与 Mask mAP50 0.872，参数量 2.98M，FPS 88。
消融研究显示逐步提升：基线 YOLO11 -> +FDDEM -> +MS-GRB -> +CA2-Neck -> SEP-YOLO，适用于 Trans10K 与 GVD。
SEP-YOLO 保持轻量化结构与实时推理，同时在透明对象分割方面带来显著的精度提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。