Skip to main content
QUICK REVIEW

[论文解读] FlowComposer: Composable Flows for Compositional Zero-Shot Learning

Zhenqi He, Lin Z. Li|arXiv (Cornell University)|Mar 17, 2026
Domain Adaptation and Few-Shot Learning被引用 0
一句话总结

FlowComposer 使用两种原始流模型和一个可学习的 Composer,将视觉特征显式传输到属性和对象文本嵌入,从而在嵌入空间实现显式组合,并在接入基线方法时提升 CZSL 的性能。

ABSTRACT

Compositional zero-shot learning (CZSL) aims to recognize unseen attribute-object compositions by recombining primitives learned from seen pairs. Recent CZSL methods built on vision-language models (VLMs) typically adopt parameter-efficient fine-tuning (PEFT). They apply visual disentanglers for decomposition and manipulate token-level prompts or prefixes to encode compositions. However, such PEFT-based designs suffer from two fundamental limitations: (1) Implicit Composition Construction, where composition is realized only via token concatenation or branch-wise prompt tuning rather than an explicit operation in the embedding space; (2) Remained Feature Entanglement, where imperfect disentanglement leaves attribute, object, and composition features mutually contaminated. Together, these issues limit the generalization ability of current CZSL models. In this paper, we are the first to systematically study flow matching for CZSL and introduce FlowComposer, a model-agnostic framework that learns two primitive flows to transport visual features toward attribute and object text embeddings, and a learnable Composer that explicitly fuses their velocity fields into a composition flow. To exploit the inevitable residual entanglement, we further devise a leakage-guided augmentation scheme that reuses leaked features as auxiliary signals. We thoroughly evaluate FlowComposer on three public CZSL benchmarks by integrating it as a plug-and-play component into various baselines, consistently achieving significant improvements.

研究动机与目标

  • 激励并解决基于 PEFT 的 CZSL 方法在显式组合与特征解耦方面的局限性。
  • 提出一个与模型无关的框架,学习将属性与对象的流将嵌入文本。
  • 引入一个可学习的 Composer,显式将原始速度场融合成组合流。
  • 利用泄露引导的增强,将泄露的特征作为跨分支的监督信号来丰富速度监督。
  • 在将 FlowComposer 集成到现有 CZSL 基线中时,证明性能提升。

提出的方法

  • 两个原始流模型学习时序条件速度,将属性与对象的视觉嵌入传输至它们的文本嵌入。
  • 一个可学习的 Composer 预测系数,将原始速度集合成一个组合流。
  • 通过泄露引导的增强,重复使用泄露的特征作为跨分支的监督信号,以丰富速度监督。
  • 端到端训练使用流对齐损失来对齐端点,并使用用于端点识别的交叉熵项。
  • 推理阶段使用一步传输将图像特征映射到相应的文本空间,并通过最小二乘法为组合学习系数。
Figure 1 : (a) Humans recognize new concepts by recombining familiar primitives. (b) Prior CZSL methods compose only at the token level, which may not yield valid unseen compositions in the embedding space. (c) We perform explicit composition in the embedding space via learned attribute and object f
Figure 1 : (a) Humans recognize new concepts by recombining familiar primitives. (b) Prior CZSL methods compose only at the token level, which may not yield valid unseen compositions in the embedding space. (c) We perform explicit composition in the embedding space via learned attribute and object f

实验结果

研究问题

  • RQ1嵌入空间中的显式基于速度的组合是否能提高 CZSL 对未见属性–对象对的泛化能力?
  • RQ2两原始流加 Composer 的结构是否在闭集和开集 CZSL 设置下优于单流或多流变体?
  • RQ3泄露引导的增强是否提升解耦鲁棒性与整体 CZSL 性能?
  • RQ4FlowComposer 在不进行全局模型改动的情况下能否良好接入现有 CZSL 基线(如 CSP、Troika)?
  • RQ5相较于传统的逐字标记提示方法,流对齐是否是建模 CZSL 组成性的合适范式?

主要发现

方法SeenUnseenHMAUCSeenUnseenHMAUCSeenUnseenHMAUC
Baseline (Troika)49.352.539.222.166.373.455.441.8
+FlowComposer (CSP)48.350.437.620.766.668.251.237.829.030.922.97.7
+FlowComposer (Troika)50.453.240.223.571.174.958.646.844.834.015.9
  • FlowComposer 在接入 CSP 和 Troika 时对 MIT-States、UT-Zappos 和 C-GQA 的 HM 与 AUC 具有稳定提升。
  • 在闭集场景中,FlowComposer 与 Troika 的组合在所有三个数据集上实现了最先进的 AUC,超越若干LLM 增强方法。
  • 在开集场景中,FlowComposer 相较基线获得显著的 HM 提升(如 MIT-States +1.3%,UT-Zappos +4.4%)和 AUC 增益。
  • 消融研究表明,所有组件(Flows、Composer、LG-Aug)均为增益贡献,完整的 FlowComposer 方案带来最大提升。
  • 与预测器变体的对比显示,Composer 的显式组合规则优于直接对组合速度进行回归。
  • 参数匹配的回归基线表明增益来自流对齐设计,而非参数数量的增加。
Figure 2 : Training dynamics and performance comparison with baseline - Troika [ 15 ] . Our method yields a more balanced seen/unseen accuracy trajectory and consistently improves HM and AUC over the baseline on all three datasets.
Figure 2 : Training dynamics and performance comparison with baseline - Troika [ 15 ] . Our method yields a more balanced seen/unseen accuracy trajectory and consistently improves HM and AUC over the baseline on all three datasets.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。