QUICK REVIEW

[论文解读] Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

Andreas Veit, Balázs Kovács|arXiv (Cornell University)|Sep 24, 2015

Generative Adversarial Networks and Image Synthesis参考文献 13被引用 41

一句话总结

本文提出了一种孪生卷积神经网络（Siamese CNN）框架，通过异质二元共现关系（即来自不同类别的服装单品频繁共现，例如通过亚马逊的共购数据）学习视觉风格空间。通过有策略地采样跨类别的兼容/不兼容配对，模型能够将图像嵌入到潜在空间中，使风格兼容的物品在该空间中彼此靠近，从而实现对完整多类别穿搭的准确检索，性能优于ImageNet特征和非有策略采样基线。

ABSTRACT

With the rapid proliferation of smart mobile devices, users now take millions of photos every day. These include large numbers of clothing and accessory images. We would like to answer questions like `What outfit goes well with this pair of shoes?' To answer these types of questions, one has to go beyond learning visual similarity and learn a visual notion of compatibility across categories. In this paper, we propose a novel learning framework to help answer these types of questions. The main idea of this framework is to learn a feature transformation from images of items into a latent space that expresses compatibility. For the feature transformation, we use a Siamese Convolutional Neural Network (CNN) architecture, where training examples are pairs of items that are either compatible or incompatible. We model compatibility based on co-occurrence in large-scale user behavior data; in particular co-purchase data from Amazon.com. To learn cross-category fit, we introduce a strategic method to sample training data, where pairs of items are heterogeneous dyads, i.e., the two elements of a pair belong to different high-level categories. While this approach is applicable to a wide variety of settings, we focus on the representative problem of learning compatible clothing style. Our results indicate that the proposed framework is capable of learning semantic information about visual style and is able to generate outfits of clothes, with items from different categories, that go well together.

研究动机与目标

学习一个视觉风格空间，以捕捉不同服装类别之间的语义兼容性，而不仅限于简单的视觉相似性。
解决在不依赖细粒度属性或大量人工标注的情况下，学习跨类别兼容性的挑战。
开发一种利用异质二元共现关系（如共购商品）的稳健训练策略，以提升在多样化服装类别上的泛化能力。
通过在学习到的风格空间中跨类别检索最近邻，实现结构化穿搭生成。
通过定量指标和用户感知风格兼容性的用户研究，评估模型性能与基线的对比。

提出的方法

该框架使用孪生卷积神经网络（CNN）学习从图像空间到潜在风格空间的特征变换，使得兼容的物品在该空间中被嵌入得彼此靠近。
训练样本以异质二元对的形式采样——即来自不同高层类别（如鞋履与衬衫）的物品，这些物品在用户行为数据（如亚马逊共购数据）中频繁共现。
模型通过对比损失函数进行训练，以最小化兼容配对之间的距离，同时最大化不兼容配对之间的距离。
采用鲁棒的最近邻检索方法，以处理现实世界共现数据中的标签噪声，从而实现可靠的穿搭生成。
通过以参考物品查询风格空间，并从其他类别中检索最近邻，实现穿搭生成。

实验结果

研究问题

RQ1深度学习模型能否在不依赖细粒度属性的情况下，学习到一个有意义的视觉风格空间，以捕捉跨类别的服装兼容性？
RQ2与随机或朴素采样相比，有策略地采样异质二元共现关系在兼容性预测方面有何改进？
RQ3所学习的风格空间在未见服装类别上的泛化能力如何？
RQ4与基线模型相比，人类用户对预测穿搭组合的兼容性感知如何？
RQ5除了客观兼容性度量外，还有哪些因素会影响人类在风格兼容性判断中的决策？

主要发现

采用有策略采样的所提框架在‘bought together’共现数据上达到82.6%的AUC，显著优于原始ImageNet特征基线（67.5%）和非有策略采样基线。
在‘also bought’数据集上，该方法达到83.1%的准确率，优于基线的88.7%，但仍表现出强劲的竞争力。
用户研究表明，该模型在四组测试中的两组中优于随机选择和朴素采样基线，且在两个场景中偏好度具有统计显著性。
该模型展示了所学习风格特征向未见服装类别的迁移能力，表明其具备强大的泛化性能。
用户调查结果表明，风格兼容性并非决策的唯一因素——功能性、视觉相似性以及个人偏好也起着重要作用。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。