QUICK REVIEW

[论文解读] The Freiburg Groceries Dataset

Philipp Jund, Nichola Abdo|arXiv (Cornell University)|Nov 17, 2016

Advanced Image and Video Retrieval Techniques参考文献 27被引用 50

一句话总结

弗莱堡杂货数据集引入了一个包含5,000张图像、25种杂货类别的真实世界基准数据集，数据采集自多样化的家庭和零售环境，旨在解决服务机器人物体识别中缺乏真实感训练数据的问题。通过微调CaffeNet模型，作者在五折交叉验证中实现了78.9%的平均准确率，为未来服务机器人和视觉系统研究提供了强有力的基线。

ABSTRACT

With the increasing performance of machine learning techniques in the last few years, the computer vision and robotics communities have created a large number of datasets for benchmarking object recognition tasks. These datasets cover a large spectrum of natural images and object categories, making them not only useful as a testbed for comparing machine learning approaches, but also a great resource for bootstrapping different domain-specific perception and robotic systems. One such domain is domestic environments, where an autonomous robot has to recognize a large variety of everyday objects such as groceries. This is a challenging task due to the large variety of objects and products, and where there is great need for real-world training data that goes beyond product images available online. In this paper, we address this issue and present a dataset consisting of 5,000 images covering 25 different classes of groceries, with at least 97 images per class. We collected all images from real-world settings at different stores and apartments. In contrast to existing groceries datasets, our dataset includes a large variety of perspectives, lighting conditions, and degrees of clutter. Overall, our images contain thousands of different object instances. It is our hope that machine learning and robotics researchers find this dataset of use for training, testing, and bootstrapping their approaches. As a baseline classifier to facilitate comparison, we re-trained the CaffeNet architecture (an adaptation of the well-known AlexNet) on our dataset and achieved a mean accuracy of 78.9%. We release this trained model along with the code and data splits we used in our experiments.

研究动机与目标

为解决服务机器人中杂货物体识别缺乏真实感、真实世界训练数据的问题。
提供一个涵盖不同光照、视角和杂乱程度的数据集基准，以更真实地反映家庭环境。
支持日常物体识别中机器学习与机器人感知系统的开发与比较。
发布训练好的CaffeNet模型和数据划分，以支持可复现的评估与基线比较。

提出的方法

数据集从德国的真实商店和公寓中采集，涵盖不同光照、视角和杂乱程度的图像。
图像被划分为25个类别，每个类别至少包含97张图像，总计约5,000张图像。
采用五折交叉验证策略，每个类别的图像均匀分布于各折中，以确保评估的平衡性。
基于AlexNet架构的CaffeNet模型在数据集上进行了微调，初始权重来自预训练模型，仅全连接层进行了微调。
训练过程中通过复制低频类别图像以实现数据平衡，维持类别间的平衡性。
通过使用在单类别图像上训练的模型对复杂场景中手动提取的图像块（数据集D2）进行分类，开展了定性测试。

实验结果

研究问题

RQ1在光照、视角和杂乱程度高度多变的真实世界杂货识别数据集上，标准深度学习模型的性能如何？
RQ2误导性视觉设计（如谷物盒上的水果图像）在多大程度上会负面影响分类性能？
RQ3在单物体图像上训练的模型能否泛化到复杂、杂乱场景中多个重叠类别的物体识别？
RQ4不同杂货类别之间的性能表现如何变化，特别是外观相似的类别（如白色包装）？

主要发现

CaffeNet模型在五次交叉验证中实现了78.9%的平均准确率，标准差为0.5%。
该模型在水、果酱和果汁等类别上表现最佳，准确率在88.1%至93.2%之间。
模型在面粉类别上表现最差，准确率仅为59.9%，可能因其纯白包装与其它相似物品外观相近所致。
误分类主要由视觉模糊性引起，例如印有水果图像的谷物盒常被误认为果汁。
模型在复杂场景中展现出一定的泛化潜力，通过在多物体场景中成功分类图像块得到验证，但性能对图像块大小较为敏感。
混淆矩阵揭示了系统性错误，尤其出现在外观相似的类别之间，凸显了细粒度识别的挑战。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。