QUICK REVIEW

[論文レビュー] RPC: A Large-Scale Retail Product Checkout Dataset

Xiu-Shen Wei, Quan Cui|arXiv (Cornell University)|Jan 22, 2019

Advanced Neural Network Applications参考文献 30被引用数 111

ひとこと要約

この論文は、SKUと画像の点で自動会計（ACO）の最大規模のベンチマークとしての Retail Product Checkout（RPC）データセットを紹介し、 exemplar および checkout 画像を用いたドメイン横断検出のベースラインを評価します。

ABSTRACT

Over recent years, emerging interest has occurred in integrating computer vision technology into the retail industry. Automatic checkout (ACO) is one of the critical problems in this area which aims to automatically generate the shopping list from the images of the products to purchase. The main challenge of this problem comes from the large scale and the fine-grained nature of the product categories as well as the difficulty for collecting training images that reflect the realistic checkout scenarios due to continuous update of the products. Despite its significant practical and research value, this problem is not extensively studied in the computer vision community, largely due to the lack of a high-quality dataset. To fill this gap, in this work we propose a new dataset to facilitate relevant research. Our dataset enjoys the following characteristics: (1) It is by far the largest dataset in terms of both product image quantity and product categories. (2) It includes single-product images taken in a controlled environment and multi-product images taken by the checkout system. (3) It provides different levels of annotations for the check-out images. Comparing with the existing datasets, ours is closer to the realistic setting and can derive a variety of research problems. Besides the dataset, we also benchmark the performance on this dataset with various approaches. The dataset and related resources can be found at \url{https://rpc-dataset.github.io/}.

研究の動機と目的

実世界の小売シナリオにおける大規模・細粒度・ドメインシフトの課題に対応することで、 automatic checkout 研究を動機づける。
RPCは200 SKU、53,739枚の exemplar 画像、30,000枚の checkout 画像を含み、三つの混雑レベルに跨るデータセットを紹介する。
階層的なメタカテゴリと弱から強までのアノテーションを提供し、多様な学習設定を可能にする。
実現可能性の基準を確立し、改善の余地を特定するためにベースライン法をベンチマークする。

提案手法

ACOタスクとデータ要件を定義する：訓練には単一製品の exemplar、評価には checkout 画像を用いる。
現実的な条件を反映するため、exemplarとcheckoutの二つの画像タイプと三つの混雑レベル（easy, medium, hard）でRPCを構築する。
checkout画像には、弱〜強のアノテーション（買い物リスト、ポイントレベル、バウンディングボックス）を提供し、弱教師付き学習を支援する。
exemplarで訓練された検出器と合成、およびCycle-GANベースのドメイン翻訳によるデータ拡張を用いて、4つのドメイン横断検出ベースライン（Single, Syn, Render, Syn+Render）を実装する。
カスタムACO指標（cAcc, ACD, mCCD, mCIoU）と標準検出指標（mAP50, mmAP）で検出器を評価する。
合成とレンダリングの影響を分析し、ドメイン翻訳と混合の合成データ/実データによる substantial gains を示す。

実験結果

リサーチクエスチョン

RQ1実世界の checkout の混雑をリアルに再現した大規模で多カテゴリのデータセットは、効果的な自動会計研究を支援できるか。
RQ2exemplarの単一製品画像と checkout シーン間のドメインギャップは検出器の性能にどう影響し、合成とドメイン翻訳はこのギャップを埋められるか。
RQ3ACOタスクを進展させるのに有用なアノテーションと監督レベルは何か（弱〜強い監督）。
RQ4異なる訓練データ戦略（単一、合成、レンダリング、組み合わせ）が easy/medium/hard の混雑レベルでどう性能を示すか。
RQ5ACO検出器の実用的な失敗モードは何か、どのアプローチがそれらを最も効果的に緩和するか。

主な発見

Clutter mode	Methods	cAcc (↑)	ACD (↓)	mCCD (↓)	mCIoU (↑)	mAP50 (↑)	mmAP (↑)
Easy	Single	0.02%	7.83	1.09	4.36%	3.65%	2.04%
Easy	Syn	18.49%	2.58	0.37	69.33%	81.51%	56.39%
Easy	Render	63.19%	0.72	0.11	90.64%	96.21%	77.65%
Easy	Syn+Render	73.17%	0.49	0.07	93.66%	97.34%	79.01%
Medium	Single	0.00%	19.77	1.67	3.96%	2.06%	1.11%
Medium	Syn	6.54%	4.33	0.37	68.61%	79.72%	51.75%
Medium	Render	43.02%	1.24	0.11	90.64%	95.83%	72.53%
Medium	Syn+Render	54.69%	0.90	0.08	92.95%	96.56%	73.24%
Hard	Single	0.00%	22.61	1.33	2.06%	0.97%	0.55%
Hard	Syn	2.91%	5.94	0.34	70.25%	80.98%	53.11%
Hard	Render	31.01%	1.77	0.10	90.41%	95.18%	71.56%
Hard	Syn+Render	42.48%	1.28	0.07	93.06%	96.45%	72.72%
Averaged	Single	0.01%	12.84	1.06	2.14%	1.83%	1.01%
Averaged	Syn	9.27%	4.27	0.35	69.65%	80.66%	53.08%
Averaged	Render	45.60%	1.25	0.10	90.58%	95.50%	72.76%
Averaged	Syn+Render	56.68%	0.89	0.07	93.19%	96.57%	73.83%

RPCは200SKU、53,739枚のexemplar画像、30,000枚のcheckout画像で、large-scale 評価を可能にする。
exemplar画像だけで訓練するとcAccはほぼゼロとなり、特に easy 混雑である; 合成データはcAccを大幅に向上させる。
レンダリング（ドメイン翻訳）は性能を劇的に向上させ、easyモードのcAccを0.02%（Single）から63.19%（Render）へ、さらに73.17%（Syn+Render）へ。
合成データとレンダリングデータを組み合わせると、混雑レベルを問わず最良の結果を得られ、Syn+Render の easy): 73.17%、Medium: 54.69%、Hard: 42.48% の cAcc、mmAPの平均は最大 73.83%。
標準検出指標（mAP50, mmAP）は Render および Syn+Render で大きな向上を示し、例えば Averaged mmAP は Syn+Render で 73.83%。
本研究は改善の余地が大きいことを確認し、見逃し検出、密集配置、細かな識別、誤検出などの実務上の課題を強調している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。