QUICK REVIEW

[論文レビュー] Learning from Label Proportions with Dual-proportion Constraints

Tianhao Ma, Ximing Li|arXiv (Cornell University)|Mar 22, 2026

Machine Learning and Data Classification被引用数 0

ひとこと要約

LLP-DC は、ラベル比のみの注釈を含むバッグからインスタンスレベルの分類器を学習する。バッグレベルとインスタンスレベルの両方で制約を課し、ハードな疑似ラベルを生成する最小コスト最大流を用いる。

ABSTRACT

Learning from Label Proportions (LLP) is a weakly supervised problem in which the training data comprise bags, that is, groups of instances, each annotated only with bag-level class label proportions, and the objective is to learn a classifier that predicts instance-level labels. This setting is widely applicable when privacy constraints limit access to instance-level annotations or when fine-grained labeling is costly or impractical. In this work, we introduce a method that leverages Dual proportion Constraints (LLP-DC) during training, enforcing them at both the bag and instance levels. Specifically, the bag-level training aligns the mean prediction with the given proportion, and the instance-level training aligns hard pseudo-labels that satisfy the proportion constraint, where a minimum-cost maximum-flow algorithm is used to generate hard pseudo-labels. Extensive experimental results across various benchmark datasets empirically validate that LLP-DC consistently improves over previous LLP methods across datasets and bag sizes. The code is publicly available at https://github.com/TianhaoMa5/CV PR2026_Findings_LLP_DC.

研究の動機と目的

ラベル比 (LLP) から学習する動機付けと、バッグ比 annotations に基づくインスタンスレベルの効果的な監督の必要性。
バッグレベルとインスタンスレベルの制約を課す Dual-proportion Constraints (LLP-DC) の導入。
バッグ比を満たすようにするハードな疑似ラベル割り当てを生み出す最小コスト最大流を用いた効率的な疑似ラベル生成機構の提供。
複数のデータセットとバッグサイズにわたる既存 LLP メソッドに対する LLP-DC の経験的改善を示す。

提案手法

バッグレベルの比率制約とインスタンスレベルの疑似ラベル生成ステップを組み込んだ LLP の定式化。
各バッグに対して有向多段グラフを構成し、バッグ比を満たすハードな疑似ラベル割り当てを生み出す最小コスト最大流問題を解く。
バッグレベルの損失を用いて平均予測をバッグ比に揃え、信頼度の高い疑似ラベルを用いたインスタンスレベルの損失を組み合わせて学習。
弱いデータ拡張と強いデータ拡張を用い、インスタンスレベルの監督のために疑似ラベルを閾値でフィルタする。
総合損失 L = L_bag + lambda * L_ins を SGD で最適化。

Figure 1 : Training an instance-level classifier from a dataset with only bag-level class proportions.

実験結果

リサーチクエスチョン

RQ1バッグ比注釈のみを用いたバッグから、 diverse datasets およびバッグサイズで一貫してインスタンスレベル分類を改善できるか？
RQ2デュアル比制約（バッグレベルとインスタンスレベル）が収束、疑似ラベルの質、学習効率にどう影響するか？
RQ3最小コスト最大流による疑似ラベル生成が Soft-label アプローチと比較して正確さと計算量に与える影響は？
RQ4インスタンスレベルの損失重み lambda や信頼閾値 tau のようなハイパーパラメータに対する LLP-DC の感度はどの程度か？

主な発見

Dataset	Bag Size	Fully Supervised	DLLP	LLP-VAT	ROT (iter=3)	ROT (iter=75)	SoftMatch	FLMm*	L2 p-ahil	LLP-DC
CIFAR-10	16	91.59 ± 1.52	88.61 ± 0.90	89.11 ± 0.22	94.34 ± 0.65	93.97 ± 0.96	95.25 ± 0.14	92.34	94.96 ± 0.13	95.97 ± 0.03
CIFAR-10	32	64.95 ± 0.01	79.76 ± 1.45	78.75 ± 0.46	93.97 ± 0.96	92.23 ± 0.81	95.25 ± 0.14	92.00	95.00 ± 0.11	95.90 ± 0.07
CIFAR-10	64	96.05 ± 0.33	64.95 ± 0.01	63.89 ± 0.19	92.23 ± 0.81	91.? 0	94.23 ± 0.18	91.74	94.58 ± 0.21	95.46 ± 0.03
CIFAR-10	128	?	?	?	?	?	?	?	94.47 ± 0.05
CIFAR-100	16	79.89 ± 0.14	69.92 ± 2.86	71.62 ± 0.07	69.31 ± 0.22	17.48 ± 0.86	80.14 ± 0.12	66.16	78.65 ± 0.28	80.32 ± 0.10
CIFAR-100	32	?	?	?	?	?	?	?	79.85 ± 0.03
CIFAR-100	64	?	?	?	?	?	?	?	79.05 ± 0.19
CIFAR-100	128	?	?	?	?	?	?	?	73.29 ± 0.26
SVHN	16	97.77 ± 0.03	96.93 ± 0.23	96.68 ± 0.01	94.?	96.75 ± 0.11	22.39 ± 0.11	46.86	97.91 ± 0.02	98.01 ± 0.02
SVHN	32	?	?	?	?	?	?	?	97.99 ± 0.04
SVHN	64	?	?	?	?	?	?	?	97.97 ± 0.02
SVHN	128	?	?	?	?	?	?	?	97.97 ± 0.07
Fashion-MNIST	16	96.39 ± 0.02	93.70 ± 0.39	94.69 ± 0.20	93.68 ± 0.22	92.53 ± 0.46	95.85 ± 0.22	-	96.93 ± 0.23	95.90 ± 0.02
Fashion-MNIST	32	96.39 ± 0.02	93.18 ± 0.22	93.25 ± 0.18	92.53 ± 0.46	91.84 ± 0.19	95.86 ± 0.25	-	95.78 ± 0.15	95.86 ± 0.06
Fashion-MNIST	64	96.39 ± 0.02	93.70 ± 0.39	93.25 ± 0.18	?	?	95.18 ± 0.21	-	95.27 ± 0.13	95.19 ± 0.20
Fashion-MNIST	128	96.39 ± 0.02	91.70 ± 0.21	92.30 ± 0.13	?	?	94.73 ± 0.20	-	94.19 ± 0.14	94.74 ± 0.07
MiniImageNet	16	73.95 ± 0.22	61.?	54.??	46.??	46.??	46.03	-	46.86	63.01
MiniImageNet	32	73.95 ± 0.22	?	?	?	?	42.95	-	43.46	62.65
MiniImageNet	64	73.29 ± 0.26	?	?	?	?	42.23	-	42.91	59.32
MiniImageNet	128	?	?	?	?	?	41.73	-	41.58	58.32

LLP-DC は CIFAR-10、CIFAR-100、SVHN、Fashion-MNIST、MiniImageNet において、バッグサイズ 16–128 で確立された LLP ベースラインを上回る。
CIFAR-10 では LLP-DC は bag=16 で 95.97%、bag=128 で 94.47% を達成し、強力なベースラインを上回る。
CIFAR-100 では LLP-DC は bag=16 で 80.32%、bag=128 で 73.29% を達成し、バッグサイズの増大に対して頑健性を示す。
SVHN と Fashion-MNIST では、バッグサイズ全体で競争力あるまたは優れた結果を達成し、MiniImageNet のより難しいデータセットでは大きなバッグで顕著な改善を示す。
LLP-DC は追加の疑似ラベル生成ステップにもかかわらず他の LLP 手法と同程度の実行時間を維持し、tau および lambda が実用的な範囲で頑健であることを示す。

Figure 2 : A toy example of minimum-cost maximum-flow problem with 3 instances and 2 labels. Given a labeled bag $\{\mathbf{x}_{i1},\mathbf{x}_{i2},\mathbf{x}_{i3},\bm{\alpha}_{i}\}$ and the current predicted outputs of instances $\{\mathbf{p}_{i1},\mathbf{p}_{i2},\mathbf{p}_{i3}\}$ , we can form a

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。