QUICK REVIEW

[論文レビュー] Women also Snowboard: Overcoming Bias in Captioning Models

Kaylee Burns, Lisa Anne Hendricks|arXiv (Cornell University)|Mar 26, 2018

Multimodal Machine Learning Applications参考文献 48被引用数 120

ひとこと要約

この論文は、性別推定が視覚的証拠に依存するようにし、テスト時の性別分布の変動に適応するようにするため、Appearance Confusion Loss（ACL）と Confident Loss（Con）という二つの損失を備えた Equalizer モデルを提案する。

ABSTRACT

Most machine learning methods are known to capture and exploit biases of the training data. While some biases are beneficial for learning, others are harmful. Specifically, image captioning models tend to exaggerate biases present in training data (e.g., if a word is present in 60% of training sentences, it might be predicted in 70% of sentences at test time). This can lead to incorrect captions in domains where unbiased captions are desired, or required, due to over-reliance on the learned prior and image context. In this work we investigate generation of gender-specific caption words (e.g. man, woman) based on the person's appearance or the image context. We introduce a new Equalizer model that ensures equal gender probability when gender evidence is occluded in a scene and confident predictions when gender evidence is present. The resulting model is forced to look at a person rather than use contextual cues to make a gender-specific predictions. The losses that comprise our model, the Appearance Confusion Loss and the Confident Loss, are general, and can be added to any description model in order to mitigate impacts of unwanted bias in a description dataset. Our proposed model has lower error than prior work when describing images with people and mentioning their gender and more closely matches the ground truth ratio of sentences including women to sentences including men. We also show that unlike other approaches, our model is indeed more often looking at people when predicting their gender.

研究の動機と目的

Captioning モデルがトレーニングデータから性別バイアスを増幅させるかを特定する。
Right-for-the-right-reasons の説明を促すバイアス緩和キャプショニングフレームワークを提案する。
性別予測を文脈的手がかりではなく視覚証拠に依存させる。
トレーニングとテストセット間の分布シフト下でのバイアス低減を評価する。
性別語を予測する際に人に関する証拠に基づくことで、モデルが“正しい理由で”判断していることを示す。

提案手法

基礎的なキャプショニングフレームワークは InceptionV3 の画像特徴を用いて LSTM の記述生成器を初期化する。
新規損失として Appearance Confusion Loss（ACL）と Confident Loss（Conf）を導入し、証拠がある場合には性別証拠に基づくキャプションへ、証拠がない場合には非証拠手がかりへの依存を抑制する。
Appearance Confusion Loss は証拠がある場合に性別情報を除去し、証拠がない場合には男性/女性語の等確率を促す。
Confident Loss は性別証拠がある場合に正しい性別予測の信頼性を高め、比率ベースの信頼度測定を用いて性別ニュートラルな語を許容する。
最終目的関数は L = alpha L_CE + beta L_AC + mu L_Con（実験では alpha=1, beta=10, mu=1）である。
ACL に対する ground-truth の性別根拠マスクを持つ ACL データセット（MSCOCO-Bias と MSCOCO-Balanced）を用いた訓練。

実験結果

リサーチクエスチョン

RQ1キャプショニングモデルは、キャプション中の性別語を予測する際に性別バイアスを露呈させ、増幅し得るか。
RQ2提案された ACL および Confident Loss が、ベースラインと比較して性別語の誤分類率を低減させるか。
RQ3Equalizer は、テスト時の分布シフトがあっても、キャプション中の性別語の分布を地上真実の分布と一致させるか。
RQ4Grad-CAM / サリエンシーなどの説明が、性別語を予測する際にモデルが文脈的手掛かりよりも人に注意を向けていることを示しているか。
RQ5性別予測を人の証拠に基づかせることで、モデルは“正しい理由で”より適切な判断を下しているか。

主な発見

モデル	Error_MSCOCO-Bias	RatioΔ_MSCOCO-Bias	Error_MSCOCO-Balanced	RatioΔ_MSCOCO-Balanced
Baseline-FT	12.83	0.15	19.30	0.51
Balanced	12.85	0.14	18.30	0.47
UpWeight	13.56	0.08	16.30	0.35
Equalizer w/o ACL	7.57	0.04	10.10	0.26
Equalizer w/o Conf	9.62	0.09	13.90	0.40
Equalizer	7.02	-0.03	8.10	0.13

Equalizer は、MSCOCO-Bias および MSCOCO-Balanced のテストセット両方で、ベースラインと比較して最も低い性別語エラーを達成する。
MSCOCO-Bias では Equalizer のエラーは 7.02 で、全てのアブレーションおよびベースラインより低く、MSCOCO-Balanced では 8.10、これも多くのバリアントより低い。
Equalizer は、両データセットで地上真実に最も近い性別比を示す（Ratio Δ 値: MSCOCO-Bias で −0.03、MSCOCO-Balanced で 0.13、全モデル）。
アブレーションは ACL と Confident Loss が補完的であることを示し、どちらかを除くと性能が低下する（Equalizer w/o ACL または w/o Conf はエラーが増加）。
Equalizer は性別間の結果差を減少させる（Jensen-Shannon ダイバージェンス 0.018、比較モデルの中で最低）。
視覚的説明は、性別語を予測する際に Equalizer が人により頻繁に注意を向けることを示し、“正しい理由で”の支持となる。
アノテータの信頼閾値の下で、性別が不明確な場合には性別中立語で説明し、明確に証拠がある場合には性別語で説明する傾向があり、人間のようなパターンに一致している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。