QUICK REVIEW

[論文レビュー] A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models

James Urquhart Allingham, Jie Ren|arXiv (Cornell University)|Feb 13, 2023

Generative Adversarial Networks and Image Synthesis被引用数 9

ひとこと要約

この論文は Zero-shot Prompt Ensembling (ZPE) を提案します：ラベル付きデータなしで大規模なプロンプトプールを自動的にスコアリングし、事前学習/テストのバイアスを低減するようにスコアを正規化し、ソフトマックス重み付けまたはプロンプト選択を適用し、ImageNet、派生データセット、および細分類データセットでハンドクラフトプロンプトよりもゼロショット精度が向上することを実証します。

ABSTRACT

Contrastively trained text-image models have the remarkable ability to perform zero-shot classification, that is, classifying previously unseen images into categories that the model has never been explicitly trained to identify. However, these zero-shot classifiers need prompt engineering to achieve high accuracy. Prompt engineering typically requires hand-crafting a set of prompts for individual downstream tasks. In this work, we aim to automate this prompt engineering and improve zero-shot accuracy through prompt ensembling. In particular, we ask "Given a large pool of prompts, can we automatically score the prompts and ensemble those that are most suitable for a particular downstream dataset, without needing access to labeled validation data?". We demonstrate that this is possible. In doing so, we identify several pathologies in a naive prompt scoring method where the score can be easily overconfident due to biases in pre-training and test data, and we propose a novel prompt scoring method that corrects for the biases. Using our proposed scoring method to create a weighted average prompt ensemble, our method outperforms equal average ensemble, as well as hand-crafted prompts, on ImageNet, 4 of its variants, and 11 fine-grained classification benchmarks, all while being fully automatic, optimization-free, and not requiring access to labeled validation data.

研究の動機と目的

ゼロショット分類器のプロンプト工学を自動化するために、ラベル付き検証データなしで特定の下流タスクに対して大量のプロンプトをスコアリングする。
事前学習とテストデータの頻度に支配される過度に自信を持つプロンプトを避けるため、バイアス補正型のプロンプトスコアリング手法を開発する。
スコアを用いた重み付きまたは選択型のプロンプトアンサンブルが、等重アンサンブルやハンドクラフトプロンプトを上回るゼロショット分類精度を、さまざまなデータセットで示す。

提案手法

ラージプールのプロンプトを使用し、ラベル付きデータにアクセスせず各プロンプトのゼロショットスコア s_p を計算する。
語の頻度バイアスに起因する素朴な最大対数値計算の病理を特定し、事前学習分布とテスト分布からの期待対数値で正規化を提案する。
logits_normalized = logits - (E_pretrain + E_test)/2 による正規化を用いてバイアスを低減する。
正規化されたロジットに基づく各画像のクラス対最大値を平均してプロンプトスコア s_p を計算する。
プロンプトスコアに対してソフトマックス重み付けを適用し、長い尾の効果を抑えるためにロジットの加重アンサンブルを形成する（式3/5）。
オプションとして外れ値検出（中央値と MAD）を用いた tau という閾値でトップのプロンプトのみを使用するプロンプト選択を実施する（式4）。

実験結果

リサーチクエスチョン

RQ1ラベル付き検証データなしで、下流のゼロショット精度を最大化するために、大規模なプロンプトプールからプロンプトを自動的に選択・重み付けできるか？
RQ2プロンプトスコアリングを、事前学習語頻度のバイアスとテストデータの概念頻度バイアスを緩和するように補正できるか？
RQ3重み付きまたは選択型のプロンプトアンサンブルは、等重アンサンブルおよびハンドクラフトプロンプトを多様なデータセットで上回るか？
RQ4正規化と重み付けスキームがゼロショットプロンプトアンサンブリングの有効性に与える影響は？
RQ5プロンプトプールのサイズと構成は、ImageNetおよび細分類データセットでのゼロショット性能にどのように影響するか？

主な発見

Weighted ZPE アンサンブルは、ImageNet、ImageNet の派生データセット、およびいくつかの細分類ベンチマークで、等平均アンサンブルやハンドクラフトプロンプトよりも優れている。
E_pretrain および E_test を用いた正規化は語彙頻度バイアスと偽の概念頻度バイアスを低減し、タスク横断でゼロショット精度を向上させる。
プロンプトスコアのソフトマックス重み付けは、生スコアの重み付けや単純な最大対数法より一般的に良い性能をもたらす。
プロンプト選択（外れ値ベースの tau 閾値）は、特にドメイン固有のプロンプトが価値を持つ細分類データセットで改善をもたらす。
CLIP ViT-B/16 および LiT ViT-L/16 を横断して、ZPE ベースの加重平均が、ハンドクラフトプロンプトおよび素朴な手法より高い平均精度を達成し、いくつかのデータセットで顕著なゲインを示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。