QUICK REVIEW

[論文レビュー] Breaking the Bank with ChatGPT: Few-Shot Text Classification for Finance

Lefteris Loukas, Ilias Stogiannidis|arXiv (Cornell University)|Aug 28, 2023

Topic Modeling被引用数 18

ひとこと要約

本論文は、Banking77に対する少数ショットの金融テキスト分類を、GPT-3.5/GPT-4を用いたインコンテキスト学習とSetFitによるMPNet系のファインチューニングで調査し、生成型LLMが少数ショット設定でファインチューニング済みモデルを上回る可能性を示すとともに、人間がキュレーションしたサンプルが性能をさらに向上させ、コスト要因にも留意している。

ABSTRACT

We propose the use of conversational GPT models for easy and quick few-shot text classification in the financial domain using the Banking77 dataset. Our approach involves in-context learning with GPT-3.5 and GPT-4, which minimizes the technical expertise required and eliminates the need for expensive GPU computing while yielding quick and accurate results. Additionally, we fine-tune other pre-trained, masked language models with SetFit, a recent contrastive learning technique, to achieve state-of-the-art results both in full-data and few-shot settings. Our findings show that querying GPT-3.5 and GPT-4 can outperform fine-tuned, non-generative models even with fewer examples. However, subscription fees associated with these solutions may be considered costly for small organizations. Lastly, we find that generative models perform better on the given task when shown representative samples selected by a human expert rather than when shown random ones. We conclude that a) our proposed methods offer a practical solution for few-shot tasks in datasets with limited label availability, and b) our state-of-the-art results can inspire future work in the area.

研究の動機と目的

Banking77 を用いた対話型LLM（GPT-3.5/GPT-4）による少数-shot の金融意図分類を実証する。
限られたデータ領域でインコンテキスト学習がファインチューニングされていない非生成モデルを上回ることができることを示す。
SetFitベースの MPNet 系モデルのファインチューニングを全データおよび少数-shot設定で最新技術として評価する。
LLM ベースアプローチの費用とトークン制限を含む実践的考慮事項を評価する。
少数-shot の金融NLPタスクの今後の研究に向けた指針を提供する。

提案手法

Banking77 に対する金融テキスト分類での GPT-3.5 および GPT-4 を用いた少数-shot のインコンテキスト学習。
SetFit による MPNet 系モデル（S-MPNet-v2 および P-MPNet-v2）の少数-shotおよび全データ設定でのファインチューニング。
改善された少数-shot 学習のための代表サンプルをキュレートする人間専門家の注釈（各クラスあたり上位3件、10件中）
3-shot GPT-4 のための前回のチャット履歴とシステム文脈プロンプトの比較を含むプロンプト設計実験。
全データおよび少数-shot設定での micro-F1 および macro-F1 指標を用いた評価。

実験結果

リサーチクエスチョン

RQ1GPT-3.5 および GPT-4 を用いたインコンテキスト学習は、少数サンプルで Banking77 に対して競争力のある性能を達成できるか？
RQ2人間がキュレーションした代表サンプルは、少数-shot の金融意図分類のための LLM プロンプティングにおいて、ランダムサンプルより優れているか？
RQ3MPNet 系の SetFit は全データおよび 10-shot 設定で最先端の結果を達成できるか？
RQ4生成型 LLM を少数-shot の金融NLPタスクに用いる際のコストと実用的なトレードオフは何か？

主な発見

GPT-3.5 および GPT-4 のインコンテキスト学習は、非常に少数の例で、ファインチューニングされた非生成モデルと比較して競争力があるか、あるいは上回ることができる。
人間がキュレーションした代表サンプルは、GPT-3.5 および GPT-4 に対するインコンテキスト学習の性能を、ランダムに選ばれたサンプルよりも高くする。
MPNet-v2 系の SetFit は全データおよび 10-shot 設定で最先端の結果を達成する。
プロンプト戦略（system-context vs previous chat history）は GPT-4 の性能に影響を与え、system-context が優れた結果を示す。
Genative LLM approaches は、購読およびトークンコストの考慮を伴い、小規模組織には障害となる可能性がある。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。