QUICK REVIEW

[論文レビュー] Binding Language Models in Symbolic Languages

Zhoujun Cheng, Tianbao Xie|arXiv (Cornell University)|Oct 6, 2022

Topic Modeling被引用数 38

ひとこと要約

Binderはトレーニング不要のニューラル-シンボリック・フレームワークで、入力を実行可能なプログラムへマッピングし、統一された LM API をプログラミング言語へ結び付けることで、トレーニングなしで WikiTableQuestions と TabFact において最先端の結果を、インコンテキスト exemplars を数十個用いて達成します。

ABSTRACT

Though end-to-end neural approaches have recently been dominating NLP tasks in both performance and ease-of-use, they lack interpretability and robustness. We propose Binder, a training-free neural-symbolic framework that maps the task input to a program, which (1) allows binding a unified API of language model (LM) functionalities to a programming language (e.g., SQL, Python) to extend its grammar coverage and thus tackle more diverse questions, (2) adopts an LM as both the program parser and the underlying model called by the API during execution, and (3) requires only a few in-context exemplar annotations. Specifically, we employ GPT-3 Codex as the LM. In the parsing stage, with only a few in-context exemplars, Codex is able to identify the part of the task input that cannot be answerable by the original programming language, correctly generate API calls to prompt Codex to solve the unanswerable part, and identify where to place the API calls while being compatible with the original grammar. In the execution stage, Codex can perform versatile functionalities (e.g., commonsense QA, information extraction) given proper prompts in the API calls. Binder achieves state-of-the-art results on WikiTableQuestions and TabFact datasets, with explicit output programs that benefit human debugging. Note that previous best systems are all finetuned on tens of thousands of task-specific samples, while Binder only uses dozens of annotations as in-context exemplars without any training. Our code is available at https://github.com/HKUNLP/Binder .

研究の動機と目的

言語モデルAPIを活用することにより、特定タスク向けの大規模なトレーニングを要さず、記号的言語のカバー範囲を拡大する。
NLから実行可能プログラムへのシステムの解釈性と堅牢性を向上させる。
最小限の注釈で WikiTableQuestions と TabFact において最先端の性能を示す。

提案手法

タスク入力を標準的なプログラミング言語（例：SQL, Python）で実行可能な Binder プログラムへマッピングし、LM への API 呼び出しを拡張する。
Binder プログラム内の API 呼び出しの意味解析および実行の双方として Codex を用いる。
少数のデモンストレーションを用いたインコンテキスト学習で複数の Binder プログラムを生成し、それらの回答に対して多数決を行う。
列/値推論に LM 能力を組み込むため、ニューラル API f_col および f_val を用いて Binder を SQL と Python に拡張する。
API 呼び出しを受け付ける文法を拡張する Binder インタープリタを介した実行を行い、ネストした API 呼び出しをボトムアップで評価する。

実験結果

リサーチクエスチョン

RQ1トレーニング不要のニューラル-シンボリックフレームワークは、大規模なファインチューニングを伴わずに構造化知識タスクで競争力のある性能を達成できるか。
RQ2LM対応API呼び出しをプログラミング言語に結び付けることは、実世界の質問に対するカバレッジ、解釈性、堅牢性にどのように影響するか。
RQ3Binderがプログラムでは解けない問題と解ける問題（WikiTQ の質問および関連データセット）に与える影響は何か。
RQ4Binderはマルチモーダル入力（テキスト、テーブル、画像）および代替的なプログラミング言語（SQL、Python）への一般化性はどれくらいか。

主な発見

Dataset	Method	Evaluation Target	Value
WikiTQ	Codex Binder (Ours)	Dev	65.0
WikiTQ	Codex Binder (Ours)	Test	64.6
TabFact	Codex Binder (Ours)	Test	85.1
TabFact	Codex Binder (Ours) with few-shot retriever	Test	86.0

Binder は Codex を用い、タスク固有の学習なしで、インコンテキスト例が数十個だけで WikiTableQuestions および TabFact において実行精度で最先端を達成。
WikiTQ では Codex Binder の dev 65.0、test 64.6 が、ファインチューニングモデルを含む従来のベースラインを上回る。
TabFact では Codex Binder が test 精度 85.1 を達成し、few-shot retrieval で 86.0、従来の記号的手法を上回り、ファインチューニング済みベースラインと対抗的。
SQL のみでは解けない質問（program-unsolvable）に対して Binder は性能を向上させ、WikiTQ の解けない項目で純粋な SQL を 10.1 ポーセントポイント上回る。
エンドツーエンドの QA と比較して、Codex Binder は大幅な利得を示し（例：WikiTQで15.9%、TabFactで12.4%）、記号的実行によるカバレッジの向上を強調する。
Binder は解釈可能な中間プログラムを提供し、デバッグとエラー分析を支援し、大規模入力やノイズの多い内容に対する堅牢性を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。