QUICK REVIEW

[論文レビュー] Can Generative Large Language Models Perform ASR Error Correction?

Rao Ma, Mengjie Qian|arXiv (Cornell University)|Jul 9, 2023

Speech Recognition and Synthesis被引用数 7

ひとこと要約

この論文は、2つのASRアーキテクチャ（Conformer-TransducerとWhisper AED）に対するChatGPTを用いたゼロショットおよび少数ショットのASRエラー訂正を調査し、0-shotおよび1-shotプロンプトで、制約なし、選択的、最近傍対応のアプローチを比較する。

ABSTRACT

ASR error correction is an interesting option for post processing speech recognition system outputs. These error correction models are usually trained in a supervised fashion using the decoding results of a target ASR system. This approach can be computationally intensive and the model is tuned to a specific ASR system. Recently generative large language models (LLMs) have been applied to a wide range of natural language processing tasks, as they can operate in a zero-shot or few shot fashion. In this paper we investigate using ChatGPT, a generative LLM, for ASR error correction. Based on the ASR N-best output, we propose both unconstrained and constrained, where a member of the N-best list is selected, approaches. Additionally, zero and 1-shot settings are evaluated. Experiments show that this generative LLM approach can yield performance gains for two different state-of-the-art ASR architectures, transducer and attention-encoder-decoder based, and multiple test sets.

研究の動機と目的

ASR出力の後処理としてのエラー訂正を再訓練なしで動機付け評価する。
複数のASRアーキテクチャとデータセットで、ゼロショットおよび少数ショットの生成型LLMアプローチを評価する。
ChatGPTを用いたエラー訂正において、無制約と制約付き（N-best）プロンプト戦略を比較する。

提案手法

ASRのN-bestリスト（デフォルトトップ5）を、エラー訂正器の訓練なしにChatGPTへ入力する。
ゼロショットの制約なし、ゼロショットの選択的、1-shotの制約なしプロンプトをテストする。
出力をN-best候補に制限するよう、制約付きデコード変種を適用する：選択的アプローチと最近傍対応。
2つのASRシステム（Conformer-TransducerとWhisper small.en）を対象に、LibriSpeechテストセット、TED-LIUM3、Artie Biasで評価する。
ChatGPTの結果を、微調整済みN-best T5エラー訂正モデルと比較する。

Fig. 1 : N-best T5 error correction model structure.

実験結果

リサーチクエスチョン

RQ1生成型LLM（ChatGPT）は、追加の訓練なしでゼロショットおよび1-shot設定におけるASRエラー訂正を改善できるか。
RQ2N-best制約付きデコード変種（選択的、最近傍）は、ChatGPTを用いた場合、制約なし生成より改善をもたらすか。
RQ3ChatGPTベースの訂正は、異なるASRアーキテクチャとドメインで、微調整済みのT5 N-bestエラー訂正モデルと比較してどうか。

主な発見

ChatGPTは、Conformer-TransducerとWhisper AEDの両方のシステムでASRベースラインより性能を向上させる。
ゼロショット1-shotの制約なしプロンプトでChatGPTはWERを改善し、1-shot最近傍は一部設定でT5の性能に近い。
制約付きアプローチ（最近傍、選択的）は、ゼロショットの制約なしより優れる場合があり、最近傍が特に強力な結果をもたらすことがある。
ChatGPTベースの訂正は、LibriSpeech関連のインドメインデータおよび一部のアウト・オブ・ドメインセットでより良い向上を示すが、N-bestの多様性問題のため特定のWhisper駆動ケースで劣化する。
T5ベースのエラー訂正モデルと比較して、ChatGPTは複数のシナリオで競合的または優位になり得し、事前のモデル訓練を必要としない。

Fig. 2 : Prompt design for (a) zero-shot unconstrained error correction, (b) zero-shot selective approach, and (c) 1-shot unconstrained error correction. Here we use a 3-best list generated by the ASR system as input to ChatGPT for illustration.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。