QUICK REVIEW

[論文レビュー] FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

Yichong Leng, Xu Tan|arXiv (Cornell University)|May 9, 2021

Natural Language Processing Techniques参考文献 35被引用数 29

ひとこと要約

FastCorrect は、編集距離ベースのアラインメントを用いて誤りを検出・修正する非自己回帰型 ASR エラー訂正モデルであり、自己回帰補正と比較してレイテンシを大幅に削減（6-9x）しつつ、WERの低減を競合的に達成します（8-14%）。

ABSTRACT

Error correction techniques have been used to refine the output sentences from automatic speech recognition (ASR) models and achieve a lower word error rate (WER) than original ASR outputs. Previous works usually use a sequence-to-sequence model to correct an ASR output sentence autoregressively, which causes large latency and cannot be deployed in online ASR services. A straightforward solution to reduce latency, inspired by non-autoregressive (NAR) neural machine translation, is to use an NAR sequence generation model for ASR error correction, which, however, comes at the cost of significantly increased ASR error rate. In this paper, observing distinctive error patterns and correction operations (i.e., insertion, deletion, and substitution) in ASR, we propose FastCorrect, a novel NAR error correction model based on edit alignment. In training, FastCorrect aligns each source token from an ASR output sentence to the target tokens from the corresponding ground-truth sentence based on the edit distance between the source and target sentences, and extracts the number of target tokens corresponding to each source token during edition/correction, which is then used to train a length predictor and to adjust the source tokens to match the length of the target sentence for parallel generation. In inference, the token number predicted by the length predictor is used to adjust the source tokens for target sequence generation. Experiments on the public AISHELL-1 dataset and an internal industrial-scale ASR dataset show the effectiveness of FastCorrect for ASR error correction: 1) it speeds up the inference by 6-9 times and maintains the accuracy (8-14% WER reduction) compared with the autoregressive correction model; and 2) it outperforms the popular NAR models adopted in neural machine translation and text edition by a large margin.

研究の動機と目的

オンラインサービス向けの低レイテンシな高速な ASR エラー訂正を動機づける。
編集距離ベースのアラインメントを活用して非自己回帰訂正を導く。
大規模な擬似訂正データで事前学習し、実ASR訂正データで微調整する。
AISHELL-1 および大規模な社内中国語ASRデータセットでの速度向上と精度向上を示す。

提案手法

ASR 出力と正解訂正との間に最小編集距離に基づく編集アラインメントを導入し、n-gram頻度ベースの最適アラインメント選択で決定。
長さ予測子を備えた Transformer ベースの NAR エンコーダ-デコーダを用い、各ソーストークンに対してどれだけのターゲットトークンが整列するかを推定（削除は 0、置換/未変更は 1、挿入は >1）。
長さ予測子を MSE 損失で訓練し、それを用いてソーストークンを並列ターゲット列生成のために整列。
FastCorrect を、同音語情報を含むノイズ生成プロセスでテキストを編集して作成した大規模な擬似訂正データセットで事前学習し、その後実際のASR訂正データで微調整する。
比較対象として、AR 訂正および他の NAR モデル（LevT、FELIX）を、GPU および CPU での WER 減少とレイテンシの観点から評価。

実験結果

リサーチクエスチョン

RQ1編集アラインメントの導 guided non-autoregressive エラー訂正モデルは、AR モデルと同等の WER 減少を達成しつつレイテンシを大幅に削減できるか。
RQ2挿入/削除/置換のパターンを編集アラインメントで活用することが、ASR エラー訂正における既存の NAR アプローチより訂正品質を向上させるか。
RQ3擬似訂正データでの事前学習が、限られた ASR 訂正データでの微調整時に性能に与える影響はどれか。
RQ4FastCorrect の公開データセットおよび産業規模の中国語ASRデータセットにおけるレイテンシと精度のトレードオフはどのようになるか。

主な発見

モデル	AISHELL-1 WER (Test)	AISHELL-1 WER Reduction (WERR)	AISHELL-1 WER (Dev)	AISHELL-1 WERR (Dev)	Latency GPU (AISHELL-1, ms/sent)	Latency CPU4 (AISHELL-1, ms/sent)	Latency CPU (AISHELL-1, ms/sent)	Internal WER (Test)	Internal WER Reduction (WERR)	Internal Latency GPU (ms/sent)	Internal Latency CPU4 (ms/sent)
補正なし	4.83	-	-	-	-	-	11.17	-	-	-	-
AR モデル	4.08	(15.53)	3.80	-	149.5 (1x)	248.9 (1x)	-	-	-	-	-
LevT (MIter=1)	4.73	(2.07)	4.37	-	54.0 (2.8x)	82.7 (3.0x)	-	-	-	-	-
LevT (MIter=3)	4.74	(1.86)	4.38	-	60.5 (2.5x)	83.9 (3.0x)	-	-	-	-	-
FELIX	4.63	(4.14)	4.26	-	23.8 (6.3x)	41.7 (6.0x)	-	-	-	-	-
FastCorrect	4.16	(13.87)	3.89	-	21.2 (7.1x)	40.8 (6.1x)	82.3 (6.5x)	-	-	-	-

FastCorrect は AISHELL-1 および内部データセットで AR 訂正に対する推論スピードを 6-9x 向上。
FastCorrect は 8-14% WER 減少（WERR）を実現し、訂正なしに対して、両データセットで AR モデルの性能に近い水準を達成。
FastCorrect は LevT および FELIX を WERR と訂正品質の点で大幅に上回る。
アブレーションにより、編集アラインメント（長さ予測子）と事前学習が強力な WER 減少を達成するために重要であることが示された。
深いエンコーダと浅いデコーダを持つ AR モデルと比較して、FastCorrect は同等かそれ以上の精度を、はるか低いレイテンシで提供する。
Table 1 は FastCorrect が AISHELL-1 (Test) で 4.16 WER、Internal (Test) で 10.27 WER を達成し、顕著なレイテンシ削減を示すことを示している。Table 4 はベースラインと比較して高い P_right および類似の P_edit/R_edit を報告。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。