QUICK REVIEW

[論文レビュー] Domain-Incremental Continual Learning for Robust and Efficient Keyword Spotting in Resource Constrained Systems

Prakash Dhungana, Sayed Ahmad Salehi|arXiv (Cornell University)|Jan 22, 2026

Speech Recognition and Synthesis被引用数 0

ひとこと要約

本論文は、デュアル入力 MFCC+LogMel CNNとウェーブレットおよびスペクトルノイズ除去、リハーサルバッファ、プロトタイプベースの有効サンプル選択を組み合わせた、リソース制約デバイス上でノイズに対して頑健性を維持するキーワード検出のオンデバイス domain-incremental 継続学習フレームワークを提案する。

ABSTRACT

Keyword Spotting (KWS) systems with small footprint models deployed on edge devices face significant accuracy and robustness challenges due to domain shifts caused by varying noise and recording conditions. To address this, we propose a comprehensive framework for continual learning designed to adapt to new domains while maintaining computational efficiency. The proposed pipeline integrates a dual-input Convolutional Neural Network, utilizing both Mel Frequency Cepstral Coefficients (MFCC) and Mel-spectrogram features, supported by a multi-stage denoising process, involving discrete wavelet transform and spectral subtraction techniques, plus model and prototype update blocks. Unlike prior methods that restrict updates to specific layers, our approach updates the complete quantized model, made possible due to compact model architecture. A subset of input samples are selected during runtime using class prototypes and confidence-driven filtering, which are then pseudo-labeled and combined with rehearsal buffer for incremental model retraining. Experimental results on noisy test dataset demonstrate the framework's effectiveness, achieving 99.63\% accuracy on clean data and maintaining robust performance (exceeding 94\% accuracy) across diverse noisy environments, even at -10 dB Signal-to-Noise Ratio. The proposed framework work confirms that integrating efficient denoising with prototype-based continual learning enables KWS models to operate autonomously and robustly in resource-constrained, dynamic environments.

研究の動機と目的

エッジデバイスのリソース制限下でのキーワード検出におけるドメインシフトに対処する。
新しいノイズ条件に適応できるオンデバイス継続学習フレームワークを開発する。
頑健な特徴抽出（ウェーブレットおよびスペクトルノイズ除去）とデュアル特徴 CNN（MFCC + LogMel）を統合する。
クラスプロトタイプを用いた効率的なサンプル選択と擬似ラベリングを実現し、モデルを段階的に再訓練する。

提案手法

単一入力 MFCC 経路またはデュアル入力 MFCC+LogMel 経路を用いたコンパクトな CNNClassifier を使用する。
Raw音声フレームをノイズ除去するために Haar ウェーブレットノイズ除去と VisuShrink ベースの閾値処理を適用する。
時間的・スペクトル的マスキングを用いて MFCC/LogMel フィーチャマップのスペクトルノイズ除去を行う。
INT8 への量子化とリハーサルバッファおよび擬似ラベル付き有効サンプルによる継続学習を実行する。
潜在空間でクラスプロトタイプを維持・更新し、プロトタイプベースの有効サンプル選択を可能にする。
プロトタイプへの MAE 距離を用いて有効サンプルを決定し、ミニバッチ単位でオンデバ retraining をトリガーする。

実験結果

リサーチクエスチョン

RQ1オンデバイスの継続学習をどのように用いて、スクラッチから再訓練することなしに unseen ノイズドメインへ KWS モデルを適応させられるか？
RQ2デュアル特徴入力（MFCCと LogMel）とノイズ除去を統合することで、さまざまな SNR 下で頑健性が向上するか？
RQ3プロトタイプベースの有効サンプル選択とリハーサルバッファは、量子化されたモデル全体を更新しつつ精度を維持できるか？
RQ4domain-incremental CL を用いた KWS により、リソース制約のあるハードウェアでどの程度の性能向上が得られるか？

主な発見

Model	Test Accuracy
Single Input (MFCC)	97.45
Dual Input (MFCC + LogMel)	99.63

デュアル入力 MFCC+LogMel モデルはクリーンデータに対して test accuracy が 99.63%。
フレームワークは騒音環境全体で頑健性を維持し、-10 dB SNR までの精度が 94% 以上を超える。
リハーサルバッファと有効サンプルを用いたオンデバイス継続学習は、複数のノイズレベル（-10 〜 10 dB）で競争力のある精度を達成する。
既存のオンデバイス学習フレームワークと比較して、提案手法は制約のあるハードウェア上で低いメモリ・計算負荷で高い精度を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。