QUICK REVIEW

[論文レビュー] Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift

Jisheng Bai, Mou Wang|arXiv (Cornell University)|Feb 5, 2024

Speech and Audio Processing被引用数 5

ひとこと要約

この論文は、ICME 2024 Grand Challenge on semi-supervised ASC under domain shift を紹介し、22 の中国都市からなる CAS 2023 データセット、ラベル付きデータを 4.8h、ラベルなしデータを 19.3h を含む開発セット、そして SE-Trans ベースラインモデルを提示します。半監視微調整で未知の都市評価データに対し 59% のマクロ平均精度を達成します。

ABSTRACT

Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is the domain shift between training and testing data. Since 2018, ASC challenges have focused on the generalization of ASC models across different recording devices. Although this task, in recent years, has achieved substantial progress in device generalization, the challenge of domain shift between different geographical regions, involving discrepancies such as time, space, culture, and language, remains insufficiently explored at present. In addition, considering the abundance of unlabeled acoustic scene data in the real world, it is important to study the possible ways to utilize these unlabelled data. Therefore, we introduce the task Semi-supervised Acoustic Scene Classification under Domain Shift in the ICME 2024 Grand Challenge. We encourage participants to innovate with semi-supervised learning techniques, aiming to develop more robust ASC models under domain shift.

研究の動機と目的

地理的地域やデバイスを跨ぐ音響シーン分類におけるドメインシフトの課題を浮き彫りにする。
ドメインシフト下で半教師あり ASC 研究を促進するための大規模な中国語 CAS データセットを提供する。
ドメインシフト下の ASC 手法をベンチマークするためのベースライン半教師ありフレームワークを提案する。

提案手法

事前学習済み ASC モデルとして TAU UAS 2020 Mobile を使用し、ラベル付き CAS 2023 開発データでファインチューニングする。
ラベルなしデータに対して疑似ラベルを生成し、疑似ラベルデータを用いて SE-Trans でさらにファインチューニングする。
SE-Trans アーキテクチャ（Squeeze-and-Excitation ブロック + Transformer エンコーダ）を log-mel スペクトログラム入力で採用する。
44.1 kHz リサンプル音声から log-mel 特徴量（64 帯域）を抽出し、Adam オプティマイザで学習する。
見知らぬ都市を含む評価データセットでマクロ平均精度を用いて評価する。
ベースラインの報告結果は、ドメインシフト下の半教師あり ASC の今後の研究の指針となる。

Fig. 1 : The domain shift problem in acoustic scene classification.

実験結果

リサーチクエスチョン

RQ1地理と都市コンテキストによるドメインシフトが ASC の性能にどう影響するか？
RQ2半教師あり学習は未ラベルデータを活用してドメインシフト下の ASC を改善できるか？
RQ3CAS 派生の課題タスクにおける SE-Trans アーキテクチャで達成可能なベースライン性能はどれか？
RQ4評価時に未见の都市へベースラインがどのように一般化するか？

主な発見

シーン	精度
Bus	40%
Airport	55%
Metro	90%
Restaurant	69%
Shopping mall	51%
Public square	29%
Urban park	46%
Traffic street	65%
Construction site	68%
Bar	87%
Average	59%

ベースラインのマクロ平均精度は評価データで 59%。
Metro および Bar のシーンはより高い精度を達成した；Public square は最も困難で 29% だった。
CAS 2023 データセットは 10 のシーンを跨ぎ、22 の中国都市にまたがる 130 時間超のデータを含む。
開発セットは semi-supervised 開発のため、24 時間のデータのうち 20% がラベル付き。
ラベルなしデータは疑似ラベル付けに用いられ、SE-Trans のファインチューニングを強化した。

Fig. 2 : The recording device of the CAS dataset. The left side of the figure shows the physical representation of the device, and the right side displays the relevant dimensional parameters of the device, measured in millimeters.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。