QUICK REVIEW

[論文レビュー] MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos

Amir Zadeh, Rowan Zellers|arXiv (Cornell University)|Jun 20, 2016

Sentiment Analysis and Opinion Mining参考文献 33被引用数 344

ひとこと要約

本論文は、オンライン動画における感情強度と主観性を情動レベルで注釈付けした初の多模態コーパス MOSI を紹介します。フレームごとの視覚特徴とミリ秒ごとの音声特徴、さらにベースラインと多模融合モデルを提供します。

ABSTRACT

People are sharing their opinions, stories and reviews through online video sharing websites every day. Studying sentiment and subjectivity in these opinion videos is experiencing a growing attention from academia and industry. While sentiment analysis has been successful for text, it is an understudied research question for videos and multimedia content. The biggest setbacks for studies in this direction are lack of a proper dataset, methodology, baselines and statistical analysis of how information from different modality sources relate to each other. This paper introduces to the scientific community the first opinion-level annotated corpus of sentiment and subjectivity analysis in online videos called Multimodal Opinion-level Sentiment Intensity dataset (MOSI). The dataset is rigorously annotated with labels for subjectivity, sentiment intensity, per-frame and per-opinion annotated visual features, and per-milliseconds annotated audio features. Furthermore, we present baselines for future studies in this direction as well as a new multimodal fusion approach that jointly models spoken words and visual gestures.

研究の動機と目的

オンライン動画における感情と主観性の適切な多模態データセットが不足しているという動機づけと解決。
豊富なモダリティ注釈（視覚、音声、発話内容）を持つ意見レベルの注釈付きコーパスを提供する。
ビデオデータに対する多模態感情分析と主観性検出の基礎ラインを確立する。
話された言葉と視覚的身振りを共同モデル化する多模態融合アプローチを提案する。

提案手法

オンライン動画の感情と主観性のための最初の意見レベル注釈コーパスとして MOSI を導入する。
データに主観性ラベル、感情強度、フレームごとの視覚特徴、意見ごとの注釈、ミリ秒ごとの音声特徴を注釈する。
将来の多模態感情分析研究のためのベースラインモデルを提供する。
話された言葉と視覚的身振りを共同モデル化する新しい多模態融合アプローチを提案する。

実験結果

リサーチクエスチョン

RQ1オンライン動画における意見レベルで感情強度と主観性をどのように効果的に注釈付けし測定できるか？
RQ2テキスト、音声、視覚的手掛かりを組み合わせたビデオデータの多模態感情分析に適したベースラインは何か？
RQ3話された言葉と視覚的身振りを共同で使用する融合モデルは、単一モダリティのアプローチより感情と主観性の分析を改善できるか？

主な発見

MOSI はオンライン動画における意見レベルの感情と主観性のための厳密に注釈付けられたコーパスを提供する。
データセットには細粒度分析を支えるフレームごとの視覚特徴とミリ秒ごとの音声特徴が含まれている。
ベースラインモデルと新しい多模態融合アプローチが提案され、話された内容と視覚的身振りを共同でモデル化する。
本研究はビデオデータ上の多模態感情分析研究の基盤を確立する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。