QUICK REVIEW

[論文レビュー] Mapping of Subjective Accounts into Interpreted Clusters (MOSAIC): Topic Modelling and LLM applied to Stroboscopic Phenomenology

Romy Beauté, David J. Schwartzman|ArXiv.org|Feb 25, 2025

Advanced Text Analysis Techniques被引用数 3

ひとこと要約

The paper introduces MOSAIC, an open-source NLP pipeline that uses BERTopic-based topic modelling and an LLM to automatically label topics from open-ended Dreamachine reports, uncovering latent experiential themes in stroboscopic phenomenology.

ABSTRACT

Stroboscopic light stimulation (SLS) on closed eyes typically induces simple visual hallucinations (VHs), characterised by vivid, geometric and colourful patterns. A dataset of 862 sentences, extracted from 422 open subjective reports, was recently compiled as part of the Dreamachine programme (Collective Act, 2022), an immersive multisensory experience that combines SLS and spatial sound in a collective setting. Although open reports extend the range of reportable phenomenology, their analysis presents significant challenges, particularly in systematically identifying patterns. To address this challenge, we implemented a data-driven approach leveraging Large Language Models and Topic Modelling to uncover and interpret latent experiential topics directly from the Dreamachine's text-based reports. Our analysis confirmed the presence of simple VHs typically documented in scientific studies of SLS, while also revealing experiences of altered states of consciousness and complex hallucinations. Building on these findings, our computational approach expands the systematic study of subjective experience by enabling data-driven analyses of open-ended phenomenological reports, capturing experiences not readily identified through standard questionnaires. By revealing rich and multifaceted aspects of experiences, our study broadens our understanding of stroboscopically-induced phenomena while highlighting the potential of Natural Language Processing and Large Language Models in the emerging field of computational (neuro)phenomenology. More generally, this approach provides a practically applicable methodology for uncovering subtle hidden patterns of subjective experience across diverse research domains.

研究の動機と目的

データ駆動型分析を促進し、事前定義された質問票を超えた主観的開放回答を分析する。
Dreamachineデータセットからstroboscopic現象論の全スペクトルを特徴づける。
現象学的テキスト分析のためのオープンソースNLPパイプラインを開発・文書化する。

提案手法

報告を文レベルでトークン化し、埋め込みのための粒度入力を作成する。
事前学習済みSBERTモデルを用いてテキストを768次元埋め込みにエンコードする。
クラスタリング準備のためUMAPで次元削減を行う。
predefinedなトピック数を設定せずにHDBSCANで体験的トピックを識別する。
キーワードと抜粋に基づき、c-TF-IDFとLlama-3-8B-Instructを用いて自動的にトピックラベルを付与する。
前処理からラベル生成までのエンドツーエンドのオープンソースワークフローを提供する。

実験結果

リサーチクエスチョン

RQ1open-ended Dreamachineレポートからどのような潜在的体験トピックが浮かび上がるか。
RQ2High Sensory (HS)とDeep Listening (DL)のDreamachine条件間でトピック構造はどう異なるか。
RQ3自動LLMベースのラベリングは研究者のバイアスなしに信頼性の高い解釈可能なトピック記述を生み出せるか。
RQ4主観的Dreamachine現象論の構造を最もよく捉える一貫性とクラスタリングの特徴は何か。

主な発見

HS分析は13の体験トピックを生み出し、Llama 3によって自動ラベル付けされ、視覚現象、変性状態、自伝的経験を網羅した。
DL分析は7の体験トピックを生み出し、Dream ImageryやDissociative Experiencesを含むLlama 3によって同様にラベル付けされた。
階層的クラスタリングは三つの主要なHS現象論的グループを示した：視覚体験、変性状態、そして記憶・精神的/自伝的テーマ。
トピックモデルのコヒーレンススコアはHSで0.56（トピック数14）、DLで0.57（トピック数8）で、トピック品質は許容範囲であることを示した。
MOSAICパイプラインは、前処理・埋め込み・クラスタリング・ラベリングの再現性のあるステップを含むオープンソースワークフローとして実装されている。
このアプローチは、標準的な質問票を超えた多様な主観的経験のデータ駆動分析の潜在能力を示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。