QUICK REVIEW

[논문 리뷰] Mapping of Subjective Accounts into Interpreted Clusters (MOSAIC): Topic Modelling and LLM applied to Stroboscopic Phenomenology

Romy Beauté, David J. Schwartzman|ArXiv.org|2025. 02. 25.

Advanced Text Analysis Techniques인용 수 3

한 줄 요약

이 논문은 MOSAIC를 소개합니다. 이 오픈소스 NLP 파이프라인은 BERTopic 기반 토픽 모델링과 LLM을 사용하여 자유 응답 Dreamachine 보고서에서 주제를 자동으로 라벨링하고, 스트로보스코픽 현상학에서 잠재적 체험 주제를 밝힙니다.

ABSTRACT

Stroboscopic light stimulation (SLS) on closed eyes typically induces simple visual hallucinations (VHs), characterised by vivid, geometric and colourful patterns. A dataset of 862 sentences, extracted from 422 open subjective reports, was recently compiled as part of the Dreamachine programme (Collective Act, 2022), an immersive multisensory experience that combines SLS and spatial sound in a collective setting. Although open reports extend the range of reportable phenomenology, their analysis presents significant challenges, particularly in systematically identifying patterns. To address this challenge, we implemented a data-driven approach leveraging Large Language Models and Topic Modelling to uncover and interpret latent experiential topics directly from the Dreamachine's text-based reports. Our analysis confirmed the presence of simple VHs typically documented in scientific studies of SLS, while also revealing experiences of altered states of consciousness and complex hallucinations. Building on these findings, our computational approach expands the systematic study of subjective experience by enabling data-driven analyses of open-ended phenomenological reports, capturing experiences not readily identified through standard questionnaires. By revealing rich and multifaceted aspects of experiences, our study broadens our understanding of stroboscopically-induced phenomena while highlighting the potential of Natural Language Processing and Large Language Models in the emerging field of computational (neuro)phenomenology. More generally, this approach provides a practically applicable methodology for uncovering subtle hidden patterns of subjective experience across diverse research domains.

연구 동기 및 목표

정해진 설문지보다 더 넓은 범위의 자유응답 주관적 보고서에 대한 데이터 기반 분석을 촉진한다.
Dreamachine 데이터세트에서 스트로보스코픽 현상의 전체 스펙트럼을 특징화한다.
현상학적 텍스트 분석을 위한 오픈소스 NLP 파이프라인을 개발하고 문서화한다.

제안 방법

임베딩용 세부 입력을 만들기 위해 보고서를 문장 단위로 토큰화한다.
사전 학습된 SBERT 모델로 텍스트를 768차원 임베딩으로 인코딩한다.
클러스터링을 위한 차원 축소를 위해 UMAP을 사용한다.
사전 정의된 토픽 수 없이 HDBSCAN으로 군집화하여 체험 주제를 식별한다.
핵심 용어와 발췌를 바탕으로 c-TF-IDF와 Llama-3-8B-Instruct로 주제를 자동 라벨링한다.
전처리에서 라벨 생성까지 엔드투엔드 오픈소스 워크플로우를 제공한다.

실험 결과

연구 질문

RQ1자유응답 Dreamachine 보고서에서 어떤 잠재적 체험 주제가 나타나는가?
RQ2High Sensory (HS)와 Deep Listening (DL) Dreamachine 조건의 주제 구조는 어떻게 다른가?
RQ3연구자 편향 없이 자동 LLM 기반 라벨링이 신뢰할 수 있고 해석 가능한 주제 기술자를 생성할 수 있는가?
RQ4주관적 Dreamachine 현상학의 구조를 가장 잘 포착하는 일관성(coherence)와 군집 특성은 무엇인가?

주요 결과

HS 분석은 시각 현상, 변화된 상태, 자전적(자기전기적) 경험에 걸친 13개의 체험 주제로 자동으로 Llama 3에 의해 라벨링되었다.
DL 분석은 Dream Imagery 및 Dissociative Experiences를 포함하여 Llama 3에 의해 유사하게 생성된 7개의 체험 주제로 라벨링되었다.
계층적 클러스터링은 세 가지 주요 HS 현상학적 그룹을 밝혀냈다: 시각적 체험, 변화된 상태, 그리고 기억-영적/자전적 주제.
토픽 모델의 일관성(coherence) 점수는 0.56(HS, 14개 주제)과 0.57(DL, 8개 주제)로, 적합한 주제 품질을 시사한다.
MOSAIC 파이프라인은 재현 가능한 전처리, 임베딩, 클러스터링, 라벨링 단계를 갖춘 오픈소스 워크플로우로 구현되어 있다.
이 접근법은 표준 설문지를 넘어 다양한 주관적 경험의 데이터 기반 분석 가능성을 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.