Skip to main content
QUICK REVIEW

[論文レビュー] PVminerLLM: Structured Extraction of Patient Voice from Patient-Generated Text using Large Language Models

Samah Fodeh, Linhai Ma|arXiv (Cornell University)|Mar 6, 2026
Machine Learning in Healthcare被引用数 0
ひとこと要約

この論文は PVminer という患者生成テキストからの患者の声を構造的に抽出するためのフレームワークと、Code、Sub-code、および Span 抽出で高い F1 を達成する supervisd fine-tuned LLMs である PVminerLLM を導入します。

ABSTRACT

Motivation: Patient-generated text contains critical information about patients' lived experiences, social circumstances, and engagement in care, including factors that strongly influence adherence, care coordination, and health equity. However, these patient voice signals are rarely available in structured form, limiting their use in patient-centered outcomes research and clinical quality improvement. Reliable extraction of such information is therefore essential for understanding and addressing non-clinical drivers of health outcomes at scale. Results: We introduce PVminer, a benchmark for structured extraction of patient voice, and propose PVminerLLM, a supervised fine-tuned large language model tailored to this task. Across multiple datasets and model sizes, PVminerLLM substantially outperforms prompt-based baselines, achieving up to 83.82% F1 for Code prediction, 80.74% F1 for Sub-code prediction, and 87.03% F1 for evidence Span extraction. Notably, strong performance is achieved even with smaller models, demonstrating that reliable patient voice extraction is feasible without extreme model scale. These results enable scalable analysis of social and experiential signals embedded in patient-generated text. Availability and Implementation: Code, evaluation scripts, and trained LLMs will be released publicly. Annotated datasets will be made available upon request for research use. Keywords: Large Language Models, Supervised Fine-Tuning, Medical Annotation, Patient-Generated Text, Clinical NLP

研究の動機と目的

  • Formalize patient voice extraction as schema-constrained structured prediction from unstructured patient-generated text.
  • Develop a hierarchical code/sub-code and Span grounding schema for multi-label extractions.
  • Benchmark prompt-based approaches and demonstrate the benefits of supervised fine-tuning (PVminerLLM).
  • Provide datasets, annotation schema, and evaluation protocols to enable scalable analysis of patient voice signals.

提案手法

  • Define PVminer task as schema-constrained structured extraction over a message producing (Code, Sub-code, Span) tuples ground in the text.
  • Develop an eight-code, 26-sub-code hierarchical labeling schema with Span grounding (Appendix B).
  • Benchmark prompt-based extraction across instruction-tuned LLMs in zero-shot and few-shot settings with engineered prompts (Prompt 2).
  • Introduce PVminerLLM by supervised fine-tuning instruction-tuned LLMs using adapters (QLoRA) with a masked token-level objective to enforce schema-valid outputs.
  • Train on an annotated corpus from multiple sources totaling 1,137 messages with multi-label, Span-grounded annotations.
  • Evaluate using Code, Sub-code, and Span metrics with multi-label precision/recall/F1 and a relaxed token-level Span match criterion.

実験結果

リサーチクエスチョン

  • RQ1Can a PVminer schema reliably extract structured patient voice signals from unstructured patient-generated text?
  • RQ2Do prompt-based methods suffice, or is task-tuned supervision necessary for high-fidelity extraction under schema constraints?
  • RQ3What are the performance gains of PVminerLLM over prompting baselines across Codes, Sub-codes, and Spans?
  • RQ4How well do models generalize across diverse data sources and message directions (patient vs provider)?

主な発見

  • Engineered prompts improve zero-shot performance over baselines across Code, Sub-code, and Span tasks (e.g., Code: 0.0→47.09 for 8B; Span: 50.10→54.15 for 8B).
  • Supervised fine-tuning (PVminerLLM) yields substantial gains, e.g., Code F1 83.82%, Sub-code F1 80.74%, Span F1 87.03% for 70B model.
  • PVminerLLM outperforms prompt-based approaches across sizes, with large gains in SDOH, Shared Decision-Making, and Partnership domains.
  • Two-shot prompting reveals domain prevalence and variability; PVminerLLM mitigates under-identification of socio-economic and care coordination signals.
  • PVminerLLM achieves strong domain-level performance, e.g., PartnershipPatient 83.82% F1 and PartnershipProvider 84.21% F1 under two-shot, and higher scores after SFT.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。