QUICK REVIEW

[논문 리뷰] PVminerLLM: Structured Extraction of Patient Voice from Patient-Generated Text using Large Language Models

Samah Fodeh, Linhai Ma|arXiv (Cornell University)|2026. 03. 06.

Machine Learning in Healthcare인용 수 0

한 줄 요약

이 논문은 PVminer를 소개한다, 환자 생성 텍스트에서 환자 목소리를 구조화 추출하기 위한 프레임워크, 그리고 Code, Sub-code, Span 추출에서 높은 F1을 달성하는 지도 학습 미세 조정 LLM인 PVminerLLM.

ABSTRACT

Motivation: Patient-generated text contains critical information about patients' lived experiences, social circumstances, and engagement in care, including factors that strongly influence adherence, care coordination, and health equity. However, these patient voice signals are rarely available in structured form, limiting their use in patient-centered outcomes research and clinical quality improvement. Reliable extraction of such information is therefore essential for understanding and addressing non-clinical drivers of health outcomes at scale. Results: We introduce PVminer, a benchmark for structured extraction of patient voice, and propose PVminerLLM, a supervised fine-tuned large language model tailored to this task. Across multiple datasets and model sizes, PVminerLLM substantially outperforms prompt-based baselines, achieving up to 83.82% F1 for Code prediction, 80.74% F1 for Sub-code prediction, and 87.03% F1 for evidence Span extraction. Notably, strong performance is achieved even with smaller models, demonstrating that reliable patient voice extraction is feasible without extreme model scale. These results enable scalable analysis of social and experiential signals embedded in patient-generated text. Availability and Implementation: Code, evaluation scripts, and trained LLMs will be released publicly. Annotated datasets will be made available upon request for research use. Keywords: Large Language Models, Supervised Fine-Tuning, Medical Annotation, Patient-Generated Text, Clinical NLP

연구 동기 및 목표

비구조적 환자 생성 텍스트에서 스키마 제약 구조화 예측으로 환자 목소리 추출의 형식을 formalize한다.
다중 레이블 추출을 위한 계층적 코드/서브코드 및 Span 정 grounding 스키마를 개발한다.
프롬프트 기반 접근법을 벤치마킹하고 지도 학습 미세 조정(PVminerLLM)의 이점을 입증한다.
확장 가능한 환자 목소리 신호 분석을 가능하게 하는 데이터셋, 주석 스키마, 평가 프로토콜을 제공한다.

제안 방법

메시지가 생성하는 (Code, Sub-code, Span)tuples를 텍스트에 Grounding하는 스키마 제약 구조화 추출로 PVminer 작업 정의한다.
Span grounding(Appendix B)을 포함한 8개 코드, 26개 서브 코드의 계층적 라벨링 스키마를 개발한다.
엔지니어링된 프롬프트를 사용한 제로샷 및 파샷 설정에서 지시문 조정 LLM들에 대한 프롬프트 기반 추출을 벤치마크한다(Prompt 2).
마스크 토큰 수준 목표를 사용하여 스키마 유효한 출력을 강제 하는 어댑터(QLoRA)를 이용한 감독 학습 미세 조정으로 PVminerLLM를 소개한다.
다중 소스에서 얻은 주석 코퍼스에서 1,137개의 메시지에 대한 다중 레이블, Span-정 grounding 주석으로 학습한다.
코드, 서브코드, Span 지표를 다중 레이블 정밀도/재현율/F1 및 느슨한 토큰 수준 Span 일치 기준으로 평가한다.

실험 결과

연구 질문

RQ1PVminer 스키마가 비구조적 환자 생성 텍스트에서 구조화된 환자 목소리 신호를 안정적으로 추출할 수 있는가?
RQ2프롬프트 기반 방법이 충분한가, 아니면 스키마 제약 하에서 고충실도 추출을 위한 작업- tuned 감독이 필요한가?
RQ3Codes, Sub-codes, Spans 전반에서 PVminerLLM이 프롬프트 기반 기준선에 비해 어떤 성능 이득을 제공하는가?
RQ4모델이 다양한 데이터 소스와 메시지 방향(환자 대 제공자)에서 얼마나 잘 일반화하는가?

주요 결과

엔지니어드 프롬프트가 제로샷 성능을 코드, 서브코드, Span 작업에서 개선한다(예: Code: 0.0→47.09 for 8B; Span: 50.10→54.15 for 8B).
감독 학습 미세 조정(PVminerLLM)은 Code F1 83.82%, Sub-code F1 80.74%, Span F1 87.03% 등에서 상당한 이득을 준다(70B 모델).
PVminerLLM은 사이즈를 가리지 않고 프롬프트 기반 접근법을 능가하며 SDOH, Shared Decision-Making, Partnership 도메인에서 큰 이득을 보인다.
두샷 프롬팅은 도메인 편향과 가변성을 드러내지만 PVminerLLM은 사회경제 신호 및 케어 조정 신호의 과소 식별을 완화한다.
PVminerLLM은 도메인 수준에서 강력한 성능을 달성하며, 예를 들어 PartnershipPatient 83.82% F1 및 PartnershipProvider 84.21% F1은 두 샷에서, SFT 이후 더 높은 점수.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.