QUICK REVIEW

[论文解读] Identifying Reasons for Contraceptive Switching from Real-World Data Using Large Language Models

Brenda Y. Miao, Christopher Y. K. Williams|arXiv (Cornell University)|Feb 6, 2024

Computational and Text Analysis Methods被引用 5

一句话总结

本研究使用 GPT-4 从 UCSF 临床笔记中提取避孕切换的原因，性能超过 BERT 基线并达到高水平的人类验证准确性。

ABSTRACT

Prescription contraceptives play a critical role in supporting women's reproductive health. With nearly 50 million women in the United States using contraceptives, understanding the factors that drive contraceptives selection and switching is of significant interest. However, many factors related to medication switching are often only captured in unstructured clinical notes and can be difficult to extract. Here, we evaluate the zero-shot abilities of a recently developed large language model, GPT-4 (via HIPAA-compliant Microsoft Azure API), to identify reasons for switching between classes of contraceptives from the UCSF Information Commons clinical notes dataset. We demonstrate that GPT-4 can accurately extract reasons for contraceptive switching, outperforming baseline BERT-based models with microF1 scores of 0.849 and 0.881 for contraceptive start and stop extraction, respectively. Human evaluation of GPT-4-extracted reasons for switching showed 91.4% accuracy, with minimal hallucinations. Using extracted reasons, we identified patient preference, adverse events, and insurance as key reasons for switching using unsupervised topic modeling approaches. Notably, we also showed using our approach that "weight gain/mood change" and "insurance coverage" are disproportionately found as reasons for contraceptive switching in specific demographic populations. Our code and supplemental data are available at https://github.com/BMiao10/contraceptive-switching.

研究动机与目标

推动理解影响避孕选择与切换的因素。
评估在非结构化临床笔记上零-shot 的 GPT-4 能力。
将 GPT-4 提取表现与基线 BERT 模型进行比较。
使用无监督分析识别切换的关键驱动因素与人口统计模式。

提出的方法

将 GPT-4（通过 Azure 符合 HIPAA 要求）应用于 UCSF Information Commons 的临床笔记。
在零-shot 设置中提取避孕开始与结束的原因。
使用 microF1 分数定量比较 GPT-4 提取与基线 BERT 模型。
对 GPT-4 提取的原因进行人工评估，以评估准确性和幻觉（错误信息）情况。
使用无监督主题建模来识别主要切换驱动因素和人口统计相关性。

实验结果

研究问题

RQ1GPT-4 能否从非结构化临床笔记中识别出避孕切换的明确原因？
RQ2GPT-4 的零-shot 提取在开始/停止切换任务上与基线 BERT 模型相比如何？
RQ3主要的切换原因是什么，它们在不同人口统计群体之间是否存在差异？

主要发现

GPT-4 在提取任务中达到微 F1 分数 0.849（开始）和 0.881（停止）。
对 GPT-4 提取原因的人工评估显示准确性为 91.4%，幻觉很少。
报道的原因包括患者偏好、不良事件和保险是关键驱动因素。
无监督主题建模显示体重增加/情绪改变和保险覆盖在某些人口统计群体中与切换的相关性更高。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。