QUICK REVIEW

[論文レビュー] ChatGPT-3.5, ChatGPT-4, Google Bard, and Microsoft Bing to Improve Health Literacy and Communication in Pediatric Populations and Beyond

Kanhai Amin, Linda C. Mayes|arXiv (Cornell University)|Nov 16, 2023

Health Literacy and Information Accessibility被引用数 10

ひとこと要約

本論文は、4つの大規模言語モデル（ChatGPT-3.5/4、Google Bard、Microsoft Bing）が小児集団向けの健康情報をどのように調整するかを評価し、読みレベル出力と prompting 行動に差異があることを示します。

ABSTRACT

Purpose: Enhanced health literacy has been linked to better health outcomes; however, few interventions have been studied. We investigate whether large language models (LLMs) can serve as a medium to improve health literacy in children and other populations. Methods: We ran 288 conditions using 26 different prompts through ChatGPT-3.5, Microsoft Bing, and Google Bard. Given constraints imposed by rate limits, we tested a subset of 150 conditions through ChatGPT-4. The primary outcome measurements were the reading grade level (RGL) and word counts of output. Results: Across all models, output for basic prompts such as "Explain" and "What is (are)" were at, or exceeded, a 10th-grade RGL. When prompts were specified to explain conditions from the 1st to 12th RGL, we found that LLMs had varying abilities to tailor responses based on RGL. ChatGPT-3.5 provided responses that ranged from the 7th-grade to college freshmen RGL while ChatGPT-4 outputted responses from the 6th-grade to the college-senior RGL. Microsoft Bing provided responses from the 9th to 11th RGL while Google Bard provided responses from the 7th to 10th RGL. Discussion: ChatGPT-3.5 and ChatGPT-4 did better in achieving lower-grade level outputs. Meanwhile Bard and Bing tended to consistently produce an RGL that is at the high school level regardless of prompt. Additionally, Bard's hesitancy in providing certain outputs indicates a cautious approach towards health information. LLMs demonstrate promise in enhancing health communication, but future research should verify the accuracy and effectiveness of such tools in this context. Implications: LLMs face challenges in crafting outputs below a sixth-grade reading level. However, their capability to modify outputs above this threshold provides a potential mechanism to improve health literacy and communication in a pediatric population and beyond.

研究の動機と目的

大規模言語モデルが子どもや他の集団の健康リテラシーを向上させる媒介として機能し得るかを調査する。
出力結果（読みレベルと長さ）がモデルとプロンプトでどのように変化するかを定量化する。
LLMs を用いて異なる読解レベルに合わせた健康情報を提供することの実現可能性と限界を評価する。

提案手法

ChatGPT-3.5、Microsoft Bing、Google Bard の下で 26 のプロンプトを用いて 288 条件を実施した。
レート制限のため ChatGPT-4 で 150 条件の subset をテストした。
主要アウトカムは読み級別レベル（RGL）と生成出力の語数で測定した。

実験結果

リサーチクエスチョン

RQ1LLMs は特定の読み級程度に合わせた健康情報を効果的に調整できるか？
RQ2異なる LLM がプロンプト間で低い読み取りレベルの出力と高い読み取りレベルの出力をどのように生成するか？
RQ3小児健康コミュニケーションにおける LLM の使用の限界（例：正確性、躊躮性）は何か？

主な発見

モデル間で、Explain や What is などの基本的なプロンプトは、10th-grade RGL 以上の出力を生み出す傾向にある。
ChatGPT-3.5 は 7th-grade から college freshmen RGL までの出力を生んだ。
ChatGPT-4 は 6th-grade から college senior RGL までの出力を生んだ。
Microsoft Bing は 9th から 11th-grade RGL までの出力を生んだ。
Google Bard は 7th から 10th-grade RGL までの出力を生んだ。
Bard は特定の出力を提供する際に躊躇を示し、健康情報に対して慎重なアプローチをとっていることを示唆している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。