QUICK REVIEW

[論文レビュー] Opportunities and Risks of LLMs for Scalable Deliberation with Polis

Christopher Small, Ivan Vendrov|arXiv (Cornell University)|Jun 20, 2023

Ethics and Social Impacts of AI被引用数 18

ひとこと要約

本論文は、大規模言語モデル（LLMs）が Polis をスケーラブルな熟慮プロセスにどのように補強できるかを探り、トピックモデリング、要約、投票予測の能力を実証するとともに、リスクと緩和戦略を強調している。

ABSTRACT

Polis is a platform that leverages machine intelligence to scale up deliberative processes. In this paper, we explore the opportunities and risks associated with applying Large Language Models (LLMs) towards challenges with facilitating, moderating and summarizing the results of Polis engagements. In particular, we demonstrate with pilot experiments using Anthropic's Claude that LLMs can indeed augment human intelligence to help more efficiently run Polis conversations. In particular, we find that summarization capabilities enable categorically new methods with immense promise to empower the public in collective meaning-making exercises. And notably, LLM context limitations have a significant impact on insight and quality of these results. However, these opportunities come with risks. We discuss some of these risks, as well as principles and techniques for characterizing and mitigating them, and the implications for other deliberative or political systems that may employ LLMs. Finally, we conclude with several open future research directions for augmenting tools like Polis with LLMs.

研究の動機と目的

LLMが Polis を補完して熟慮プロセスのスケーラビリティを改善する方法を評価する。
Polis におけるトピックモデリング、要約、モデレーション、合意発見といった LLM を活用したタスクを評価する。
偏見、幻覚、誤表現といったリスクを特定し、緩和戦略を提案する。
Anthropic の Claude を用いたパイロット実験で Polis のワークフローを補強する。
LLMs を熟慮プラットフォームへ統合する将来の方向性を示す。

提案手法

Anthropic Claude を使用して Polis ワークフロー内で動作させるパイロット実験を実施する。
コメントのバッチにトピックを割り当てるよう LLM に促してトピックモデリングを実行する。
Polis データから自動要約と合意文を生成する。
未見コメントに対する参加者の同意／不同意を予測するために、LLMs に投票の予測を照会して評価する。
大規模な会話を扱うための長い文脈窓（8K 〜 100K トークン）の利用を検討する。
LLM 出力の人間による介入付き評価と安全対策を議論する。

実験結果

リサーチクエスチョン

RQ1Polis の会話で LLM がトピックを信頼性高く識別し、レポーティングを支援できるか。
RQ2LLM は Polis データから自然な要約を生成し、グループの合意を識別することがどの程度可能か。
RQ3過去の投票履歴を踏まえて、LLM はどの程度正確に投票を予測できるか。
RQ4LLM 提供の Polis のリスク（バイアス、誤情報、モデレーション）とは何で、どう緩和できるか。
RQ5拡張した文脈窓は大規模な Polis 会話における LLM の性能にどのように影響するか。

主な発見

グループ

トピック

Government and Public Policy; Infrastructure and Development; Public Services; Safety Health and Environment; Economy and Business

Local government and politics; Laws and regulations; Taxes and services; Transparency and accountability; Housing; Transportation; Utilities; Historic preservation; Urban planning; Healthcare; Education; Public spaces; Homeless services; Disability access; Law enforcement; Public health; Pollution; Emergency management; Marginalized groups; Animal control; Job opportunities; Local business support; Competition and legalization issues

Claude が生成したトピックは手作業分析と一致し、階層的なトピック構造を生み出した。
LLMs は手作業分析の傾向と一致する簡潔な要約を自動生成できたが、文脈窓の制限と潜在的な不正確さには注意が必要で、適切な促しと人の確認が求められる。
プレーンな LLM は、特定のコメントに対して参加者が同意するか否かを高い自信をもって予測できることを示し、強力な予測能力を示した。
合意ドラフト作成に LLM を使用すると、倫理的な使用のためのライブテストとガバナンスが必要となる例として合意文のサンプルを浮上させた。
オンライン会話でトピックモデリングと要約を新しいコメントに適応させる形で更新可能。
リスクには誤情報の潜在、偏った表現、透明性の開示と人間-in-the-loop の監視が必要となる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。