QUICK REVIEW

[論文レビュー] Enabling Large Language Models to Perform Power System Simulations with Previously Unseen Tools: A Case of Daline

Mengshuo Jia, Zeyu Cui|arXiv (Cornell University)|Jun 25, 2024

Power Systems and Technologies被引用数 5

ひとこと要約

本稿は、未見のツールボックス Daline を用いて電力系統シミュレーションを実行するための四モジュールのモジュラー化フレームワークを提示し、34件のタスクで GPT-4o のコーディング精度を96.07%に達成。

ABSTRACT

The integration of experiment technologies with large language models (LLMs) is transforming scientific research, offering AI capabilities beyond specialized problem-solving to becoming research assistants for human scientists. In power systems, simulations are essential for research. However, LLMs face significant challenges in power system simulations due to limited pre-existing knowledge and the complexity of power grids. To address this issue, this work proposes a modular framework that integrates expertise from both the power system and LLM domains. This framework enhances LLMs' ability to perform power system simulations on previously unseen tools. Validated using 34 simulation tasks in Daline, a (optimal) power flow simulation and linearization toolbox not yet exposed to LLMs, the proposed framework improved GPT-4o's simulation coding accuracy from 0% to 96.07%, also outperforming the ChatGPT-4o web interface's 33.8% accuracy (with the entire knowledge base uploaded). These results highlight the potential of LLMs as research assistants in power systems.

研究の動機と目的

LLMs を電力系統シミュレーションの研究アシスタントとして機能させる動機付けと実現。
プロンプト設計、強化された RAG、ツールボックス指向の設計、フィードバックループを組み合わせたモジュラー型フレームワークの開発。
Daline の電力潮流および線形化ツールボックスにまたがる34タスクでフレームワークを検証。
複数の手法を統合することが、従来手法に対して優れたシミュレーションコーディング精度を生み出すことを示す。

提案手法

四モジュールのフレームワークを提案する： (i) プロンプト設計、 (ii) 強化された retrieval-augmented generation (RAG)、 (iii) LLM指向のツールボックス補助設計、 (iv) フィードバックループ設計。
ツールボックスベースのシミュレーションに合わせた思考過程の連鎖とFew-shot promptingを用いて、LLMの行動を段階的に導く。
クエリ計画を用いた長い要求をサブリクエストに分解し、機能/パラメータキーワードへマッピングして並行検索を可能にする強化RAG戦略を開発。
RAG に適したナレッジベース文書とツールボックスの構文チェック/エラー報告を作成し、情報検索と信頼性を向上。
詳細なエラーレポートとトラブルシューティングガイダンスを備えたフィードバックループを実装し、コードを反復的に修正。
Daline の34のシミュレーションタスクと20の技法方式でフレームワークを検証し、各タスクにつき最大3回の試行でコーディング精度を測定。）

実験結果

リサーチクエスチョン

RQ1Can a modular framework enable LLMs to perform power system simulations with tools unknown to the model?
RQ2How do prompt design, enhanced RAG, toolbox design, and feedback loops individually and cumulatively affect coding accuracy in power system simulations?
RQ3What is the impact of using Daline as a previously unseen toolbox on LLM performance?
RQ4What combination of techniques yields the highest simulation coding accuracy for LLMs?

主な発見

GPT-4o with the proposed framework achieves 96.07% coding accuracy across 34 tasks, greatly surpassing a 33.82% accuracy from the ChatGPT-4o web interface.
Using enhanced RAG alone improves accuracy over baseline retrieval approaches (e.g., GPT-3.5-NK to GPT-3.5-Full).
Few-shot prompting substantially boosts accuracy when combined with enhanced RAG (e.g., from 45.09% to 81.37%).
RAG-friendly knowledge-base documents and syntax-checking/error reporting materially improve reliability and accuracy (e.g., 75.49% vs 60.29% when comparing RAG-friendly docs to user manual only).
The accuracy gains are cumulative across techniques, with multiple components contributing to the top performance (GPT-4o-Full at 96.07%).

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。