QUICK REVIEW

[论文解读] Data Interpreter: An LLM Agent For Data Science

Sirui Hong, Yizhang Lin|arXiv (Cornell University)|Feb 28, 2024

Semantic Web and Ontologies被引用 11

一句话总结

Data Interpreter 是一个基于大型语言模型的代理，将数据科学工作流程建模为具有可编程节点生成的分层图，从而实现端到端、动态任务解决，并在多个基准测试上提升性能。

ABSTRACT

Large Language Model (LLM)-based agents have shown effectiveness across many applications. However, their use in data science scenarios requiring solving long-term interconnected tasks, dynamic data adjustments and domain expertise remains challenging. Previous approaches primarily focus on individual tasks, making it difficult to assess the complete data science workflow. Moreover, they struggle to handle real-time changes in intermediate data and fail to adapt dynamically to evolving task dependencies inherent to data science problems. In this paper, we present Data Interpreter, an LLM-based agent designed to automatically solve various data science problems end-to-end. Our Data Interpreter incorporates two key modules: 1) Hierarchical Graph Modeling, which breaks down complex problems into manageable subproblems, enabling dynamic node generation and graph optimization; and 2) Programmable Node Generation, a technique that refines and verifies each subproblem to iteratively improve code generation results and robustness. Extensive experiments consistently demonstrate the superiority of Data Interpreter. On InfiAgent-DABench, it achieves a 25% performance boost, raising accuracy from 75.9% to 94.9%. For machine learning and open-ended tasks, it improves performance from 88% to 95%, and from 60% to 97%, respectively. Moreover, on the MATH dataset, Data Interpreter achieves remarkable performance with a 26% improvement compared to state-of-the-art baselines. The code is available at https://github.com/geekan/MetaGPT.

研究动机与目标

将数据科学工作流重新构建为分层图模型，以管理长期、相互依赖的任务。
开发可编程节点生成机制，以实时细化并验证子问题和代码。
实现动态任务图优化和迭代执行，以适应数据与任务的变化。
在多样化基准上展示端到端数据科学问题求解。
在数据分析与机器学习任务中展示相对于现有开源框架的鲁棒性和性能提升。

提出的方法

将数据科学问题表示为有向无环图（DAG），其中节点为子过程，边表示依赖关系。
使用任务图生成器从项目需求生成任务级图。
使用操作图生成器将任务转换为可执行代码片段，并可能集成工具。
使用具备状态的图执行器结合自省来运行和调试行动图，并在运行时反馈中对其进行改进。
迭代性地细化任务图（IGR），并采用可编程节点生成（PNG）以提升鲁棒性和适应性。
基于任务元数据对工具进行排序与选择，并将其集成到生成的代码中以实现情境感知执行。

实验结果

研究问题

RQ1如何将数据科学工作流有效分解为分层图，以捕获相互依赖关系并实现动态计划？
RQ2与静态或单任务的LLM系统相比，分层图方法是否提高了数据科学基准的端到端性能？
RQ3迭代图优化（IGR）对任务成功率和效率的影响是什么？
RQ4可编程节点生成（PNG）对所生成数据科学代码的鲁棒性和准确性有何影响？
RQ5动态工具选择与集成如何影响在不同数据科学任务中的任务结果？

主要发现

Data Interpreter 在 InfiAgent-DABench 上实现了 25% 的性能提升（准确率从 75.9% 提升到 94.9%）。
在 MATH 数据集上，该方法相比最先进基线显示出 26% 的提升。
在 ML-Benchmark 任务中，Data Interpreter 获得 0.95（综合分数），并在多项任务中超越了若干基线。
开放式任务基准显示出较高的完成率，Data Interpreter 平均完成率达 0.97。
消融研究表明迭代图细化与可编程节点生成在性能上有显著提升（带 PNG/IGR 时综合分数为 0.96–0.95）。
使用更长上下文的 LLM（例如 gpt-4o）放大了收益，使 Data Interpreter 在多步推理情景中超过了直接的 LLM 推理。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。