QUICK REVIEW

[论文解读] EcoAssistant: Using LLM Assistant More Affordably and Accurately

Jieyu Zhang, Ranjay Krishna|arXiv (Cornell University)|Oct 3, 2023

Topic Modeling被引用 7

一句话总结

tldr: EcoAssistant enables LLMs to answer code-driven queries more cheaply and accurately by (1) iterating with an automatic code executor, (2) using a hierarchy of cheaper to pricier LLMs, and (3) retrieving past successful solutions as in-context demonstrations.

ABSTRACT

Today, users ask Large language models (LLMs) as assistants to answer queries that require external knowledge; they ask about the weather in a specific city, about stock prices, and even about where specific locations are within their neighborhood. These queries require the LLM to produce code that invokes external APIs to answer the user's question, yet LLMs rarely produce correct code on the first try, requiring iterative code refinement upon execution results. In addition, using LLM assistants to support high query volumes can be expensive. In this work, we contribute a framework, EcoAssistant, that enables LLMs to answer code-driven queries more affordably and accurately. EcoAssistant contains three components. First, it allows the LLM assistants to converse with an automatic code executor to iteratively refine code or to produce answers based on the execution results. Second, we use a hierarchy of LLM assistants, which attempts to answer the query with weaker, cheaper LLMs before backing off to stronger, expensive ones. Third, we retrieve solutions from past successful queries as in-context demonstrations to help subsequent queries. Empirically, we show that EcoAssistant offers distinct advantages for affordability and accuracy, surpassing GPT-4 by 10 points of success rate with less than 50% of GPT-4's cost.

研究动机与目标

Motivate the challenge of code-driven question answering requiring external API calls.
Propose a system that reduces cost while improving accuracy for LLM-based code generation and execution.
Introduce methods to automatically iterate with a code executor, employ a hierarchy of assistants, and reuse past solutions as demonstrations.
Evaluate the impact of these techniques on cost and success rate across multiple domains (Places, Weather, Stock).

提出的方法

Introduce a two-agent conversation framework where an LLM assistant chats with a code executor to iteratively refine generated code.
Employ an assistant hierarchy that starts with the cheapest LLM and backs off to more expensive ones if needed.
Implement solution demonstration by storing successful query-code pairs in a vector database and retrieving the most similar past solution as a prompt demonstration.

实验结果

研究问题

RQ1Can iterative code execution with a code executor improve accuracy in code-driven QA?
RQ2Does an assistant hierarchy reduce dollar cost without sacrificing accuracy?
RQ3Does incorporating past successful solutions as demonstrations improve performance and speed?
RQ4What is the combined effect of hierarchy and demonstrations across diverse datasets (Places, Weather, Stock)?

主要发现

Stronger, more expensive models yield higher success rates when used alone.
CoT prompting improves GPT-3.5-turbo but not always GPT-4 or LLAMA-2-13B-chat, and it can increase cost.
Solution demonstration significantly boosts success rates for weaker models with modest cost impact.
Assistant hierarchy can reduce cost by about 10–50% compared to GPT-4 alone, with potential slight performance gains.
System-wide combination (hierarchy + solution demonstration) achieves the highest success rate across datasets and reduces cost further compared to hierarchy alone.
On mixed datasets, EcoAssistant with hierarchy and demonstration shows best performance while maintaining relatively low cost, and autonomous variants maintain similar trends.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。