QUICK REVIEW

[論文レビュー] AutoDroid: LLM-powered Task Automation in Android

Hao Wen, Yuanchun Li|arXiv (Cornell University)|Aug 29, 2023

Topic Modeling被引用数 12

ひとこと要約

AutoDroid はアプリ固有のメモリを強化した LLM を用いて Android アプリ上の任意のタスクを自動化し、クエリコストを低減しつつ高いアクション精度とタスク成功率を実現します。

ABSTRACT

Mobile task automation is an attractive technique that aims to enable voice-based hands-free user interaction with smartphones. However, existing approaches suffer from poor scalability due to the limited language understanding ability and the non-trivial manual efforts required from developers or end-users. The recent advance of large language models (LLMs) in language understanding and reasoning inspires us to rethink the problem from a model-centric perspective, where task preparation, comprehension, and execution are handled by a unified language model. In this work, we introduce AutoDroid, a mobile task automation system capable of handling arbitrary tasks on any Android application without manual efforts. The key insight is to combine the commonsense knowledge of LLMs and domain-specific knowledge of apps through automated dynamic analysis. The main components include a functionality-aware UI representation method that bridges the UI with the LLM, exploration-based memory injection techniques that augment the app-specific domain knowledge of LLM, and a multi-granularity query optimization module that reduces the cost of model inference. We integrate AutoDroid with off-the-shelf LLMs including online GPT-4/GPT-3.5 and on-device Vicuna, and evaluate its performance on a new benchmark for memory-augmented Android task automation with 158 common tasks. The results demonstrated that AutoDroid is able to precisely generate actions with an accuracy of 90.9%, and complete tasks with a success rate of 71.3%, outperforming the GPT-4-powered baselines by 36.4% and 39.7%. The demo, benchmark suites, and source code of AutoDroid will be released at url{https://autodroid-sys.github.io/}.

研究の動機と目的

手動のタスク固有の配線やデモなしに、スケーラブルなモバイルタスク自動化を促進する。
構造化された HTML ラインのような UI プロンプトを介してスマートフォンの GUI 表現を LLM に橋渡しする。
動的な UI 分析とシミュレートされたタスク合成を通じて、アプリ固有の知識で LLM を強化する。
メモリ誘導プロンプト、UI のマージ、トークンの剪定によって LLM のクエリコストを削減する。
多様なアプリを備えた新しい Android タスク自動化ベンチマークで有効性を示す。

提案手法

GUI 状態を簡略化した HTML風プロンプトとして表現し、LLMs を誘導する。
アプリのオフラインランダム探索によって UI 遷移グラフを構築する。
UTG からシミュレートされたタスクを生成してプロンプトへアプリ知識を注入する。
類似性ベースの検索を用いて適切なアプリメモリをプロンプトに付与する。
コスト効率と精度を向上させるために、アプリ固有データでローカルLLMsをチューニングする。
トークン剪定と GUI マージを含むマルチ粒度のクエリ最適化を適用する。

実験結果

リサーチクエスチョン

RQ1複数のアプリにまたがる見たことのないスマートフォンのタスクを、LLMを用いたエージェントが高精度で完了できるか？
RQ2アプリ固有のメモリとシミュレートされたタスク合成は、モバイルタスク自動化の計画と行動選択をどれだけ改善できるか？
RQ3AutoDroid におけるオンラインLLMクエリコストとタスク成功率のトレードオフは？
RQ4オンラインLLMへの依存を減らすうえで、プロンプト補強とローカルLLMのチューニングはどれほど効果的か？

主な発見

AutoDroid は実行されたステップでのアクション精度を 90.9% に達成した。
GPT-4 搭載使用でタスク完遂成功率は 71.3% に達した。
AutoDroid はタスク完遂率で GPT-4 搭載ベースラインを 36.4% ポイント上回った。
LLM のクエリコストは、ベースラインと比較して 51.7% 削減された。
ベンチマークは 13 のオープンソース Android アプリにまたがる 158 タスクで構成された。
LLMs とアプリ固有メモリを組み合わせた、スケーラブルなモバイルタスク自動化の実現性を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。