QUICK REVIEW

[논문 리뷰] ELLMPEG: An Edge-based Agentic LLM Video Processing Tool

Zoha Azimi, Reza Farahani|arXiv (Cornell University)|2026. 01. 17.

Multimodal Machine Learning Applications인용 수 0

한 줄 요약

ELL-MPEG는 엣지에 배치 가능한 에이전트형 LLM 시스템으로, Retrieval-Augmented Generation과 자기개선을 사용하여 FFmpeg 및 VVenC 명령을 로컬에서 생성하고 검증하며 클라우드 API 의존성을 제거합니다. 오픈소스 모델에서 명령 생성 정확도가 높고 런타임 및 에너지 비용이 낮습니다.

ABSTRACT

Large language models (LLMs), the foundation of generative AI systems like ChatGPT, are transforming many fields and applications, including multimedia, enabling more advanced content generation, analysis, and interaction. However, cloud-based LLM deployments face three key limitations: high computational and energy demands, privacy and reliability risks from remote processing, and recurring API costs. Recent advances in agentic AI, especially in structured reasoning and tool use, offer a better way to exploit open and locally deployed tools and LLMs. This paper presents ELLMPEG, an edge-enabled agentic LLM framework for the automated generation of video-processing commands. ELLMPEG integrates tool-aware Retrieval-Augmented Generation (RAG) with iterative self-reflection to produce and locally verify executable FFmpeg and VVenC commands directly at the edge, eliminating reliance on external cloud APIs. To evaluate ELLMPEG, we collect a dedicated prompt dataset comprising 480 diverse queries covering different categories of FFmpeg and the Versatile Video Codec (VVC) encoder (VVenC) commands. We validate command generation accuracy and evaluate four open-source LLMs based on command validity, tokens generated per second, inference time, and energy efficiency. We also execute the generated commands to assess their runtime correctness and practical applicability. Experimental results show that Qwen2.5, when augmented with the ELLMPEG framework, achieves an average command-generation accuracy of 78 % with zero recurring API cost, outperforming all other open-source models across both the FFmpeg and VVenC datasets.

연구 동기 및 목표

클라우드 LLM과 API 의 의존을 줄여 에지 기반의 프라이버시 보호 비디오 처리를 촉진한다.
RAG와 자기 성찰을 결합하여 에지에서 실행 가능한 멀티미디어 처리 명령을 생성하는 아키텍처를 설계한다.
유효성, 속도, 에너지 효율성 측면에서 FFmpeg 및 VVenC 명령 생성에 대한 오픈소스 LLM을 평가한다.
FFmpeg 및 VVenC 질의 데이터세트를 제공하고 엣지 배포를 위한 시스템의 정확성과 실용성을 벤치마크한다.

제안 방법

RAG 설정, LLM 추론, 그리고 명령 실행의 세 단계로 구성된 에지 배치 가능한 에이전트형 LLM 워크플로우를 제안한다.
도구-의식 FAISS 벡터 저장소 두 개(FFmpeg 및 VVenC)를 유지하고, 정확한 명령 생성을 위해 도구별 검색을 수행한다.
검색 중 관련 도구 문서에 청크를 매핑하기 위해 이중 임베딩 방식(dual-embedding)을 사용한다.
오류를 수정하고 명령 정확성을 향상시키기 위해 최대 Imax 반복의 자기 성찰 루프를 구현한다.
FFmpeg 또는 VVenC 백엔드로 전달하기 전에 패턴 매칭 모듈을 사용하여 LLM 출력에서 실행 가능한 명령을 추출한다.
FFmpeg 및 VVenC 명령을 다루는 전용 480-쿼리 데이터세트에서 평가하고 엣지 CPU 및 서버급 하드웨어에서 정확도, 속도 및 에너지 효율성을 측정한다.

Figure 1 . Comparison of responses to two queries: green borders indicate valid commands, red borders denote invalid ones.

실험 결과

연구 질문

RQ1RAG과 자기 성찰을 갖춘 에지 배치 가능한 LLM이 클라우드 API 없이도 정확한 FFmpeg 및 VVenC 명령을 생성할 수 있는가?
RQ2ELLMPEG가 보강될 때 도메인 특화 멀티미디어 명령 생성에서 오픈소스 2–8B 파라미터 모델은 어떻게 성능을 보이는가?
RQ3엣지 환경과 서버 환경에서 명령 생성 정확도, 추론 시간, 에너지 소비 간의 트레이드오프는 무엇인가?
RQ4툴 인식 이중 벡터 스토어 RAG 설정이 검색 관련성 향상 및 명령 생성에서 교차 도구 혼동 감소에 도움이 되는가?

주요 결과

Qwen2.5에 ELLMPEG를 보강하면 재발 API 비용 없이 평균 명령 생성 정확도 78%에 도달한다.
ELLMPEG는 FFmpeg 및 VVenC 데이터셋 모두에서 명령 정확도 측면에서 다른 오픈 소스 모델을 능가한다.
시스템은 에지 하드웨어(Intel i7-8700)와 서버 하드웨어(Xeon Gold with GPUs)에서 작동하며, 에지 설정은 클라우드 API를 피한다.
FFmpeg와 VVenC를 위한 두 개의 별도 FAISS 벡터 저장소가 검색 노이즈를 줄이고 도구 라우팅 정확도를 향상시킨다.
제한된 반복 수를 갖는 자기 성찰 루프는 에지 장치에서 허용 가능한 지연 시간을 유지하면서 명령 정확성을 향상시킨다.
데이터세트는 GPT-4o와 실제 세계 소스에서 생성된 480개의 다양한 질의(380 FFmpeg, 100 VVenC)로 구성되며 재현성을 위해 공개적으로 배포된다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.