QUICK REVIEW

[논문 리뷰] CHAI: CacHe Attention Inference for text2video

Joel Mathew Cherian, Ashutosh Muralidhara Bharadwaj|arXiv (Cornell University)|2026. 02. 18.

Generative Adversarial Networks and Image Synthesis인용 수 0

한 줄 요약

CHAI는 Cache Attention을 도입하여 텍스트-비디오 확산에서 교차 추론 캐싱을 가능하게 하며, 엔티티 수준의 캐시된 래턴트를 재사용하여 품질 저하를 최소화하면서 상당한 속도 향상을 달성합니다.

ABSTRACT

Text-to-video diffusion models deliver impressive results but remain slow because of the sequential denoising of 3D latents. Existing approaches to speed up inference either require expensive model retraining or use heuristic-based step skipping, which struggles to maintain video quality as the number of denoising steps decreases. Our work, CHAI, aims to use cross-inference caching to reduce latency while maintaining video quality. We introduce Cache Attention as an effective method for attending to shared objects/scenes across cross-inference latents. This selective attention mechanism enables effective reuse of cached latents across semantically related prompts, yielding high cache hit rates. We show that it is possible to generate high-quality videos using Cache Attention with as few as 8 denoising steps. When integrated into the overall system, CHAI is 1.65x - 3.35x faster than baseline OpenSora 1.2 while maintaining video quality.

연구 동기 및 목표

재학습이나 대규모 엔지니어링 없이 텍스트-비디오 확산의 지연을 줄이는 동기를 부여한다.
전체 프롬프트가 아닌 엔티티 수준(객체/장면)에서의 교차 추론 재사용을 탐구한다.
품질 저하 없이 캐시 정보를 주입하는 학습 없는 메커니즘을 개발한다.
실제 배포를 위한 실용적인 캐시 예산과 확장 가능한 캐시 관리 방법을 시연한다.

제안 방법

Cache Attention을 도입하여 캐시된 래턴트를 키/값 입력으로 사용하고, 쿼리는 프롬프트 조건의 노이즈로 남아 있다.
Entity Extractor를 통해 프롬프트에서 엔티티를 식별하고 임베딩을 래턴 캐시와 연결된 벡터 DB에 저장한다.
대기 시간과 품질의 균형을 맞추기 위해 2번째, 3번째, 4번째 denoising 단계로 캐시 사용을 제한한다.
OpenSora 1.2를 기반으로 두 가지 확산 모드: full(캐시 미스) 및 fast(캐시 적중).
VBench 및 VidProM 데이터셋에서 OpenSora 1.2, NIRVANA-VID, AdaCache와의 비교를 통해 평가한다.

Figure 1 : Feature distance between latents produced by adjacent denoising steps in a single text-to-video inference. The highlighted region indicates steps that are skipped by intra-inference caching approaches due to low degree of difference.

실험 결과

연구 질문

RQ1Cache Attention이 denoising 단계를 줄이면서 비디오 품질을 보존하는가?
RQ2제한된 캐시 예산에서 교차 추론 캐싱은 캐시 크기에 따라 어떻게 확장되는가?
RQ3CHAI는 지연 및 품질 측면에서 인트라-추론 캐싱 베이스라인과 어떻게 비교되는가?
RQ4높은 적중률을 달성하기 위한 캐시 관리 전략은 어떤 방식으로 구현되는가?

주요 결과

CHAI는 52–100% 캐시 적중률에서 OpenSora 1.2 대비 엔드투엔드 속도 1.65x–3.35x를 달성하며 비디오 품질을 보존한다.
8 denoising 단계에서 CHAI는 VBench 점수 0.7985를 달성하고, 이는 30단계 기준 OpenSora 1.2보다 0.3% 낮은 수치이다.
CHAI는 수용 가능한 저장 예산(1–5 GB)에서 높은 캐시 적중률(80% 이상)을 달성한다.
제한된 캐시(전체 캐시의 10%) 하에서 엔티티 수준 재사용은 VidProM에서 52%의 적중률과 1.65x의 지연 감소를 달성하여 전체 프롬프트 재사용을 능가한다.
CHAI는 품질 측면에서 NIRVANA-VID를 능가하는 동시에 지연 시간을 낮은 상태로 유지하고, 유사하거나 더 낮은 지연에서 Vbench 점수에서 AdaCache를 능가한다.

Figure 2 : Cache hit rate (%) vs. cache size on 2000 unseen VidProM prompts. Cached and unseen prompts show little overall similarity, but they share common entities and thus achieve a higher entity-similarity-based cache hit rate.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.