QUICK REVIEW

[논문 리뷰] StreamReady: Learning What to Answer and When in Long Streaming Videos

Shehreen Azad, Vibhav Vineet|arXiv (Cornell University)|2026. 03. 09.

Multimodal Machine Learning Applications인용 수 0

한 줄 요약

StreamReady 는 스트리밍 비디오 QA에서 정답의 정확도와 타이밍을 함께 최적화하기 위해 Answer Readiness Score (ARS)를 도입하고, 증거가 충분할 때에만 응답하도록 기억과 경량의 준비 메커니즘을 갖춘 준비 기반 프레임워크를 제시합니다. 또한 긴 스트리밍 비디오를 위한 능동적 다회차 QA 벤치마크인 ProReady-QA 를 소개합니다.

ABSTRACT

Streaming video understanding often involves time-sensitive scenarios where models need to answer exactly when the supporting visual evidence appears: answering before the evidence reflects speculation, answering after it has passed reduces real-time utility. To capture this behavior, we introduce a readiness-aware formulation of streaming video understanding with the Answer Readiness Score (ARS), a timing-aware objective with asymmetric early and late penalties. When combined with correctness, ARS defines an effective accuracy that measures not just whether a model is right, but whether it answers at the appropriate moment. Building on this formulation, we introduce StreamReady, a framework to unify temporal reasoning with on-time answering through a lightweight readiness mechanism that decides if sufficient evidence has been observed before responding. To evaluate this capability, we further introduce ProReady-QA, a benchmark with annotated answer evidence windows and proactive multi-turn questions across local and global contexts. StreamReady achieves superior performance on ProReady-QA, and consistently outperforms prior methods across eight additional streaming and offline long-video benchmarks, demonstrating robust and broadly generalizable video understanding capability.

연구 동기 및 목표

답을 언제 내릴지 고려하는 준비 인식형 스트리밍 이해를Formalize
조기에 잘못된 답변과 늦은 답변을 비대칭적으로 벌점하는 Answer Readiness Score(ARS) 정의
충분한 증거가 나타난 후에만 답변을 트리거하는 경량의 준비 메커니즘 개발
메모리 보강 QA를 사용하여 시간적 추론과 준비 신호를 하나로 unify하는 StreamReady 구축

제안 방법

ARS 를 도입하여 조기 및 지연 페널티와 정확도를 결합한 비대칭 타이밍 인식 평가 지표
스트림Ready 제안: 다중 차원의 시각 기억 트리(Hierarchical Visual Memory Tree)와 맥락 메모리 뱅크(Contextual Memory Bank)로 다중 구간의 시각적/시맨틱 히스토리 저장 및 검색
메모리 슬롯에 대한 단기·장기 질의 인식 추론을 위한 듀얼 브랜치 Q-Former 활용
생성 게이트를 제어하고 타이밍 정확성을 강제하기 위한 학습 가능한 <RDY> 토큰과 Preparedness 헤드 도입
준비를 평가하기 위해 주석이 달린 증거 윈도우와 능동적 다회차 질문을 갖춘 ProReady-QA 개발
Ground-truth 증거 타임스탬프가 필요 없이 메모리 표현으로부터 파생된 준지도학습으로 준비 신호를 학습

Figure 2 : Framework Overview. StreamReady encodes streaming videos into a visual memory tree and reasons through short and long-term branches. A learnable <RDY> token, guided by a readiness head, gates the reasoning output until sufficient evidence is observed. Once ready, the long-term representat

실험 결과

연구 질문

RQ1스트리밍 비디오 QA에서 정답의 정확도뿐만 아니라 타이밍을 어떻게 형식적으로 평가할 수 있는가?
RQ2충분한 시각적 증거가 누적되었는지 판단하는 경량의 준비 메커니즘이 신뢰할 수 있는가?
RQ3메모리 보강 추론은 능동적 스트리밍 시나리오에서 정확도와 반응성을 모두 향상시키는가?
RQ4준비 인식형 스트리밍이 긴 비디오 및 다양한 스트리밍 벤치마크에 얼마나 잘 일반화되는가?

주요 결과

StreamReady 는 ProReady-QA 태스크에서 기준 모델보다 더 높은 정확도와 ARS 를 달성하여 타이밍 정렬과 정확도가 개선되었음을 보여준다.
준비 메커니즘은 증거와 답변 간의 시기 오류를 줄여 타이밍 정렬을 강화한다.
StreamReady 는 능동적이든 비능동적이든 여러 스트리밍 벤치마크에서 일관되게 선행 방법보다 높은 성능을 보인다.
메모리 계층 구조와 질의 인식 추론은 스트리밍 QA의 긴 수평선 이해와 증거 검색 능력을 견고하게 한다.
StreamReady 는 준비 평가를 넘어 오프라인 장편 비디오 벤치마크에서도 강한 성능을 보이며 일반화 가능성을 입증한다.

Figure 3 : Examples of each task in ProReady-QA. Here, the question and answer frames are color-coded.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.