QUICK REVIEW

[논문 리뷰] TimeSqueeze: Dynamic Patching for Efficient Time Series Forecasting

Sravan Kumar Ankireddy, Nikita Seleznev|arXiv (Cornell University)|2026. 03. 11.

Time Series Analysis and Forecasting인용 수 0

한 줄 요약

TimeSqueeze는 경량의 상태-공간 인코더 위에 동적이며 콘텐츠 인식적인 패칭 메커니즘을 도입하여 변동 길이의 토큰을 Transformer 백본에 대해 생성하고, 예측 정확도를 유지하거나 향상시키면서 주요한 효율성 향상을 달성한다.

ABSTRACT

Transformer-based time series foundation models face a fundamental trade-off in choice of tokenization: point-wise embeddings preserve temporal fidelity but scale poorly with sequence length, whereas fixed-length patching improves efficiency by imposing uniform boundaries that may disrupt natural transitions and blur informative local dynamics. In order to address these limitations, we introduce TimeSqueeze, a dynamic patching mechanism that adaptively selects patch boundaries within each sequence based on local signal complexity. TimeSqueeze first applies a lightweight state-space encoder to extract full-resolution point-wise features, then performs content-aware segmentation by allocating short patches to information-dense regions and long patches to smooth or redundant segments. This variable-resolution compression preserves critical temporal structure while substantially reducing the token sequence presented to the Transformer backbone. Specifically for large-scale pretraining, TimeSqueeze attains up to 20x faster convergence and 8x higher data efficiency compared to equivalent point-token baselines. Experiments across long-horizon forecasting benchmarks show that TimeSqueeze consistently outperforms comparable architectures that use either point-wise tokenization or fixed-size patching.

연구 동기 및 목표

긴 컨텍스트 시계열 예측의 계산 및 메모리 부담 감소.
입력 표현을 적응적으로 압축하면서도 두드러진 시계적 다이나믹스를 보존한다.
정확성을 희생하지 않으면서 대규모 시계열 기초 모델의 확장 가능한 사전 학습을 가능하게 한다.
다양한 Transformer 백본 및 사전 학습 데이터 세트와의 호환성을 입증한다.

제안 방법

경량 상태-공간 모델 (SSM) 인코더를 사용하여 풀 해상도 로컬 특징을 추출한다.
정보 밀집 영역에는 짧은 패치를, 매끄러운 영역에는 긴 패치를 할당하는 콘텐츠 인식적 동적 패칭을 적용한다.
다운샘플링된 패치 기반 임베딩을 디코더 전용 Mixture-of-Experts (MoE) Transformer 백본으로 전달한다.
인과성을 보존하며 압축된 표현을 복원하는 언패칭 모듈을 사용한다.
자가회귀 손실과 보조 부하 균형 손실을 결합한 복합 손실과 다중 수평 예측 헤드를 사용하여 학습한다.
Time-300B에서 실제 데이터와 합성 데이터를 혼합하여 사전 학습하고, 패칭 임계치를 사용하여 평균 압축의 약 4배를 목표로 한다.

Figure 1 : Architectural overview of TimeSqueeze forecasting model. An SSM encoder first processes the raw series at full resolution to extract fine-grained features. Dynamic patching then adaptively compresses the sequence, selecting the salient subset of features. A Transformer backbone performs c

실험 결과

연구 질문

RQ1고정 크기 패칭이나 포인트-와이즈 토큰화에 비해 동적이고 콘텐츠 인식적인 패칭이 예측 정확도를 희생하지 않으면서 효율성을 향상시키는가?
RQ2TimeSqueeze가 다양한 Transformer 백본과 사전 학습 데이터 체제에 얼마나 잘 통합되는가?
RQ3TimeSqueeze를 사용할 때 사전 학습 맥락 길이가 다운스트림 예측 성능에 미치는 영향은 무엇인가?
RQ4다운샘플링 및 업샘플링 동안 시간적 무결성과 인과 관계를 유지할 수 있는가?

주요 결과

TimeSqueeze는 포인트-토큰 기반 기준선보다 최대 20배 빠른 사전 학습 수렴과 8배 높은 데이터 효율을 달성한다.
장기 예측 벤치마크 전반에서 TimeSqueeze는 포인트-와이즈 토큰화나 고정 크기 패칭을 사용하는 아키텍처를 지속적으로 능가한다.
제로샷 결과는 표준 장기 예측 데이터셋에서 TimeSqueeze가 Time-MoE 성능에 근접함을 보인다.
미세조정 시 TimeSqueeze는 강한 풀샷 성능을 유지하며 종종 다수의 최첨단 기준선을 능가한다.
효율성 비교에서 TimeSqueeze는 특정 예산에서 최대 3.4배 적은 메모리와 약 20배의 더 짧은 학습 시간을 필요로 하며, 장기 예측에 대해 최대 10.5배 더 높은 추론 처리량을 보인다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.