QUICK REVIEW

[논문 리뷰] Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series

Vijay Ekambaram, Arindam Jati|arXiv (Cornell University)|2024. 01. 08.

Stock Market Forecasting Methods인용 수 11

한 줄 요약

TTMs는 제로샷/소샷 다변량 시계열 예측을 위한 빠르고 작은 사전 학습 모델(≤1M 매개변수)로, 적응 패치, 다운샘플링 증강, 해상도 접두사 튜닝으로 보이지 않는 데이터에 대해 강한 전달을 달성하고, LLM 기반 시계열 모델에 비해 계산량을 크게 줄입니다.

ABSTRACT

Large pre-trained models excel in zero/few-shot learning for language and vision tasks but face challenges in multivariate time series (TS) forecasting due to diverse data characteristics. Consequently, recent research efforts have focused on developing pre-trained TS forecasting models. These models, whether built from scratch or adapted from large language models (LLMs), excel in zero/few-shot forecasting tasks. However, they are limited by slow performance, high computational demands, and neglect of cross-channel and exogenous correlations. To address this, we introduce Tiny Time Mixers (TTM), a compact model (starting from 1M parameters) with effective transfer learning capabilities, trained exclusively on public TS datasets. TTM, based on the light-weight TSMixer architecture, incorporates innovations like adaptive patching, diverse resolution sampling, and resolution prefix tuning to handle pre-training on varied dataset resolutions with minimal model capacity. Additionally, it employs multi-level modeling to capture channel correlations and infuse exogenous signals during fine-tuning. TTM outperforms existing popular benchmarks in zero/few-shot forecasting by (4-40%), while reducing computational requirements significantly. Moreover, TTMs are lightweight and can be executed even on CPU-only machines, enhancing usability and fostering wider adoption in resource-constrained environments. The model weights for reproducibility and research use are available at https://huggingface.co/ibm/ttm-research-r2/, while enterprise-use weights under the Apache license can be accessed as follows: the initial TTM-Q variant at https://huggingface.co/ibm-granite/granite-timeseries-ttm-r1, and the latest variants (TTM-B, TTM-E, TTM-A) weights are available at https://huggingface.co/ibm-granite/granite-timeseries-ttm-r2.

연구 동기 및 목표

공개 사전 학습 데이터가 희소하고 다양할 때 제로샷/소샷 다변량 시계열 예측을 개선하는 동기를 제시한다.
전이 가능성을 갖춘 공개 시계열 데이터로 학습된 작고 일반적인 사전 학습 모델을 제안한다.
다중 해상도 데이터 및 크로스-데이터셋 전이를 다루기 위한 아키텍처 및 학습 개선을 도입한다.
다수의 데이터셋에서 대형 LLM 기반 시계열 방법에 비해 성능 향상과 계산 효율을 입증한다.

제안 방법

경량 TSMixer 아키텍처를 기반으로 다중 레벨 TTM 백본을 구축한다.
공개 시계열 데이터셋에서 TTMs를 단변량 방식으로 사전 학습해 일반적인 시간 역학을 학습시킨다.
수준별 다중 해상도 데이터를 다루기 위해 적응형 패치를 적용한다.
다운샘플링을 통한 데이터셋 증강으로 사전 학습용 다중 해상도를 생성한다.
패치에 해상도 정보를 주입하기 위한 해상도 접두사 튜닝을 도입한다.
채널 혼합을 가능하게 하는 디코더와 외생 신호를 활용하기 위한 외생 믹서를 사용해 미세 조정한다.

실험 결과

연구 질문

RQ1공개 TS 데이터만으로 학습된 작은(≤1M 매개변수) 사전 학습 모델이 보지 않은 데이터셋에서 경쟁력 있는 제로샷/소샷 예측을 달성할 수 있는가?
RQ2적응형 패칭과 다운샘플링을 통한 다중 해상도 사전 학습이 다양한 해상도 시계열에 대한 일반화를 향상시키는가?
RQ3디코더 채널 혼합 및 외생 융합이 미세 조정 중 다변량 예측 성능을 향상시키는가?
RQ4표준 벤치마크에서 TTMs의 전이 학습 성능 및 계산 규모가 대형 LLM 기반 시계열 방법과 어떻게 비교되는가?

주요 결과

TTMs는 적은/제로샷 예측에서 인기 벤치마크 대비 12-38% 정확도 향상을 달성한다.
TTMs는 대형 LLM-TS 방식에 비해 학습 가능 매개변수에서 14배, 전체 매개변수에서 106배의 컴퓨트를 감소시킨다.
TTMs를 사용할 경우 미세 조정 시간은 65배, 추론 시간은 54배 감소하고 메모리 사용량은 27배 감소한다.
제로샷 TTMs는 많은 벤치마크에서 자주 소샷 결과를 능가하여 다양한 공개 TS 데이터로부터의 효과적인 전이 학습을 강조한다.
TTM-CM(디코더 채널 혼합과 외생 융합)는 외생/다변량 데이터셋에서 경쟁 모델보다 15-44% 우수하다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.