QUICK REVIEW

[논문 리뷰] BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting

Patrick Emami, Abhijeet Sahu|arXiv (Cornell University)|2023. 06. 30.

Energy Load and Power Forecasting인용 수 8

한 줄 요약

BuildingsBench는 건물-900K를 소개하고, 거의 백만 개에 달하는 시뮬레이션 빌딩 데이터셋과 제로샷 STLF 및 실건물에 대한 전이 학습 평가 플랫폼을 제공합니다.

ABSTRACT

Short-term forecasting of residential and commercial building energy consumption is widely used in power systems and continues to grow in importance. Data-driven short-term load forecasting (STLF), although promising, has suffered from a lack of open, large-scale datasets with high building diversity. This has hindered exploring the pretrain-then-fine-tune paradigm for STLF. To help address this, we present BuildingsBench, which consists of: 1) Buildings-900K, a large-scale dataset of 900K simulated buildings representing the U.S. building stock; and 2) an evaluation platform with over 1,900 real residential and commercial buildings from 7 open datasets. BuildingsBench benchmarks two under-explored tasks: zero-shot STLF, where a pretrained model is evaluated on unseen buildings without fine-tuning, and transfer learning, where a pretrained model is fine-tuned on a target building. The main finding of our benchmark analysis is that synthetically pretrained models generalize surprisingly well to real commercial buildings. An exploration of the effect of increasing dataset size and diversity on zero-shot commercial building performance reveals a power-law with diminishing returns. We also show that fine-tuning pretrained models on real commercial and residential buildings improves performance for a majority of target buildings. We hope that BuildingsBench encourages and facilitates future research on generalizable STLF. All datasets and code can be accessed from https://github.com/NREL/BuildingsBench.

연구 동기 및 목표

residential 및 commercial 건물 전반에 걸친 STLF 모델 사전학습을 위한 대규모 합성 데이터의 가치를 입증합니다.
시뮬레이션 데이터와 실제 데이터셋을 결합한 제로샷 STLF 및 전이 학습용 오픈 평가 플랫폼을 제공합니다.
합성 사전학습에서 실건물로의 일반화 및 대상 데이터에 대한 미세조정의 이점을 분석합니다.
데이터셋 규모, 모델 크기 및 아키텍처가 제로샷 및 전이 학습 성능에 미치는 영향을 조사합니다.

제안 방법

NREL End-Use Load Profiles (EULP)에서 파생된 시뮬레이션 데이터셋인 Buildings-900K를 도입하고, 주거 및 상업 유형의 900K 건물 모델을 포함합니다.
168시간의 이력과 공변량을 고려하여 24시간 ahead 로드의 분포를 예측하는 확률적 단기 부하 예측(formulation)을 채택합니다.
1B 로드-시간의 규모로 Buildings-900K에서 트랜스포머 기반 시계열 모델( Gaussian 및 토큰화)을 사전학습한 뒤, 실제 데이터셋에서 제로샷 및 전이 학습을 평가합니다.
BuildingsBench를 포함한 7개 공개 데이터셋의 1,900건 이상 건물을 포함하는 실제 건물 평가 세트를 제공합니다.
Persistence, LightGBM, Linear/DLinear/RNN 변형, 트랜스포머 기반 예측기 등 미세조정 여부에 관계없이 벤치마크 베이스라인을 사용합니다.
포인트 예측 및 불확실성에 대해 NRMSE 및 순위 확률 점수(RPS)로 성능을 평가합니다.

BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting

실험 결과

연구 질문

RQ1Buildings-900K에서 사전학습된 모델이 실제 건물에 일반화할 수 있는가, 특히 상업적 건물에서 그렇다?
RQ2부분적인 실제 데이터에서 사전학습된 모델을 미세조정하면 대상 건물의 성능이 향상되는가?
RQ3사전학습 데이터셋 크기 및 모델 크기가 제로샷 성능과 실제 데이터로의 일반화에 어떤 영향을 미치는가?
RQ4대규모 STLF 사전학습에서 연속 로드보다 토큰화된 로드로 사전학습하는 것이 이점이 있는가?
RQ5지리공간 및 건물 유형 공변량을 도입하는 것이 예측 정확도에 어떤 영향을 미치는가?

주요 결과

Buildings-900K에서의 사전학습은 실질 상업 건물에서 강력한 제로샷 STLF 성능을 보이며, 주거 건물의 경우 시뮬-실(real) 간 차이가 존재합니다.
사전학습된 트랜스포머를 6개월의 실제 데이터에 대해 미세조정하면 상업 및 주거 건물 모두에서 STLF가 개선되며, 특히 상업의 경우 상당한 이점을 보입니다.
상업용 건물의 제로샷 성능은 데이터셋 규모 증가에 따라 힘의 법칙을 따르며 수익 감소가 나타나고, 주거 건물은 시뮬-실 매칭이 더 높아 일반화가 더 제한적입니다.
Transformer-M 모델은 미세조정으로 가장 큰 이점을 보이며, 큰 사전학습 베이스라인에 근접하거나 이를 따라가지만, 매우 큰 모델은 제로샷 설정에서 포화될 수 있습니다.
토큰화된 로드 표현(Tokens)은 학습 안정성을 제공하지만 일반적으로 Gaussian 트랜스포머보다 정확도에서 뒤떨어지며, 양자화는 데이터 압축을 효과적으로 수행합니다.
지리공간 공변량은 정확도 향상을 modest하게 제공하므로 위치 인식이 일반화에 약간 도움이 되는 것으로 보입니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.