QUICK REVIEW

[논문 리뷰] Hyp-RL : Hyperparameter Optimization by Reinforcement Learning

Hadi S. Jomaa, Josif Grabocka|arXiv (Cornell University)|2019. 06. 27.

Machine Learning and Data Classification참고 문헌 37인용 수 34

한 줄 요약

이 논문은 하이퍼파라미터 튜닝을 강화학습 문제로 형식화하고 LSTM을 갖춘 Q-learning 기반 정책인 Hyp-RL을 도입하여 하이퍼파라미터 공간을 탐색하고 미래 보상을 극대화하며 50-dataset meta-dataset에서 최첨단 baselines 대비 이득을 보임.

ABSTRACT

Hyperparameter tuning is an omnipresent problem in machine learning as it is an integral aspect of obtaining the state-of-the-art performance for any model. Most often, hyperparameters are optimized just by training a model on a grid of possible hyperparameter values and taking the one that performs best on a validation sample (grid search). More recently, methods have been introduced that build a so-called surrogate model that predicts the validation loss for a specific hyperparameter setting, model and dataset and then sequentially select the next hyperparameter to test, based on a heuristic function of the expected value and the uncertainty of the surrogate model called acquisition function (sequential model-based Bayesian optimization, SMBO). In this paper we model the hyperparameter optimization problem as a sequential decision problem, which hyperparameter to test next, and address it with reinforcement learning. This way our model does not have to rely on a heuristic acquisition function like SMBO, but can learn which hyperparameters to test next based on the subsequent reduction in validation loss they will eventually lead to, either because they yield good models themselves or because they allow the hyperparameter selection policy to build a better surrogate model that is able to choose better hyperparameters later on. Experiments on a large battery of 50 data sets demonstrate that our method outperforms the state-of-the-art approaches for hyperparameter learning.

연구 동기 및 목표

고차원 모델을 위한 확장 가능하고 강건한 자동 프로세스로서의 하이퍼파라미터 튜닝의 동기를 제시한다.
하이퍼파라미터에 대한 전통적 베이지안 최적화에서 획득 함수(acquisition function)를 대체하기 위한 강화학습 프레임워크를 제안한다.
여러 데이터셋에 걸쳐 학습하고 보지 못한 데이터셋에서 평가함으로써 전이 학습 가능성을 보여준다.
대규모 메타-데이터셋에서 baselines에 비해 최종 모델 성능을 개선한다는 실증적 증거를 제시한다.

제안 방법

데이터셋 메타피처와 테스트된 구성 및 보상의 이력을 포함하는 상태를 가진 마코프 결정 프로세스(Markov Decision Process)로 하이퍼파라미터 최적화를 형식화한다.
Action-value 함수 모델링과 하이퍼파라미터 표면의 탐색을 위해 LSTM을 갖춘 Q-learning 기반 정책(Hyp-RL)을 사용한다.
행동은 격자(grid)에서 다음 하이퍼파라미터 구성을 선택하는 것으로 정의한다; 보상은 결과 모델의 검증 손실의 음수와 같다.
데이터셋에 적응하기 위해 초기 LSTM 상태(h0 = W0 * s_static)를 조정하기 위해 메타피처를 도입한다.
경험 재생과 타깃 네트워크를 사용하여 학습하고 예산이 소진되거나 반복된 행동이 발생하면 에피소드를 종료한다.
교차 데이터셋 전이를 연구하고 평가하기 위해 50개의 임의의 UCI 분류 데이터셋으로 meta-dataset(nnMeta)을 구성한다.

실험 결과

연구 질문

RQ1강화학습 정책이 고차원 하이퍼파라미터 공간을 효과적으로 탐색하여 검증 손실을 개선할 수 있는가?
RQ2데이터셋 메타피처에 조건을 걸면 하이퍼파라미터 최적화에서 데이터셋 간 전이가 가능해지는가?
RQ3다양한 데이터셋에서 Hyp-RL은 베이지안 최적화 기반 Baselines 및 메타학습 대리모(F-MLP 같은)와 어떻게 비교되는가?
RQ4Hyp-RL 접근법의 계산 특성과 확장성은 무엇인가?

주요 결과

Hyp-RL은 데이터셋 간 지식을 전이하지 않는 baselines보다 일관되게 우수하다.
Hyp-RL은 메타로 학습된 대리모(F-MLP)와 경쟁력 있으며, 각 구성마다 대리모를 재적합하지 않으므로 추론이 더 빠르다.
정책은 에피소드 보상의 증가와 시간에 따른 하이퍼파라미터 응답 표면의 더 나은 탐색으로 학습 진행을 보인다.
데이터셋 메타피처에 정책을 조건화하면 더 나은 초기 구성과 보지 못한 데이터에서의 빠른 개선이 가능하다.
정책 학습은 상당한 초기 계산이 필요하며(10백만 프레임에 대해 약 24 GPU-시간 소요) 하지만 구성을 선택하는 온라인 추론은 즉시 가능하다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.