QUICK REVIEW

[논문 리뷰] Tunability: Importance of Hyperparameters of Machine Learning Algorithms

Philipp Probst, Bernd Bischl|arXiv (Cornell University)|2018. 02. 26.

Machine Learning and Data Classification참고 문헌 15인용 수 572

한 줄 요약

본 연구는 하이퍼파라미터 튜닝을 통계적 문제로 형식화하고, 기본값과 튜너빌리티 측정값을 정의하며, 38개의 OpenML 데이터셋에서 6개 알고리즘을 벤치마킹하여 튜닝이 성능을 얼마나 향상시키는지 정량화한다. 데이터 기반의 최적 기본값, 실용적인 튜닝 공간, 그리고 어떤 하이퍼파라미터가 가장 중요한지에 대한 통찰을 제공한다.

ABSTRACT

Modern supervised machine learning algorithms involve hyperparameters that have to be set before running them. Options for setting hyperparameters are default values from the software package, manual configuration by the user or configuring them for optimal predictive performance by a tuning procedure. The goal of this paper is two-fold. Firstly, we formalize the problem of tuning from a statistical point of view, define data-based defaults and suggest general measures quantifying the tunability of hyperparameters of algorithms. Secondly, we conduct a large-scale benchmarking study based on 38 datasets from the OpenML platform and six common machine learning algorithms. We apply our measures to assess the tunability of their parameters. Our results yield default values for hyperparameters and enable users to decide whether it is worth conducting a possibly time consuming tuning strategy, to focus on the most important hyperparameters and to chose adequate hyperparameter spaces for tuning.

연구 동기 및 목표

하이퍼파라미터 튜닝 문제를 통계적 관점에서 형식화하고 데이터 기반의 기본값을 정의한다.
하이퍼파라미터와 하이퍼파라미터 조합의 튜너빌리티를 정량화하는 측정을 도입한다.
대리모형을 사용하여 튜너빌리티와 최적 튜닝 공간을 추정하는 절차를 개발한다.
프레임워크를 대규모 OpenML 벤치마크에 적용하여 알고리즘 전반에 걸친 실용적 기본값과 통찰을 도출한다.]
method1. Define R(theta) as the expected risk given hyperparameter configuration theta.
method2. Propose optimal defaults theta* by minimizing a summary of R^{(j)}(theta) over m datasets.
method3. Introduce tunability measures d^(j) and d_i^(j) based on differences between default and optimal risk.
method4. Extend tunability to hyperparameter pairs and joint gains g_i1,i2, including sequential tuning comparisons.
method5. Define optimal hyperparameter spaces Theta* using dataset-wise quantiles to capture robust tuning ranges.
method6. Use surrogate models (random forest, etc.) to estimate R^{(j)}(theta) and black-box optimization to find defaults and tunings.
method7. Experiment with 38 OpenML100 binary classification datasets using six algorithms (glmnet, rpart, kknn, svm, ranger, xgboost) and cross-validation to assess tunability.

제안 방법

R(theta)를 하이퍼파라미터 구성 theta에 따른 기대 위험으로 정의한다.
m개의 데이터셋에 걸쳐 R^{(j)}(theta)의 요약을 최소화하여 최적 기본값 theta*를 제안한다.
기본값과 최적 위험의 차이를 바탕으로 튜너빌리티 측정치 d^{(j)}와 d_i^{(j)}를 도입한다.
순차적 튜닝 비교를 포함하여 하이퍼파라미터 쌍 및 결합 이익 g_{i1,i2}를 포함하도록 튜너빌리티를 확장한다.
데이터셋별 분위수를 사용하여 강건한 튜닝 범위를 포착하는 최적 하이퍼파라미터 공간 Theta*를 정의한다.
대리모형(랜덤 포레스트 등)을 사용해 R^{(j)}(theta)를 추정하고 블랙박스 최적화를 통해 기본값과 튜닝을 찾는다.
38개의 OpenML100 이진 분류 데이터셋에서 여섯 알고리즘(glmnet, rpart, kknn, svm, ranger, xgboost)과 교차검증을 사용하여 튜너빌리티를 평가한다.

실험 결과

연구 질문

RQ1다양한 데이터셋에서 잘 작동하도록 기본값을 어떻게 정의해야 하는가?
RQ2일반적인 ML 알고리즘은 전반적으로 얼마나 튜너블하며 어떤 하이퍼파라미터가 가장 영향력이 큰가?
RQ3개별 하이퍼파라미터를 튜닝하는 것과 조합으로 튜닝하는 것의 이점은 무엇인가?
RQ4데이터셋 전반에서 성능이 향상되는 위치를 포착하는 적절한 하이퍼파라미터 튜닝 공간은 무엇인가?
RQ5대리모형이 튜너빌리티 추정과 자동 튜닝 가이딩에 어떻게 도움을 줄 수 있는가?

주요 결과

최적 기본값은 여러 알고리즘에서 소프트웨어 기본값보다 성능을 크게 향상시키며, 방법에 따라 튜너빌리티가 다르게 나타난다.
glmnet과 svm은 ranger보다 튜너빌리티가 더 높게 나타났고, 연구에서 ranger가 가장 작은 튜너빌리티를 보였다.
개별 하이퍼파라미터도 상당한 튜너빌리티를 가질 수 있다(예: svm의 gamma, lambda; xgboost의 eta, booster).
하이퍼파라미터 쌍의 공동 튜닝은 단일 매개변수 튜닝보다 더 큰 이득을 주는 경우가 많다(예: rpart의 minsplit과 minbucket).
5번째 및 95번째 분위수를 통해 정의된 튜닝 공간은 많은 데이터셋에서 최적 기본값을 포괄하지만, 일부 패키지 기본값은 강건한 범위를 벗어난다.
대리모형(다양한 것 중 랜덤 포레스트 등)은 R(theta)의 신뢰할 만한 추정치를 제공하여 효율적인 튜닝 의사결정을 가능하게 한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.