Skip to content

hyeonsangjeon/Hyperparameters-Optimization

Repository files navigation

screenshot1

  • ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ์กฐ์ •์€ ๋ชจ๋“  ๊ธฐ๊ณ„ ํ•™์Šต ํ”„๋กœ์ ํŠธ์˜ ํ•„์ˆ˜ ๋ถ€๋ถ„์ด๋ฉฐ ๊ฐ€์žฅ ์‹œ๊ฐ„์ด ๋งŽ์ด ๊ฑธ๋ฆฌ๋Š” ์ž‘์—… ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค.
  • ๊ฐ€์žฅ ๋‹จ์ˆœํ•œ ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ์—๋„ ํ•˜๋ฃจ, ๋ช‡ ์ฃผ ๋˜๋Š” ๊ทธ ์ด์ƒ ์ตœ์ ํ™” ํ•  ์ˆ˜์žˆ๋Š” ์‹ ๊ฒฝ๋ง์„ ์–ธ๊ธ‰ํ•˜์ง€ ์•Š๊ณ  ์ตœ์ ์˜ ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ์ฐพ๋Š” ๋ฐ ๋ช‡ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” Grid Search , Random Search, HyperBand, Bayesian optimization, Tree-structured Parzen Estimator(TPE)์— ๋Œ€ํ•ด ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค.
  • ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ์กฐ์ •์€ ํ•จ์ˆ˜ ์ตœ์ ํ™” ์ž‘์—…์— ์ง€๋‚˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋ถ„๋ช…ํžˆ Grid ๋˜๋Š” Random Search๊ฐ€ ์œ ์ผํ•˜๊ณ  ์ตœ์ƒ์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์•„๋‹ˆ์ง€๋งŒ ํšจ์œจ์ ์ธ ์†๋„์™€ ๊ฒฐ๊ณผ ์ธก๋ฉด์—์„œ ๊พธ์ค€ํžˆ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
  • ์ด๋ก ์  ๊ด€์ ์—์„œ ์–ด๋–ป๊ฒŒ ์ž‘๋™ํ•˜๋Š”์ง€ ์„ค๋ช…ํ•˜๊ณ  Hyperopt ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‹ค์ œ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
  • ์ด ํŠœํ† ๋ฆฌ์–ผ์„ ๋งˆ์น˜๋ฉด ๋ชจ๋ธ๋ง ํ”„๋กœ์„ธ์Šค์˜ ์†๋„๋ฅผ ์‰ฝ๊ฒŒ ๋†’์ด๋Š” ๋ฐฉ๋ฒ•์„ ์•Œ๊ฒŒ๋ฉ๋‹ˆ๋‹ค.
  • ํŠœํ† ๋ฆฌ์–ผ์˜ ์œ ์ผํ•œ ๋ชฉํ‘œ๋Š” ๋งค์šฐ ๋‹จ์ˆœํ™” ๋œ ์˜ˆ์ œ์—์„œ Hyperparameter optimization์„ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์‹œ์—ฐํ•˜๊ณ  ์„ค๋ช…ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๊ตญ๋‚ด๊ด€๋ จ๋ฐœํ‘œ๊ธฐ๊ณ 

#!pip install pip install git+https://github.com/darenr/scikit-optimize

Preparation step

  • ํ‘œ์ค€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๊ฐ€์ ธ ์˜ต๋‹ˆ๋‹ค
#!pip install lightgbm
import numpy as np
import pandas as pd

from lightgbm.sklearn import LGBMRegressor
from sklearn.metrics import mean_squared_error

%matplotlib inline

import warnings                                  # `do not disturbe` mode
warnings.filterwarnings('ignore')

sklearn.datasets์˜ ๋‹น๋‡จ๋ณ‘ ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•œ ๋‹ค์–‘ํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‹œ์—ฐํ•˜๊ณ  ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค. ๋กœ๋“œํ•ฉ์‹œ๋‹ค.

์—ฌ๊ธฐ์—์„œ ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•œ ์„ค๋ช…์„ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. [https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html] add: ์ผ๋ถ€ ํ™˜์ž ๋ฐ ๋Œ€์ƒ ์ธก์ • ํ•ญ๋ชฉ์— ๋Œ€ํ•œ ์ •๋ณด๊ฐ€ ํฌํ•จ ๋œ ๋ฐ์ดํ„ฐ ์„ธํŠธ์ž…๋‹ˆ๋‹ค. "๊ธฐ์ค€์„  1 ๋…„ ํ›„ ์งˆ๋ณ‘ ์ง„ํ–‰์˜ ์ •๋Ÿ‰์  ์ธก์ •". ์ด ์˜ˆ์ œ์˜ ๋ชฉ์ ์„ ์œ„ํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์ดํ•ดํ•  ํ•„์š”๋„ ์—†์Šต๋‹ˆ๋‹ค. ํšŒ๊ท€ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ  ์žˆ์œผ๋ฉฐ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์กฐ์ •ํ•˜๋ ค๊ณ ํ•œ๋‹ค๋Š” ์ ์„ ๋ช…์‹ฌํ•˜์‹ญ์‹œ์˜ค.

from sklearn.datasets import load_diabetes

diabetes = load_diabetes()
n = diabetes.data.shape[0]

data = diabetes.data
targets = diabetes.target
  • ๋ฐ์ดํ„ฐ ์„ธํŠธ๋Š” ๋งค์šฐ ์ž‘์Šต๋‹ˆ๋‹ค.
  • ์ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ธฐ๋ณธ ๊ฐœ๋…์„ ์‰ฝ๊ฒŒ ๋ณด์—ฌ์ค„ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์„ ํƒํ–ˆ์Šต๋‹ˆ๋‹ค.( ๋ชจ๋“  ๊ฒƒ์ด ๊ณ„์‚ฐ ๋  ๋•Œ ๋ช‡ ์‹œ๊ฐ„์„ ๊ธฐ๋‹ค๋ฆด ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค.)
  • ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ํ•™์Šต ๋ฐ ํ…Œ์ŠคํŠธ ๋ถ€๋ถ„์œผ๋กœ ๋‚˜๋ˆŒ ๊ฒƒ์ž…๋‹ˆ๋‹ค. train ๋ถ€๋ถ„์€ 2 ๊ฐœ๋กœ ๋ถ„ํ• ๋˜๋ฉฐ, ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐ ๋”ฐ๋ผ ๊ต์ฐจ ๊ฒ€์ฆ MSE๋ฅผ ์ตœ์ข… ์ธก์ • ํ•ญ๋ชฉ์œผ๋กœ ์‚ฌ์šฉํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

add :์ด ๊ฐ„๋‹จํ•œ ์˜ˆ์ œ๋Š” ์‹ค์ œ ๋ชจ๋ธ๋ง์„ ์œ„ํ•œ ๋ฐฉ๋ฒ•์ด ์•„๋‹™๋‹ˆ๋‹ค. ๋น ๋ฅธ ๋ฐ๋ชจ ์†Œ๊ฐœ์—๋งŒ ์‚ฌ์šฉํ•˜๋Š” ์ž‘์€ ๋ฐ์ดํ„ฐ ์„ธํŠธ์™€ 2 ๊ฐœ์˜ fold๋กœ ์ธํ•ด ๋ถˆ์•ˆ์ • ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ๋Š” random_state์— ๋”ฐ๋ผ ํฌ๊ฒŒ ๋ณ€๊ฒฝ๋ฉ๋‹ˆ๋‹ค.

iteration์€ 50์œผ๋กœ ๊ณ ์ •ํ•ฉ๋‹ˆ๋‹ค.

from sklearn.model_selection import KFold, cross_val_score
from sklearn.model_selection import train_test_split

random_state=42
n_iter=50

train_data, test_data, train_targets, test_targets = train_test_split(data, targets, 
                                                                      test_size=0.20, shuffle=True,
                                                                      random_state=random_state)
num_folds=2
kf = KFold(n_splits=num_folds, random_state=random_state)
print('train_data : ',train_data.shape)
print('test_data : ',test_data.shape)

print('train_targets : ',train_targets.shape)
print('test_targets : ',test_targets.shape)
train_data :  (353, 10)
test_data :  (89, 10)
train_targets :  (353,)
test_targets :  (89,)

๋ชจ๋ธ์ƒ์„ฑ

LGBMRegressor๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•ด๋ด…๋‹ˆ๋‹ค. Gradient Boosting์—๋Š” ์ตœ์ ํ™” ํ•  ์ˆ˜์žˆ๋Š” ๋งŽ์€ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์žˆ์œผ๋ฏ€๋กœ ๋ฐ๋ชจ์— ์ ํ•ฉํ•œ ์„ ํƒ์ž…๋‹ˆ๋‹ค.

model = LGBMRegressor(random_state=random_state)

๊ธฐ๋ณธ ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ธฐ์ค€ ๋ชจ๋ธ์„ ํ•™์Šต ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

  • ์ฆ‰์‹œ ์ถœ๋ ฅ๋œ ๋ชจ๋ธ์˜ ๊ฒฐ๊ณผ๋Š” 3532์ž…๋‹ˆ๋‹ค.
%%time
score = -cross_val_score(model, train_data, train_targets, cv=kf, scoring="neg_mean_squared_error", n_jobs=-1).mean()
print(score)
3532.0822189641976
CPU times: user 23 ms, sys: 33 ms, total: 56 ms
Wall time: 806 ms

์‹คํ—˜์— ์‚ฌ์šฉํ•œ Scikit-Learn์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • base_estimator: ๊ธฐ๋ณธ ๋ชจํ˜•
  • n_estimators: ๋ชจํ˜• ๊ฐฏ์ˆ˜. ๋””ํดํŠธ 10
  • bootstrap: ๋ฐ์ดํ„ฐ์˜ ์ค‘๋ณต ์‚ฌ์šฉ ์—ฌ๋ถ€. ๋””ํดํŠธ True
  • max_samples: ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ ์ค‘ ์„ ํƒํ•  ์ƒ˜ํ”Œ์˜ ์ˆ˜ ํ˜น์€ ๋น„์œจ. ๋””ํดํŠธ 1.0
  • bootstrap_features: ํŠน์ง• ์ฐจ์›์˜ ์ค‘๋ณต ์‚ฌ์šฉ ์—ฌ๋ถ€. ๋””ํดํŠธ False
  • max_features: ๋‹ค์ฐจ์› ๋…๋ฆฝ ๋ณ€์ˆ˜ ์ค‘ ์„ ํƒํ•  ์ฐจ์›์˜ ์ˆ˜ ํ˜น์€ ๋น„์œจ 1.0

์ตœ์ ํ™” ์ ‘๊ทผ ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ๋ชจ ๋ชฉ์ ์œผ๋กœ 3 ๊ฐœ์˜ ๋งค๊ฐœ ๋ณ€์ˆ˜ ๋งŒ ์กฐ์ •ํ•˜๋Š” ๋ชจ๋ธ์„ ์ตœ์ ํ™” ํ•˜๋Š” ์˜ˆ์ œ์ž…๋‹ˆ๋‹ค.

  • n_estimators: from 100 to 2000
  • max_depth: from 2 to 20
  • learning_rate: from 10e-5 to 1

Computing power๋Š” ์ผ๋ฐ˜์ ์ธ ๋กœ์ปฌ ๋ฏธ๋‹ˆ์„œ๋ฒ„์™€ ํด๋ผ์šฐ๋“œ์ปดํ“จํŒ… ํ™˜๊ฒฝ์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

  • ๋กœ์ปฌ๋ฏธ๋‹ˆ์„œ๋ฒ„ : AMD 2700x (1 CPU - 8Core)
  • ํด๋ผ์šฐ๋“œ์„œ๋ฒ„ : (18 CPU- 162core) [Intel(R) Xeon(R) 2.00GHZ]

1. GridSearch

ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ์ตœ์ ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ์ „ํ†ต์ ์ธ ๋ฐฉ๋ฒ•์€ ๊ทธ๋ฆฌ๋“œ ๊ฒ€์ƒ‰ ๋˜๋Š” ๋งค๊ฐœ ๋ณ€์ˆ˜ ์Šค์œ•์œผ๋กœ, ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ณต๊ฐ„์—์„œ ์ˆ˜๋™์œผ๋กœ ์ง€์ •๋œ ํ•˜์œ„ ์ง‘ํ•ฉ์„ ํ†ตํ•ด ์ „์ฒด์ ์œผ๋กœ ๊ฒ€์ƒ‰ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

  • ๊ฐ€์žฅ ๋จผ์ € ์‹œ๋„ํ•ด ๋ณผ ์ˆ˜์žˆ๋Š” ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์€ sklearn.model_selection์— ํฌํ•จ ๋œ GridSearchCV์ž…๋‹ˆ๋‹ค.์ด ์ ‘๊ทผ ๋ฐฉ์‹์€ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ๋งค๊ฐœ ๋ณ€์ˆ˜์˜ ์กฐํ•ฉ์„ 1 x 1๋กœ ์‹œ๋„ํ•˜๊ณ  ์ตœ์ƒ์˜ ๊ต์ฐจ ๊ฒ€์ฆ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ€์ง„ ๊ฒƒ์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.

์ด ์ ‘๊ทผ ๋ฐฉ์‹์—๋Š” ๋ช‡ ๊ฐ€์ง€ ๋‹จ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

  1. ๋งค์šฐ ๋Š๋ฆฝ๋‹ˆ๋‹ค. ๋ชจ๋“  ๋งค๊ฐœ ๋ณ€์ˆ˜์˜ ๋ชจ๋“  ์กฐํ•ฉ์„ ์‹œ๋„ํ•˜๊ณ  ๋งŽ์€ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆฝ๋‹ˆ๋‹ค. ๋ณ€๋Ÿ‰ ํ•  ์ถ”๊ฐ€ ๋งค๊ฐœ ๋ณ€์ˆ˜๋Š” ์™„๋ฃŒํ•ด์•ผํ•˜๋Š” ๋ฐ˜๋ณต ํšŸ์ˆ˜๋ฅผ ๊ณฑํ•ฉ๋‹ˆ๋‹ค. ๊ฐ€๋Šฅํ•œ ๊ฐ’์ด 10 ๊ฐœ์ธ ์ƒˆ ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ๋งค๊ฐœ ๋ณ€์ˆ˜ ๊ทธ๋ฆฌ๋“œ์— ์ถ”๊ฐ€ํ•œ๋‹ค๊ณ  ๊ฐ€์ • ํ•ด๋ณด์‹ญ์‹œ์˜ค.์ด ๋งค๊ฐœ ๋ณ€์ˆ˜๋Š” ๋ฌด์˜๋ฏธํ•œ ๊ฒƒ์œผ๋กœ ํŒ๋ช… ๋  ์ˆ˜ ์žˆ์ง€๋งŒ ๊ณ„์‚ฐ ์‹œ๊ฐ„์€ 10 ๋ฐฐ ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
  2. ์ด์‚ฐ ๊ฐ’์œผ๋กœ ๋งŒ ์ž‘๋™ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ „์—ญ ์ตœ์  ๊ฐ’์ด n_estimators = 550์ด์ง€๋งŒ 100 ๋‹จ๊ณ„์—์„œ 100์—์„œ 1000๊นŒ์ง€ GridSearchCV๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒฝ์šฐ ์ตœ์  ์ ์— ๋„๋‹ฌํ•˜์ง€ ๋ชปํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.
  3. ์ ์ ˆํ•œ ์‹œ๊ฐ„์— ๊ฒ€์ƒ‰์„ ์™„๋ฃŒํ•˜๋ ค๋ฉด approximate localization of the optimum๋ฅผ ์•Œ๊ณ  / ์ถ”์ธกํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ๋‹จ์  ์ค‘ ์ผ๋ถ€๋ฅผ ๊ทน๋ณต ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋งค๊ฐœ ๋ณ€์ˆ˜๋ณ„๋กœ ๊ทธ๋ฆฌ๋“œ ๊ฒ€์ƒ‰ ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ฑฐ๋‚˜ ํฐ ๋‹จ๊ณ„๊ฐ€์žˆ๋Š” ๋„“์€ ๊ทธ๋ฆฌ๋“œ์—์„œ ์‹œ์ž‘ํ•˜์—ฌ ์—ฌ๋Ÿฌ ๋ฒˆ ์‚ฌ์šฉํ•˜๊ณ  ๋ฐ˜๋ณต์—์„œ ๊ฒฝ๊ณ„๋ฅผ ์ขํžˆ๊ณ  ๋‹จ๊ณ„ ํฌ๊ธฐ๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, ์—ฌ์ „ํžˆ ๋งค์šฐ ๊ณ„์‚ฐ ์ง‘์•ฝ์ ์ด๊ณ  ๊ธธ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

  • ์šฐ๋ฆฌ์˜ ๊ฒฝ์šฐ ๊ทธ๋ฆฌ๋“œ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐ ๊ฑธ๋ฆฌ๋Š” ์‹œ๊ฐ„์„ ์ถ”์ • ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๋“œ๊ฐ€ 'n_estimators'(100 ~ 2000)์˜ ๊ฐ€๋Šฅํ•œ ๊ฐ’ 20 ๊ฐœ,
  • 'max_depth'์˜ 19 ๊ฐœ ๊ฐ’ (2 ~ 20),
  • 'learning_rate'(10e-4 ~ 0.1)์˜ 5 ๊ฐœ ๊ฐ’์œผ๋กœ ๊ตฌ์„ฑ๋˜๊ธฐ๋ฅผ ์›ํ•œ๋‹ค๊ณ  ๊ฐ€์ • ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
  • ์ฆ‰, cross_val_score 20 * 19 * 5 = 1900 ๋ฒˆ ๊ณ„์‚ฐํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค. 1 ๋ฒˆ ๊ณ„์‚ฐ์— 0.5 ~ 1.0 ์ดˆ๊ฐ€ ๊ฑธ๋ฆฌ๋ฉด ๊ทธ๋ฆฌ๋“œ ๊ฒ€์ƒ‰์€ 15 ~ 30 ๋ถ„ ๋™์•ˆ ์ง€์†๋ฉ๋‹ˆ๋‹ค. ~ 400 ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๊ฐ€์žˆ๋Š” ๋ฐ์ดํ„ฐ ์„ธํŠธ์—๋Š” ๋„ˆ๋ฌด ๋งŽ์Šต๋‹ˆ๋‹ค.
  • ์‹คํ—˜ ์‹œ๊ฐ„์€ ์˜ค๋ž˜ ๊ฑธ๋ฆฌ์ง€ ๋ง์•„์•ผํ•˜๋ฏ€๋กœ, ์ด ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ถ„์„ ํ•  ๊ตฌ๊ฐ„์„ ์ขํ˜€ ์•ผํ•ฉ๋‹ˆ๋‹ค. 5 * 8 * 3 = 120 ์กฐํ•ฉ ๋งŒ ๋‚จ๊ฒผ์Šต๋‹ˆ๋‹ค.
  • (18 CPU- 162core) Wall time: 5.5 s
  • AMD 2700x (1 CPU - 8Core) Wall time: 6.7 s
%%time
from sklearn.model_selection import GridSearchCV

param_grid={'max_depth':  np.linspace(5,12,8,dtype = int),
            'n_estimators': np.linspace(800,1200,5, dtype = int),
            'learning_rate': np.logspace(-3, -1, 3),            
            'random_state': [random_state]}

gs=GridSearchCV(model, param_grid, scoring='neg_mean_squared_error', n_jobs=-1, cv=kf, verbose=False)

gs.fit(train_data, train_targets)
gs_test_score=mean_squared_error(test_targets, gs.predict(test_data))


print('===========================')
print("Best MSE = {:.3f} , when params {}".format(-gs.best_score_, gs.best_params_))
print('===========================')
===========================
Best MSE = 3319.975 , when params {'learning_rate': 0.01, 'max_depth': 5, 'n_estimators': 800, 'random_state': 42}
===========================
CPU times: user 1.58 s, sys: 21 ms, total: 1.6 s
Wall time: 6.3 s

๊ฒฐ๊ณผ๋ฅผ ๊ฐœ์„ ํ–ˆ์ง€๋งŒ, ๊ทธ๊ฒƒ์— ๋งŽ์€ ์‹œ๊ฐ„์„ ๋ณด๋ƒˆ์Šต๋‹ˆ๋‹ค. ๋งค๊ฐœ ๋ณ€์ˆ˜๊ฐ€ ๋ฐ˜๋ณต์—์„œ ๋ฐ˜๋ณต์œผ๋กœ ์–ด๋–ป๊ฒŒ ๋ณ€๊ฒฝ๋˜์—ˆ๋Š”์ง€ ์‚ดํŽด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

  • ์•„๋ž˜ ๊ทธ๋ฆผ์—์„œ, (MSE๊ฐ€ ๋‚ฎ์„๋•Œ ๊ฐ ๋ณ€์ˆ˜๊ด€๊ณ„ ์ฐธ์กฐ)
  • ์˜ˆ๋ฅผ ๋“ค์–ด max_depth๋Š” ์ ์ˆ˜์— ํฌ๊ฒŒ ์˜ํ–ฅ์„์ฃผ์ง€ ์•Š๋Š” ๊ฐ€์žฅ ๋œ ์ค‘์š”ํ•œ ๋งค๊ฐœ ๋ณ€์ˆ˜์ž„์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์šฐ๋ฆฌ๋Š” max_depth์˜ 8 ๊ฐ€์ง€ ๋‹ค๋ฅธ ๊ฐ’์„ ๊ฒ€์ƒ‰ํ•˜๊ณ  ๋‹ค๋ฅธ ๋งค๊ฐœ ๋ณ€์ˆ˜์— ๋Œ€ํ•œ ๊ณ ์ • ๊ฐ’ ๊ฒ€์ƒ‰์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์‹œ๊ฐ„๊ณผ ์ž์›์˜ ๋‚ญ๋น„์ž…๋‹ˆ๋‹ค.
gs_results_df=pd.DataFrame(np.transpose([-gs.cv_results_['mean_test_score'],
                                         gs.cv_results_['param_learning_rate'].data,
                                         gs.cv_results_['param_max_depth'].data,
                                         gs.cv_results_['param_n_estimators'].data]),
                           columns=['score', 'learning_rate', 'max_depth', 'n_estimators'])
gs_results_df.plot(subplots=True,figsize=(10, 10))
array([<matplotlib.axes._subplots.AxesSubplot object at 0x7fbdc6fc7898>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x7fbdc6febe80>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x7fbdc6f96898>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x7fbdc6fd2828>],
      dtype=object)

png

2. Random Search

Research Paper Random Search

  • Random Search๋Š” ๊ทธ๋ฆฌ๋“œ ๊ฒ€์ƒ‰๋ณด๋‹ค ํ‰๊ท ์ ์œผ๋กœ ๋” ํšจ๊ณผ์ ์ž…๋‹ˆ๋‹ค.

์ฃผ์š” ์žฅ์  :

  1. ์˜๋ฏธ์—†๋Š” ๋งค๊ฐœ ๋ณ€์ˆ˜์— ์‹œ๊ฐ„์„ ์†Œ๋น„ํ•˜์ง€ ์•‰์Œ. ๋ชจ๋“  ๋‹จ๊ณ„์—์„œ ๋ฌด์ž‘์œ„ ๊ฒ€์ƒ‰์€ ๋ชจ๋“  ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค.
  2. ํ‰๊ท ์ ์œผ๋กœ ๊ทธ๋ฆฌ๋“œ ๊ฒ€์ƒ‰๋ณด๋‹ค ํ›จ์”ฌ ๋น ๋ฅด๊ฒŒ ~ ์ตœ์ ์˜ ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ์ฐพ์Šต๋‹ˆ๋‹ค.
  3. ์—ฐ์† ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ์ตœ์ ํ™” ํ•  ๋•Œ ๊ทธ๋ฆฌ๋“œ์— ์˜ํ•ด ์ œํ•œ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

๋‹จ์ :

  1. ๊ทธ๋ฆฌ๋“œ์—์„œ ๊ธ€๋กœ๋ฒŒ ์ตœ์  ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ์ฐพ์ง€ ๋ชปํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  2. ๋ชจ๋“  ๋‹จ๊ณ„๋Š” ๋…๋ฆฝ์ ์ž…๋‹ˆ๋‹ค. ๋ชจ๋“  ํŠน์ • ๋‹จ๊ณ„์—์„œ ์ง€๊ธˆ๊นŒ์ง€ ์ˆ˜์ง‘ ๋œ ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์˜ˆ์ œ๋Š”, sklearn.model_selection์—์„œ RandomizedSearchCV๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋งค์šฐ ๋„“์€ ๋งค๊ฐœ ๋ณ€์ˆ˜ ๊ณต๊ฐ„์œผ๋กœ ์‹œ์ž‘ํ•˜์—ฌ 50 ๊ฐœ์˜ ๋ฌด์ž‘์œ„ ๋‹จ๊ณ„ ๋งŒ ๋งŒ๋“ค ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ˆ˜ํ–‰์†๋„:

  • (18 CPU- 162core) Wall time: 2.51 s
  • AMD 2700x (1 CPU - 8Core) Wall time: 3.08 s
%%time
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint

param_grid_rand={'learning_rate': np.logspace(-5, 0, 100),
                 'max_depth':  randint(2,20),
                 'n_estimators': randint(100,2000),
                 'random_state': [random_state]}

rs=RandomizedSearchCV(model, param_grid_rand, n_iter = n_iter, scoring='neg_mean_squared_error',
                n_jobs=-1, cv=kf, verbose=False, random_state=random_state)

rs.fit(train_data, train_targets)

rs_test_score=mean_squared_error(test_targets, rs.predict(test_data))

print('===========================')
print("Best MSE = {:.3f} , when params {}".format(-rs.best_score_, rs.best_params_))
print('===========================')
===========================
Best MSE = 3200.402 , when params {'learning_rate': 0.0047508101621027985, 'max_depth': 19, 'n_estimators': 829, 'random_state': 42}
===========================
CPU times: user 1.16 s, sys: 25 ms, total: 1.19 s
Wall time: 3.15 s

๊ฒฐ๊ณผ๋Š” GridSearchCV๋ณด๋‹ค ๋‚ซ์Šต๋‹ˆ๋‹ค. ๋” ์ ์€ ์‹œ๊ฐ„์„ ์†Œ๋น„ํ•˜๊ณ  ๋” ์™„์ „ํ•œ ๊ฒ€์ƒ‰์„ํ–ˆ์Šต๋‹ˆ๋‹ค. ์‹œ๊ฐํ™”๋ฅผ ์‚ดํŽด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

  • random search์˜ ๋ชจ๋“  ๋‹จ๊ณ„๋Š” ์™„์ „ํžˆ ๋ฌด์ž‘์œ„์ž…๋‹ˆ๋‹ค. ์“ธ๋ชจ์—†๋Š” ๋งค๊ฐœ ๋ณ€์ˆ˜์— ์‹œ๊ฐ„์„ ์†Œ๋น„ํ•˜์ง€ ์•Š๋Š” ๋ฐ ๋„์›€์ด๋˜์ง€๋งŒ ์—ฌ์ „ํžˆ ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„์—์„œ ์ˆ˜์ง‘ ๋œ ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ›„์ž์˜ ๊ฒฐ๊ณผ๋ฅผ ๊ฐœ์„ ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
rs_results_df=pd.DataFrame(np.transpose([-rs.cv_results_['mean_test_score'],
                                         rs.cv_results_['param_learning_rate'].data,
                                         rs.cv_results_['param_max_depth'].data,
                                         rs.cv_results_['param_n_estimators'].data]),
                           columns=['score', 'learning_rate', 'max_depth', 'n_estimators'])
rs_results_df.plot(subplots=True,figsize=(10, 10))
array([<matplotlib.axes._subplots.AxesSubplot object at 0x7fbdc6dfd4a8>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x7fbdc6d44668>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x7fbdc6d78630>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x7fbdc6d2c630>],
      dtype=object)

png

3. HyperBand

Research Paper HyperBand

Abstract ๋ฐœ์ทŒ: ๋จธ์‹  ๋Ÿฌ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์„ฑ๋Šฅ์€ ์ข‹์€ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ์ง‘ํ•ฉ์„ ์‹๋ณ„ํ•˜๋Š” ๋ฐ ๋งค์šฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ์ตœ๊ทผ ์ ‘๊ทผ ๋ฐฉ์‹์€ ๋ฒ ์ด์ง€์•ˆ ์ตœ์ ํ™”๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ตฌ์„ฑ์„ ์ ์‘ ์ ์œผ๋กœ ์„ ํƒํ•˜์ง€๋งŒ ์ ์‘ ํ˜• ๋ฆฌ์†Œ์Šค ํ• ๋‹น ๋ฐ ์กฐ๊ธฐ ์ค‘์ง€๋ฅผ ํ†ตํ•ด ์ž„์˜ ๊ฒ€์ƒ‰ ์†๋„๋ฅผ ๋†’์ด๋Š” ๋ฐ ์ค‘์ ์„ ๋‘ก๋‹ˆ๋‹ค. ๋ฐ˜๋ณต, ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ ๋˜๋Š” ๊ธฐ๋Šฅ๊ณผ ๊ฐ™์€ ์‚ฌ์ „ ์ •์˜ ๋œ ๋ฆฌ์†Œ์Šค๊ฐ€ ๋ฌด์ž‘์œ„๋กœ ์ƒ˜ํ”Œ๋ง ๋œ ๊ตฌ์„ฑ์— ํ• ๋‹น๋˜๋Š” ์ˆœ์ˆ˜ ํƒ์ƒ‰ ๋น„ ํ™•๋ฅ  ์  ๋ฌดํ•œ ๋ฌด์žฅ ๋ฐด๋””ํŠธ ๋ฌธ์ œ๋กœ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ์ตœ์ ํ™”๋ฅผ ๊ณต์‹ํ™”ํ•ฉ๋‹ˆ๋‹ค. ์ด ํ”„๋ ˆ์ž„ ์›Œํฌ์— ๋Œ€ํ•ด ์ƒˆ๋กœ์šด ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ธ Hyperband๋ฅผ ๋„์ž…ํ•˜๊ณ  ์ด๋ก ์  ์†์„ฑ์„ ๋ถ„์„ํ•˜์—ฌ ๋ช‡ ๊ฐ€์ง€ ๋ฐ”๋žŒ์งํ•œ ๋ณด์žฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ์ตœ์ ํ™” ๋ฌธ์ œ ๋ชจ์Œ์— ๋Œ€ํ•ด Hyperband๋ฅผ ์ธ๊ธฐ์žˆ๋Š” ๋ฒ ์ด์ง€์•ˆ ์ตœ์ ํ™” ๋ฐฉ๋ฒ•๊ณผ ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค. Hyperband๋Š” ๋‹ค์–‘ํ•œ ๋”ฅ ๋Ÿฌ๋‹ ๋ฐ ์ปค๋„ ๊ธฐ๋ฐ˜ ํ•™์Šต ๋ฌธ์ œ์— ๋Œ€ํ•ด ๊ฒฝ์Ÿ ์—…์ฒด๋ณด๋‹ค ํ›จ์”ฌ ๋น ๋ฅธ ์†๋„๋ฅผ ์ œ๊ณต ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ยฉ 2018 Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh ๋ฐ Ameet Talwalkar.

  • Hyperband Search๋Š” ์ตœ์ ํ™” ๊ฒ€์ƒ‰ ์†๋„์— ์ค‘์ ์„ ๋‘ก๋‹ˆ๋‹ค.
  • n๊ฐœ์˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์กฐํ•ฉ์„ ๋žœ๋ค ์ƒ˜ํ”Œ๋ง.
  • ์ „์ฒด resource๋ฅผ n๊ฐœ๋กœ ๋ถ„ํ• ํ•˜๊ณ , ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ์กฐํ•ฉ์— ๊ฐ๊ฐ ํ• ๋‹นํ•˜์—ฌ ํ•™์Šต
  • ๊ฐ ํ•™์Šต ํ”„๋กœ์„ธ์Šค๋Š” ์ผ์ • ๋น„์œจ ์ด์ƒ์˜ ์ƒ์œ„ ์กฐํ•ฉ์„ ๋‚จ๊ธฐ๊ณ  ๋ฒ„๋ฆผ.

์ˆ˜ํ–‰์†๋„:

  • (18 CPU- 162core) Wall time: 2.51 s
  • AMD 2700x (1 CPU - 8Core) Wall time: 1.19 s

cloud์ž์›์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ๋ถ„์‚ฐ ์ž์›์˜ ์ค€๋น„ ์‹œ๊ฐ„์ด ์ƒ๋Œ€์ ์œผ๋กœ ๊ธด๊ฒƒ์„ ๋ณผ์ˆ˜ ์žˆ์—ˆ์Œ.

!git clone https://github.com/thuijskens/scikit-hyperband.git 2>/dev/null 1>/dev/null
!cp -r scikit-hyperband/* .
!python setup.py install 2>/dev/null 1>/dev/null
%%time
from hyperband import HyperbandSearchCV

from scipy.stats import randint as sp_randint
from sklearn.preprocessing import LabelBinarizer


param_hyper_band={'learning_rate': np.logspace(-5, 0, 100),
                 'max_depth':  randint(2,20),
                 'n_estimators': randint(100,2000),                  
                 #'num_leaves' : randint(2,20),
                 'random_state': [random_state]
                 }


hb = HyperbandSearchCV(model, param_hyper_band, max_iter = n_iter, scoring='neg_mean_squared_error', resource_param='n_estimators', random_state=random_state)


#%time search.fit(new_training_data, y)
hb.fit(train_data, train_targets)



hb_test_score=mean_squared_error(test_targets, hb.predict(test_data))

print('===========================')
print("Best MSE = {:.3f} , when params {}".format(-hb.best_score_, hb.best_params_))
print('===========================')
===========================
Best MSE = 3431.685 , when params {'learning_rate': 0.13848863713938717, 'max_depth': 12, 'n_estimators': 16, 'random_state': 42}
===========================
CPU times: user 13.4 s, sys: 64 ms, total: 13.5 s
Wall time: 2.06 s
hb_results_df=pd.DataFrame(np.transpose([-hb.cv_results_['mean_test_score'],
                                         hb.cv_results_['param_learning_rate'].data,
                                         hb.cv_results_['param_max_depth'].data,
                                         hb.cv_results_['param_n_estimators'].data]),
                           columns=['score', 'learning_rate', 'max_depth', 'n_estimators'])
hb_results_df.plot(subplots=True,figsize=(10, 10))
array([<matplotlib.axes._subplots.AxesSubplot object at 0x7fbdc6c3f7f0>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x7fbdc6c2a358>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x7fbdc6bd4320>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x7fbdc6b882b0>],
      dtype=object)

png

4. Bayesian optimization

Research Paper Bayesian optimization

Random ๋˜๋Š” Grid Search์™€ ๋‹ฌ๋ฆฌ ๋ฒ ์ด์ง€์•ˆ ์ ‘๊ทผ ๋ฐฉ์‹์€ ๋ชฉํ‘œ ํ•จ์ˆ˜์˜ ์ ์ˆ˜ ํ™•๋ฅ ์— ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๋งคํ•‘ํ•˜๋Š” ํ™•๋ฅ  ๋ชจ๋ธ์„ ํ˜•์„ฑํ•˜๋Š”๋ฐ ์‚ฌ์šฉํ•˜๋Š” ๊ณผ๊ฑฐ ํ‰๊ฐ€ ๊ฒฐ๊ณผ๋ฅผ ์ถ”์ ํ•ฉ๋‹ˆ๋‹ค.

P(Score | Hyperparameters)

๋…ผ๋ฌธ์—์„œ ์ด ๋ชจ๋ธ์€ ๋ชฉ์  ํ•จ์ˆ˜์— ๋Œ€ํ•œ "surrogate"๋ผ๊ณ ํ•˜๋ฉฐ p (y | x)๋กœ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. surrogate ํ•จ์ˆ˜๋Š” ๋ชฉ์  ํ•จ์ˆ˜๋ณด๋‹ค ์ตœ์ ํ™”ํ•˜๊ธฐ ํ›จ์”ฌ ์‰ฌ์šฐ ๋ฉฐ ๋ฒ ์ด์ง€์•ˆ ๋ฐฉ๋ฒ•์€ ๋Œ€๋ฆฌ ํ•จ์ˆ˜์—์„œ ๊ฐ€์žฅ ์ž˜ ์ˆ˜ํ–‰๋˜๋Š” ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์„ ํƒํ•˜์—ฌ ์‹ค์ œ ๋ชฉ์  ํ•จ์ˆ˜๋ฅผ ํ‰๊ฐ€ํ•  ๋‹ค์Œ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ์„ธํŠธ๋ฅผ ์ฐพ๋Š” ๋ฐฉ์‹์œผ๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

pseudo code๋กœ ์ •๋ฆฌํ•˜๋ฉด:

  1. ๋ชฉ์  ํ•จ์ˆ˜์˜ ๋Œ€๋ฆฌ ํ™•๋ฅ  ๋ชจ๋ธ ๊ตฌ์ถ•
  1. surrogate์—์„œ ๊ฐ€์žฅ ์ž˜ ์ˆ˜ํ–‰๋˜๋Š” ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ฐพ์Šต๋‹ˆ๋‹ค.
  2. ์ด๋Ÿฌํ•œ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์‹ค์ œ ๋ชฉ์  ํ•จ์ˆ˜์— ์ ์šฉ
  3. ์ƒˆ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ฉํ•˜๋Š” ๋Œ€๋ฆฌ ๋ชจ๋ธ ์—…๋ฐ์ดํŠธ
  4. ์ตœ๋Œ€ ๋ฐ˜๋ณต ๋˜๋Š” ์‹œ๊ฐ„์— ๋„๋‹ฌ ํ•  ๋•Œ๊นŒ์ง€ 2-4 ๋‹จ๊ณ„๋ฅผ ๋ฐ˜๋ณตํ•ฉ๋‹ˆ๋‹ค.

๋” ๊นŠ์ด์žˆ๋Š” ๋ฒ ์ด์ง€์•ˆ ์ตœ์ ํ™”์— ๋Œ€ํ•œ ํ›Œ๋ฅญํ•œ ์ปค๋„์€ ์—ฌ๊ธฐ ์ฐธ์กฐ: https://www.kaggle.com/artgor/bayesian-optimization-for-robots

  • Surrogate Model : ํ˜„์žฌ๊นŒ์ง€ ์กฐ์‚ฌ๋œ ์ž…๋ ฅ๊ฐ’-ํ•จ์ˆซ๊ฐ’ ์ ๋“คย ${(๐‘ฅ_1,f(๐‘ฅ_1)), ..., (๐‘ฅ_๐‘ก,f(๐‘ฅ_๐‘ก))}$ ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ, ๋ฏธ์ง€์˜ ๋ชฉ์  ํ•จ์ˆ˜์˜ ํ˜•ํƒœ์— ๋Œ€ํ•œ ํ™•๋ฅ ์ ์ธ ์ถ”์ •์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ชจ๋ธ

  • Acquisition Function: ๋ชฉ์  ํ•จ์ˆ˜์— ๋Œ€ํ•œ ํ˜„์žฌ๊นŒ์ง€์˜ ํ™•๋ฅ ์  ์ถ”์ • ๊ฒฐ๊ณผ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ, โ€˜์ตœ์  ์ž…๋ ฅ๊ฐ’ย ${๐‘ฅ^โˆ—}$๋ฅผ ์ฐพ๋Š” ๋ฐ ์žˆ์–ด ๊ฐ€์žฅ ์œ ์šฉํ•  ๋งŒํ•œโ€™ ๋‹ค์Œ ์ž…๋ ฅ๊ฐ’ ํ›„๋ณดย ${๐‘ฅ_(๐‘ก+1)}$์„ ์ถ”์ฒœํ•ด ์ฃผ๋Š” ํ•จ์ˆ˜ Expected Improvement(EI) ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

์ˆ˜ํ–‰์†๋„:

  • (18 CPU- 162core) Wall time: 2min 24s
  • AMD 2700x (1 CPU - 8Core) Wall time: 1min 36s

์ƒ๋Œ€์ ์œผ๋กœ ๋กœ์ปฌํ…Œ์ŠคํŠธ์˜ ์ˆ˜ํ–‰ ์†๋„๊ฐ€ ๋น ๋ฅธ๊ฒƒ์„ ๋ณผ์ˆ˜์žˆ์—ˆ๋‹ค.

#! pip install scikit-optimize
#https://towardsdatascience.com/hyperparameter-optimization-with-scikit-learn-scikit-opt-and-keras-f13367f3e796
from skopt import BayesSearchCV
from skopt.space import Real, Categorical, Integer
%%time

search_space={'learning_rate': np.logspace(-5, 0, 100),
                 "max_depth": Integer(2, 20), 
                 'n_estimators': Integer(100,2000),
                 'random_state': [random_state]}
                 

def on_step(optim_result):
    """
    Callback meant to view scores after
    each iteration while performing Bayesian
    Optimization in Skopt"""
    score = bayes_search.best_score_
    print("best score: %s" % score)
    if score >= 0.98:
        print('Interrupting!')
        return True
    
bayes_search = BayesSearchCV(model, search_space, n_iter=n_iter, # specify how many iterations
                                    scoring='neg_mean_squared_error', n_jobs=-1, cv=5)
bayes_search.fit(train_data, train_targets, callback=on_step) # callback=on_step will print score after each iteration

bayes_test_score=mean_squared_error(test_targets, bayes_search.predict(test_data))

print('===========================')
print("Best MSE = {:.3f} , when params {}".format(-bayes_search.best_score_, bayes_search.best_params_))
print('===========================')
best score: -4415.920614880022
best score: -4415.920614880022
best score: -4415.920614880022
best score: -4415.920614880022
best score: -4116.905834420919
best score: -4116.905834420919
best score: -4116.905834420919
best score: -4116.905834420919
best score: -4116.905834420919
best score: -3540.855689828868
best score: -3467.4059934906645
best score: -3467.4059934906645
best score: -3467.4059934906645
best score: -3467.4059934906645
best score: -3467.4059934906645
best score: -3467.4059934906645
best score: -3467.4059934906645
best score: -3467.4059934906645
best score: -3467.4059934906645
best score: -3465.869585251784
best score: -3462.4668073239764
best score: -3462.4668073239764
best score: -3462.4668073239764
best score: -3460.603434822278
best score: -3460.603434822278
best score: -3460.603434822278
best score: -3460.603434822278
best score: -3460.603434822278
best score: -3460.603434822278
best score: -3460.603434822278
best score: -3460.603434822278
best score: -3459.5705953392157
best score: -3456.063877875675
best score: -3456.063877875675
best score: -3454.9987003394112
best score: -3454.9987003394112
best score: -3454.9987003394112
best score: -3454.9987003394112
best score: -3454.9987003394112
best score: -3454.9987003394112
best score: -3454.9987003394112
best score: -3454.9987003394112
best score: -3454.9987003394112
best score: -3454.9987003394112
best score: -3454.9987003394112
best score: -3454.9987003394112
best score: -3454.9987003394112
best score: -3454.9987003394112
best score: -3454.9987003394112
best score: -3454.9987003394112
===========================
Best MSE = 3454.999 , when params OrderedDict([('learning_rate', 0.005336699231206307), ('max_depth', 2), ('n_estimators', 655), ('random_state', 42)])
===========================
CPU times: user 1min 59s, sys: 3min 34s, total: 5min 33s
Wall time: 1min 26s
bayes_results_df=pd.DataFrame(np.transpose([
                                         -np.array(bayes_search.cv_results_['mean_test_score']),
                                         np.array(bayes_search.cv_results_['param_learning_rate']).data,
                                         np.array(bayes_search.cv_results_['param_max_depth']).data,
                                         np.array(bayes_search.cv_results_['param_n_estimators']).data
                                        ]),
                           columns=['score', 'learning_rate', 'max_depth', 'n_estimators'])



bayes_results_df.plot(subplots=True,figsize=(10, 10))
array([<matplotlib.axes._subplots.AxesSubplot object at 0x7fbd6bfcc208>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x7fbd68640470>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x7fbd686b97f0>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x7fbd686f1c50>],
      dtype=object)

png

5.Hyperopt

  • ์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•ด hyperopt [https://github.com/hyperopt/hyperopt] ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ํ˜„์žฌ, ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ์ตœ์ ํ™”๋ฅผ์œ„ํ•œ ์ตœ์‹  ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค.
#!pip install hyperopt

์šฐ์„  hyperopt์—์„œ ๋ช‡ ๊ฐ€์ง€ ์œ ์šฉํ•œ ํ•จ์ˆ˜๋ฅผ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค.

  • fmin : ์ตœ์†Œํ™” ๋ฉ”์ธ ๋ชฉ์  ํ•จ์ˆ˜
  • tpe and anneal : optimization ์ ‘๊ทผ๋ฐฉ์‹
  • hp : ๋‹ค์–‘ํ•œ ๋ณ€์ˆ˜๋“ค์˜ ๋ถ„ํฌ ํฌํ•จ
  • Trials : logging์— ์‚ฌ์šฉ
from hyperopt import fmin, tpe, hp, anneal, Trials

hyperop.fmin์˜ ์ธํ„ฐํŽ˜์ด์Šค๋Š” Grid ๋˜๋Š” Randomized search์™€ ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ๋จผ์ €, ์ตœ์†Œํ™” ํ•  ํ•จ์ˆ˜๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.

  • ์•„๋ž˜๋Š” 'learning_rate', 'max_depth', 'n_estimators'์™€ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” gb_mse_cv () ํ•จ์ˆ˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
def gb_mse_cv(params, random_state=random_state, cv=kf, X=train_data, y=train_targets):
    # the function gets a set of variable parameters in "param"
    params = {'n_estimators': int(params['n_estimators']), 
              'max_depth': int(params['max_depth']), 
             'learning_rate': params['learning_rate']}
    
    # we use this params to create a new LGBM Regressor
    model = LGBMRegressor(random_state=random_state, **params)
    
    # and then conduct the cross validation with the same folds as before
    score = -cross_val_score(model, X, y, cv=cv, scoring="neg_mean_squared_error", n_jobs=-1).mean()

    return score

5.1 Tree-structured Parzen Estimator(TPE)

Research Paper TPE

TPE๋Š” Hyperopt์˜ ๊ธฐ๋ณธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. ์ตœ์ ํ™”๋ฅผ ์œ„ํ•ด ๋ฒ ์ด์ง€์•ˆ ์ ‘๊ทผ ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋“  ๋‹จ๊ณ„์—์„œ ํ•จ์ˆ˜์˜ ํ™•๋ฅ  ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•˜๊ณ  ๋‹ค์Œ ๋‹จ๊ณ„์—์„œ ๊ฐ€์žฅ ์œ ๋งํ•œ ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ์„ ํƒํ•˜๋ ค๊ณ ํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ ์ด๋Ÿฌํ•œ ์œ ํ˜•์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

  • 1.์ž„์˜์˜ initial point ์ƒ์„ฑ ${x^*}$
  • 2.${F(x^*)}$ ๊ณ„์‚ฐ
  • 3.trials ๋กœ๊น… ์ด๋ ฅ์„ ์‚ฌ์šฉํ•ด์„œ, ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ ๋ชจ๋ธ $P(F | x)$๋ฅผ ์ƒ์„ฑ
  • 4.$ P (F | x) $์— ๋”ฐ๋ผ $ {F (x_i)} $๊ฐ€ ๋” ๋‚˜์•„์งˆ ๊ฐ€๋Šฅ์„ฑ์ด ๊ฐ€์žฅ ๋†’์€ $ {x_i} $๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
  • 5.$ {F (x_i)} $์˜ real values๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
  • 6.์ค‘์ง€ ๊ธฐ์ค€ ์ค‘ ํ•˜๋‚˜๊ฐ€ ์ถฉ์กฑ ๋  ๋•Œ๊นŒ์ง€ 3-5 ๋‹จ๊ณ„๋ฅผ ๋ฐ˜๋ณตํ•ฉ๋‹ˆ๋‹ค (์˜ˆ : i> max_eval).

์˜ˆ๋ฅผ ๋“ค์–ด ํŠน์ • TPE ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์ •๋ณด๋Š” ์•„๋ž˜ ๋งํฌ ์ฐธ์กฐ. (์ด ๋งํฌ ๋‚ด์šฉ์€ ์ƒ์„ธ๋ฒ„์ „์œผ๋กœ, ํŠœํ† ๋ฆฌ์–ผ์˜ ๋ฒ”์œ„๋ฅผ ๋ฒ—์–ด๋‚ฉ๋‹ˆ๋‹ค.)

[https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f]

  • fmin์˜ ์‚ฌ์šฉ์€ ๋งค์šฐ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค. ๋งค๊ฐœ ๋ณ€์ˆ˜์˜ ๊ฐ€๋Šฅํ•œ ๊ณต๊ฐ„์„ ์ •์˜ํ•˜๊ณ  ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•˜๊ธฐ ๋งŒํ•˜๋ฉด๋ฉ๋‹ˆ๋‹ค.

์ˆ˜ํ–‰์†๋„:

  • (18 CPU- 162core) Wall time: 7.3s
  • AMD 2700x (1 CPU - 8Core) Wall time: 7.98s
%%time

# possible values of parameters
space={'n_estimators': hp.quniform('n_estimators', 100, 2000, 1),
       'max_depth' : hp.quniform('max_depth', 2, 20, 1),
       'learning_rate': hp.loguniform('learning_rate', -5, 0)
      }

# trials will contain logging information
trials = Trials()

best=fmin(fn=gb_mse_cv, # function to optimize
          space=space, 
          algo=tpe.suggest, # optimization algorithm, hyperotp will select its parameters automatically
          max_evals=n_iter, # maximum number of iterations
          trials=trials, # logging
          rstate=np.random.RandomState(random_state) # fixing random state for the reproducibility
         )

# computing the score on the test set
model = LGBMRegressor(random_state=random_state, n_estimators=int(best['n_estimators']),
                      max_depth=int(best['max_depth']),learning_rate=best['learning_rate'])
model.fit(train_data,train_targets)
tpe_test_score=mean_squared_error(test_targets, model.predict(test_data))

print("Best MSE {:.3f} params {}".format( gb_mse_cv(best), best))
100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 50/50 [00:06<00:00,  8.32trial/s, best loss: 3186.7910608402444]
Best MSE 3186.791 params {'learning_rate': 0.026975706032324936, 'max_depth': 20.0, 'n_estimators': 168.0}
CPU times: user 784 ms, sys: 37 ms, total: 821 ms
Wall time: 6.08 s

Best MSE 3186๋กœ RandomizedSearch์— ๋น„ํ•ด ์‹œ๊ฐ„์€ ๊ฑธ๋ฆฌ์ง€๋งŒ, ์ข€ ๋” ๋‚˜์€ ์†”๋ฃจ์…˜์„ ์ฐพ์•˜์Šต๋‹ˆ๋‹ค.

tpe_results=np.array([[x['result']['loss'],
                      x['misc']['vals']['learning_rate'][0],
                      x['misc']['vals']['max_depth'][0],
                      x['misc']['vals']['n_estimators'][0]] for x in trials.trials])

tpe_results_df=pd.DataFrame(tpe_results,
                           columns=['score', 'learning_rate', 'max_depth', 'n_estimators'])
tpe_results_df.plot(subplots=True,figsize=(10, 10))
array([<matplotlib.axes._subplots.AxesSubplot object at 0x7fbd5e386c88>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x7fbd5e2f9828>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x7fbd5e3f7828>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x7fbd5e426c88>],
      dtype=object)

png

Results

๋ชจ๋“  ์ ‘๊ทผ ๋ฐฉ์‹์— ๋Œ€ํ•ด iterations ์ˆ˜์— ๋”ฐ๋ฅธ best_cumulative_score๋ฅผ ์‹œ๊ฐํ™” ํ•ด๋ด…๋‹ˆ๋‹ค.

scores_df=pd.DataFrame(index=range(n_iter))
scores_df['Grid Search']=gs_results_df['score'].cummin()
scores_df['Random Search']=rs_results_df['score'].cummin()
scores_df['Hyperband']=hb_results_df['score'].cummin()
scores_df['Bayesian optimization ']=bayes_results_df['score'].cummin()
scores_df['TPE']=tpe_results_df['score'].cummin()


ax = scores_df.plot()

ax.set_xlabel("number_of_iterations")
ax.set_ylabel("best_cumulative_score")
Text(0, 0.5, 'best_cumulative_score')

png

  • Random Search๋Š” ๋‹จ์ˆœํ•˜๋ฉด์„œ, ์‹œ๊ฐ„์˜ ๋น„์šฉ์— ๋”ฐ๋ฅธ ์Šค์ฝ”์–ด๊ฐ€ ๋†’์€ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

  • TPE ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์‹ค์ œ๋กœ ์ดํ›„ ๋‹จ๊ณ„์—์„œ๋„ ์‹œ๊ฐ„์ด ์ง€๋‚จ์— ๋”ฐ๋ผ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ์ง€์†์ ์œผ๋กœ ๊ฐœ์„ ํ•˜๋Š” ๋ฐ˜๋ฉด, Random search๋Š” ์ฒ˜์Œ์— ์ƒ๋‹นํžˆ ์ข‹์€ ์†”๋ฃจ์…˜์„ ๋ฌด์ž‘์œ„๋กœ ์ฐพ์€ ๋‹ค์Œ ๊ฒฐ๊ณผ๋ฅผ ์•ฝ๊ฐ„๋งŒ ๊ฐœ์„ ํ–ˆ์Šต๋‹ˆ๋‹ค.

  • TPE์™€ RandomizedSearch ๊ฒฐ๊ณผ์˜ ํ˜„์žฌ ์ฐจ์ด๋Š” ๋งค์šฐ ์ž‘์ง€๋งŒ, ๋” ๋‹ค์–‘ํ•œ ๋ฒ”์œ„์˜ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ผ๋ถ€ ์‹ค์ œ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ hyperopt๋Š” ์ƒ๋‹นํ•œ ์‹œ๊ฐ„ ๋Œ€๋น„ ์ ์ˆ˜ ํ–ฅ์ƒ์„ ์ œ๊ณต ํ•  ์ˆ˜ ์žˆ์œผ๋ฆฌ๋ผ ๋ด…๋‹ˆ๋‹ค.

  • ์ฐธ๊ณ  : ์‹ค์ œ ์ƒํ™œ์—์„œ๋Š” ๋น„๊ต๋ฅผ ์œ„ํ•ด ์—ฌ๋Ÿฌ ๋ฒˆ ๋ฐ˜๋ณตํ•˜์ง€ ์•Š๊ณ  ์‹œ๊ฐ„์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ๋” ์ •ํ™•ํ•˜์ง€๋งŒ ์žฅ๋‚œ๊ฐ ์˜ˆ์ œ์—์„œ๋Š” tpe ๋ฐ ์–ด๋‹๋ง์˜ ์ถ”๊ฐ€ ๊ณ„์‚ฐ์— ์†Œ์š”๋˜๋Š” ์‹œ๊ฐ„์˜ ๋น„์œจ์ด cross_val_score ๊ณ„์‚ฐ ์‹œ๊ฐ„์— ๋น„ํ•ด ๋†’์œผ๋ฏ€๋กœ ๋ฐ˜๋ณต ํšŸ์ˆ˜์™€ ๊ด€๋ จํ•˜์—ฌ ํ•˜์ดํผ ์˜ตํŠธ ๋ฐ ํ”Œ๋กฏ ์ ์ˆ˜์˜ ๊ณ„์‚ฐ ์†๋„์— ๋Œ€ํ•ด ์˜คํ•ดํ•˜์ง€ ์•Š๊ธฐ๋กœ ๊ฒฐ์ •ํ–ˆ์Šต๋‹ˆ๋‹ค.

์‹ค์ œ Evaluate ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•ด ๊ฒฐ๊ณผ๋ฅผ ๋น„๊ตํ•˜๊ณ  ๊ต์ฐจ ๊ฒ€์ฆ ๊ฒฐ๊ณผ์™€ inline ํ™•์ธํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

print('Test MSE scored:')
print("Grid Search : {:.3f}".format(gs_test_score))
print("Random Search :  {:.3f}".format(rs_test_score))
print("Hyperband : {:.3f}".format(hb_test_score))
print("Bayesian optimization : {:.3f}".format(bayes_test_score))
print("TPE : {:.3f}".format(tpe_test_score))
Test MSE scored:
Grid Search : 3045.329
Random Search :  2877.117
Hyperband : 2852.900
Bayesian optimization : 2710.621
TPE : 2942.574

Test data์˜ evaluation์—์„œ๋Š” Bayesian optimization ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์˜ ๋ชจ๋ธ MSE ์ ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋‚ฎ์€๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ์Šต๋‹ˆ๋‹ค. (์‹คํ—˜์šฉ Toy dataset์œผ๋กœ ์‹คํ–‰์— ๋”ฐ๋ผ ๊ฒฐ๊ณผ์ž„์„ ์ฐธ๊ณ )

  • Accuinsight+์˜ modeler AutoDL์— ์ ์šฉํ•œ ๋‹ค์–‘ํ•œ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ์ตœ์ ํ™” ์ ‘๊ทผ ๋ฐฉ์‹์— ๋Œ€ํ•ด ์•Œ์•„๋ดค์Šต๋‹ˆ๋‹ค.
  • ์ตœ์‹  hyperopt์˜ TPE์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์‚ฌ์šฉ๋ฐฉ๋ฒ•์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.
  • ์‹ค์ œ ๋ชจ๋ธ๋ง ํ™˜๊ฒฝ์—์„œ๋Š”, ์‹ค์ œ๋กœ ์–ด๋–ค ์ ‘๊ทผ ๋ฐฉ์‹์ด ๊ฐ€์žฅ ์ข‹์€์ง€ ๋ฏธ๋ฆฌ ์•Œ ์ˆ˜ ์—†์œผ๋ฉฐ, ๋•Œ๋กœ๋Š” ๊ฐ„๋‹จํ•œ RandomizedSearch๊ฐ€ ์ข‹์€ ์„ ํƒ์ด ๋  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ํ•ญ์ƒ ์•Œ์•„๋‘๋ฉด ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ์ด ํŠœํ† ๋ฆฌ์–ผ์ด ํ–ฅํ›„ ML, DL ํ”„๋กœ์ ํŠธ์—์„œ ๋งŽ์€ ์‹œ๊ฐ„์„ ์ ˆ์•ฝํ•˜๊ธฐ๋ฅผ ๋ฐ”๋ž๋‹ˆ๋‹ค.