ahassoun's picture
Upload 3018 files
ee6e328
|
raw
history blame
6.13 kB

Trainer API๋ฅผ ์‚ฌ์šฉํ•œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํƒ์ƒ‰ [[hyperparameter-search-using-trainer-api]]

๐Ÿค— Transformers์—์„œ๋Š” ๐Ÿค— Transformers ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๋Š”๋ฐ ์ตœ์ ํ™”๋œ [Trainer] ํด๋ž˜์Šค๋ฅผ ์ œ๊ณตํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์‚ฌ์šฉ์ž๋Š” ์ง์ ‘ ํ›ˆ๋ จ ๋ฃจํ”„๋ฅผ ์ž‘์„ฑํ•  ํ•„์š” ์—†์ด ๋”์šฑ ๊ฐ„ํŽธํ•˜๊ฒŒ ํ•™์Šต์„ ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, [Trainer]๋Š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํƒ์ƒ‰์„ ์œ„ํ•œ API๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฌธ์„œ์—์„œ ์ด API๋ฅผ ํ™œ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์˜ˆ์‹œ์™€ ํ•จ๊ป˜ ๋ณด์—ฌ๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.

ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํƒ์ƒ‰ ๋ฐฑ์—”๋“œ [[hyperparameter-search-backend]]

[Trainer]๋Š” ํ˜„์žฌ ์•„๋ž˜ 4๊ฐ€์ง€ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํƒ์ƒ‰ ๋ฐฑ์—”๋“œ๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค: optuna์™€ sigopt, raytune, wandb ์ž…๋‹ˆ๋‹ค.

ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํƒ์ƒ‰ ๋ฐฑ์—”๋“œ๋กœ ์‚ฌ์šฉํ•˜๊ธฐ ์ „์— ์•„๋ž˜์˜ ๋ช…๋ น์–ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋“ค์„ ์„ค์น˜ํ•˜์„ธ์š”.

pip install optuna/sigopt/wandb/ray[tune] 

์˜ˆ์ œ์—์„œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํƒ์ƒ‰์„ ํ™œ์„ฑํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ• [[how-to-enable-hyperparameter-search-in-example]]

ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํƒ์ƒ‰ ๊ณต๊ฐ„์„ ์ •์˜ํ•˜์„ธ์š”. ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํƒ์ƒ‰ ๋ฐฑ์—”๋“œ๋งˆ๋‹ค ์„œ๋กœ ๋‹ค๋ฅธ ํ˜•์‹์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

sigopt์˜ ๊ฒฝ์šฐ, ํ•ด๋‹น object_parameter ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์—ฌ ์•„๋ž˜์™€ ๊ฐ™์ด ์ž‘์„ฑํ•˜์„ธ์š”:

>>> def sigopt_hp_space(trial):
...     return [
...         {"bounds": {"min": 1e-6, "max": 1e-4}, "name": "learning_rate", "type": "double"},
...         {
...             "categorical_values": ["16", "32", "64", "128"],
...             "name": "per_device_train_batch_size",
...             "type": "categorical",
...         },
...     ]

optuna์˜ ๊ฒฝ์šฐ, ํ•ด๋‹น object_parameter ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์—ฌ ์•„๋ž˜์™€ ๊ฐ™์ด ์ž‘์„ฑํ•˜์„ธ์š”:

>>> def optuna_hp_space(trial):
...     return {
...         "learning_rate": trial.suggest_float("learning_rate", 1e-6, 1e-4, log=True),
...         "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [16, 32, 64, 128]),
...     }

raytune์˜ ๊ฒฝ์šฐ, ํ•ด๋‹น object_parameter ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์—ฌ ์•„๋ž˜์™€ ๊ฐ™์ด ์ž‘์„ฑํ•˜์„ธ์š”:

>>> def ray_hp_space(trial):
...     return {
...         "learning_rate": tune.loguniform(1e-6, 1e-4),
...         "per_device_train_batch_size": tune.choice([16, 32, 64, 128]),
...     }

wandb์˜ ๊ฒฝ์šฐ, ํ•ด๋‹น object_parameter ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์—ฌ ์•„๋ž˜์™€ ๊ฐ™์ด ์ž‘์„ฑํ•˜์„ธ์š”:

>>> def wandb_hp_space(trial):
...     return {
...         "method": "random",
...         "metric": {"name": "objective", "goal": "minimize"},
...         "parameters": {
...             "learning_rate": {"distribution": "uniform", "min": 1e-6, "max": 1e-4},
...             "per_device_train_batch_size": {"values": [16, 32, 64, 128]},
...         },
...     }

model_init ํ•จ์ˆ˜๋ฅผ ์ •์˜ํ•˜๊ณ  ์ด๋ฅผ [Trainer]์— ์ „๋‹ฌํ•˜์„ธ์š”. ์•„๋ž˜๋Š” ๊ทธ ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค.

>>> def model_init(trial):
...     return AutoModelForSequenceClassification.from_pretrained(
...         model_args.model_name_or_path,
...         from_tf=bool(".ckpt" in model_args.model_name_or_path),
...         config=config,
...         cache_dir=model_args.cache_dir,
...         revision=model_args.model_revision,
...         use_auth_token=True if model_args.use_auth_token else None,
...     )

์•„๋ž˜์™€ ๊ฐ™์ด model_init ํ•จ์ˆ˜, ํ›ˆ๋ จ ์ธ์ˆ˜, ํ›ˆ๋ จ ๋ฐ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์…‹, ๊ทธ๋ฆฌ๊ณ  ํ‰๊ฐ€ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ [Trainer]๋ฅผ ์ƒ์„ฑํ•˜์„ธ์š”:

>>> trainer = Trainer(
...     model=None,
...     args=training_args,
...     train_dataset=small_train_dataset,
...     eval_dataset=small_eval_dataset,
...     compute_metrics=compute_metrics,
...     tokenizer=tokenizer,
...     model_init=model_init,
...     data_collator=data_collator,
... )

ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํƒ์ƒ‰์„ ํ˜ธ์ถœํ•˜๊ณ , ์ตœ์ ์˜ ์‹œํ—˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๊ฐ€์ ธ์˜ค์„ธ์š”. ๋ฐฑ์—”๋“œ๋Š” "optuna"/"sigopt"/"wandb"/"ray" ์ค‘์—์„œ ์„ ํƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐฉํ–ฅ์€ "minimize" ๋˜๋Š” "maximize" ์ค‘ ์„ ํƒํ•˜๋ฉฐ, ๋ชฉํ‘œ๋ฅผ ์ตœ์†Œํ™”ํ•  ๊ฒƒ์ธ์ง€ ์ตœ๋Œ€ํ™”ํ•  ๊ฒƒ์ธ์ง€๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.

์ž์‹ ๋งŒ์˜ compute_objective ํ•จ์ˆ˜๋ฅผ ์ •์˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋งŒ์•ฝ ์ด ํ•จ์ˆ˜๋ฅผ ์ •์˜ํ•˜์ง€ ์•Š์œผ๋ฉด, ๊ธฐ๋ณธ compute_objective๊ฐ€ ํ˜ธ์ถœ๋˜๊ณ , f1๊ณผ ๊ฐ™์€ ํ‰๊ฐ€ ์ง€ํ‘œ์˜ ํ•ฉ์ด ๋ชฉํ‘ฏ๊ฐ’์œผ๋กœ ๋ฐ˜ํ™˜๋ฉ๋‹ˆ๋‹ค.

>>> best_trial = trainer.hyperparameter_search(
...     direction="maximize",
...     backend="optuna",
...     hp_space=optuna_hp_space,
...     n_trials=20,
...     compute_objective=compute_objective,
... )

DDP ๋ฏธ์„ธ ์กฐ์ •์„ ์œ„ํ•œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํƒ์ƒ‰ [[hyperparameter-search-for-ddp-finetune]]

ํ˜„์žฌ, DDP(Distributed Data Parallelism; ๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ)๋ฅผ ์œ„ํ•œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํƒ์ƒ‰์€ optuna์™€ sigopt์—์„œ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ์ตœ์ƒ์œ„ ํ”„๋กœ์„ธ์Šค๊ฐ€ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํƒ์ƒ‰ ๊ณผ์ •์„ ์‹œ์ž‘ํ•˜๊ณ  ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋‹ค๋ฅธ ํ”„๋กœ์„ธ์Šค์— ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.