PruningCallback doesn't work #20

himkt · 2020-11-14T03:47:38Z

Apart from #18

himkt · 2020-11-14T03:51:23Z

Thank you so much for diving into allennlp-optuna.

Which storage do you use? (should be one of sqlite3, MySQL, PostgreSQL, Redis)
And could you please share with me a simple reproducible configuration?

vikigenius · 2020-11-14T04:15:21Z

Thanks for creating the issue @himkt I use the default sqlite3 storage

local model_name = "models/distilroberta-base-msmarco-v1/0_Transformer";
local num_gpus = 8;
local data_base_url = "data/mydata/processed/";
local batch_size = std.parseInt(std.extVar('batch_size'));
local lr = std.parseJson(std.extVar('lr'));
local model = "my_model";
local dataset_reader = "my_reader";

{
  "train_data_path": data_base_url + "train.tsv.part*",
  "validation_data_path": data_base_url + "valid.tsv.part*",
  "dataset_reader": {
    "type": "sharded",
    "base_reader": {
      "type": dataset_reader,
      "query_tokenizer": {
        "type": "pretrained_transformer",
        "model_name": model_name,
        "max_length": 500,
      },
      "query_token_indexers": {
        "tokens": {
          "type": "pretrained_transformer",
          "model_name": model_name,
          "namespace": "tokens"
        }
      },
    }
  },
  'model': {
    'type': model,
    'transformer_model': model_name,
  },
  "data_loader": {
    "batch_size": batch_size,
    "shuffle": true
  },
  "distributed": {
    "cuda_devices": if num_gpus > 1 then std.range(0, num_gpus - 1) else 0,
  },
  "trainer": {
    "num_epochs": 10,
    "optimizer": {
      "type": "huggingface_adamw",
      "lr": lr,
      "betas": [0.9, 0.999],
      "eps": 1e-8,
      "correct_bias": true
    },
    "learning_rate_scheduler": {
      "type": "polynomial_decay",
    },
    "use_amp": true,
    "grad_norm": 1.0,
    "validation_metric": "+rec1",
    "epoch_callbacks": [
      {
        "type": "optuna_pruner"
      }
    ]
  }
}

This was the config I was using. You would have to change the models and dataset readers, I can try to reproduce with a simpler example with predefined models etc, but it would take me a while since I won't be using the multi GPU cluster for some time.

himkt · 2020-11-14T04:39:03Z

@vikigenius Thank you for your help.

Let me ask a question: does this configuration work well if it runs on a single GPU? (means that it disables distributed).
The current implementation of AllenNLP integration for a pruning feature may not work with a distributed setting.

If your configuration works on a single GPU, I'll investigate AllenNLP integration in Optuna. But, it may take time because the mechanism for supporting PruningCallback in the integration is relatively complicated (I implemented...) and I don't have a cluster with multi GPUs now.

Sorry for the inconvenience. 🙇

himkt · 2020-11-14T06:02:40Z

Related to optuna/optuna#1990.

himkt · 2021-07-23T06:14:24Z

FYI @vikigenius

I'm working on the entirely refactoring AllenNLP integration in Optuna (optuna/optuna#2796).
After this PR being merged, PruningCallback would work with distributed training.

himkt · 2021-12-09T16:02:17Z

In the Optuna v3.0.0a0, we finally introduced the support for the pruning callback in distributed training.
https://github.com/optuna/optuna/releases/tag/v3.0.0-a0

pip install -U optuna==3.0.0a0

himkt mentioned this issue Nov 14, 2020

Readme/pruning #21

Merged

himkt self-assigned this Nov 14, 2020

himkt added the bug Something isn't working label Nov 14, 2020

himkt mentioned this issue Nov 14, 2020

Support distributed training in AllenNLP integration optuna/optuna#1990

Closed

himkt mentioned this issue Jul 13, 2021

Support AllenNLPPruningCallback in distributed configuration optuna/optuna#2796

Closed

3 tasks

This was referenced Jul 29, 2021

Consultation for collaboration: adding Optuna logic to allennlp.commands.train module allenai/allennlp#5338

Closed

[WIP] Introduce Optuna related logics to allennlp.commands.train for supporting pruning in distributed situation allenai/allennlp#5341

Closed

himkt mentioned this issue Oct 3, 2021

AllenNLP distributed pruning optuna/optuna#2977

Merged

3 tasks

himkt closed this as completed Dec 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PruningCallback doesn't work #20

PruningCallback doesn't work #20

himkt commented Nov 14, 2020

himkt commented Nov 14, 2020

vikigenius commented Nov 14, 2020

himkt commented Nov 14, 2020

himkt commented Nov 14, 2020

himkt commented Jul 23, 2021

himkt commented Dec 9, 2021 •

edited

Loading

PruningCallback doesn't work #20

PruningCallback doesn't work #20

Comments

himkt commented Nov 14, 2020

himkt commented Nov 14, 2020

vikigenius commented Nov 14, 2020

himkt commented Nov 14, 2020

himkt commented Nov 14, 2020

himkt commented Jul 23, 2021

himkt commented Dec 9, 2021 • edited Loading

himkt commented Dec 9, 2021 •

edited

Loading