Port Ray example to Ignite #1735

vfdev-5 · 2021-03-03T14:17:38Z

https://docs.ray.io/en/master/tune/examples/tune-pytorch-cifar.html

Devanshu24 · 2021-03-06T15:07:04Z

Hi!
I do not have much experience in distributed algorithms but I really like them and am learning them. I think it'll be really great if I could work on this as it'll provide some great exposure to ML and Distributed workflows(both of which I really like :D). However, I am not sure I'll be able to work on it at a very fast pace, so if it's urgent(or not doable by beginners) then someone else can please take it up, if not I'd love to work on it :)

sdesrozis · 2021-03-06T15:46:39Z

@vfdev-5 Your idea is to use ray.tune as in the doc you mentioned ? I mean the experiment tool ?

@Devanshu24 if so, the baseline of this should be our cifar distributed training use case. Please see https://github.com/pytorch/ignite/tree/master/examples/contrib/cifar10

If you are motivated to learn about distributed training, why not have a look to the link above ? Before going further, it would be important to be comfortable with this. What do you think ?

Devanshu24 · 2021-03-06T16:01:41Z

Thanks for the reply @sdesrozis !
To confirm if I am getting it correctly, we want to use ray.tune and other distributed utilities provided by ray and see how it performs in comparison to the cifar example already in ignite(https://github.com/pytorch/ignite/tree/master/examples/contrib/cifar10)
Correct?
If so, then sure I completely agree I'll start with going through the ignite example and hopefully make some headway and start with the ray implementation! :D

vfdev-5 · 2021-03-07T16:55:19Z

The idea is to port this example : https://github.com/ray-project/ray/blob/master/python/ray/tune/examples/cifar10_pytorch.py :

use Ignite for training and validation
use ray tune for hyperparam tuning

as a simple script file to examples/contrib/cifar10_ray_tune

Great addition will be a PR to ray docs with the example.

Rajathbharadwaj · 2021-03-08T11:49:09Z

Can't we not to the same way ray is implemented in PL?
Creating callbacks?

vfdev-5 · 2021-03-08T11:56:06Z

Can't we not to the same way ray is implemented in PL?
Creating callbacks?

Please, detail your idea ?

Rajathbharadwaj · 2021-03-08T12:13:33Z

https://docs.ray.io/en/master/tune/tutorials/tune-pytorch-lightning.html#training-with-gpus
Similar to the above
An abstract implementation

from ray.tune.integration.pytorch_ignite import TuneReportCallback

def run(train_batch_size, val_batch_size, epochs, lr, momentum, log_dir):
    train_loader, val_loader = get_data_loaders(train_batch_size, val_batch_size)
    model = Net()
    -----------------------#callback to ray.TuneReportCallback----------------
    trainer = pi.Trainer(
        ....,

        callbacks=[
            TuneReportCallback(
                {
                    "loss": "ptl/val_loss",
                    "mean_accuracy": "ptl/val_accuracy"
                },
                on="validation_end")
        ])
    device = "cpu"

    if torch.cuda.is_available():
        device = "cuda"

    model.to(device)  # Move model before creating optimizer
    optimizer = SGD(model.parameters(), lr=lr, momentum=momentum)
    criterion = nn.CrossEntropyLoss()
    trainer = create_supervised_trainer(model, optimizer, criterion, device=device)
    trainer.logger = setup_logger("Trainer")

    if sys.version_info > (3,):
        from ignite.contrib.metrics.gpu_info import GpuInfo

        try:
            GpuInfo().attach(trainer)
        except RuntimeError:
            print(
                "INFO: By default, in this example it is possible to log GPU information (used memory, utilization). "
                "As there is no pynvml python package installed, GPU information won't be logged. Otherwise, please "
                "install it : `pip install pynvml`"
            )

    metrics = {"accuracy": Accuracy(), "loss": Loss(criterion)}

    train_evaluator = create_supervised_evaluator(model, metrics=metrics, device=device)
    train_evaluator.logger = setup_logger("Train Evaluator")
    validation_evaluator = create_supervised_evaluator(model, metrics=metrics, device=device)
    validation_evaluator.logger = setup_logger("Val Evaluator")

def tune_mnist_asha(num_samples=10, num_epochs=10, gpus_per_trial=0):
    data_dir = os.path.join(tempfile.gettempdir(), "mnist_data_")
   

    config = {
        "layer_1_size": tune.choice([32, 64, 128]),
        "layer_2_size": tune.choice([64, 128, 256]),
        "lr": tune.loguniform(1e-4, 1e-1),
        "batch_size": tune.choice([32, 64, 128]),
    }

    scheduler = ASHAScheduler(
        max_t=num_epochs,
        grace_period=1,
        reduction_factor=2)

    reporter = CLIReporter(
        parameter_columns=["layer_1_size", "layer_2_size", "lr", "batch_size"],
        metric_columns=["loss", "mean_accuracy", "training_iteration"])

    analysis = tune.run(
        tune.with_parameters(
            train_mnist_tune,
            data_dir=data_dir,
            num_epochs=num_epochs,
            num_gpus=gpus_per_trial),
        resources_per_trial={
            "cpu": 1,
            "gpu": gpus_per_trial
        },
        metric="loss",
        mode="min",
        config=config,
        num_samples=num_samples,
        scheduler=scheduler,
        progress_reporter=reporter,
        name="tune_mnist_asha")

    print("Best hyperparameters found were: ", analysis.best_config)

    shutil.rmtree(data_dir)

Since most of the heavy-lifting is done by ray we could extrapolate by adding a pytorch_ignite in the ray.tune.intergration namespace module and implementing ignite's particular way of calling is what I was thinking!

vfdev-5 · 2021-03-08T13:22:11Z

@Rajathbharadwaj thanks for the details. Yes, this would be great!

Rajathbharadwaj · 2021-03-08T13:35:59Z

Awesome, I'll work on the integration
Any tips would be awesome!

https://github.com/ray-project/ray/blob/master/python/ray/tune/integration/pytorch_lightning.py

Converting to pytorch Ignite's way of implementing.

vfdev-5 · 2021-03-22T13:31:41Z

@Rajathbharadwaj any updates on this porting ?

Rajathbharadwaj · 2021-03-24T02:32:31Z

Hey @vfdev-5, I got a bit held up. But I'm working on it. Will ping you.

vfdev-5 · 2021-05-15T14:08:37Z

@Rajathbharadwaj still working on this issue ?

gucifer · 2022-02-12T07:04:50Z

Hey @vfdev-5 , if no one else if working on this, can I pick this up?

vfdev-5 · 2022-02-12T09:48:18Z

Hey @vfdev-5 , if no one else if working on this, can I pick this up?

Sure, go ahead. Thanks!

vfdev-5 added enhancement help wanted labels Mar 3, 2021

vfdev-5 mentioned this issue Aug 5, 2021

How to configure multiple processes to use same GPU? #2139

Closed

vfdev-5 added the examples Examples label Sep 9, 2021

vfdev-5 mentioned this issue Sep 9, 2021

Text Classification using Transformers pytorch-ignite/examples#15

Closed

vfdev-5 added the Hacktoberfest label Oct 1, 2021

vfdev-5 added the PyDataGlobal PyData Global 2020 Sprint label Oct 21, 2021

vfdev-5 added integrations and removed Hacktoberfest PyDataGlobal PyData Global 2020 Sprint labels Feb 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port Ray example to Ignite #1735

Port Ray example to Ignite #1735

vfdev-5 commented Mar 3, 2021 •

edited

Loading

Devanshu24 commented Mar 6, 2021

sdesrozis commented Mar 6, 2021 •

edited

Loading

Devanshu24 commented Mar 6, 2021

vfdev-5 commented Mar 7, 2021

Rajathbharadwaj commented Mar 8, 2021

vfdev-5 commented Mar 8, 2021

Rajathbharadwaj commented Mar 8, 2021

vfdev-5 commented Mar 8, 2021

Rajathbharadwaj commented Mar 8, 2021

vfdev-5 commented Mar 22, 2021

Rajathbharadwaj commented Mar 24, 2021

vfdev-5 commented May 15, 2021

gucifer commented Feb 12, 2022

vfdev-5 commented Feb 12, 2022 •

edited

Loading

Port Ray example to Ignite #1735

Port Ray example to Ignite #1735

Comments

vfdev-5 commented Mar 3, 2021 • edited Loading

Devanshu24 commented Mar 6, 2021

sdesrozis commented Mar 6, 2021 • edited Loading

Devanshu24 commented Mar 6, 2021

vfdev-5 commented Mar 7, 2021

Rajathbharadwaj commented Mar 8, 2021

vfdev-5 commented Mar 8, 2021

Rajathbharadwaj commented Mar 8, 2021

vfdev-5 commented Mar 8, 2021

Rajathbharadwaj commented Mar 8, 2021

vfdev-5 commented Mar 22, 2021

Rajathbharadwaj commented Mar 24, 2021

vfdev-5 commented May 15, 2021

gucifer commented Feb 12, 2022

vfdev-5 commented Feb 12, 2022 • edited Loading

vfdev-5 commented Mar 3, 2021 •

edited

Loading

sdesrozis commented Mar 6, 2021 •

edited

Loading

vfdev-5 commented Feb 12, 2022 •

edited

Loading