[torchbench] `opacus_cifar10` training runs unexpectedly without errors. #6391

ysiraichi · 2024-01-26T20:24:55Z

🐛 Bug

Unexpectedly, opacus_cifar10 benchmark is running training with our benchmarking script, but it's erroring out using PyTorch's own benchmarking script.

# Command for running opacus_cifar10 training with XLA's benchmarking script
python xla/benchmarks/experiment_runner.py \
    --suite-name torchbench --accelerator cuda --repeat 1 --test train --xla None --dynamo inductor \
    -k opacus_cifar10

# Command for running opacus_cifar10 training with PyTorch's benchmarking script
python benchmarks/dynamo/torchbench.py \
    --device cuda --repeat 1 --inductor --training --performance \
    -k opacus_cifar10

Traceback (most recent call last):
  File "benchmarks/dynamo/common.py", line 2585, in warmup
    fn(model, example_inputs)
  File "torch/_dynamo/eval_frame.py", line 417, in _fn
    return fn(*args, **kwargs)
  File "benchmarks/dynamo/torchbench.py", line 535, in forward_and_backward_pass
    cloned_inputs = clone_inputs(inputs)
  File "benchmarks/dynamo/torchbench.py", line 536, in resume_in_forward_and_backward_pass_at_535
    self.optimizer_zero_grad(mod)
  File "benchmarks/dynamo/torchbench.py", line 538, in resume_in_forward_and_backward_pass_at_536
    pred = mod(*cloned_inputs)
  File "torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/lib/python3.8/site-packages/opacus/grad_sample/grad_sample_module.py", line 148, in forward
    return self._module(*args, **kwargs)
  File "torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/lib/python3.8/site-packages/torchvision-0.17.0a0+a8ebd0b-py3.8-linux-x86_64.egg/torchvision/models/resnet.py", line 285, in forward
    return self._forward_impl(x)
  File "/lib/python3.8/site-packages/torchvision-0.17.0a0+a8ebd0b-py3.8-linux-x86_64.egg/torchvision/models/resnet.py", line 268, in _forward_impl
    x = self.conv1(x)
  File "torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "torch/nn/modules/module.py", line 1574, in _call_impl
    hook_result = hook(self, args, result)
  File "/lib/python3.8/site-packages/opacus/grad_sample/grad_sample_module.py", line 287, in capture_activations_hook
    for _, p in trainable_parameters(module):
  File "/lib/python3.8/site-packages/opacus/grad_sample/grad_sample_module.py", line 288, in resume_in_capture_activations_hook_at_287
    p._forward_counter += 1
AttributeError: 'Parameter' object has no attribute '_forward_counter'

Expected behavior

I would expect our inductor call to match PyTorch's.

Environment

Reproducible on XLA backend [CPU/TPU]: CUDA
torch_xla version: 9e4db96
PyTorch version: 534c73d478fc967cb3a6a29f5eada94bc4ce2c29 (Jan 8)

cc @miladm @JackCaoG

The text was updated successfully, but these errors were encountered:

ysiraichi · 2024-01-26T20:26:24Z

Note: I got this error by skipping eager runs that happened before dynamo, as they were also failing due to the same error.

ysiraichi · 2024-01-26T20:27:20Z

Not sure if related, but I'm getting a bunch of warnings:

torch/nn/modules/module.py:1352: UserWarning: Using a non-full backward hook when the forward contains 
multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some 
grad_input. Please use register_full_backward_hook to get the documented behavior.

ysiraichi added the xla:gpu label Jan 26, 2024

ysiraichi mentioned this issue Jan 29, 2024

Failing Torchbench Models: tracking issue #5932

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[torchbench] `opacus_cifar10` training runs unexpectedly without errors. #6391

[torchbench] `opacus_cifar10` training runs unexpectedly without errors. #6391

ysiraichi commented Jan 26, 2024

ysiraichi commented Jan 26, 2024

ysiraichi commented Jan 26, 2024

[torchbench] opacus_cifar10 training runs unexpectedly without errors. #6391

[torchbench] opacus_cifar10 training runs unexpectedly without errors. #6391

Comments

ysiraichi commented Jan 26, 2024

🐛 Bug

Expected behavior

Environment

ysiraichi commented Jan 26, 2024

ysiraichi commented Jan 26, 2024

[torchbench] `opacus_cifar10` training runs unexpectedly without errors. #6391

[torchbench] `opacus_cifar10` training runs unexpectedly without errors. #6391