Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[torchbench] opacus_cifar10 training runs unexpectedly without errors. #6391

Open
ysiraichi opened this issue Jan 26, 2024 · 2 comments
Open
Labels

Comments

@ysiraichi
Copy link
Collaborator

🐛 Bug

Unexpectedly, opacus_cifar10 benchmark is running training with our benchmarking script, but it's erroring out using PyTorch's own benchmarking script.

# Command for running opacus_cifar10 training with XLA's benchmarking script
python xla/benchmarks/experiment_runner.py \
    --suite-name torchbench --accelerator cuda --repeat 1 --test train --xla None --dynamo inductor \
    -k opacus_cifar10

# Command for running opacus_cifar10 training with PyTorch's benchmarking script
python benchmarks/dynamo/torchbench.py \
    --device cuda --repeat 1 --inductor --training --performance \
    -k opacus_cifar10
Traceback (most recent call last):
  File "benchmarks/dynamo/common.py", line 2585, in warmup
    fn(model, example_inputs)
  File "torch/_dynamo/eval_frame.py", line 417, in _fn
    return fn(*args, **kwargs)
  File "benchmarks/dynamo/torchbench.py", line 535, in forward_and_backward_pass
    cloned_inputs = clone_inputs(inputs)
  File "benchmarks/dynamo/torchbench.py", line 536, in resume_in_forward_and_backward_pass_at_535
    self.optimizer_zero_grad(mod)
  File "benchmarks/dynamo/torchbench.py", line 538, in resume_in_forward_and_backward_pass_at_536
    pred = mod(*cloned_inputs)
  File "torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/lib/python3.8/site-packages/opacus/grad_sample/grad_sample_module.py", line 148, in forward
    return self._module(*args, **kwargs)
  File "torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/lib/python3.8/site-packages/torchvision-0.17.0a0+a8ebd0b-py3.8-linux-x86_64.egg/torchvision/models/resnet.py", line 285, in forward
    return self._forward_impl(x)
  File "/lib/python3.8/site-packages/torchvision-0.17.0a0+a8ebd0b-py3.8-linux-x86_64.egg/torchvision/models/resnet.py", line 268, in _forward_impl
    x = self.conv1(x)
  File "torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "torch/nn/modules/module.py", line 1574, in _call_impl
    hook_result = hook(self, args, result)
  File "/lib/python3.8/site-packages/opacus/grad_sample/grad_sample_module.py", line 287, in capture_activations_hook
    for _, p in trainable_parameters(module):
  File "/lib/python3.8/site-packages/opacus/grad_sample/grad_sample_module.py", line 288, in resume_in_capture_activations_hook_at_287
    p._forward_counter += 1
AttributeError: 'Parameter' object has no attribute '_forward_counter'

Expected behavior

I would expect our inductor call to match PyTorch's.

Environment

  • Reproducible on XLA backend [CPU/TPU]: CUDA
  • torch_xla version: 9e4db96
  • PyTorch version: 534c73d478fc967cb3a6a29f5eada94bc4ce2c29 (Jan 8)

cc @miladm @JackCaoG

@ysiraichi
Copy link
Collaborator Author

Note: I got this error by skipping eager runs that happened before dynamo, as they were also failing due to the same error.

@ysiraichi
Copy link
Collaborator Author

Not sure if related, but I'm getting a bunch of warnings:

torch/nn/modules/module.py:1352: UserWarning: Using a non-full backward hook when the forward contains 
multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some 
grad_input. Please use register_full_backward_hook to get the documented behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant