Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[torchbench] Regression after changing load_benchmark method. #6348

Closed
14 tasks done
ysiraichi opened this issue Jan 22, 2024 · 0 comments · Fixed by #6389
Closed
14 tasks done

[torchbench] Regression after changing load_benchmark method. #6348

ysiraichi opened this issue Jan 22, 2024 · 0 comments · Fixed by #6389
Labels

Comments

@ysiraichi
Copy link
Collaborator

ysiraichi commented Jan 22, 2024

🐛 Bug

Starting on #6296, we began instantiating the model on the accelerator passed on the command line. Then, for XLA executions, we moved the model to the XLA device. While it worked for most of the models, it also broke a few others:

  • detectron2_fasterrcnn_r_101_c4
  • detectron2_fasterrcnn_r_101_dc5
  • detectron2_fasterrcnn_r_101_fpn
  • detectron2_fasterrcnn_r_50_c4
  • detectron2_fasterrcnn_r_50_dc5
  • detectron2_fasterrcnn_r_50_fpn
  • detectron2_fcos_r_50_fpn
  • detectron2_maskrcnn_r_101_c4
  • detectron2_maskrcnn_r_101_fpn
  • detectron2_maskrcnn_r_50_c4
  • detectron2_maskrcnn_r_50_fpn
  • hf_Bart
  • timm_regnet
  • mobilenet_v3_large

These are breaking on both non-dynamo and dynamo+openxla.

Raw Error
Traceback (most recent call last):
  File "xla/benchmarks/experiment_runner.py", line 906, in <module>
    main()
  File "xla/benchmarks/experiment_runner.py", line 902, in main
    runner.run()
  File "xla/benchmarks/experiment_runner.py", line 59, in run
    self.run_single_config()
  File "xla/benchmarks/experiment_runner.py", line 247, in run_single_config
    metrics, last_output = self.run_once_and_gather_metrics(
  File "xla/benchmarks/experiment_runner.py", line 324, in run_once_and_gather_metrics
    output, _ = loop(iter_fn=self._default_iter_fn)
  File "xla/benchmarks/experiment_runner.py", line 293, in loop
    output, timing, trace = iter_fn(benchmark_experiment, benchmark_model,
  File "xla/benchmarks/experiment_runner.py", line 209, in _default_iter_fn
    output = benchmark_model.model_iter_fn(
  File "xla/benchmarks/benchmark_model.py", line 155, in eval
    pred = self.module(*inputs)
  File "torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 150, in forward
    return self.inference(batched_inputs)
  File "/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 208, in inference
    proposals, _ = self.proposal_generator(images, features, None)
  File "torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/lib/python3.8/site-packages/detectron2/modeling/proposal_generator/rpn.py", line 454, in forward
    pred_objectness_logits, pred_anchor_deltas = self.rpn_head(features)
  File "torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/lib/python3.8/site-packages/detectron2/modeling/proposal_generator/rpn.py", line 175, in forward
    pred_objectness_logits.append(self.objectness_logits(t))
  File "torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "torch/nn/modules/conv.py", line 460, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "torch/nn/modules/conv.py", line 456, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (float) and bias type (c10::Half) should be the same

To Reproduce

python xla/benchmarks/experiment_runner.py --no-resume --suite-name torchbench --repeat 2 --accelerator cuda --test eval --xla PJRT --dynamo None -k <benchmark>

Environment

  • Reproducible on XLA backend [CPU/TPU]: CUDA
  • torch_xla version: a8b27eb

Additional Context

Further discussion can be found on #6336

cc @miladm @JackCaoG

@ysiraichi ysiraichi changed the title Torchbench regression changing load_benchmark method. [torchbench] Regression after changing load_benchmark method. Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant