Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[torchbench] timm_efficientdet training failing on non-dynamo. #7083

Closed
ysiraichi opened this issue May 20, 2024 · 0 comments · Fixed by #7091
Closed

[torchbench] timm_efficientdet training failing on non-dynamo. #7083

ysiraichi opened this issue May 20, 2024 · 0 comments · Fixed by #7091
Labels

Comments

@ysiraichi
Copy link
Collaborator

ysiraichi commented May 20, 2024

After #7067, timm_efficientdet started failing with the following error:

python xla/benchmarks/experiment_runner.py \
    --suite-name torchbench --accelerator cuda --repeat 8 --iterations-per-run 1 \
    --xla PJRT --dynamo None --test train \
    --filter timm_efficientdet
Traceback (most recent call last):
  File "xla/benchmarks/experiment_runner.py", line 945, in <module>
    main()
  File "xla/benchmarks/experiment_runner.py", line 941, in main
    runner.run()
  File "xla/benchmarks/experiment_runner.py", line 61, in run
    self.run_single_config()
  File "xla/benchmarks/experiment_runner.py", line 256, in run_single_config
    metrics, last_output = self.run_once_and_gather_metrics(
  File "xla/benchmarks/experiment_runner.py", line 345, in run_once_and_gather_metrics
    output, _ = loop(iter_fn=self._default_iter_fn)
  File "xla/benchmarks/experiment_runner.py", line 302, in loop
    output, timing, trace = iter_fn(benchmark_experiment, benchmark_model,
  File "xla/benchmarks/experiment_runner.py", line 218, in _default_iter_fn
    output = benchmark_model.model_iter_fn(
  File "xla/benchmarks/torchbench_model.py", line 411, in train
    super().train(inputs, collect_full_output=collect_full_output)
  File "xla/benchmarks/benchmark_model.py", line 160, in train
    loss.backward()
  File "torch/_tensor.py", line 523, in backward
    torch.autograd.backward(
  File "torch/autograd/__init__.py", line 267, in backward
    _engine_run_backward(
  File "torch/autograd/graph.py", line 767, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Bad StatusOr access: INTERNAL: during context [Unknown]: Seen floating point types of different precisions in %concatenate.7662 = f32[1,88,10,10,2]{4,3,2,1,0} concatenate(f16[1,88,10,10,1]{4,3,2,1,0} %reshape.7660, f32[1,88,10,10,1]{4,3,2,1,0} %reshape.7661), dimensions={4}, but mixed precision is disallowed.

Environment

  • Reproducible on XLA backend [CPU/TPU]: CUDA
  • torch_xla version: 62c3ba6

cc @miladm @JackCaoG @vanbasten23 @zpcore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant