[torchbench] `timm_efficientdet` training failing on non-dynamo. #7083

ysiraichi · 2024-05-20T20:17:08Z

After #7067, timm_efficientdet started failing with the following error:

python xla/benchmarks/experiment_runner.py \
    --suite-name torchbench --accelerator cuda --repeat 8 --iterations-per-run 1 \
    --xla PJRT --dynamo None --test train \
    --filter timm_efficientdet

Traceback (most recent call last):
  File "xla/benchmarks/experiment_runner.py", line 945, in <module>
    main()
  File "xla/benchmarks/experiment_runner.py", line 941, in main
    runner.run()
  File "xla/benchmarks/experiment_runner.py", line 61, in run
    self.run_single_config()
  File "xla/benchmarks/experiment_runner.py", line 256, in run_single_config
    metrics, last_output = self.run_once_and_gather_metrics(
  File "xla/benchmarks/experiment_runner.py", line 345, in run_once_and_gather_metrics
    output, _ = loop(iter_fn=self._default_iter_fn)
  File "xla/benchmarks/experiment_runner.py", line 302, in loop
    output, timing, trace = iter_fn(benchmark_experiment, benchmark_model,
  File "xla/benchmarks/experiment_runner.py", line 218, in _default_iter_fn
    output = benchmark_model.model_iter_fn(
  File "xla/benchmarks/torchbench_model.py", line 411, in train
    super().train(inputs, collect_full_output=collect_full_output)
  File "xla/benchmarks/benchmark_model.py", line 160, in train
    loss.backward()
  File "torch/_tensor.py", line 523, in backward
    torch.autograd.backward(
  File "torch/autograd/__init__.py", line 267, in backward
    _engine_run_backward(
  File "torch/autograd/graph.py", line 767, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Bad StatusOr access: INTERNAL: during context [Unknown]: Seen floating point types of different precisions in %concatenate.7662 = f32[1,88,10,10,2]{4,3,2,1,0} concatenate(f16[1,88,10,10,1]{4,3,2,1,0} %reshape.7660, f32[1,88,10,10,1]{4,3,2,1,0} %reshape.7661), dimensions={4}, but mixed precision is disallowed.

Environment

Reproducible on XLA backend [CPU/TPU]: CUDA
torch_xla version: 62c3ba6

cc @miladm @JackCaoG @vanbasten23 @zpcore

The text was updated successfully, but these errors were encountered:

ysiraichi added the xla:gpu label May 20, 2024

ysiraichi mentioned this issue May 22, 2024

Add data-type promotion to stack. #7091

Merged

ysiraichi closed this as completed in #7091 May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[torchbench] `timm_efficientdet` training failing on non-dynamo. #7083

[torchbench] `timm_efficientdet` training failing on non-dynamo. #7083

ysiraichi commented May 20, 2024 •

edited

Loading

[torchbench] timm_efficientdet training failing on non-dynamo. #7083

[torchbench] timm_efficientdet training failing on non-dynamo. #7083

Comments

ysiraichi commented May 20, 2024 • edited Loading

Environment

[torchbench] `timm_efficientdet` training failing on non-dynamo. #7083

[torchbench] `timm_efficientdet` training failing on non-dynamo. #7083

ysiraichi commented May 20, 2024 •

edited

Loading