[torchbench] `timm_efficientdet` inference fails to run. #6899

ysiraichi · 2024-04-08T14:55:36Z

🐛 Bug

timm_efficientdet inference fails to run with both dynamo and non-dynamo configurations. See the error below:

Traceback (most recent call last):
  File "xla/benchmarks/experiment_runner.py", line 945, in <module>
    main()
  File "xla/benchmarks/experiment_runner.py", line 941, in main
    runner.run()
  File "xla/benchmarks/experiment_runner.py", line 61, in run
    self.run_single_config()
  File "xla/benchmarks/experiment_runner.py", line 256, in run_single_config
    metrics, last_output = self.run_once_and_gather_metrics(
  File "xla/benchmarks/experiment_runner.py", line 345, in run_once_and_gather_metrics
    output, _ = loop(iter_fn=self._default_iter_fn)
  File "xla/benchmarks/experiment_runner.py", line 302, in loop
    output, timing, trace = iter_fn(benchmark_experiment, benchmark_model,
  File "xla/benchmarks/experiment_runner.py", line 218, in _default_iter_fn
    output = benchmark_model.model_iter_fn(
  File "xla/benchmarks/benchmark_model.py", line 170, in eval
    pred = self.module(*inputs)
  File "torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/lib/python3.8/site-packages/effdet/bench.py", line 110, in forward
    return _batch_detection(
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: torch_xla/csrc/tensor_impl.cpp:138 : Check failed: !has_symbolic_sizes_strides_
*** Begin stack trace ***
        tsl::CurrentStackTrace[abi:cxx11]()
        torch_xla::XLATensorImpl::sizes_custom() const
        at::FunctionalTensorWrapper::sizes_custom() const
        c10::TensorType::create(at::Tensor const&)
        torch::jit::tensorTypeInCurrentExecutionContext(at::Tensor const&)









        _PyObject_MakeTpCall

        PyVectorcall_Call

        _PyObject_MakeTpCall
        _PyEval_EvalFrameDefault
        _PyEval_EvalCodeWithName
        _PyFunction_Vectorcall

        PyVectorcall_Call
        _PyEval_EvalFrameDefault
        _PyEval_EvalCodeWithName
        _PyFunction_Vectorcall

        PyVectorcall_Call
        _PyEval_EvalFrameDefault
        _PyEval_EvalCodeWithName
        _PyFunction_Vectorcall
        _PyObject_FastCallDict
        _PyObject_Call_Prepend

        PyObject_Call
        _PyEval_EvalFrameDefault
        _PyEval_EvalCodeWithName
        _PyFunction_Vectorcall

        _PyEval_EvalFrameDefault


        _PyEval_EvalFrameDefault
        _PyEval_EvalCodeWithName
        _PyFunction_Vectorcall
        _PyEval_EvalFrameDefault
        _PyEval_EvalCodeWithName
        _PyFunction_Vectorcall
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyEval_EvalCodeWithName
        PyEval_EvalCodeEx
        PyEval_EvalCode



        PyRun_SimpleFileExFlags
        Py_RunMain
        Py_BytesMain
        __libc_start_main
        _start
*** End stack trace ***
Cannot call sizes_custom() on an XLA tensor with symbolic sizes/strides

Affected Configurations

Inference+Dynamo
Inference+NonDynamo

Environment

Reproducible on XLA backend [CPU/TPU]: CUDA
torch_xla version: 5c48be1

cc @miladm @JackCaoG @vanbasten23 @cota @golechwierowicz @frgossen @zpcore

The text was updated successfully, but these errors were encountered:

ysiraichi · 2024-04-12T20:28:12Z

I've confirmed this is due to #6814. I believe this is a similar issue that we have with data-dependent operations, e.g. nonzero. The reason being that, after the operation, the resulting tensors have dynamic sizes (when XLA_EXPERIMENTAL=nonzero is set):

>>> nms(boxes, scores, threshold).shape
torch.Size([<=5])

>>> a = torch.zeros(10, device=xla_device())
>>> a[::2] = torch.tensor(1, device=xla_device())
>>> a.nonzero().shape
torch.Size([<=10, 1])

In this case, I think we should probably just do the same thing: check if its experimental use is enabled or not, and fallback otherwise.

@JackCaoG What do you think?

ysiraichi added the xla:gpu label Apr 8, 2024

ysiraichi mentioned this issue Apr 8, 2024

Failing Torchbench Models: tracking issue #5932

Open

ysiraichi mentioned this issue Apr 17, 2024

Make nms fallback by default. #6933

Merged

ysiraichi closed this as completed in #6933 Apr 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[torchbench] `timm_efficientdet` inference fails to run. #6899

[torchbench] `timm_efficientdet` inference fails to run. #6899

ysiraichi commented Apr 8, 2024

ysiraichi commented Apr 12, 2024

[torchbench] timm_efficientdet inference fails to run. #6899

[torchbench] timm_efficientdet inference fails to run. #6899

Comments

ysiraichi commented Apr 8, 2024

🐛 Bug

Affected Configurations

Environment

ysiraichi commented Apr 12, 2024

[torchbench] `timm_efficientdet` inference fails to run. #6899

[torchbench] `timm_efficientdet` inference fails to run. #6899