[torchbench] Regression: `detectron2_maskrcnn` training fails on non-dynamo. #6353

ysiraichi · 2024-01-22T21:26:14Z

🐛 Bug

python xla/benchmarks/experiment_runner.py --suite-name torchbench --accelerator cuda --xla PJRT --dynamo None --test train --repeat 2 -k detectron2_maskrcnn

2024-01-19 02:28:43.908250: F ./torch_xla/csrc/runtime/debug_macros.h:20] Non-OK-status: status.status() status: INVALID_ARGUMENT: Slice size at index 0 in gather op is out of range, must be within [0, 1), got 1.
*** Begin stack trace ***
        tsl::CurrentStackTrace[abi:cxx11]()
        xla::Shape const* ConsumeValue<xla::Shape const*>(absl::lts_20230802::StatusOr<xla::Shape const*>&&)
        torch_xla::ShapeHelper::ShapeOfXlaOp(xla::XlaOp)
        torch_xla::InferOutputShape(absl::lts_20230802::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_20230802::Span<xla::XlaOp const>)> const&)


        torch_xla::XlaNode::GetOpShape(std::function<xla::Shape ()> const&) const
        torch_xla::XlaNode::XlaNode(torch::lazy::OpKind, c10::ArrayRef<torch::lazy::Value>, std::function<xla::Shape ()> const&, unsigned long, torch::lazy::hash_t)
        torch_xla::IndexGet::IndexGet(torch::lazy::Value const&, torch::lazy::Value const&, long)
        torch_xla::IndexByTensors(c10::intrusive_ptr<torch_xla::XLATensor, c10::detail::intrusive_target_default_null_type<torch_xla::XLATensor> > const&, absl::lts_20230802::Span<c10::intrusive_ptr<torch_xla::XLATensor, c10::detail::intrusive_target_default_null_type<torch_xla::
XLATensor> > const>, long)
        torch_xla::tensor_methods::index(c10::intrusive_ptr<torch_xla::XLATensor, c10::detail::intrusive_target_default_null_type<torch_xla::XLATensor> > const&, absl::lts_20230802::Span<c10::intrusive_ptr<torch_xla::XLATensor, c10::detail::intrusive_target_default_null_type<torc
h_xla::XLATensor> > const>, long)
        torch_xla::XLANativeFunctions::index(at::Tensor const&, c10::List<std::optional<at::Tensor> > const&)


        at::_ops::index_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::List<std::optional<at::Tensor> > const&)


        at::_ops::index_Tensor::call(at::Tensor const&, c10::List<std::optional<at::Tensor> > const&)
        torch::autograd::THPVariable_getitem(_object*, _object*)
        _PyEval_EvalFrameDefault
        ...
*** End stack trace ***

Environment

Reproducible on XLA backend [CPU/TPU]: CUDA
torch_xla version: a8b27eb

cc @miladm @JackCaoG

The text was updated successfully, but these errors were encountered:

ysiraichi added the xla:gpu label Jan 22, 2024

ysiraichi mentioned this issue Jan 29, 2024

Failing Torchbench Models: tracking issue #5932

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[torchbench] Regression: `detectron2_maskrcnn` training fails on non-dynamo. #6353

[torchbench] Regression: `detectron2_maskrcnn` training fails on non-dynamo. #6353

ysiraichi commented Jan 22, 2024

[torchbench] Regression: detectron2_maskrcnn training fails on non-dynamo. #6353

[torchbench] Regression: detectron2_maskrcnn training fails on non-dynamo. #6353

Comments

ysiraichi commented Jan 22, 2024

🐛 Bug

Environment

[torchbench] Regression: `detectron2_maskrcnn` training fails on non-dynamo. #6353

[torchbench] Regression: `detectron2_maskrcnn` training fails on non-dynamo. #6353