-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[torchbench] dlrm
fails to run on training.
#6008
Comments
I thought that we could fallback I'm thinking that there's no way out of this error, unless we actually support XLA sparse tensors. That's because |
hmm Do you know what does |
Can we be a bit cheeky and lower EmbeddingBag with sparse=true into sparse=false? |
Good question. I tried calling Traceback (most recent call last):
File "xla/benchmarks/experiment_runner.py", line 963, in <module>
main()
File "xla/benchmarks/experiment_runner.py", line 959, in main
runner.run()
File "xla/benchmarks/experiment_runner.py", line 61, in run
self.run_single_config()
File "xla/benchmarks/experiment_runner.py", line 257, in run_single_config
metrics, last_output = self.run_once_and_gather_metrics(
File "xla/benchmarks/experiment_runner.py", line 352, in run_once_and_gather_metrics
output, _ = loop(iter_fn=self._default_iter_fn)
File "xla/benchmarks/experiment_runner.py", line 309, in loop
output, timing, trace = iter_fn(benchmark_experiment, benchmark_model,
File "xla/benchmarks/experiment_runner.py", line 219, in _default_iter_fn
output = benchmark_model.model_iter_fn(
File "xla/benchmarks/torchbench_model.py", line 413, in train
super().train(inputs, collect_full_output=collect_full_output)
File "xla/benchmarks/benchmark_model.py", line 183, in train
loss.backward()
File "torch/_tensor.py", line 523, in backward
torch.autograd.backward(
File "torch/autograd/__init__.py", line 284, in backward
_engine_run_backward(
File "torch/autograd/graph.py", line 767, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: layout_or_default(layout_opt) == Layout::Strided INTERNAL ASSERT FAILED at "../aten/src/ATen/EmptyTensor.cpp":448, please report a bug to PyTorch. I checked with So maybe, the problem is in
I guess so. But, then, it would be hard to compare the performance results, since we are running different programs. |
Yeah, it'd be difficult to compare results, but at least things would run :D |
We could also just error out with a clean error asking the user to set |
🐛 Bug
Running the upstreamed benchmarking scripts with the following command results in an unexpected error.
Environment
The text was updated successfully, but these errors were encountered: