Fix some more core aten ops #6342

wonjoolee95 · 2024-01-22T06:43:04Z

Fixes #5896, fixes #5867, fixes #5884, fixes #5889

ManfeiBai

LGTM

cota · 2024-01-24T23:09:05Z

I've bisected this commit to a large amount of failures (all torchbench inference on XLA:GPU).

Some example failures:

INFO:__main__:Run with --model-config={"model_name": "BERT_pytorch"} --experiment-config={"accelerator": "cuda", "xla": "PJRT", "xla_flags": null, "dynamo": "openxla", "test": "train"}
ERROR:torchbench_model:Cannot load benchmark model
Traceback (most recent call last):
  File "/home/ecg/nightly_runs/2024-01-24/pytorch/xla/benchmarks/torchbench_model.py", line 288, in default_precision_flag
    benchmark = self.load_benchmark()
  File "/home/ecg/nightly_runs/2024-01-24/pytorch/xla/benchmarks/torchbench_model.py", line 267, in load_benchmark
    return benchmark_cls(
  File "/home/ecg/nightly_runs/2024-01-24/benchmark/torchbenchmark/util/model.py", line 24, in __call__
    obj = type.__call__(cls, *args, **kwargs)
  File "/home/ecg/nightly_runs/2024-01-24/benchmark/torchbenchmark/models/BERT_pytorch/__init__.py", line 148, in __init__
    trainer = BERTTrainer(bert, len(vocab), train_dataloader=train_data_loader, test_dataloader=test_data_loader,
  File "/home/ecg/nightly_runs/2024-01-24/benchmark/torchbenchmark/models/BERT_pytorch/bert_pytorch/trainer/pretrain.py", line 38, in __init__
    self.device = torch.device(device)
TypeError: device() received an invalid combination of arguments - got (bool), but expected one of:
 * (torch.device device)
      didn't match because some of the arguments have invalid types: (!bool!)
 * (str type, int index)
 *

INFO:__main__:Run with --model-config={"model_name": "Background_Matting"} --experiment-config={"accelerator": "cuda", "xla": "PJRT", "xla_flags": null, "dynamo": "openxla", "test": "eval"}
ERROR:torchbench_model:Cannot load benchmark model
Traceback (most recent call last):
  File "/home/ecg/nightly_runs/2024-01-24/pytorch/xla/benchmarks/torchbench_model.py", line 288, in default_precision_flag
    benchmark = self.load_benchmark()
  File "/home/ecg/nightly_runs/2024-01-24/pytorch/xla/benchmarks/torchbench_model.py", line 267, in load_benchmark
    return benchmark_cls(
  File "/home/ecg/nightly_runs/2024-01-24/benchmark/torchbenchmark/util/model.py", line 24, in __call__
    obj = type.__call__(cls, *args, **kwargs)
  File "/home/ecg/nightly_runs/2024-01-24/benchmark/torchbenchmark/models/Background_Matting/__init__.py", line 72, in __init__
    netB.to(self.device)
  File "/home/ecg/nightly_runs/2024-01-24/pytorch/torch/nn/modules/module.py", line 1137, in to
    raise TypeError('nn.Module.to only accepts floating point or complex '
TypeError: nn.Module.to only accepts floating point or complex dtypes, but got desired dtype=torch.bool

Does this ring any bells?

wonjoolee95 · 2024-01-25T00:14:25Z

Thanks for catching this. It's hard to identify the offending op just looking at the trace, but this PR basically only touches two ops -- aten::reciprocal and aten::sigmoid. Let me revert the changes that this PR does for this two ops for now and investigate.

This reverts commit 99a1341.

wonjoolee95 · 2024-01-25T00:39:46Z

Reading the error, it's complaining that it's getting passed a boolean in .device() and .to() methods. Just by a quick look, the errors seem irrelevant to this PR's change but let me continue to investigate.

@cota, is there something that describes the set-up for me to repro that (run torchbench) in GPU?

This reverts commit 4ab7a24. Turns out that the revert was unnecessary; things broke from a different commit. This reverts the revert, i.e. it reinstates pytorch#6342.

cota · 2024-01-26T05:14:21Z

@wonjoolee95 I re-did the bisection paying more attention this time. It turns out that the problem was introduced in a prior commit, not in this PR. My apologies! :(
Things are now working on master, and I have confirmed that reinstating this PR still works.
I've sent #6387 to reapply this change.

wonjoolee95 requested a review from ManfeiBai January 22, 2024 06:43

ManfeiBai approved these changes Jan 22, 2024

View reviewed changes

wonjoolee95 changed the title ~~Fix sore core aten ops~~ Fix some more core aten ops Jan 22, 2024

wonjoolee95 force-pushed the wonjoo/core-aten-ops-week-6 branch from ae77bfa to 34786e8 Compare January 22, 2024 20:23

wonjoolee95 added 7 commits January 23, 2024 08:32

Codegen aten_sigmoid and do manual dtype conversion

49df8ae

Increase fult tolerance for aten_gelu

d2450ec

Run linter

483d314

Enable test_aten_reciprocal test

4910f78

Increase fault tolernce for aten_native_group_norm

c563a34

Run linter again

3865443

Add manual dtype conversion for aten_reciprocal

0478d2f

wonjoolee95 force-pushed the wonjoo/core-aten-ops-week-6 branch from 34786e8 to 0478d2f Compare January 23, 2024 08:32

wonjoolee95 merged commit 99a1341 into master Jan 23, 2024
18 checks passed

wonjoolee95 added a commit that referenced this pull request Jan 25, 2024

Revert "Fix some more core aten ops (#6342)"

38cdd71

This reverts commit 99a1341.

wonjoolee95 added a commit that referenced this pull request Jan 25, 2024

Revert "Fix some more core aten ops (#6342)" (#6377)

4ab7a24

cota mentioned this pull request Jan 26, 2024

Reapply "Fix some more core aten ops (#6342)" (#6377) #6387

Merged

wonjoolee95 pushed a commit that referenced this pull request Jan 26, 2024

Reapply "Fix some more core aten ops (#6342)" (#6377) (#6387)

9e4db96

bhavya01 pushed a commit that referenced this pull request Apr 22, 2024

Fix some more core aten ops (#6342)

56761a3

bhavya01 pushed a commit that referenced this pull request Apr 22, 2024

Revert "Fix some more core aten ops (#6342)" (#6377)

24eb3c1

bhavya01 pushed a commit that referenced this pull request Apr 22, 2024

Reapply "Fix some more core aten ops (#6342)" (#6377) (#6387)

73f7e97

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix some more core aten ops #6342

Fix some more core aten ops #6342

wonjoolee95 commented Jan 22, 2024 •

edited

Loading

ManfeiBai left a comment

cota commented Jan 24, 2024 •

edited

Loading

wonjoolee95 commented Jan 25, 2024

wonjoolee95 commented Jan 25, 2024 •

edited

Loading

cota commented Jan 26, 2024

Fix some more core aten ops #6342

Fix some more core aten ops #6342

Conversation

wonjoolee95 commented Jan 22, 2024 • edited Loading

ManfeiBai left a comment

Choose a reason for hiding this comment

cota commented Jan 24, 2024 • edited Loading

wonjoolee95 commented Jan 25, 2024

wonjoolee95 commented Jan 25, 2024 • edited Loading

cota commented Jan 26, 2024

wonjoolee95 commented Jan 22, 2024 •

edited

Loading

cota commented Jan 24, 2024 •

edited

Loading

wonjoolee95 commented Jan 25, 2024 •

edited

Loading