-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[torchbench] Detectron2 benchmarks failing to run. #6336
Comments
This error seems to be from pytorch which is weird.. do we know why bias is fp16? |
I don't think #6296 is wrong and should be reverted, since I believe it to be the best way to compare against inductor: instantiate the module in the original accelerator, and then move to XLA. That said, I can think of 2 solutions:
Particularly, I believe (2) to be better, so I will focus on doing that. |
Not really sure. I still have to investigate. |
But, yes, this seems like PyTorch is doing something weird. |
I've just confirmed that instantiating the model with XLA device solves the error. i.e. changing the line below with xla/benchmarks/torchbench_model.py Line 233 in 423bb0b
|
I thought you want to instantiate the model on CUDA and then move to XLA? |
Yes. I do think that's better. I was just confirming that that was the change that caused these models to break. |
After further investigation, I found out the issue is due to a combination of 2 factors:
This causes the function Why wasn't it failing before?Torchbench already converts the model and example inputs to Possible SolutionsI can think of a few possible solutions:
Of those, I think (1) is the best one. Maybe we should thrown a specific error for the case when using @miladm @JackCaoG @vanbasten23 |
Apparently, after doing (1), I am getting another error: File "/lib/python3.8/site-packages/detectron2/modeling/proposal_generator/proposal_utils.py", line 121, in find_top_rpn_proposals
keep = batched_nms(boxes.tensor, scores_per_img, lvl, nms_thresh)
File "/lib/python3.8/site-packages/detectron2/layers/nms.py", line 20, in batched_nms
return box_ops.batched_nms(boxes.float(), scores, idxs, iou_threshold)
...
File "/lib/python3.8/site-packages/torchvision/torchvision/ops/boxes.py", line 109, in resume_in__batched_nms_vanilla_at_107
curr_keep_indices = nms(boxes[curr_indices], scores[curr_indices], iou_threshold)
File "/lib/python3.8/site-packages/torchvision/torchvision/ops/boxes.py", line 41, in nms
return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
File "torch/_ops.py", line 825, in __call__
return self_._op(*args, **(kwargs or {}))
RuntimeError: dets (Float) should have the same type as scores (Half) In summary:
Given the problem above, I think, for now, we should: use solution (3) in the short-term. |
FYI |
I see. So, maybe a solution is to pass |
That seems to be a reasonable workaround until upstream fixed the |
Apparently, this issue was not due to conversion issues (pytorch/pytorch#115792) as we once thought, but it's a real problem (more details in this comment). |
Here's what I found when looking into this issue ( Registering an implementation for XLA to the dispatcher should solve this problem. That said, I don't think we can leverage current codegen infrastructure, since What I think could be done: register the XLA implementation manually by: TORCH_LIBRARY_IMPL(torchvision, XLA, m) {
m.impl(TORCH_SELECTIVE_NAME("torchvision::nms"), TORCH_FN(xla_nms_kernel));
} Let me know what you think. |
I don't think that XLA nms is well tested, it is better to figure out what that ops does and test it before we register it to be the default implemenation. |
@JackCaoG While the solution in this comment works, I thought it would make more sense to implement a |
what difference does it make to mark it as |
The difference is that it would be decomposable with, hopefully, already supported operations. That said, I'm thinking on the following plan:
I think that it would be better to have the composite kernel because:
|
@JackCaoG Question: how important is it to keep the old behavior?
nms(boxes, scores, score_threshold, iou_threshold, output_size)
nms(boxes, scores, iou_threshold) |
ehh, no very? I guess no one using our nms at the moment. |
So, can we kill it, in favor of the TorchVision variant? |
yea |
🐛 Bug
After #6296, a few detectron2 benchmarks started failing when using XLA:
python xla/benchmarks/experiment_runner.py \ --suite-name torchbench --accelerator cuda --repeat 2 \ --test eval --xla PJRT --dynamo openxla \ -k detectron2_fasterrcnn_r_50_c4
Environment
cc @miladm @JackCaoG
The text was updated successfully, but these errors were encountered: