-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
fix try to fix test try to fix flaky test try to fix test fix test fix fix
Any reason you are removing int16 support? |
@marcoabreu That's because the ndarray in MXNet does not support int16. So I removed it. |
Find that it still raises a CUDA error. Need more time to fix it.
…________________________________
From: Marco de Abreu <notifications@github.com>
Sent: Tuesday, September 4, 2018 12:32:21 AM
To: apache/incubator-mxnet
Cc: Xingjian SHI; Author
Subject: Re: [apache/incubator-mxnet] [Bugfix] Fix flaky topk (#12446)
Any reason you are removing int16 support?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#12446 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AE8D7hNu43ypS6NpETeV8-VEAGko2_iWks5uXVmVgaJpZM4WX1yr>.
|
@sxjscience ping! Did you get some chance to look into the failure? This might be required for fixing the flaky test I referenced. |
@sxjscience could you please update the issue? Resolving a flaky test - #12358 is blocked because of this. |
I’ll look into this again. Sorry for the delay.
Get Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Kalyanee Chendke <notifications@github.com>
Sent: Friday, September 14, 2018 2:09:41 AM
To: apache/incubator-mxnet
Cc: Xingjian SHI; Mention
Subject: Re: [apache/incubator-mxnet] [WIP][Bugfix] Fix flaky topk (#12446)
@sxjscience<https://github.com/sxjscience> could you please update the issue? Resolving a flaky test - #12358<#12358> is blocked because of this.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#12446 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AE8D7vov4UUroFs0tOLMZaIESQKjtBrMks5uap9lgaJpZM4WX1yr>.
|
@mxnet-label-bot[pr-awaiting-response] |
@@ -455,8 +457,7 @@ void TopKImpl(const RunContext &ctx, | |||
// Cast `ret_indices` from int to real_t could introduce conversion error when the element_num | |||
// is large enough. | |||
if (param.ret_typ == topk_enum::kReturnMask) { | |||
Tensor<xpu, 2, DType> ret_mask = | |||
ret[0].get_with_shape<xpu, 2, DType>(Shape2(ret[0].Size(), 1), s); | |||
Tensor<xpu, 1, DType> ret_mask = ret[0].FlatTo1D<xpu, DType>(s); | |||
ret_mask = scalar<DType>(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it raises a really weird "CUDA Misaligned Memory Error". I currently having no idea what triggers it. Actually it happens when we initialize the ret_mask to all zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anirudh2290 @azai91 @apeforest @samskalicky @eric-haibin-lin - maybe one of you guys can help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sxjscience were you able to resolve the error?
Sorry I have no time to look at this until Oct 3rd. Need to review lots of AAAI papers + Submit my final thesis. I’ll try to solve it after Oct 3rd.
Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Vandana Kannan <notifications@github.com>
Sent: Wednesday, September 26, 2018 1:22:11 AM
To: apache/incubator-mxnet
Cc: Xingjian SHI; Mention
Subject: Re: [apache/incubator-mxnet] [WIP][Bugfix] Fix flaky topk (#12446)
@vandanavk commented on this pull request.
________________________________
In src/operator/tensor/ordering_op-inl.h<#12446 (comment)>:
@@ -455,8 +457,7 @@ void TopKImpl(const RunContext &ctx,
// Cast `ret_indices` from int to real_t could introduce conversion error when the element_num
// is large enough.
if (param.ret_typ == topk_enum::kReturnMask) {
- Tensor<xpu, 2, DType> ret_mask =
- ret[0].get_with_shape<xpu, 2, DType>(Shape2(ret[0].Size(), 1), s);
+ Tensor<xpu, 1, DType> ret_mask = ret[0].FlatTo1D<xpu, DType>(s);
ret_mask = scalar<DType>(0);
@sxjscience<https://github.com/sxjscience> were you able to resolve the error?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#12446 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AE8D7rj7e2PpZWbaBO4RYxend_Yz_hcpks5uemZDgaJpZM4WX1yr>.
|
@sxjscience once you've fixed this, could you submit a PR to 1.3.x as well? Currently the branch build is broken due to the |
Description
This PR fixes the flaky topk test reported in #12358 , #12310. The previous bug is caused by not manually setting the dtype of mx.ndarrays when constructing them from numpy ndarrays (related issue: #12268). After reimplementing the
IndexFill
function, the test passes now.Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments