-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Failed to find any forward convolution algorithm. #11176
Comments
Hi @nswamy can you addd this one with 'CI' label |
This is not CI related |
Is it possible that memory is exhausted on CI? |
Also encountered this error on Windows, CUDA 9.2, cudnn 7.1.4. Ran the following command Reducing the batch size to 16 resolved the issue. |
Currently facing this issue. Reducing batch size does not seem to fix the issue. Trying to train fast neural style transfer Issues seems to arise when trying to do mod.save_params() throws the following error:
|
Update: I've managed to find a rather bizarre workaround to this issue. I was facing this issue when I was trying to do a try:
mod.save_checkpoint(model_save_path, epoch)
except Exception as excep:
print("Exception caught: ", excep)
mod.save_checkpoint(model_save_path, epoch) |
Update: Sleeping for 0.5 seconds before saving the checkpoint also seems to help. time.sleep(0.5)
mod.save_checkpoint(model_save_path, epoch) |
@ThomasDelteil Is this still occurring on CI now? If it's not appearing again would you mind closing this? |
See this test failing: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/915/pipeline/
with
I have encountered this in the wild very rarely too.
The text was updated successfully, but these errors were encountered: