You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've found an odd result when using Apex AMP (Automatic Mixed Precision). My boss and I have ran the same program performing computer vision with PyTorch. My boss trained a PyTorch graph with and without Apex AMP. Other than using or not using Apex AMP, the training process was the same, i.e. same image set, same parameters, etc. The with Apex AMP graph was trained using the O1 opt level. I'm 100% certain we're using the same program and graph, and we've compared our settings and the command prompt messages pertaining to Apex configuration shown on start-up and everything seems to be the same.
My boss tested on a desktop with an RTX 2080 and also on a Jetson Xavier (512-core Volta GPU with Tensor Cores). He found a significant speed improvement in both cases when switching to the Apex AMP enabled graph.
I tested on a GTX 1650 and found the opposite, i.e. the Apex AMP enabled graph ran substantially slower than the non-Apex AMP enabled graph.
Issue 297 in particular seems to imply that Apex AMP is expected to not work well with GTX 1xxx series GPUs.
Can anybody confirm if this is the expected result? Are other people finding the same result? Is there a setting I can change before or during the Apex install to better work with GTX 1xxx series hardware? Or is there a setting when training the graph that can or should be changed?
Something else I should mention is that when I installed Apex I received various warnings, ending with Given no hashes to check 137 links for project 'pip': discarding no candidates, similar to as described in #690. Many other people have reported a similar message in issue 690. Is it possible this may have something to do with Apex not working well on the machine I'm using?
The text was updated successfully, but these errors were encountered:
Tensor Cores were introduced in Volta, weren't they? So without hardware support for mixed precision training in the cards preceding that, you'd just be adding overhead to the training process (casting, scaling the loss and backwards steps).
Would be handy if the library detected the availability of tensor cores, and operated in a pass through mode otherwise.
I've found an odd result when using Apex AMP (Automatic Mixed Precision). My boss and I have ran the same program performing computer vision with PyTorch. My boss trained a PyTorch graph with and without Apex AMP. Other than using or not using Apex AMP, the training process was the same, i.e. same image set, same parameters, etc. The with Apex AMP graph was trained using the O1 opt level. I'm 100% certain we're using the same program and graph, and we've compared our settings and the command prompt messages pertaining to Apex configuration shown on start-up and everything seems to be the same.
My boss tested on a desktop with an RTX 2080 and also on a Jetson Xavier (512-core Volta GPU with Tensor Cores). He found a significant speed improvement in both cases when switching to the Apex AMP enabled graph.
I tested on a GTX 1650 and found the opposite, i.e. the Apex AMP enabled graph ran substantially slower than the non-Apex AMP enabled graph.
Upon some searching I found these posts:
#297
#325
Issue 297 in particular seems to imply that Apex AMP is expected to not work well with GTX 1xxx series GPUs.
Can anybody confirm if this is the expected result? Are other people finding the same result? Is there a setting I can change before or during the Apex install to better work with GTX 1xxx series hardware? Or is there a setting when training the graph that can or should be changed?
Something else I should mention is that when I installed Apex I received various warnings, ending with
Given no hashes to check 137 links for project 'pip': discarding no candidates
, similar to as described in #690. Many other people have reported a similar message in issue 690. Is it possible this may have something to do with Apex not working well on the machine I'm using?The text was updated successfully, but these errors were encountered: