Apex AMP performance bad on GTX 1650, good on RTX 2080 and Volta w/ Tensor Cores, is this normal ?? #806

CDahmsCellarEye · 2020-04-24T02:14:57Z

I've found an odd result when using Apex AMP (Automatic Mixed Precision). My boss and I have ran the same program performing computer vision with PyTorch. My boss trained a PyTorch graph with and without Apex AMP. Other than using or not using Apex AMP, the training process was the same, i.e. same image set, same parameters, etc. The with Apex AMP graph was trained using the O1 opt level. I'm 100% certain we're using the same program and graph, and we've compared our settings and the command prompt messages pertaining to Apex configuration shown on start-up and everything seems to be the same.

My boss tested on a desktop with an RTX 2080 and also on a Jetson Xavier (512-core Volta GPU with Tensor Cores). He found a significant speed improvement in both cases when switching to the Apex AMP enabled graph.

I tested on a GTX 1650 and found the opposite, i.e. the Apex AMP enabled graph ran substantially slower than the non-Apex AMP enabled graph.

Upon some searching I found these posts:

#297
#325

Issue 297 in particular seems to imply that Apex AMP is expected to not work well with GTX 1xxx series GPUs.

Can anybody confirm if this is the expected result? Are other people finding the same result? Is there a setting I can change before or during the Apex install to better work with GTX 1xxx series hardware? Or is there a setting when training the graph that can or should be changed?

Something else I should mention is that when I installed Apex I received various warnings, ending with Given no hashes to check 137 links for project 'pip': discarding no candidates, similar to as described in #690. Many other people have reported a similar message in issue 690. Is it possible this may have something to do with Apex not working well on the machine I'm using?

The text was updated successfully, but these errors were encountered:

aabzaliev · 2020-04-26T18:31:10Z

I tested on V100 with the same warnings you mentioned during installation. Can confirm it's getting slower than without apm

sk0g · 2020-05-05T22:15:32Z

Tensor Cores were introduced in Volta, weren't they? So without hardware support for mixed precision training in the cards preceding that, you'd just be adding overhead to the training process (casting, scaling the loss and backwards steps).

Would be handy if the library detected the availability of tensor cores, and operated in a pass through mode otherwise.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apex AMP performance bad on GTX 1650, good on RTX 2080 and Volta w/ Tensor Cores, is this normal ?? #806

Apex AMP performance bad on GTX 1650, good on RTX 2080 and Volta w/ Tensor Cores, is this normal ?? #806

CDahmsCellarEye commented Apr 24, 2020

aabzaliev commented Apr 26, 2020

sk0g commented May 5, 2020

Apex AMP performance bad on GTX 1650, good on RTX 2080 and Volta w/ Tensor Cores, is this normal ?? #806

Apex AMP performance bad on GTX 1650, good on RTX 2080 and Volta w/ Tensor Cores, is this normal ?? #806

Comments

CDahmsCellarEye commented Apr 24, 2020

aabzaliev commented Apr 26, 2020

sk0g commented May 5, 2020