Embedding gradient performance optimization on GPU #16355

MoisesHer · 2019-10-02T17:30:07Z

Description

This PR includes a specific Embedding-backward operator for GPU.
Two new CUDA kernels have been implemented for improving the performance of the operator when using GPU.
According to our measurements on Volta GPUs, the previous version was taken 2.2ms,
whereas the new implementation takes 0.3ms, i.e. more than 7x speedup.

Checklist

Essentials

[X ] Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests already existed (tests/python/gpu/test_operator_gpu:test_embedding_with_type) and changes do not affect correctness.
Code is well-documented:
For new C++ functions in header files, their functionalities and arguments are documented.

Changes

[x ] Embedding-backward operator for GPU, test: tests/python/gpu/test_operator_gpu:test_embedding_with_type

src/operator/tensor/indexing_op.cu

ptrendx · 2019-10-02T21:22:19Z

@sxjscience FYI

src/operator/tensor/indexing_op.cu

ptrendx

LGTM

sxjscience · 2019-10-05T00:02:07Z

Nice! LGTM. So the BinarySearch version of FindBounds has complexity O(|V| log |N|) where |V| is the vocabulary size and |N| is the number of indices. I guess our initial version (https://github.com/dmlc/mshadow/blob/bc49327a44650c3f2b427e953ff95d2c27566c04/mshadow/cuda/tensor_gpu-inl.cuh#L619-L672) has complexity O(|N|) for finding the boundaries. Thus, in some workloads (in which |N| is small), the O(N) version might be faster.

ptrendx · 2019-10-05T22:59:26Z

Moises did a performance comparison between the new version and both the old one and the old buggy one. The new kernel is faster than the old working version in all cases and ~same in speed as the buggy one. The biggest performance change is seen actually when changing how many different elements are in the input data (as small number of distinct elements limits parallelism in the backward pass).

* Add Embedding backward Op for GPU * Add some code documentation * Use unnamed namespace for integer log2 function * Fix lint issues * Fix one more lint problem * Remove unnecessary conditions ops * Fix one more lint problem

MoisesHer added 2 commits October 1, 2019 19:09

Add Embedding backward Op for GPU

022fc0a

Add some code documentation

080fcf6

ptrendx self-requested a review October 2, 2019 17:38

MoisesHer changed the title ~~Pr embedding gradient~~ Embedding gradient operator on GPU Oct 2, 2019

ptrendx reviewed Oct 2, 2019

View reviewed changes

src/operator/tensor/indexing_op.cu Outdated Show resolved Hide resolved

MoisesHer changed the title ~~Embedding gradient operator on GPU~~ Embedding gradient performance optimization on GPU Oct 2, 2019

Use unnamed namespace for integer log2 function

84a6272

ptrendx mentioned this pull request Oct 2, 2019

[Discussion] 1.6.0 Roadmap #15589

Closed

MoisesHer added 2 commits October 2, 2019 14:40

Fix lint issues

19ef3d9

Fix one more lint problem

7d4acc0

ptrendx reviewed Oct 3, 2019

View reviewed changes

src/operator/tensor/indexing_op.cu Outdated Show resolved Hide resolved

ptrendx reviewed Oct 3, 2019

View reviewed changes

src/operator/tensor/indexing_op.cu Outdated Show resolved Hide resolved

ptrendx reviewed Oct 3, 2019

View reviewed changes

src/operator/tensor/indexing_op.cu Outdated Show resolved Hide resolved

ptrendx reviewed Oct 3, 2019

View reviewed changes

src/operator/tensor/indexing_op.cu Outdated Show resolved Hide resolved

MoisesHer added 2 commits October 3, 2019 10:40

Remove unnecessary conditions ops

38672a8

Fix one more lint problem

80c0542

ptrendx approved these changes Oct 3, 2019

View reviewed changes

ptrendx merged commit 8096421 into apache:master Oct 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embedding gradient performance optimization on GPU #16355

Embedding gradient performance optimization on GPU #16355

MoisesHer commented Oct 2, 2019

ptrendx commented Oct 2, 2019

ptrendx left a comment

sxjscience commented Oct 5, 2019 •

edited

Loading

ptrendx commented Oct 5, 2019

Embedding gradient performance optimization on GPU #16355

Embedding gradient performance optimization on GPU #16355

Conversation

MoisesHer commented Oct 2, 2019

Description

Checklist

Essentials

Changes

ptrendx commented Oct 2, 2019

ptrendx left a comment

Choose a reason for hiding this comment

sxjscience commented Oct 5, 2019 • edited Loading

ptrendx commented Oct 5, 2019

sxjscience commented Oct 5, 2019 •

edited

Loading