-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[Sparse-Gluon] embedding with sparse grad #10924
Conversation
arg_arrays = {} | ||
contains_sparse = False | ||
for param in self._params: | ||
arg_arrays[param.name] = param.data(self._contexts[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you try to do step on parameter with deferred initialization? what message did you get?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I tried the language model example. param.data() was called in the previous implementation (line 113)
if param._grad_stype != 'default': | ||
contains_sparse = True | ||
# update_on_kvstore is set to False by the user | ||
if self._update_on_kvstore is False: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not self._update_on_kvstore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self._update_on_kvstore is None, I don't need to throw the err
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
python/mxnet/gluon/parameter.py
Outdated
@@ -114,6 +116,11 @@ def __init__(self, name, grad_req='write', shape=None, dtype=mx_real_t, | |||
self.wd_mult = wd_mult | |||
self.grad_req = grad_req | |||
self.init = init | |||
grad_stype = 'default' if grad_stype is None else grad_stype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use 'default' as argument default value
python/mxnet/gluon/trainer.py
Outdated
contains_sparse = True | ||
# update_on_kvstore is set to False by the user | ||
if self._update_on_kvstore is False: | ||
raise RuntimeError("Cannot set update_on_kvstore when sparse gradients " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include parameter name in message
@@ -390,13 +390,14 @@ class Embedding(HybridBlock): | |||
- **out**: N-D tensor with shape: `(x1, x2, ..., xN-1, output_dim)`. | |||
""" | |||
def __init__(self, input_dim, output_dim, dtype='float32', | |||
weight_initializer=None, **kwargs): | |||
weight_initializer=None, sparse_grad=False, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
documentation for argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
* draft * updat test * fix kvstore * fix lint * fix test * add proper error msg * CR comment
* draft * updat test * fix kvstore * fix lint * fix test * add proper error msg * CR comment
* draft * updat test * fix kvstore * fix lint * fix test * add proper error msg * CR comment
Description
Add sparse_grad option to hybrid embedding block. This block doesn't have optimizations for distributed training.
Manually checked word_language_model example with sparse_grad=True (without grad clipping). The result is exactly the same as the original implementation.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments