Adding inputs_embeds argument and switch to paddle.nn.TransformerEncoder for Electra models #3401

sijunhe · 2022-10-01T09:11:59Z

PR types

Function optimization

PR changes

APIs

Description

Addressing part of #3382

Adding inputs_embeds argument for Electra models to provide more control over how to convert input_ids indices
into the embedding space. This is particularly useful for use cases such as P-Tuning.
Remove TransformerEncoderPro and switch to paddle.nn.TransformerEncoder

CLAassistant · 2022-10-01T09:12:07Z

All committers have signed the CLA.

guoshengCS

LGTM

wj-Mcat

Your PR looks so great, and let's discuss about two small suggestions. Waiting for your comments.

wj-Mcat · 2022-10-11T03:32:18Z

paddlenlp/transformers/electra/modeling.py

+        if input_ids is not None:
+            input_embeddings = self.word_embeddings(input_ids)
+        else:
+            input_embeddings = inputs_embeds


I think this code block can be improved to:

if input_ids is None: inputs_embeds = self.word_embeddings(input_ids)

and in the forward method, rename input_embeddings to inputs_embeds. In this way, the code looks more concise. how do you think about it?

Should the input_embeddings = self.word_embeddings(input_ids) in following original code be removed

if token_type_ids is None: token_type_ids = paddle.zeros_like(input_ids, dtype="int64") input_embeddings = self.word_embeddings(input_ids) position_embeddings = self.position_embeddings(position_ids) token_type_embeddings = self.token_type_embeddings(token_type_ids) embeddings = input_embeddings + position_embeddings + token_type_embeddings

Thanks for the review folks.
@wj-Mcat While I agree that renaming input_embeddings to inputs_embeds makes the code more concise, it also makes it less explicit/readable. Therefore I prefer the way it is now.
@guoshengCS good call. Removed the redundant line of code.

wj-Mcat · 2022-10-11T03:35:20Z

tests/transformers/electra/test_modeling.py

+        inputs_embeds = None
+        if self.use_inputs_embeds:
+            inputs_embeds = floats_tensor(
+                [self.batch_size, self.seq_length, self.embedding_size])
+            # In order to use inputs_embeds, input_ids needs to set to None
+            input_ids = None


if not use_inputs_embeds, it should not prepare the input_ids tensor in prepare_config_and_inputs method.

sijunhe · 2022-10-11T15:54:36Z

Addressed both comments. I think this PR should be ready for merging.

wj-Mcat · 2022-10-17T01:41:09Z

There is another to-do under the TransformerEncoderPro class, you can refer to :

PaddleNLP/paddlenlp/transformers/model_outputs.py

Lines 147 to 152 in f43cfd0

    
           if cache is None and getattr(self, "_use_cache", False): 
        
               cache = [tuple(self.layers[0].gen_cache(src))] * len(self.layers) 
        
           # To be compatible with `TransformerEncoder.forward`, `_use_cache` defualts 
        
           # to True when cache is not None. 
        
           new_caches = [] if cache is not None and getattr(self, "_use_cache", 
        
                                                            True) else None

you set the _use_cache attribute in ElectraModel but not implementing the handler for it. So this block should be changed, how do you think about it? @sijunhe

sijunhe · 2022-10-17T02:59:34Z

There is another to-do under the TransformerEncoderPro class, you can refer to :

PaddleNLP/paddlenlp/transformers/model_outputs.py

Lines 147 to 152 in f43cfd0

if cache is None and getattr(self, "_use_cache", False):

cache = [tuple(self.layers[0].gen_cache(src))] * len(self.layers)

# To be compatible with `TransformerEncoder.forward`, `_use_cache` defualts

# to True when cache is not None.

new_caches = [] if cache is not None and getattr(self, "_use_cache",

True) else None

you set the _use_cache attribute in ElectraModel but not implementing the handler for it. So this block should be changed, how do you think about it? @sijunhe

Good catch!
As this to-do pertains to the use of cache and past_key_values, it is outside the scope of this PR. To incorporate the change you asked, I'd like to change not only the handle, but the unit test as well. I think we should merge this PR first and I'll create a new PR for the implementation.

Regarding the to-do:
I looked at _transformer_encoder_fwd you linked and the TransformerEncoderPro in ELECTRA and seems like they are identical. I should be able to directly use the existing patched paddle.nn.Transformer instead of creating one for electra, right?

wj-Mcat · 2022-10-17T04:14:23Z

There are some thing I want to tell you:

the implementation of paddle.nn.Transformer** don't contian full features we wanted, so here is why TransformerEncoderPro was born.
_transformer_encoder_fwd the TransformerEncoderPro are actually identical and should be refactored into paddlenlp/layers/transformer.py as the TransformerEncoderPro class.

and there are some modules that using paddle.nn.TransformerEncoder、paddle.nn.TransformerEncoderLayer、paddle.nn.TransformerDecoder、paddle.nn.TransformerDecoderLayer, you can change the related module to make it more unified.

I prefer that you do it in this pr. how do you think about it？ @sijunhe @guoshengCS

wj-Mcat · 2022-10-17T05:51:15Z

In order to make this pr merged, you can make some changes in TransformerEncoderPro class under the electra.modeling module. The works of refactoring can be done in next few weeks. @sijunhe

sijunhe · 2022-10-17T16:56:40Z

In order to make this pr merged, you can make some changes in TransformerEncoderPro class under the electra.modeling module. The works of refactoring can be done in next few weeks. @sijunhe

I noticed that before #3411, TransformerEncoderPro is basically paddle.nn.TransformerEncoder without the cache input and output. By adding the cache functionalities in #3411 and #3401, TransformerEncoderPro would be identical to paddle.nn.TransformerEncoder. Hence I think we can just use paddle.nn.TransformerEncoder in electra

wj-Mcat

LGTM

sijunhe added 3 commits October 1, 2022 16:59

add inputs_embeds input arguments to all electra models

8090a41

save past_key_value for next PR

5107cd7

remove unused input_shape

80e03f1

sijunhe mentioned this pull request Oct 1, 2022

Electra 模型增加对输入 past_key_values 和 inputs_embeds 的支持 #3382

Closed

sijunhe added 2 commits October 1, 2022 17:21

fix yapf style check

7f9cb13

Merge branch 'develop' into electra_inputs_embeds

8efc78a

guoshengCS previously approved these changes Oct 11, 2022

View reviewed changes

Merge branch 'develop' into electra_inputs_embeds

76a7f4d

wj-Mcat requested changes Oct 11, 2022

View reviewed changes

sijunhe added 2 commits October 11, 2022 23:20

Merge branch 'PaddlePaddle:develop' into electra_inputs_embeds

446131d

address comments

c18cd26

sijunhe dismissed guoshengCS’s stale review via c18cd26 October 11, 2022 15:34

guoshengCS previously approved these changes Oct 12, 2022

View reviewed changes

Merge branch 'develop' into electra_inputs_embeds

57aadef

sijunhe dismissed guoshengCS’s stale review via 57aadef October 13, 2022 03:03

sijunhe added 2 commits October 13, 2022 16:26

fix style

2604460

Merge branch 'develop' into electra_inputs_embeds

7b4c100

sijunhe requested a review from wj-Mcat October 13, 2022 08:33

sijunhe and others added 3 commits October 14, 2022 14:22

Merge branch 'develop' into electra_inputs_embeds

fa58f18

Merge branch 'develop' into electra_inputs_embeds

6904860

fix unit test

f707421

Merge branch 'develop' into electra_inputs_embeds

a58e9e6

address comment

3cc2173

sijunhe changed the title ~~Adding inputs_embeds argument for Electra models~~ Adding inputs_embeds argument and switch to paddle.nn.TransformerEncoder for Electra models Oct 17, 2022

Merge branch 'develop' into electra_inputs_embeds

fb90a66

wj-Mcat approved these changes Oct 20, 2022

View reviewed changes

wj-Mcat and others added 2 commits October 24, 2022 14:21

Merge branch 'develop' into electra_inputs_embeds

6c91f8b

Merge branch 'develop' into electra_inputs_embeds

bfb54c4

guoshengCS merged commit c38902e into PaddlePaddle:develop Nov 3, 2022

sijunhe deleted the electra_inputs_embeds branch November 3, 2022 05:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding inputs_embeds argument and switch to paddle.nn.TransformerEncoder for Electra models #3401

Adding inputs_embeds argument and switch to paddle.nn.TransformerEncoder for Electra models #3401

sijunhe commented Oct 1, 2022 •

edited

Loading

CLAassistant commented Oct 1, 2022 •

edited

Loading

guoshengCS left a comment

wj-Mcat left a comment

wj-Mcat Oct 11, 2022

guoshengCS Oct 11, 2022

sijunhe Oct 11, 2022 •

edited

Loading

wj-Mcat Oct 11, 2022

sijunhe Oct 11, 2022

sijunhe commented Oct 11, 2022

wj-Mcat commented Oct 17, 2022

sijunhe commented Oct 17, 2022

wj-Mcat commented Oct 17, 2022

wj-Mcat commented Oct 17, 2022

sijunhe commented Oct 17, 2022

wj-Mcat left a comment

Adding inputs_embeds argument and switch to paddle.nn.TransformerEncoder for Electra models #3401

Adding inputs_embeds argument and switch to paddle.nn.TransformerEncoder for Electra models #3401

Conversation

sijunhe commented Oct 1, 2022 • edited Loading

PR types

PR changes

Description

CLAassistant commented Oct 1, 2022 • edited Loading

guoshengCS left a comment

Choose a reason for hiding this comment

wj-Mcat left a comment

Choose a reason for hiding this comment

wj-Mcat Oct 11, 2022

Choose a reason for hiding this comment

guoshengCS Oct 11, 2022

Choose a reason for hiding this comment

sijunhe Oct 11, 2022 • edited Loading

Choose a reason for hiding this comment

wj-Mcat Oct 11, 2022

Choose a reason for hiding this comment

sijunhe Oct 11, 2022

Choose a reason for hiding this comment

sijunhe commented Oct 11, 2022

wj-Mcat commented Oct 17, 2022

sijunhe commented Oct 17, 2022

wj-Mcat commented Oct 17, 2022

wj-Mcat commented Oct 17, 2022

sijunhe commented Oct 17, 2022

wj-Mcat left a comment

Choose a reason for hiding this comment

sijunhe commented Oct 1, 2022 •

edited

Loading

CLAassistant commented Oct 1, 2022 •

edited

Loading

sijunhe Oct 11, 2022 •

edited

Loading