Skip to content

Commit

Permalink
Fix some bug of sequence_parallel (#746)
Browse files Browse the repository at this point in the history
* Add sequence parallel strategy support.
1. Add sequence parallel strategy for GPTModelHybrid
2. Output has been checked layer by layer both in forward
   and backward progress, and its loss curve of the beginning
   5000 steps fits the peer
3. Performance is improved for about 10% with sequence_parallel
   strategy compared with pretrain_gpt_1.3B_mp8

* Add sequence_parallel_utils.py file

* Fix some bug of sequence_parallel.
1. Add sequence_parallel option for GPTModel
2. When mp=1, sequence_parallel option should
   always be set False
  • Loading branch information
GhostScreaming authored Sep 17, 2022
1 parent 12fbfd2 commit d6c186d
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 2 deletions.
5 changes: 5 additions & 0 deletions ppfleetx/models/language_model/gpt/dygraph/hybrid_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -646,6 +646,11 @@ def __init__(self,
self.hidden_size = hidden_size
self.vocab_size = vocab_size

hcg = fleet.get_hybrid_communicate_group()
mp_size = hcg.get_model_parallel_world_size()
if mp_size <= 1:
sequence_parallel = False

self.embeddings = GPTEmbeddings(
vocab_size, hidden_size, hidden_dropout_prob,
max_position_embeddings, type_vocab_size, self.initializer_range,
Expand Down
4 changes: 2 additions & 2 deletions ppfleetx/models/language_model/gpt/dygraph/single_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -486,10 +486,10 @@ def __init__(self,
use_recompute=False,
initializer_range=0.02,
fused_linear=False,
recompute_granularity="full"):
recompute_granularity="full",
sequence_parallel=False):

super(GPTModel, self).__init__()

self.initializer_range = initializer_range
self.hidden_size = hidden_size
self.vocab_size = vocab_size
Expand Down

0 comments on commit d6c186d

Please sign in to comment.