Add Pipeline Parallel for PPO training and support generation with InferenceModel #7953

guoshengCS · 2024-02-02T08:05:58Z

PR types

New features

PR changes

Others

Description

Add Pipeline Parallel for PPO training and support generation with InferenceModel

codecov · 2024-02-02T08:46:10Z

Codecov Report

Attention: Patch coverage is 30.55556% with 25 lines in your changes missing coverage. Please review.

Project coverage is 54.41%. Comparing base (b36b6a0) to head (f1e66f2).
Report is 290 commits behind head on develop.

Files with missing lines	Patch %	Lines
paddlenlp/utils/nested.py	11.11%	16 Missing ⚠️
paddlenlp/transformers/llama/modeling_pp.py	28.57%	5 Missing ⚠️
paddlenlp/generation/utils.py	33.33%	2 Missing ⚠️
paddlenlp/trainer/plugins/timer.py	0.00%	1 Missing ⚠️
paddlenlp/transformers/model_utils.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #7953      +/-   ##
===========================================
+ Coverage    54.29%   54.41%   +0.12%     
===========================================
  Files          617      632      +15     
  Lines        96339    99476    +3137     
===========================================
+ Hits         52310    54134    +1824     
- Misses       44029    45342    +1313

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…inputs_func.

… from main from_pratrined.

…el to unify.

…ength.

PPP 4d/support uc

Add ppo 4d test

… ppo-4d-reshard

…ain/eval to align.

…them.

…nto ppo-4d

[PPO] Format code

CLAassistant · 2024-06-12T02:26:46Z

All committers have signed the CLA.

… ppo-4d-eb

gongel · 2024-06-12T04:50:02Z

paddlenlp/generation/utils.py

+            # and sampling, and then broadcast to avoid broadcast logits.
+            if hasattr(self, "pp_group"):
+                paddle.distributed.broadcast(
+                    next_tokens, src=self.pp_group.ranks[0], group=self.pp_group  # use rank 0 for same seed to check


我看CI这里，self.pp_group有可能为None，导致.ranks[0]报错

wawltor

LGTM

guoshengCS added 2 commits February 2, 2024 08:03

Add Pipeline Parallel for PPO training.

c71fb92

Move new_ppo_trainer.py to ppo_trainer.py

2f2ad5e

guoshengCS and others added 7 commits February 4, 2024 06:04

Fix padding among batches of accumulation steps in _prepare_pipeline_…

8e8143e

…inputs_func.

Fix hcg using in TP generation

e4d7781

Try to support generation in PP. And allow extra training args passed…

4d1641b

… from main from_pratrined.

Support PP generation.

34d4cd1

Fix PP eval by unify prediction_step

665fee2

Fix reward value showing error cased by BF16 dtype when eval

a2e9702

fix all

6c8441c

guoshengCS requested review from ZHUI and wawltor February 22, 2024 07:59

guoshengCS and others added 18 commits February 22, 2024 08:11

Make non-PipelineParallel models use the same loss layer with PipeMod…

d295d11

…el to unify.

add offload.

38cc1a7

Use create_loss to unify Pipe and non-Pipe usage.

6ff38c8

Add eval mode and offload level.

6e49431

merge

c421af7

fix all

f6b5f97

support tp+pp

63df4fd

fix data split.

c9e5cad

Fix position_ids in generation/eval/train.

5979507

fix data group.

16d886a

add tp rank guard

1786357

Support rollout label data both with target length or source+target l…

3bc48cb

…ength.

Merge remote-tracking branch 'guosheng/ppo-4d' into ppo-4d/support_uc

5ae2c6f

Move metric calculation to rl_step to avoid comm.

1b50869

fix pad

bc80256

Merge remote-tracking branch 'guosheng/ppo-4d' into ppo-4d/support_uc

986b407

fix create group.

b3f22c2

no print

8c7e612

Open PolicyTrainer loss logging postprocess. More StepTrainer docs.

860e61d

guoshengCS changed the title ~~Add Pipeline Parallel for PPO training.~~ Add Pipeline Parallel for PPO training and support generation with InferenceModel Mar 13, 2024

ZHUI and others added 17 commits March 15, 2024 15:31

more timer.

afa1b53

Merge remote-tracking branch 'guosheng/ppo-4d' into ppo-4d/support_uc

80e47e6

fix bugs.

757d3a7

Merge pull request #1 from PaddlePaddle/ppo-4d/support_uc

6f2eff6

PPP 4d/support uc

Add EMA and PPOMetric

1448b73

add tests

b809631

add unit test for rank guard.

2f8d032

Merge pull request #2 from ZHUI/ppo-4d-test

edf28f2

Add ppo 4d test

Fix reshard zero3 and reshard infer.

fbb9ac3

Merge branch 'ppo-4d' of https://github.com/guoshengCS/PaddleNLP into…

aebdc89

… ppo-4d-reshard

Revert PaddlePaddle#7818 for llama and remove position_ids for gen/tr…

cb6e4ff

…ain/eval to align.

Move reload/clean/data_group to comm_utils and use guard to decorate …

4ddd415

…them.

Offload sync and other data reuse fix.

b68cb0d

Clead code

5e46ab6

Update README

d538917

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

510ef03

…nto ppo-4d

Update ppo_trainer

1feb5ee

gongel marked this pull request as ready for review June 11, 2024 03:19

gongel and others added 3 commits June 11, 2024 09:02

format code

c8b3c61

Fix make_position_ids by 4d causal mask.

c26583f

Merge pull request #4 from gongel/ppo-4d

1b8e4a3

[PPO] Format code

guoshengCS added 2 commits June 12, 2024 02:29

Merge branch 'ppo-4d' of https://github.com/guoshengCS/PaddleNLP into…

9acd87c

… ppo-4d-eb

Fix nested_broadcast_tensor_with_empty import

ffa4658

gongel reviewed Jun 12, 2024

View reviewed changes

Update eval with make_attention_mask

f1e66f2

wawltor approved these changes Jun 13, 2024

View reviewed changes

wawltor merged commit bf6d4e7 into PaddlePaddle:develop Jun 13, 2024
7 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Pipeline Parallel for PPO training and support generation with InferenceModel #7953

Add Pipeline Parallel for PPO training and support generation with InferenceModel #7953

guoshengCS commented Feb 2, 2024 •

edited

Loading

codecov bot commented Feb 2, 2024 •

edited

Loading

CLAassistant commented Jun 12, 2024 •

edited

Loading

gongel Jun 12, 2024

wawltor left a comment

Add Pipeline Parallel for PPO training and support generation with InferenceModel #7953

Add Pipeline Parallel for PPO training and support generation with InferenceModel #7953

Conversation

guoshengCS commented Feb 2, 2024 • edited Loading

PR types

PR changes

Description

codecov bot commented Feb 2, 2024 • edited Loading

Codecov Report

CLAassistant commented Jun 12, 2024 • edited Loading

gongel Jun 12, 2024

Choose a reason for hiding this comment

wawltor left a comment

Choose a reason for hiding this comment

guoshengCS commented Feb 2, 2024 •

edited

Loading

codecov bot commented Feb 2, 2024 •

edited

Loading

CLAassistant commented Jun 12, 2024 •

edited

Loading