GQA optimization for TP #498

michaelbenayoun · 2024-02-28T14:46:59Z

What does this PR do?

Adds support for GQAQKVColumnParallelLinear, which makes it possible to have tp_size >>> num_key_value_heads
Adds tests for this use case
Adds tests for checkpoint consolidation after distributed training
~~Initialize the parallel layers directly on the xla device to save host memory~~, in another PR.

HuggingFaceDocBuilderDev · 2024-03-01T10:25:44Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

dacorvo

Very hard to review in details as it is a very complex pull-request with many changes.
It looks good at first glance, but I have one question: you seem to sometimes use Tensor.copy_() (just creates a new tensor) and sometimes Tensor.clone() without detach() (data is still attached to old Tensor). Is it on on purpose ?

michaelbenayoun · 2024-03-19T09:14:14Z

Very hard to review in details as it is a very complex pull-request with many changes. It looks good at first glance, but I have one question: you seem to sometimes use Tensor.copy_() (just creates a new tensor) and sometimes Tensor.clone() without detach() (data is still attached to old Tensor). Is it on on purpose ?

We use Tensor.copy_() when loading weights to a model parameter. It will copy the tensor data into the parameter, and even move it to the proper device if needs be, so it is very convenient.
We use Tensor.clone() when we want to perform some task on a copy of the tensor, it should not affect anything here since we do that on cpu tensors during parallelization, or during checkpointing. I detach them now as you suggested since it makes things safer!

dacorvo · 2024-03-19T10:07:44Z

optimum/neuron/accelerate/accelerator.py

@@ -422,7 +422,8 @@ def _prepare_model_for_mp(
        cpu_ids = {name: id(param) for name, param in model.named_parameters()}
        tied_parameters_dict = get_tied_parameters_dict(model)
        model_main_input_name = getattr(model, "main_input_name", None)
-        model = self.state.mp_plugin.parallelize_model(model, device=self.device)
+        # TODO: use self.device.


You changed that back also ... Is it because the tests were failing ?

Yes, and since it is not related to the PR, I will work on that on another PR.

JingyaHuang

LGTM, thanks for adding the feature. I'm not familiar with this feature so not many meaningful advice for that...

JingyaHuang · 2024-03-20T10:31:16Z

optimum/neuron/accelerate/accelerator.py

@@ -435,6 +436,11 @@ def _prepare_model_for_mp(
        else:
            model_to_cast = model

+        # Update CPU ids
+        original_parameter_names_to_gqa_qkv_names = model._gqa_qkv_metadata["original_names_to_gqa_qkv_names"]
+        for key in list(cpu_ids.keys()):


Suggested change

for key in list(cpu_ids.keys()):

for key in cpu_ids.keys():

No, because in the for loop I update the keys so I want to work on a copy of the initial keys.

JingyaHuang · 2024-03-20T10:34:00Z

optimum/neuron/distributed/checkpointing.py

+    assert query_or_output in ["query", "output"]
+    assert full_weight.device == torch.device("cpu")


Maybe raise with information?

I do not think it's needed as it's quite low level functions.

michaelbenayoun added 6 commits February 27, 2024 12:02

Add GQAPatcher

1af4143

[WIP]

04321c6

[WIP]

2b29260

[WIP]

edd05a0

Computation works, but result not equal

e8e1dc5

Merge branch 'main' into gqa_optimization

58ccd6a

michaelbenayoun added 23 commits March 1, 2024 15:56

Trigger CI

8912ddc

[WIP]

521b398

[WIP]

46da027

Test pass

a35cbe8

kv_size_multiplier as argument

994f7bc

WIP

82ad4da

WIP

b44437b

WIP

eb67cbb

Works for PP

c701cec

WIP for consolidation

f789e3d

WIP for consolidation

6494a76

WIP

83f27af

WIP works for queries

52bee5d

[WIP]

50af237

[WIP]

caae3d4

[WIP]

87aada5

Cleanup

4c478a0

Works for no_lazy_load

1eece1d

Works for lazy_load

83d196c

[WIP] checkpointing

b5544ec

[WIP]

1ab5fac

Merge branch 'main' into gqa_optimization

62fb199

Fix consolidation

6885a6a

michaelbenayoun added 4 commits March 14, 2024 17:41

Fix

c4f7796

Add support for bias

18fd750

Fix issues

6e90797

Final fixes

1412c4a

michaelbenayoun marked this pull request as ready for review March 15, 2024 13:45

michaelbenayoun requested review from dacorvo and JingyaHuang March 15, 2024 13:45

michaelbenayoun added 5 commits March 15, 2024 15:13

Add docs

ea774bb

[WIP]

ab55b61

[WIP]

9b8713b

Remove print

b49ee18

WIP

f517b04

dacorvo reviewed Mar 19, 2024

View reviewed changes

Apply suggestions

29031c6

dacorvo reviewed Mar 19, 2024

View reviewed changes

michaelbenayoun added 3 commits March 19, 2024 16:17

Fix test

b3d10e1

Fix test

c47e7a2

Styling

7be5dde

JingyaHuang approved these changes Mar 20, 2024

View reviewed changes

michaelbenayoun added 2 commits March 20, 2024 14:31

Fix skip test

901b439

Skip from_config + lazy_load

759a6ae

michaelbenayoun merged commit 4a7df1a into main Mar 20, 2024
8 of 12 checks passed

michaelbenayoun deleted the gqa_optimization branch March 20, 2024 16:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GQA optimization for TP #498

GQA optimization for TP #498

michaelbenayoun commented Feb 28, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 1, 2024

dacorvo left a comment

michaelbenayoun commented Mar 19, 2024 •

edited

Loading

dacorvo Mar 19, 2024

michaelbenayoun Mar 19, 2024

JingyaHuang left a comment

JingyaHuang Mar 20, 2024

michaelbenayoun Mar 20, 2024

JingyaHuang Mar 20, 2024

michaelbenayoun Mar 20, 2024

		assert query_or_output in ["query", "output"]
		assert full_weight.device == torch.device("cpu")

GQA optimization for TP #498

GQA optimization for TP #498

Conversation

michaelbenayoun commented Feb 28, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Mar 1, 2024

dacorvo left a comment

Choose a reason for hiding this comment

michaelbenayoun commented Mar 19, 2024 • edited Loading

dacorvo Mar 19, 2024

Choose a reason for hiding this comment

michaelbenayoun Mar 19, 2024

Choose a reason for hiding this comment

JingyaHuang left a comment

Choose a reason for hiding this comment

JingyaHuang Mar 20, 2024

Choose a reason for hiding this comment

michaelbenayoun Mar 20, 2024

Choose a reason for hiding this comment

JingyaHuang Mar 20, 2024

Choose a reason for hiding this comment

michaelbenayoun Mar 20, 2024

Choose a reason for hiding this comment

michaelbenayoun commented Feb 28, 2024 •

edited

Loading

michaelbenayoun commented Mar 19, 2024 •

edited

Loading