Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge LoCo with Zero++ #6730

Merged
merged 22 commits into from
Dec 10, 2024
Merged

Merge LoCo with Zero++ #6730

merged 22 commits into from
Dec 10, 2024

Conversation

XingyuXie
Copy link
Contributor

Integration of LoCo Method into ZeRO++

Overview

This PR introduces the integration of the LoCo method, as outlined in this paper, into the ZeRO++ framework of DeepSpeed. The key enhancement involves applying error feedback compensation to 4-bit gradients before communication. This approach improves pre-training loss outcomes without additional time overhead, though it requires extra GPU memory. The extent of this memory increase depends on model size and training configuration.

Experimental Results

We conducted pre-training experiments using the Llama2 architecture, adjusting the number of layers and hidden size. The experiments included:

  • A smaller-scale model with 0.8B parameters trained on 30B tokens.
  • A larger-scale model with 8B parameters trained on 5B tokens.

The training data was sampled from Redpajama-V2.

Findings:

  • Smaller Models (0.8B parameters): Significant gains were observed when applying the LoCo method.
  • Larger Models (8B parameters): The gains were present but less pronounced. This could be due to:
    1. Relatively smaller data volume.
    2. Lower pre-training loss for larger models, making significant improvements harder to achieve.

However, even a smaller pre-training loss gap in larger models can translate to meaningful gains in downstream tasks.

Example Script

For reference, the run.sh script used for the 8B parameter, 5B tokens experiment is attached. The experiment was conducted using the DeepSpeed-Megatron platform.

Acknowledgments

Special thanks to cc @GuanhuaWang for ongoing communication and guidance throughout this work.


We appreciate your consideration of this PR and welcome any feedback or questions!

@XingyuXie
Copy link
Contributor Author

@XingyuXie please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree

@loadams loadams requested review from GuanhuaWang and removed request for awan-10 November 12, 2024 14:48
Copy link
Member

@GuanhuaWang GuanhuaWang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XingyuXie thx for this effort.

Overall looks good to me. Just left a few comments

deepspeed/runtime/zero/config.py Outdated Show resolved Hide resolved
deepspeed/runtime/zero/stage3.py Outdated Show resolved Hide resolved
csrc/includes/quantization_utils.h Outdated Show resolved Hide resolved
@XingyuXie XingyuXie requested a review from hwchen2017 November 19, 2024 20:59
@XingyuXie XingyuXie requested a review from hwchen2017 November 20, 2024 17:17
@XingyuXie XingyuXie requested a review from loadams as a code owner November 27, 2024 09:22
@XingyuXie
Copy link
Contributor Author

XingyuXie commented Nov 27, 2024

As required by cc @GuanhuaWang , we added the unit-test code to verify the logic of cuda kernels used in LoCo-Zero++.

@GuanhuaWang
Copy link
Member

Thx @XingyuXie for the pr updates on unit-test. Overall, it Looks good to me.

cc @tjruwase @hwchen2017

@loadams loadams enabled auto-merge December 6, 2024 22:29
@loadams loadams added this pull request to the merge queue Dec 10, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 10, 2024
@loadams loadams added this pull request to the merge queue Dec 10, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 10, 2024
@loadams loadams merged commit 1b58ba5 into microsoft:master Dec 10, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants