fix 3.1 rope init for compile #1544

ebsmothers · 2024-09-11T20:15:22Z

Currently our Llama 3.1 doesn't work with compile because we init the RoPE cache in the first forward.

For our single-device recipe we can init the cache as part of __init__ no problem, but for our distributed recipe we load on meta device. In that case we cannot init the cache in __init__ because the RoPE scaling factors require data to exist in intermediate tensors (which they won't on meta device). So for our distributed recipes currently initialize RoPE cache directly in the recipe here after sharding the model.

So we just need a way to skip calling rope_init from a meta device context. Actually we've solved this problem hackily before, see e.g. here. Well this change is basically identical to that, but this time rope_init always gets called. Then once the frequency is constructed, we break out early if it's on meta device and rely on the recipe to manually call after FSDP sharding.

Test plan

Apart from green CI, checking that loss curves are the same on a few different configs..

Single-device LoRA 8B

Distributed LoRA 8B

Single-device FFT 8B

pytorch-bot · 2024-09-11T20:15:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1544

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3d25ee0 with merge base df29d8a ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

joecummings

Neaet!

felipemello1 · 2024-09-11T20:57:03Z

torchtune/models/llama3_1/_position_embeddings.py

-            with torch.device(x.device):
-                self.rope_init()
+            raise RuntimeError(
+                "RoPE cache is not built. Please call rope_init() first."


this feels a bit inconvenient. Do we have to raise the error?

I also think that we should update the other positional embedding to follow the same pattern. They dont have is_cache_built. In the recipes, we should add a check:

if hasattr(m, init_rope) and not m.is_cache_bult

currently it is just:
if hasattr(m, init_rope)

this feels a bit inconvenient. Do we have to raise the error?

I think it's safest. Was talking with @joecummings about this a bit; previously we had an issue where RoPE was not getting initialized and so there were garbage values left over from the to_empty call. This was extremely hard to debug (Joe can tell you all about that one).

I also think that we should update the other positional embedding to follow the same pattern.

Updating the other one is fine but I might save that for now since it has a larger blast radius and isn't blocking anything like this is.

if hasattr(m, init_rope) and m.is_cache_bult

I think it'd be not m.is_cache_built, right? Either way I don't think there's any real cost to calling it twice. But I plan to move this init logic into a common util soon, so we can make the change as part of that move.

fix 3.1 rope init for compile

fcd960a

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 11, 2024

joecummings approved these changes Sep 11, 2024

View reviewed changes

ebsmothers marked this pull request as ready for review September 11, 2024 20:31

add back cache init validation check

3d25ee0

felipemello1 reviewed Sep 11, 2024

View reviewed changes

ebsmothers merged commit 221031a into pytorch:main Sep 11, 2024
17 checks passed

ebsmothers deleted the 31-rope-init branch September 11, 2024 21:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix 3.1 rope init for compile #1544

fix 3.1 rope init for compile #1544

ebsmothers commented Sep 11, 2024 •

edited

Loading

pytorch-bot bot commented Sep 11, 2024 •

edited

Loading

joecummings left a comment

felipemello1 Sep 11, 2024 •

edited

Loading

ebsmothers Sep 11, 2024

fix 3.1 rope init for compile #1544

fix 3.1 rope init for compile #1544

Conversation

ebsmothers commented Sep 11, 2024 • edited Loading

Test plan

Single-device LoRA 8B

Distributed LoRA 8B

Single-device FFT 8B

pytorch-bot bot commented Sep 11, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1544

✅ No Failures

joecummings left a comment

Choose a reason for hiding this comment

felipemello1 Sep 11, 2024 • edited Loading

Choose a reason for hiding this comment

ebsmothers Sep 11, 2024

Choose a reason for hiding this comment

ebsmothers commented Sep 11, 2024 •

edited

Loading

pytorch-bot bot commented Sep 11, 2024 •

edited

Loading

felipemello1 Sep 11, 2024 •

edited

Loading