-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test different input sequence lengths for Llama #1070
base: main
Are you sure you want to change the base?
Conversation
], | ||
) | ||
@pytest.mark.parametrize("seq_len", [1, 2, 4, 7, 8, 16, 28, 32, 63, 64, 99, 117, 128, 256, 341, 512, 1024, 1790, 2048]) | ||
@pytest.mark.skip(reason="No need to run in CI as it takes a long time to run.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My recommendation is to choose which of these will be part of the training focus, instead of skipping it entirely.
E.g. if we're going to focus on training 2048 seq len model, let's fully compile and run as part of push CI that variant alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right - understanding which sequence length is relevant for Llama finetuning is one of the training team's tasks.
Once we establish which set of seq lengths is needed, we will continue with PCC tests and run as part of CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree as well, will update seq_len parameters with required ones for training once we choose them (we will run some experiments separately).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is updated to use only dim sizes we care about. Additionally, I setup only one hidden layer to be used for test to speed it up (while I also ran full model test locally to make sure it passes).
input_ids = tokenizer(prompt, padding="max_length", truncation=True, return_tensors="pt").input_ids | ||
|
||
# Compile the model and run fwd pass | ||
compiled_model = forge.compile(framework_model, input_ids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to test out bwd compile/run as well?
One general question, is there a clean way to test a backward part of a graph in isolation? For example, our compile should return compiled context that contains information about each compiled component (e.g. fwd, bwd, loss, etc.).
Therefore, is there a clean way to just call the bwd part of the graph with random inputs, without a need to run the forward part, and initialize the loss and optimizer part of the training workflow?
Note: this is not a requirement for this PR, just a general question that can be useful here as well. I.e. can we have granular tests that target specific functionality, rather than the whole workflow (only the bwd part of the model). I see this as especially useful for bwd generallity push in the future. cc @vladimirjovanovicTT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a must-have functionality as part of our training generality/BFS effort.
Let's discuss the implementation details offline.
|
|
|
|
5b45490
to
710afb4
Compare
|
1 similar comment
|
|
1 similar comment
|
Add test to make sure Llama compiles and run fwd pass with different input sequence lengths as we will have inputs of various lengths during training.
Close #1071