-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test without gradient accumulation #841
Conversation
pred = tt_model(data)[0] | ||
golden_pred = framework_model(data) | ||
assert golden_pred.dtype == dtype | ||
assert compare_with_golden(golden_pred, pred, verify_cfg=VerifyConfig(pcc=0.95)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this PR, there is a unified method for comparing golden output with output of compiled model: linkt to verify. It calls this compare_with_golden but also performs dataformat and some other sanity checks...
In this PR above, you can find examples of its use in tests. It makes things more concise...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need the predictions from the models since I am calculating the loss after this.
Should I change the verify
method to make it return the predictions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify is under development at the moment (by @vkovinicTT), so I wouldn't change it. Let's leave compare_with_golden
then...
@vkovinicTT did you plan on adding return for the predictions any time soon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not diverge and extend the scope of planned changes for the verify refactor, yet. There are not so many training tests, so it's not a big problem at this moment IMO...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't plan on extending it, but I can do that if needed?
@@ -21,6 +21,92 @@ | |||
def test_mnist_training(): | |||
torch.manual_seed(0) | |||
|
|||
# Model and data type. To use bfloat16 follow the instructions found at test_forge_vs_torch() in this file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_forge_vs_torch is moving into tt-thomas - it makes sense to copy the instructions here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, done here de73a93
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I am not sure if we should keep the old test? We should probably test the gradient accumulation part differently (sort of like a unit test)...
learning_rate = 0.001 | ||
|
||
# Limit number of batches to run - quicker test | ||
limit_num_batches = 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about removing this "parameter" and just run this with higher batch size, i.e. 2048?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MNIST is 60k inputs so with a large batch size we will never reach limit_num_batches. I agree that limit_num_batches can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done here: ae7bee0
I agree that it should be reworked, but maybe in a different PR? We can create an issue for this. |
Add a new test that does standard batched training without gradient accumulation.
Renames the existing test to specify that it has gradient accumulation.
Closes #806