Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post-init model patching fix #280

Merged
merged 11 commits into from
Sep 30, 2024
Merged

Conversation

shimizust
Copy link
Collaborator

Summary

Testing Done

  • In convergence tests, check that pre-init patching and post-init patching match results from original model

  • Hardware Type: A100

  • run make test to ensure correctness

  • run make checkstyle to ensure code style

  • run make test-convergence to ensure convergence --> most tests working, waiting for other fixes for all tests to pass

lancerts
lancerts previously approved these changes Sep 29, 2024
@ByronHsu ByronHsu mentioned this pull request Sep 30, 2024
@shimizust
Copy link
Collaborator Author

shimizust commented Sep 30, 2024

@ByronHsu Confirmed all tests pass locally on A100 with transformers 4.44.2 if you can force merge

@ByronHsu ByronHsu merged commit f2b288c into linkedin:main Sep 30, 2024
1 of 2 checks passed
tyler-romero pushed a commit to tyler-romero/Liger-Kernel that referenced this pull request Oct 1, 2024
## Summary
- Previously, the pre-trained weights were not being loaded if patching
model post-initialization
- Instead of loading weights, just patch the model instance module's
forward method (see linkedin#279)

## Testing Done
- In convergence tests, check that pre-init patching and post-init
patching match results from original model

- Hardware Type: A100
- [x] run `make test` to ensure correctness
- [x] run `make checkstyle` to ensure code style
- [ ] run `make test-convergence` to ensure convergence --> most tests
working, waiting for other fixes for all tests to pass
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants