Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GNS Calculation at Transformer Model #1

Open
HariSeldon11988 opened this issue Jul 29, 2024 · 2 comments
Open

GNS Calculation at Transformer Model #1

HariSeldon11988 opened this issue Jul 29, 2024 · 2 comments

Comments

@HariSeldon11988
Copy link

Dear all,

I run your transformer code in gns_workloads to have a look at the calculation of gns. I haven't found the part that calculates the gns. Have I overlooked it, or does the code not calculate gns?

Would help me a lot if you could give me some support or feedback.

@ruipeterpan
Copy link
Member

ruipeterpan commented Jul 29, 2024

Hi, apologies for the inconvenience -- it seems that, indeed, the current scripts for the translation workloads do not calculate GNS. It's been a few years since I touched this codebase and I no longer have access to my original dev setup, so I unfortunately won't be able to provide a script that works out of the box, but I believe that the GNS calculation scripts are model-agnostic as they are only accessing param.grad.data. For reference, please refer to how get_GNS() is used and implemented here.

cc'ing @DartingMelody as she is more familiar with the GNS implementations in case :)

@HariSeldon11988
Copy link
Author

@DartingMelody
Is your gns implementation based on appendix A1 from An Empirical Model of Large-Batch Training?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants