Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] APIs #1075

Merged
merged 15 commits into from
Mar 7, 2024
Merged

Conversation

stevhliu
Copy link
Contributor

@stevhliu stevhliu commented Feb 21, 2024

This PR expands the API docs to showcase high-level classes and core lower-level building blocks, starting with the optimizers.

  • LinearKbit (Linear4bit, Linear8bitLt, etc.)
  • Parameters (Params4bit, Int8Params)
  • Embeddings (Stable Embedding, Embedding)

Copy link

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@stevhliu
Copy link
Contributor Author

Hey @Titus-von-Koeller, can you check the docstring descriptions for AdaGrad and the base optimizer classes? Since most of the docstring descriptions are the same, if these look good to you, I can easily copy most of them over to the remaining optimizers.

@Titus-von-Koeller
Copy link
Collaborator

Hey @stevhliu, I'll take a look now. Please be sure to run pre-commit install inside your local BNB Git directory.

@Titus-von-Koeller
Copy link
Collaborator

Everything looks really good, thanks! I think the docstring for the optimizers and base class is good the way you did it.

@@ -1,17 +1,18 @@
# AdaGrad

[AdaGrad (Adaptive Gradient)](https://jmlr.org/papers/v12/duchi11a.html) is an optimizer that adaptively adjusts the learning rate for each parameter based on their historical gradients.
[AdaGrad (Adaptive Gradient)](https://jmlr.org/papers/v12/duchi11a.html) is an adaptive learning rate optimizer. AdaGrad stores a sum of the squared past gradients for each parameter and uses it to scale their learning rate. This allows the learning rate to be automatically lower or higher depending on the magnitude of the gradient, eliminating the need to manually tune the learning rate.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice summary, much cleaner now!

@stevhliu stevhliu marked this pull request as ready for review March 5, 2024 21:27
@Titus-von-Koeller Titus-von-Koeller merged commit ac5d6ee into bitsandbytes-foundation:main Mar 7, 2024
10 checks passed
@stevhliu stevhliu deleted the api-build branch March 7, 2024 21:37
@akx akx mentioned this pull request Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants