Docs / Quantization: refactor quantization documentation #30942

younesbelkada · 2024-05-21T16:22:52Z

What does this PR do?

As per title, this PR refactors the quantization documentation to make it clearer, less aggressive to users and simple to understand, mainly about which quantization method to use when - still WIP

cc @SunMarc @stevhliu @Titus-von-Koeller

docs/source/en/quantization/overview.md

HuggingFaceDocBuilderDev · 2024-05-21T16:54:31Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…d-docs

SunMarc

Awesome work @younesbelkada. Thanks for refactoring the docs, so that the users can choose better which quantization method to use !

docs/source/en/quantization/overview.md

SunMarc · 2024-05-22T11:53:19Z

docs/source/en/quantization/overview.md

+Quantization techniques focus on representing data with less information while also trying to not lose too much accuracy. This often means converting a data type to represent the same information with fewer bits. For example, if your model weights are stored as 32-bit floating points and they're quantized to 16-bit floating points, this halves the model size which makes it easier to store and reduces memory-usage. Lower precision can also speedup inference because it takes less time to perform calculations with fewer bits.
+


For those who are interested in learning more about quantization, do you think we can put the links to the DLAI course ?

stevhliu

Very nice! 🔥

Makes a lot of sense to create separate pages for each method especially if the community keeps adding new quantization methods!

docs/source/en/_toctree.yml

docs/source/en/quantization/overview.md

docs/source/en/quantization/bitsandbytes.md

docs/source/en/quantization/awq.md

docs/source/en/quantization/aqlm.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* refactor quant docs * delete file * rename to overview * fix * fix table * fix * add content * fix library versions * fix table * fix table * fix table * fix table * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * replace to quantization_config * fix aqlm snippet * add DLAI courses * fix * fix table * fix bulet points --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

LysandreJik · 2024-05-27T13:19:53Z

In this PR you deleted the quantization.md file, doing so means that all existing links will redirect to a non-existing file (or to the previous file as long as it's cached).

You should update the following file to ensure that users coming from other places don't get redirected to an empty/outdated file: https://github.com/huggingface/transformers/blob/main/docs/source/en/_redirects.yml

cc @younesbelkada

LysandreJik · 2024-05-27T13:20:38Z

We likely want to redirect to the overview.md file

younesbelkada · 2024-05-27T13:29:49Z

Thanks for the heads up ! Done in #31063

…#30942) * refactor quant docs * delete file * rename to overview * fix * fix table * fix * add content * fix library versions * fix table * fix table * fix table * fix table * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * replace to quantization_config * fix aqlm snippet * add DLAI courses * fix * fix table * fix bulet points --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

younesbelkada added 4 commits May 21, 2024 18:21

refactor quant docs

0e7b68a

delete file

4a50b7c

rename to overview

883f2e8

fix

7068150

younesbelkada commented May 21, 2024

View reviewed changes

docs/source/en/quantization/overview.md Outdated Show resolved Hide resolved

younesbelkada commented May 21, 2024

View reviewed changes

docs/source/en/quantization/overview.md Outdated Show resolved Hide resolved

fix table

fce94de

younesbelkada added 5 commits May 22, 2024 09:45

fix

2ea16f8

Merge remote-tracking branch 'origin/main' into refactor-quantization…

d7d6615

…d-docs

add content

c1f7b25

fix library versions

25a14c5

fix table

273bb21

younesbelkada marked this pull request as ready for review May 22, 2024 08:21

younesbelkada requested review from LysandreJik, stevhliu and SunMarc May 22, 2024 08:21

SunMarc approved these changes May 22, 2024

View reviewed changes

younesbelkada added 3 commits May 22, 2024 16:30

fix table

275dded

fix table

36c82ab

fix table

73d8a81

stevhliu approved these changes May 22, 2024

View reviewed changes

younesbelkada and others added 7 commits May 23, 2024 09:35

Apply suggestions from code review

8fbbde9

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

replace to quantization_config

0c1f26d

fix aqlm snippet

8fb0318

add DLAI courses

c34e844

fix

083492f

fix table

10a0637

fix bulet points

cec977c

younesbelkada merged commit 87a3518 into main May 23, 2024
8 checks passed

younesbelkada deleted the refactor-quantizationd-docs branch May 23, 2024 12:31

younesbelkada mentioned this pull request May 27, 2024

Docs / Quantization: Redirect deleted page #31063

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs / Quantization: refactor quantization documentation #30942

Docs / Quantization: refactor quantization documentation #30942

younesbelkada commented May 21, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented May 21, 2024

SunMarc left a comment

SunMarc May 22, 2024

stevhliu left a comment

LysandreJik commented May 27, 2024

LysandreJik commented May 27, 2024

younesbelkada commented May 27, 2024

		Quantization techniques focus on representing data with less information while also trying to not lose too much accuracy. This often means converting a data type to represent the same information with fewer bits. For example, if your model weights are stored as 32-bit floating points and they're quantized to 16-bit floating points, this halves the model size which makes it easier to store and reduces memory-usage. Lower precision can also speedup inference because it takes less time to perform calculations with fewer bits.

Docs / Quantization: refactor quantization documentation #30942

Docs / Quantization: refactor quantization documentation #30942

Conversation

younesbelkada commented May 21, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented May 21, 2024

SunMarc left a comment

Choose a reason for hiding this comment

SunMarc May 22, 2024

Choose a reason for hiding this comment

stevhliu left a comment

Choose a reason for hiding this comment

LysandreJik commented May 27, 2024

LysandreJik commented May 27, 2024

younesbelkada commented May 27, 2024

younesbelkada commented May 21, 2024 •

edited

Loading