[WIP] [doc] performance/scalability revamp #15723

stas00 · 2022-02-18T17:36:35Z

moved from #15213 so that we get the doc generator working

XXX: The previous PR has comments/suggestions that need to be integrated here

@lvwerra and I are working on a massive performance/scalability docs revamp:

So the rough plan is to make custom plans for each of the combinations [inference|training] * [1 gpu|many gpus|cpu] so that it's very easy for the user to follow the instructions that are specific to their needs.

So the proposed doc layout is:

performance.mdx (main entry point)
perf_infer.mdx
- perf_infer_cpu.mdx
- perf_infer_gpu_many.mdx
- perf_infer_gpu_one.mdx
perf_train.mdx
- perf_train_gpu_many.mdx
- perf_train_gpu_one.mdx
scalability.mdx (rename from parallelism.mdx) (XXX: to do)

See the PR's changes for a rough layout of the content.

One big question is this: At the moment everything is pytorch-centric, as we don't have any info on tf/flax. Down the road we will either inject tf/flax-specific instructions into the current docs, or perhaps it'd be better to have dedicated docs for pt/tf/flax. It'd help a lot to decide ahead of time to avoid document renaming and potentially breaking links. If we plan to have these PT-specific perhaps let's embed _pt in the filenames?

@lvwerra

HuggingFaceDocBuilder · 2022-02-18T17:36:56Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

lvwerra · 2022-03-15T14:53:20Z

Hi @stas00, first stab at the single GPU section. The text is still WIP but you could have look at the sections and rough content to see if you agree.

Looking at the docs as a whole again here a few thoughts:

Instead of dividing the whole section into speed and memory related techniques I thought an alternative could be to have a summary table at the beginning or end where the memory and speed impact of all methods is described. Could also help a user navigate the doc better.
In the previous PR we discussed where to discuss common concepts (first time they occurred vs. in the performance docs). I am currently leaning towards the former. The main thing that's currently in the performance.mdx and I am not sure, yet, where I would put it is the hardware discussion which is quite different to everything else we are doing which makes me think that maybe we can create a dedicated document hardware.mdx where we could also expand a bit to other accelerators (TPU/IPUs for example).
If we take that road the performance.mdx doc is again quite empty and I think we could move the content from perf_infer.mdx and perf_train.mdx to this document and then have a shared entry point and remove perf_infer.mdx and perf_train.mdx.

If you are happy with the structure I could start outlining the performance.mdx and polishing the single-GPU section while you could for example work on the multi-GPU training content.

What do you think?

stas00 · 2022-03-17T03:43:25Z

Thank you for taking a stab at redesign, @lvwerra

It's very clear that we have fundamentally conflicting visions of how the performance documentation should be laid out. So it'd be wasteful to spend energy to try to come up with something that resonates with both of us.

I don't think this project can have 2 masterminds, and since I'm currently busy with BigScience training and we don't want this to fall between the cracks, I propose that if this works for you, please take over and proceed in an unimpeded way that sounds good to you, @lvwerra.

Once you have completed the restructuring and added what you know and you want me to fill in some gaps that I hopefully know to fill in please let me know and I will do so. or alternatively do tell me what sections are missing and I will work on those in free format and you can then adapt them to the structure you end up choosing.

lvwerra · 2022-04-13T09:59:37Z

Hi @LysandreJik and @sgugger,

I revamped the documentation a bit and reduced the scope from all the documents @stas00 lined out to just the performance.mdx and the training on single/multi-GPU so we can converge on those before adding the others, which also prevents a monster PR. For some reason the preview of the docs still does not work - if you have any idea, let me know.

To guide you a bit in the review:

performance.mdx should be the entry point and give an overview
perf_train_gpu_single.mdx shows all the tricks to efficiently train large models on a single GPU (gradient accumulation/checkpointing, optimizers, DeepSpeed Zero, data loaders etc.)
perf_train_gpu_single.mdx shows all the tricks to efficiently train large models on a single GPU (mostly parallelism strategies: data, tensor, pipeline parallelism)
perf_hardware.mdx is where I put some tips and tricks about custom hardware setups

In subsequent PRs we can:

add more TensorFlow examples
add TPU training guide
add inference side
add guides for train/inference on specialized hardware (Optimum)

This is the plan for the full documentation:

Let me know what you think about style and content of the documentation. There are still a few rough edges but I think it is good for a first review.

sgugger · 2022-04-13T11:55:16Z

The preview can't work if you don't rebase on master to have the new doc structure (everything nested in an "en" subfolder)

HuggingFaceDocBuilderDev · 2022-04-13T13:08:04Z

The documentation is not available anymore as the PR was closed or merged.

lvwerra · 2022-04-13T13:10:15Z

Thank you @sgugger it worked!

sgugger · 2022-04-13T15:35:41Z

docs/source/en/_toctree.yml

+  - local: fast_tokenizers
+    title: "Using tokenizers from 🤗 Tokenizers"


This change does not seem to be linked to this PR.

Good catch!must have happened during rebase - I'll remove it.

docs/source/en/perf_hardware.mdx

docs/source/en/perf_train_gpu_many.mdx

docs/source/en/perf_train_gpu_one.mdx

sgugger · 2022-04-13T15:43:35Z

docs/source/en/perf_train_gpu_one.mdx

+### `_multi_tensor`
+pytorch-nightly introduced `torch.optim._multi_tensor` which should significantly speed up the optimizers for situations with lots of small feature tensors. It should eventually become the default, but if you want to experiment with it sooner and don't mind using the bleed-edge, see: https://github.com/huggingface/transformers/issues/9965


This is one year old, is it really the bleeding edge?

I did spend time investigating this a few months ago, you can see the details here: pytorch/pytorch#71274

the pytorch devs swear it's supposed to be faster, but I wasn't able to see any speed improvement. tldr: pytorch/pytorch#71274 (comment)

I have a separate PR that expands the optimizers section: #14708 and I was thinking to add it after the dust settles, but I suppose it might be better to put it into the "fodder" now?

As long as it's not forgotten here I'm fine with both solutions.

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

docs/source/en/perf_train_gpu_many.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

stas00 · 2022-04-20T18:43:30Z

@lvwerra, I rebased and solved the conflict introduced by #16860 - fixing it at its new file the content was moved to.

lvwerra · 2022-04-22T09:22:40Z

@stas00 thank you!

@sgugger @LysandreJik would you like to take a look at the PR? I integrated the previous suggestion.

LysandreJik

Impressive refactor!

lvwerra · 2022-05-06T14:45:07Z

@sgugger are you happy to merge this?

docs/source/en/_toctree.yml

sgugger · 2022-05-09T14:32:13Z

Yes, LGTM :-)

* [doc] performance/scalability revamp * link the new docs * no : * mixed precision * work on the first doc * expand the main doc * Trigger CI * style * revamp single GPU training section * work on training performance * remove files not used anymore or will be added later * final touches * fix rebase * Add hardware section to toctree * fix toctree again * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * remove `fast_tokenizers` entry that was copied in rebase * add warning about DP vs DDP * remove todo * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix missing closure of codeblock * Update docs/source/en/perf_train_gpu_many.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * sync with huggingface#16860 * update toc Co-authored-by: leandro <leandro.vonwerra@spoud.io> Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

fix tokenizer autodoc fix minor CI issues fix minor CI issues fix minor CI issues fix style issue fix minor import issues fix few issues remove def main on the test add require torch replace decorator with 'with' fix style change to bloom add quick fix tokenizer fix tokenizer file fix tokenizer - merge tests - small fixes fix import issue add bloom to readme fix consistency Update docs/source/en/model_doc/bloom.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Apply suggestions from code review fix comment issues on file headers Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> fix doc issue small fix - modeling test some changes - refactor some code - taking into account reviews - more tests should pass - removed pruning tests remove useless division more tests should pass more tests should pass more tests should pass let's try this one -add alibi offset - remove all permutes to make the grad operations work - finger crossed Update data2vec.mdx to include a Colab Notebook link (that shows fine-tuning) (huggingface#17194) * Update data2vec.mdx * Update data2vec.mdx * Update docs/source/en/model_doc/data2vec.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Dev version Add test to ensure models can take int64 inputs (huggingface#17210) * Add test to ensure models can take int64 inputs * is_integer is an attribute, not a method * Fix test when some inputs aren't tensors * Add casts to blenderbot and blenderbot-small * Add casts to the other failing models Fix dependency table update BART docs (huggingface#17212) Black preview (huggingface#17217) * Black preview * Fixup too! * Fix check copies * Use the same version as the CI * Bump black Fix typo in bug report template (huggingface#17178) * Fix typo * Force rerun workflows Co-authored-by: Felix Marty <felix@huggingface.co> Added translation of installation.mdx to Portuguese Issue huggingface#16824 (huggingface#16979) * Added translation of installation.mdx to Portuguese, as well as default templates of _toctree.yml and _config.py * [ build_documentation.yml ] - Updated doc_builder to build documentation in Portuguese. [ pipeline_tutorial.mdx ] - Created translation for the pipeline_tutorial.mdx. * [ build_pr_documentation.yml ] - Added pt language to pr_documentation builder. [ pipeline_tutorial.mdx ] - Grammar changes. * [ accelerate.mdx ] - Translated to Portuguese the acceleration tutorial. * [ multilingual.mdx ] - Added portuguese translation for multilingual tutorial. [ training.mdx ] - Added portuguese translation for training tutorial. * [ preprocessing.mdx ] - WIP * Update _toctree.yml * Adding Pré-processamento to _toctree.yml * Update accelerate.mdx * Nits and eliminate preprocessing file while it is ready Co-authored-by: Omar U. Espejel <espejelomar@gmail.com> OPT-fix (huggingface#17229) * try fixes * Revert "try fixes" This reverts commit a8ad75e. * add correct shape * add correct path OPT - fix docstring and improve tests slighly (huggingface#17228) * correct some stuff * fix doc tests * make style Update self-push workflow (huggingface#17177) * update push ci * install git-python * update comment * update deepspeed jobs * fix report * skip 2 more tests that require fairscale * Fix changes in test_fetcher.py (to deal with `setup.py` is changed) * set RUN_PT_TF_CROSS_TESTS=1 and final clean-up * remove SIGOPT_API_TOKEN * remove echo "$matrix_folders" Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> fix --gpus option for docker (huggingface#17235) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Handle copyright in add-new-model-like (huggingface#17218) Fix Trainer for Datasets that don't have dict items (huggingface#17239) install dev. version of accelerate (huggingface#17243) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Fix push CI channel (huggingface#17242) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Add PR title to push CI report (huggingface#17246) * add PR title to push CI report * add link Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> [ fast_tokenizers.mdx ] - Added translation to portuguese to tutorial (huggingface#17076) * [ fast_tokenizers.mdx ] - Added translation to portuguese to tutorial * Delete docs/source/pt-br directory * [ fast_tokenizers.mdx ] - Continuing work on file * [ fast_tokenizers.mdx ] - Continuing work on file * Add fast tokenizers to _toctree.yml * Eliminated config and toctree.yml * Nits in fast_tokenizers.mdx Co-authored-by: Omar U. Espejel <espejelomar@gmail.com> Translated version of model_sharing.mdx doc to spanish (huggingface#16184) * Translated version of model_sharing to spanish * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Addind model sharing to _toctree.yml Co-authored-by: Omar U. Espejel <espejelomar@gmail.com> Guide to create custom models in Spanish (huggingface#17158) * file copied and toctree updated * Intro and configuration translated * model section translated * enter hotfix * Translation over, correction pending * Typos and corrections * Update docs/source/es/create_a_model.mdx Co-authored-by: Omar U. Espejel <espejelomar@gmail.com> * Update docs/source/es/create_a_model.mdx Co-authored-by: Omar U. Espejel <espejelomar@gmail.com> * Update docs/source/es/create_a_model.mdx Co-authored-by: Omar U. Espejel <espejelomar@gmail.com> * Update docs/source/es/create_a_model.mdx Co-authored-by: Omar U. Espejel <espejelomar@gmail.com> Co-authored-by: Omar U. Espejel <espejelomar@gmail.com> Fix obvious typos in flax decoder impl (huggingface#17279) Change config.encoder_ffn_dim -> config.decoder_ffn_dim for decoder. TF - Fix convnext classification example (huggingface#17261) [WIP] [doc] performance/scalability revamp (huggingface#15723) * [doc] performance/scalability revamp * link the new docs * no : * mixed precision * work on the first doc * expand the main doc * Trigger CI * style * revamp single GPU training section * work on training performance * remove files not used anymore or will be added later * final touches * fix rebase * Add hardware section to toctree * fix toctree again * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * remove `fast_tokenizers` entry that was copied in rebase * add warning about DP vs DDP * remove todo * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix missing closure of codeblock * Update docs/source/en/perf_train_gpu_many.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * sync with huggingface#16860 * update toc Co-authored-by: leandro <leandro.vonwerra@spoud.io> Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> fixed bug in run_mlm_flax_stream.py (huggingface#17203) * fixed bug run_mlm_flax_stream.py Fixed bug caused by an update to tokenizer keys introduced in recent transformers versions (between `4.6.2` and `4.18.0`) where additional keys were introduced to the tokenizer output. * Update run_mlm_flax_stream.py * adding missing paranthesis * formatted to black * remove cols from dataset instead * reformat to black * moved rem. columns to map * formatted to black Co-authored-by: KennethEnevoldsen <kennethcenevolsen@gmail.com> Updated checkpoint support for Sagemaker Model Parallel (huggingface#17219) * adding partial checkpoint support for optimizer state * formatted trainer.py * Refactoring based on comments * reformatting * Update src/transformers/trainer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/trainer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/trainer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Cavdar <dcavdar@a07817b12d7e.ant.amazon.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update codeparrot data preprocessing (huggingface#16944) * add new preprocessing arguments * add new filters * add new filters to readme * fix config and test count, update function names and docstrings * reformat code * update readme * Update readme * rename config_test filter Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * rename few_assignments filter Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * rename tokenizer in arguments Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * rename functions and add limit_line argument for config_test filter * update threshold for config_test filter Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: Loubna ben allal <loubnabenallal@gmail.com> CodeParrot data pretokenization (huggingface#16932) * add pretokenization arguments * add pretokenization script * add support for pretokenized data * reformat code * fix run command for training * fix model call from config * remove a package * add comments on pretokenization in the readme * remove explicit parallelization Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * update readme Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * update readme -remove username Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * update readme -remove username Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * keep data parallelization * reformat code * reformat code * update readme * reformat code * Update examples/research_projects/codeparrot/README.md Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: Loubna ben allal <loubnabenallal@gmail.com> Remove next sentence prediction from supported ONNX tasks (huggingface#17276) Align logits and labels in OPT (huggingface#17237) Mlflowcallback fix nonetype error (huggingface#17171) * Fix edge cases TypeError: 'NoneType' object is not callable * fix style Automatically sort auto mappings (huggingface#17250) * Automatically sort auto mappings * Better class extraction * Some auto class magic * Adapt test and underlying behavior * Remove re-used config * Quality Make TrainerHyperParameterSigOptIntegrationTest slow test (huggingface#17288) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Better error in the Auto API when a dep is missing (huggingface#17289) Fix FlavaForPreTrainingIntegrationTest CI test (huggingface#17232) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Use the PR URL in CI report (huggingface#17269) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> logging documentation update (huggingface#17174) * logging documentation * style Co-authored-by: Sander Land <sander@chatdesk.com> docs(transformers): fix typo (huggingface#17263) Add Tensorflow Swin model (huggingface#16988) Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> [Tests] Fix slow opt tests (huggingface#17282) * fix opt tests * remove unused tok * make style * make flake8 happy * Update tests/models/opt/test_modeling_opt.py Fix test_model_parallelization (huggingface#17249) * Fix test_model_parallelization * Modify Add Wav2Vec2Conformer (huggingface#16812) * save intermediate * add wav2vec2 conformer * add more code * more * first test passes * make all checkpoints work * update * up * more clean ups * save clean-up * save clean-up * save more * remove bogus * finalize design conformer * remove vision * finish all tests * more changes * finish code * add doc tests * add slow tests * fix autoconfig test * up * correct docstring * up * update * fix * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * Update docs/source/en/model_doc/wav2vec2-conformer.mdx * upload * save copied from * correct configs * fix model outputs * add to docs * fix imports * finish * finish code * correct copied from * correct again * correct make fix * improve make fix copies * save * correct fix copy from * correct init structure * correct * fix import * apply suggestions Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> Fix missing job action button in CI report (huggingface#17270) * use matrix.machine_type * fix job names used in job_link Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Fix wrong PT/TF categories in CI report (huggingface#17272) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> [ConvNeXT] Fix drop_path_rate (huggingface#17280) * Fix drop_path_rate * Fix TF's drop path rate fix retribert's `test_torch_encode_plus_sent_to_model` (huggingface#17231) Fix tests of mixed precision now that experimental is deprecated (huggingface#17300) * Fix tests of mixed precision now that experimental is deprecated * Fix mixed precision in training_args_tf.py too Rewrite TensorFlow train_step and test_step (huggingface#17057) * Initial commit * Better label renaming * Remove breakpoint before pushing (this is your job) * Test a lot more in the Keras fit() test * make fixup * Clarify the case where we flatten y dicts into tensors * Clarify the case where we flatten y dicts into tensors * Extract label name remapping to a method correct opt (huggingface#17301) refactor - refactor code - style changes - add new threshold for test major changes - change BLOOM to Bloom - add quick doc on bloom.mdx - move embeddings test on modeling test modify readme small fixes small fix - better threshold for a test remove old test file from fetcher fix small typo major change - change BloomLMHead to BloomForCausalLM remove onnx config major changes - refactor the code - remove asserts - change tol for test make style small change adding a slow test + commenting old ones for now make style Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> make style fix duplicates cleaning comments on config clean a bit conversion file refacor a bit modeling file refactor tokenizer file fix tokenization test issue fix tokenization issue second try fix tokenization issue #2 fix test issue make style + add suggestions change test fetcher try this one - slow tests should pass - finger crossed possible final changes make style try fix padding side issue fix side fix padding issue fix ko-readme fix config auto cleaning modeling file keep bloom in caps in ko update config docs remove pretraining_pp remove model parallel update config - add correct config files fix duplicates fix fetcher fix refactor issue - remove divide function try to remove alibi small fixes - fix alibi - remove seq length - refactor a bit the code put correct values - fix bos and eos token ids fix attention mask loop Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com> small fixes: - remove skip bias add small fixes - fix typo in readme - fix typos in config small changes - remove a test - add reconstruction test - change config small changes - change Scaled Softmax to BloomScaledSoftmax small fixes - fix alibi dtype major changes - removing explicit dtype when loading modules - fixing test args (torch_dtype=auto) - add dosctring fix readmes major changes - now bloom supports alibi shifting - refactor a bit the code - better test tolerance now refactor a bit refactor a bit put correct name on test change docstring small changes - fix docstring modeling - fix test tolerance fix small nit - take dtype from tensors in the conversion script minor fix - fix mdx issue minor fix - change config docstring forward contrib credits from PR14084 Apply suggestions from code review Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> apply modifications Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> resolve softmax upcast Apply suggestions from code review Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> final changes modeling Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Merge commit 'd156898f3b9b2c990e5963f5030a7143d57921a2' merge commit Apply suggestions from code review Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> apply suggestions Apply suggestions from Stas comments Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* [doc] performance/scalability revamp * link the new docs * no : * mixed precision * work on the first doc * expand the main doc * Trigger CI * style * revamp single GPU training section * work on training performance * remove files not used anymore or will be added later * final touches * fix rebase * Add hardware section to toctree * fix toctree again * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * remove `fast_tokenizers` entry that was copied in rebase * add warning about DP vs DDP * remove todo * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix missing closure of codeblock * Update docs/source/en/perf_train_gpu_many.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * sync with huggingface#16860 * update toc Co-authored-by: leandro <leandro.vonwerra@spoud.io> Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

stas00 mentioned this pull request Feb 18, 2022

[WIP] [doc] performance/scalability revamp #15213

Closed

lvwerra requested review from sgugger and LysandreJik April 13, 2022 09:59

lvwerra force-pushed the doc-perf-revamp-2 branch from 5c72639 to 5fb8e15 Compare April 13, 2022 12:06

stas00 and others added 13 commits April 13, 2022 14:17

[doc] performance/scalability revamp

b886b50

link the new docs

632448e

no :

b4d7d0e

mixed precision

d18d5e0

work on the first doc

3c4c89e

expand the main doc

6595a86

Trigger CI

656a96a

style

b6bcf4c

revamp single GPU training section

b54df4e

work on training performance

6019817

remove files not used anymore or will be added later

217aa66

final touches

c848433

fix rebase

cd93337

lvwerra force-pushed the doc-perf-revamp-2 branch from 5fb8e15 to cd93337 Compare April 13, 2022 12:30

leandro added 2 commits April 13, 2022 14:42

Add hardware section to toctree

5b2d1c5

fix toctree again

df919d9

sgugger reviewed Apr 13, 2022

View reviewed changes

lvwerra and others added 2 commits April 14, 2022 10:23

Apply suggestions from code review

9d53d78

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

remove fast_tokenizers entry that was copied in rebase

b93ba2d

leandro and others added 4 commits April 14, 2022 10:33

add warning about DP vs DDP

f7791dc

remove todo

ff06bf6

Apply suggestions from code review

71bbe01

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

fix missing closure of codeblock

20ea4e8

sgugger reviewed Apr 14, 2022

View reviewed changes

docs/source/en/perf_train_gpu_many.mdx Outdated Show resolved Hide resolved

lvwerra marked this pull request as ready for review April 19, 2022 11:59

lvwerra and others added 3 commits April 19, 2022 13:59

Update docs/source/en/perf_train_gpu_many.mdx

0b74488

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

sync with #16860

da03a9b

Merge remote-tracking branch 'origin/main' into doc-perf-revamp-2

3743743

LysandreJik approved these changes Apr 29, 2022

View reviewed changes

Merge branch 'main' into doc-perf-revamp-2

93c6125

sgugger approved these changes May 9, 2022

View reviewed changes

docs/source/en/_toctree.yml Outdated Show resolved Hide resolved

update toc

6e3714a

lvwerra merged commit 71abd3a into main May 16, 2022

lvwerra deleted the doc-perf-revamp-2 branch May 16, 2022 11:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] [doc] performance/scalability revamp #15723

[WIP] [doc] performance/scalability revamp #15723

stas00 commented Feb 18, 2022 •

edited

Loading

HuggingFaceDocBuilder commented Feb 18, 2022

lvwerra commented Mar 15, 2022

stas00 commented Mar 17, 2022 •

edited

Loading

lvwerra commented Apr 13, 2022

sgugger commented Apr 13, 2022

HuggingFaceDocBuilderDev commented Apr 13, 2022 •

edited

Loading

lvwerra commented Apr 13, 2022

sgugger Apr 13, 2022

lvwerra Apr 14, 2022

sgugger Apr 13, 2022

stas00 Apr 13, 2022 •

edited

Loading

sgugger Apr 13, 2022 •

edited

Loading

stas00 commented Apr 20, 2022 •

edited

Loading

lvwerra commented Apr 22, 2022

LysandreJik left a comment

lvwerra commented May 6, 2022

sgugger commented May 9, 2022

		- local: fast_tokenizers
		title: "Using tokenizers from 🤗 Tokenizers"

		### `_multi_tensor`
		pytorch-nightly introduced `torch.optim._multi_tensor` which should significantly speed up the optimizers for situations with lots of small feature tensors. It should eventually become the default, but if you want to experiment with it sooner and don't mind using the bleed-edge, see: https://github.com/huggingface/transformers/issues/9965

[WIP] [doc] performance/scalability revamp #15723

[WIP] [doc] performance/scalability revamp #15723

Conversation

stas00 commented Feb 18, 2022 • edited Loading

HuggingFaceDocBuilder commented Feb 18, 2022

lvwerra commented Mar 15, 2022

stas00 commented Mar 17, 2022 • edited Loading

lvwerra commented Apr 13, 2022

sgugger commented Apr 13, 2022

HuggingFaceDocBuilderDev commented Apr 13, 2022 • edited Loading

lvwerra commented Apr 13, 2022

sgugger Apr 13, 2022

Choose a reason for hiding this comment

lvwerra Apr 14, 2022

Choose a reason for hiding this comment

sgugger Apr 13, 2022

Choose a reason for hiding this comment

stas00 Apr 13, 2022 • edited Loading

Choose a reason for hiding this comment

sgugger Apr 13, 2022 • edited Loading

Choose a reason for hiding this comment

stas00 commented Apr 20, 2022 • edited Loading

lvwerra commented Apr 22, 2022

LysandreJik left a comment

Choose a reason for hiding this comment

lvwerra commented May 6, 2022

sgugger commented May 9, 2022

stas00 commented Feb 18, 2022 •

edited

Loading

stas00 commented Mar 17, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 13, 2022 •

edited

Loading

stas00 Apr 13, 2022 •

edited

Loading

sgugger Apr 13, 2022 •

edited

Loading

stas00 commented Apr 20, 2022 •

edited

Loading