Releases: huggingface/optimum-neuron
v0.0.16: T5 export and inference, general training fixes
What's Changed
Training
A few fixes related to precompilation and checkpoiting. Those fixes enable training LLMs on AWS Trainium instances without friction.
- Skip model saving during precompilation and provide option to skip cache push (#365)
- Fixes checkpoint saving and consolidtation for TP (#378)
- A
torch_xla
compatible version ofsafetensors.torch.save_file
is now used in theNeuronTrainer
(#329)
Inference
v0.0.15: Mistral training, Tensor parallelism improvement, better integration with the AWS SDK
What's Changed
Training
Distributed Training
parallel_cross_entropy
loss support for tensor parallelism (#246)- Support for training the Mistral architecture with tensor parallelism (#303)
AWS SDK
- Fix:
neuron_parallel_compile
is compatible with the cache system (#352) - Full support for
neuron_parallel_compile
with the cache system: compilation files produced byneuron_parallel_compile
will be pushed to the remote cache repo on the Hugging Face Hub at the beginning of the next training job (#354)
Documentation
Inference
- Data parallelism option for Stable Diffusion - LCM allowing multi-device inference (#346)
- Support decoding sequences of byte tokens in TGI (#350)
Documentation
- Updated the documentation on LCM (#351)
v0.0.14: LCM support
What's Changed
LCM support
- [Stable Diffusion] Add LCM(Latent Consistency Models) support by @JingyaHuang in #323
Tutorials and doc improvement
- notebooks: add llama2 chatbot example by @dacorvo in #300
- Add llama 2 tutorial by @dacorvo in #321
- Migrate documentation of Stable Diffusion and add notebooks by @JingyaHuang in #312
Major bugfixes
- Noisy loss fix by @bocchris-aws in #293
- Fix neuron cache starting compilation before fetching by @michaelbenayoun in #280
- fix(pipelines): support passing decoder model + tokenizer by @dacorvo in #319
Other changes
- chore: update dev version by @dacorvo in #276
- Explicitly mention aws repo extra url in documentation by @dacorvo in #277
- Update supported architecture in the doc by @JingyaHuang in #281
- Fix doc build source code broken links by @JingyaHuang in #282
- Add revision to push_to_hub by @philschmid in #292
- Set default device id for SD and SDXL by @JingyaHuang in #297
- Add missing decoder model architectures by @dacorvo in #298
- Official support for AWS inferentia2 TGI container by @dacorvo in #302
- Transformers fix by @dacorvo in #320
- Add sagemaker compatible image by @dacorvo in #322
- Fix broken tests by @michaelbenayoun in #274
- chore: align with AWS Neuron SDK 2.15.1 by @dacorvo in #325
- Deleted the 'maybe_free_model_hooks()' from Diffusers Pipelines by @Cerrix in #330
- Bump diffusers version by @JingyaHuang in #335
New Contributors
Full Changelog: v0.0.13...v0.0.14
v0.0.13: AWS Neuron SDK 2.15
What's Changed
The main change in this release is the alignment with AWS Neuron SDK 2.15.
Text-generation
Other changes
- Use attention masks for TGI generation by @dacorvo in #264
- Various fixes for TP by @michaelbenayoun in #260
- Fix neuron pipelines by @dacorvo in #265
- Fix #241 by @michaelbenayoun in #268
- Fixes generation during the evaluation step by @michaelbenayoun in #266
- Save / load from checkpoint TP by @michaelbenayoun in #269
Full Changelog: v0.0.12...v0.0.13
v0.0.12.1: Patch release for training with Neuron SDK 2.14
v0.0.12: SDXL refiner, Sequence parallelism training
What's Changed
Stable Diffusion: SDXL Refiner, Stable Diffusion Img2Img, Inpaint support
- [Stable Diffusion] Image2image and inpaint pipeline support by @JingyaHuang in #161
- [SDXL] Add SDXL image to image support by @JingyaHuang in #239
Distributed Training:
- Sequence parallelism by @michaelbenayoun in #233
- Parallelism support for GPTNeoX by @michaelbenayoun in #244
Text generation updates
Other changes
- TGI stability fixes by @dacorvo in #226
- Remove experimental compilation flag for text-generation models by @dacorvo in #228
- Patch for diffusers 0.21.0 release by @JingyaHuang in #229
- test_examples uses ExampleRunner by @michaelbenayoun in #227
- Using the real model name instead of hard code "model" by @davidshtian in #231
- Replace transformers list of logits warpers by a fused logic warper by @dacorvo in #234
- Use AWS Neuron SDK 2.14 by @dacorvo in #236
- Weight loading after lazy loading fix by @michaelbenayoun in #238
- Add
debug
attribute toNeuronPartialState
by @michaelbenayoun in #240 - Update
tests/test_examples.py
for AWS team by @michaelbenayoun in #242 - Rework text-generation example by @dacorvo in #245
- Fix evaluation recompilation issue by @michaelbenayoun in #248
- test(generation): specify revision for hub test model by @dacorvo in #250
- Add sequence length for generative models and llama tests by @dacorvo in #251
- Fix noisy loss for T5 when doing TP by @michaelbenayoun in #257
- Fix bug with transformers 4.34 by @michaelbenayoun in #259
New Contributors
- @davidshtian made their first contribution in #231
Full Changelog: v0.0.11...v0.0.12
v0.0.11: SDXL, LLama v2 training and inference, Inf2 powered TGI
SDXL Export and Inference
Optimum CLI now supports compiling components in the SDXL pipeline for inference on neuron devices (inf2/trn1).
Below is an example of compiling SDXL models. You can either compile it with an inf2 instance (inf2.8xlarge
or larger recommended) or a CPU-only instance (disable the validation with --disable-validation
) :
optimum-cli export neuron --model stabilityai/stable-diffusion-xl-base-1.0 --task stable-diffusion-xl --batch_size 1 --height 1024 --width 1024 --auto_cast matmul --auto_cast_type bf16 sdxl_neuron/
And then run inference with the class NeuronStableDiffusionXLPipeline
from optimum.neuron import NeuronStableDiffusionXLPipeline
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
stable_diffusion_xl = NeuronStableDiffusionXLPipeline.from_pretrained(
model_id="sdxl_neuron/", device_ids=[0, 1]
)
image = stable_diffusion_xl(prompt).images[0]
- Add sdxl exporter support by @JingyaHuang in #203
- Add Stable Diffusion XL inference support by @JingyaHuang in #212
Llama v1, v2 Inference
Llama v2 Training
- Llama V2 training support by @michaelbenayoun in #211
- LLama V1 training fix by @michaelbenayoun in #211
TGI
Major bugfixes
neuron_parallel_compile
,ParallelLoader
and Zero-1 fixes for torchneuron 8+ by @michaelbenayoun in #200- flan-t5 fix:
T5Parallelizer
,NeuronCacheCallback
andNeuronHash
refactors by @michaelbenayoun in #207 - Fix optimum-cli broke by optimum 1.13.0 release by @JingyaHuang in #217
Other changes
- Bump Inference APIs to Neuron 2.13 by @JingyaHuang in #206
- Add log for SD when applying optim attn & pipelines lazy loading by @JingyaHuang in #208
- Cancel concurreny CIs for inference by @JingyaHuang in #218
- fix(tgi): typer does not support Union types by @dacorvo in #219
- Bump neuron-cc version to 1.18.* by @JingyaHuang in #224
Full Changelog: v0.0.10...v0.0.11
v0.0.10: Bugfixes and enhancement
Major bugfixes
- Improve and Fix inferentia exporter by @JingyaHuang in #168
- [Stable Diffusion] Fix the image size value inferral by @JingyaHuang in #167
- Fix inferral of dynamic batch size from the config & Be compatible with transformers 4.32 by @JingyaHuang in #190
Enhancements of APIs
- Enable exporter on non INF instances by @JingyaHuang in #178
- Support multiple prompts for generation example by @dacorvo in #173
- Fix unet export when using optimized attn score by @JingyaHuang in #165
- Improve default compilation arguments for stable diffusion by @JingyaHuang in #182
- Add
num_image_per_prompt
support for stable diffusion by @JingyaHuang in #192
Other changes
- minor doc fix by @oOraph in #164
- Fix duplicates handling in converting to
safetensors
by @michaelbenayoun in #172 - Fix empty preprocessor issue by @JingyaHuang in #180
- Update models.mdx by @philschmid in #183
- Only run INF2 CI for .code change by @JingyaHuang in #184
- Improve Readme and installation guide by @JingyaHuang in #181
- Fixes #150 by @michaelbenayoun in #177
- Fix TP for t5 by @michaelbenayoun in #179
- Improve SD logging by @JingyaHuang in #194
- Add mark step after optimizer step by @michaelbenayoun in #195
- Option to disable the parallelization of the embedding with TP by @michaelbenayoun in #191
- Restrict generation to sampling and greedy search by @dacorvo in #201
New Contributors
Full Changelog: v0.0.9...v0.0.10
v0.0.9: Tensor Parallelism training for T5, more stable Stable Diffusion inference
Tensor Parallelism support for T5 on training
- TP tests and additional support by @michaelbenayoun in #155
Enhance Stable Diffusion Inference
- Enhance robustness of stable diffusion inference by @JingyaHuang in #156
- Some other enhancement for stable diffusion by @JingyaHuang in #159
- SD quick fix export by @JingyaHuang in #160
- Stable Diffusion quick fix by @JingyaHuang in #162
What's Changed
- Doc upgrade by @michaelbenayoun in #152
Full Changelog: v0.0.8...v0.0.9
v0.0.8: Tensor Parallelism, ZeRO-1 optimization and Stable Diffusion model classes
Tensor Parallelism and ZeRO-1 optimization
Tensor Parallelism
It is now possible to shard model's parameters across several Neuron cores using tensor parallelism enabling training of much larger models than before.
The following model architectures are supported:
- BERT
- RoBERTa
- GPT Neo
- LLaMa
ZeRO-1
Deepspeed ZeRO Stage 1 optimization is supported as well, which shards the optimizer state across data-parallel ranks, resulting in an important memory save.
Relevant PRs: #140
Note: Tensor Parallelism and ZeRO-1 can be combined,
Stable Diffusion Models Inference support
NeuronStableDiffusionPipeline
allows you to export your stable diffusion checkpoint to neuronx compatible format and run inference on Inf2 or trn1 instances while preserving the python interface you are used to from 🤗 diffusers
Example:
from optimum.neuron import NeuronStableDiffusionPipeline
model_id = "runwayml/stable-diffusion-v1-5"
input_shapes = {"batch_size": 1, "height": 512, "width": 512}
stable_diffusion = NeuronStableDiffusionPipeline.from_pretrained(model_id, export=True, **input_shapes)
prompt = "a photo of an astronaut riding a horse on mars"
image = stable_diffusion(prompt).images[0]
Currently only Text-to-Image Generation task is supported.