Releases · huggingface/optimum-neuron

19 Dec 13:29

v0.0.16

c0c1fc8

v0.0.16: T5 export and inference, general training fixes

What's Changed

Training

A few fixes related to precompilation and checkpoiting. Those fixes enable training LLMs on AWS Trainium instances without friction.

Skip model saving during precompilation and provide option to skip cache push (#365)
Fixes checkpoint saving and consolidtation for TP (#378)
A torch_xla compatible version of safetensors.torch.save_file is now used in the NeuronTrainer (#329)

Inference

Support for the export and inference of T5 (#267)
New documentation for Stable Diffusion XL Turbo (#374)

Assets 2

24 Nov 17:46

michaelbenayoun

v0.0.15

3f88322

v0.0.15: Mistral training, Tensor parallelism improvement, better integration with the AWS SDK

What's Changed

Training

Distributed Training

parallel_cross_entropy loss support for tensor parallelism (#246)
Support for training the Mistral architecture with tensor parallelism (#303)

AWS SDK

Fix: neuron_parallel_compile is compatible with the cache system (#352)
Full support for neuron_parallel_compile with the cache system: compilation files produced by neuron_parallel_compile will be pushed to the remote cache repo on the Hugging Face Hub at the beginning of the next training job (#354)

Documentation

Guide explaining how distributed training works in optimum-neuron (#339)

Inference

Data parallelism option for Stable Diffusion - LCM allowing multi-device inference (#346)
Support decoding sequences of byte tokens in TGI (#350)

Documentation

Updated the documentation on LCM (#351)

Assets 2

17 Nov 16:38

JingyaHuang

v0.0.14

d65449e

v0.0.14: LCM support

What's Changed

LCM support

[Stable Diffusion] Add LCM(Latent Consistency Models) support by @JingyaHuang in #323

Tutorials and doc improvement

notebooks: add llama2 chatbot example by @dacorvo in #300
Add llama 2 tutorial by @dacorvo in #321
Migrate documentation of Stable Diffusion and add notebooks by @JingyaHuang in #312

Major bugfixes

Noisy loss fix by @bocchris-aws in #293
Fix neuron cache starting compilation before fetching by @michaelbenayoun in #280
fix(pipelines): support passing decoder model + tokenizer by @dacorvo in #319

Other changes

chore: update dev version by @dacorvo in #276
Explicitly mention aws repo extra url in documentation by @dacorvo in #277
Update supported architecture in the doc by @JingyaHuang in #281
Fix doc build source code broken links by @JingyaHuang in #282
Add revision to push_to_hub by @philschmid in #292
Set default device id for SD and SDXL by @JingyaHuang in #297
Add missing decoder model architectures by @dacorvo in #298
Official support for AWS inferentia2 TGI container by @dacorvo in #302
Transformers fix by @dacorvo in #320
Add sagemaker compatible image by @dacorvo in #322
Fix broken tests by @michaelbenayoun in #274
chore: align with AWS Neuron SDK 2.15.1 by @dacorvo in #325
Deleted the 'maybe_free_model_hooks()' from Diffusers Pipelines by @Cerrix in #330
Bump diffusers version by @JingyaHuang in #335

New Contributors

@Cerrix made their first contribution in #330

Full Changelog: v0.0.13...v0.0.14

Contributors

dacorvo, Cerrix, and 4 other contributors

Assets 2

27 Oct 09:08

dacorvo

v0.0.13

cf97838

v0.0.13: AWS Neuron SDK 2.15

What's Changed

The main change in this release is the alignment with AWS Neuron SDK 2.15.

Text-generation

add support for bloom and opt models by @dacorvo in #275

Other changes

Use attention masks for TGI generation by @dacorvo in #264
Various fixes for TP by @michaelbenayoun in #260
Fix neuron pipelines by @dacorvo in #265
Fix #241 by @michaelbenayoun in #268
Fixes generation during the evaluation step by @michaelbenayoun in #266
Save / load from checkpoint TP by @michaelbenayoun in #269

Full Changelog: v0.0.12...v0.0.13

Contributors

dacorvo and michaelbenayoun

Assets 2

27 Oct 14:08

JingyaHuang

v0.0.12.1

fe11ccf

v0.0.12.1: Patch release for training with Neuron SDK 2.14

Major bugfixes

Fix #241 by @michaelbenayoun in #268

Full Changelog: v0.0.12...v0.0.12.1

Contributors

michaelbenayoun

Assets 2

16 Oct 08:42

JingyaHuang

v0.0.12

78c2c12

v0.0.12: SDXL refiner, Sequence parallelism training

What's Changed

Stable Diffusion: SDXL Refiner, Stable Diffusion Img2Img, Inpaint support

[Stable Diffusion] Image2image and inpaint pipeline support by @JingyaHuang in #161
[SDXL] Add SDXL image to image support by @JingyaHuang in #239

Distributed Training:

Sequence parallelism by @michaelbenayoun in #233
Parallelism support for GPTNeoX by @michaelbenayoun in #244

Text generation updates

Add text generation pipeline by @dacorvo in #258

Other changes

TGI stability fixes by @dacorvo in #226
Remove experimental compilation flag for text-generation models by @dacorvo in #228
Patch for diffusers 0.21.0 release by @JingyaHuang in #229
test_examples uses ExampleRunner by @michaelbenayoun in #227
Using the real model name instead of hard code "model" by @davidshtian in #231
Replace transformers list of logits warpers by a fused logic warper by @dacorvo in #234
Use AWS Neuron SDK 2.14 by @dacorvo in #236
Weight loading after lazy loading fix by @michaelbenayoun in #238
Add debug attribute to NeuronPartialState by @michaelbenayoun in #240
Update tests/test_examples.py for AWS team by @michaelbenayoun in #242
Rework text-generation example by @dacorvo in #245
Fix evaluation recompilation issue by @michaelbenayoun in #248
test(generation): specify revision for hub test model by @dacorvo in #250
Add sequence length for generative models and llama tests by @dacorvo in #251
Fix noisy loss for T5 when doing TP by @michaelbenayoun in #257
Fix bug with transformers 4.34 by @michaelbenayoun in #259

New Contributors

@davidshtian made their first contribution in #231

Full Changelog: v0.0.11...v0.0.12

Contributors

dacorvo, davidshtian, and 2 other contributors

Assets 2

12 Sep 13:50

JingyaHuang

v0.0.11

608f869

v0.0.11: SDXL, LLama v2 training and inference, Inf2 powered TGI

SDXL Export and Inference

Optimum CLI now supports compiling components in the SDXL pipeline for inference on neuron devices (inf2/trn1).

Below is an example of compiling SDXL models. You can either compile it with an inf2 instance (inf2.8xlarge or larger recommended) or a CPU-only instance (disable the validation with --disable-validation) :

optimum-cli export neuron --model stabilityai/stable-diffusion-xl-base-1.0 --task stable-diffusion-xl --batch_size 1 --height 1024 --width 1024 --auto_cast matmul --auto_cast_type bf16 sdxl_neuron/

And then run inference with the class NeuronStableDiffusionXLPipeline

from optimum.neuron import NeuronStableDiffusionXLPipeline

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
stable_diffusion_xl = NeuronStableDiffusionXLPipeline.from_pretrained(
    model_id="sdxl_neuron/", device_ids=[0, 1]
)
image = stable_diffusion_xl(prompt).images[0]

Add sdxl exporter support by @JingyaHuang in #203
Add Stable Diffusion XL inference support by @JingyaHuang in #212

Llama v1, v2 Inference

Add support for Llama inference through NeuronModelForCausalLM by @dacorvo in #223

Llama v2 Training

Llama V2 training support by @michaelbenayoun in #211
LLama V1 training fix by @michaelbenayoun in #211

TGI

AWS Inferentia2 TGI server by @dacorvo in #214

Major bugfixes

neuron_parallel_compile, ParallelLoader and Zero-1 fixes for torchneuron 8+ by @michaelbenayoun in #200
flan-t5 fix: T5Parallelizer, NeuronCacheCallback and NeuronHash refactors by @michaelbenayoun in #207
Fix optimum-cli broke by optimum 1.13.0 release by @JingyaHuang in #217

Other changes

Bump Inference APIs to Neuron 2.13 by @JingyaHuang in #206
Add log for SD when applying optim attn & pipelines lazy loading by @JingyaHuang in #208
Cancel concurreny CIs for inference by @JingyaHuang in #218
fix(tgi): typer does not support Union types by @dacorvo in #219
Bump neuron-cc version to 1.18.* by @JingyaHuang in #224

Full Changelog: v0.0.10...v0.0.11

Contributors

dacorvo, michaelbenayoun, and JingyaHuang

Assets 2

28 Aug 11:55

JingyaHuang

v0.0.10

14dc28c

v0.0.10: Bugfixes and enhancement

Major bugfixes

Improve and Fix inferentia exporter by @JingyaHuang in #168
[Stable Diffusion] Fix the image size value inferral by @JingyaHuang in #167
Fix inferral of dynamic batch size from the config & Be compatible with transformers 4.32 by @JingyaHuang in #190

Enhancements of APIs

Enable exporter on non INF instances by @JingyaHuang in #178
Support multiple prompts for generation example by @dacorvo in #173
Fix unet export when using optimized attn score by @JingyaHuang in #165
Improve default compilation arguments for stable diffusion by @JingyaHuang in #182
Add num_image_per_prompt support for stable diffusion by @JingyaHuang in #192

Other changes

minor doc fix by @oOraph in #164
Fix duplicates handling in converting to safetensors by @michaelbenayoun in #172
Fix empty preprocessor issue by @JingyaHuang in #180
Update models.mdx by @philschmid in #183
Only run INF2 CI for .code change by @JingyaHuang in #184
Improve Readme and installation guide by @JingyaHuang in #181
Fixes #150 by @michaelbenayoun in #177
Fix TP for t5 by @michaelbenayoun in #179
Improve SD logging by @JingyaHuang in #194
Add mark step after optimizer step by @michaelbenayoun in #195
Option to disable the parallelization of the embedding with TP by @michaelbenayoun in #191
Restrict generation to sampling and greedy search by @dacorvo in #201

New Contributors

@oOraph made their first contribution in #164

Full Changelog: v0.0.9...v0.0.10

Contributors

dacorvo, oOraph, and 3 other contributors

Assets 2

07 Aug 18:47

JingyaHuang

v0.0.9

e5e9118

v0.0.9: Tensor Parallelism training for T5, more stable Stable Diffusion inference

Tensor Parallelism support for T5 on training

TP tests and additional support by @michaelbenayoun in #155

Enhance Stable Diffusion Inference

Enhance robustness of stable diffusion inference by @JingyaHuang in #156
Some other enhancement for stable diffusion by @JingyaHuang in #159
SD quick fix export by @JingyaHuang in #160
Stable Diffusion quick fix by @JingyaHuang in #162

What's Changed

Doc upgrade by @michaelbenayoun in #152

Full Changelog: v0.0.8...v0.0.9

Contributors

michaelbenayoun and JingyaHuang

Assets 2

31 Jul 07:54

michaelbenayoun

v0.0.8

aeaedaf

v0.0.8: Tensor Parallelism, ZeRO-1 optimization and Stable Diffusion model classes

Tensor Parallelism and ZeRO-1 optimization

Tensor Parallelism

It is now possible to shard model's parameters across several Neuron cores using tensor parallelism enabling training of much larger models than before.

The following model architectures are supported:

BERT
RoBERTa
GPT Neo
LLaMa

Relevant PRs: #125 and #143

ZeRO-1

Deepspeed ZeRO Stage 1 optimization is supported as well, which shards the optimizer state across data-parallel ranks, resulting in an important memory save.

Relevant PRs: #140

Note: Tensor Parallelism and ZeRO-1 can be combined,

Stable Diffusion Models Inference support

NeuronStableDiffusionPipeline allows you to export your stable diffusion checkpoint to neuronx compatible format and run inference on Inf2 or trn1 instances while preserving the python interface you are used to from 🤗 diffusers

Example:

from optimum.neuron import NeuronStableDiffusionPipeline

model_id = "runwayml/stable-diffusion-v1-5"
input_shapes = {"batch_size": 1, "height": 512, "width": 512}  
stable_diffusion = NeuronStableDiffusionPipeline.from_pretrained(model_id, export=True, **input_shapes)

prompt = "a photo of an astronaut riding a horse on mars"
image = stable_diffusion(prompt).images[0]

Currently only Text-to-Image Generation task is supported.

Assets 2

Releases: huggingface/optimum-neuron

v0.0.16: T5 export and inference, general training fixes

What's Changed

Training

Inference

v0.0.15: Mistral training, Tensor parallelism improvement, better integration with the AWS SDK

What's Changed

Training

Distributed Training

AWS SDK

Documentation

Inference

Documentation

v0.0.14: LCM support

What's Changed

LCM support

Tutorials and doc improvement

Major bugfixes

Other changes

New Contributors

Contributors

v0.0.13: AWS Neuron SDK 2.15

What's Changed

Text-generation

Other changes

Contributors

v0.0.12.1: Patch release for training with Neuron SDK 2.14

Major bugfixes

Contributors

v0.0.12: SDXL refiner, Sequence parallelism training

What's Changed

Stable Diffusion: SDXL Refiner, Stable Diffusion Img2Img, Inpaint support

Distributed Training:

Text generation updates

Other changes

New Contributors

Contributors

v0.0.11: SDXL, LLama v2 training and inference, Inf2 powered TGI

SDXL Export and Inference

Llama v1, v2 Inference

Llama v2 Training

TGI

Major bugfixes

Other changes

Contributors

v0.0.10: Bugfixes and enhancement

Major bugfixes

Enhancements of APIs

Other changes

New Contributors

Contributors

v0.0.9: Tensor Parallelism training for T5, more stable Stable Diffusion inference

Tensor Parallelism support for T5 on training

Enhance Stable Diffusion Inference

What's Changed

Contributors

v0.0.8: Tensor Parallelism, ZeRO-1 optimization and Stable Diffusion model classes

Tensor Parallelism and ZeRO-1 optimization

Tensor Parallelism

ZeRO-1

Stable Diffusion Models Inference support