Releases · huggingface/optimum-neuron

05 Jul 12:39

michaelbenayoun

v0.0.7

53cfcc4

v0.0.7: Stable diffusion, `transformers` pipeline and cache fix

Stable diffusion

Supports stable diffusion compilation with neuronx-cc for inference with inf2 / trn1.

Components chosen to be exported from StableDiffusionPipeline are:

CLIP text encoder
VAE decoder
UNet
VAE_post_quant_conv

The export can be done with optimum-cli as follow:

optimum-cli export neuron --model stabilityai/stable-diffusion-2-1-base --task stable-diffusion --batch_size 1 --num_channels 4 --height 64 --width 64 --sequence_length 32 sd_neuron/

Relevant PR: #101
More guide: Exporting stable diffusion to neuron

`transformers` pipeline support

Pipelines running on Inferiencia instances are now supported.

It can be used with an online export as follows:

from optimum.neuron.pipelines import pipeline

clf = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english", export=True)
clf("Amazon is a great company")
# [{'label': 'POSITIVE', 'score': 0.9998538494110107}]

clf = pipeline("question-answering")
clf({"context": "This is a sample context", "question": "What is the context here?"})
# {'score': 0.4972594678401947, 'start': 8, 'end': 16, 'answer': 'a sample'}

Or with precompiled models as follows:

from transformers import AutoTokenizer
from optimum.neuron import NeuronModelForQuestionAnswering, pipeline

tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")

# Loading the PyTorch checkpoint and converting to the neuron format by providing export=True
model = NeuronModelForQuestionAnswering.from_pretrained(
    "deepset/roberta-base-squad2",
    export=True
)

neuron_qa = pipeline("question-answering", model=model, tokenizer=tokenizer)
question = "What's my name?"
context = "My name is Philipp and I live in Nuremberg."

pred = neuron_qa(question=question, context=context)

Relevant PR: #107

Cache repo fix

The cache repo system was broken starting from Neuron 2.11.
This release fixes that, the relevant PR is #119.

Assets 2

26 Jun 16:35

michaelbenayoun

v0.0.6

7265e94

v0.0.6: Patch release

Introduces fix for #109 (#113)

Assets 2

23 Jun 16:07

michaelbenayoun

v0.0.5

14fc839

v0.0.5: NeuronModel classes and generation methods during training

NeuronModel classes

NeuronModel classes allow you to run inference on Inf1 and Inf2 instances while preserving the python interface you are used to from Transformers' auto model classses.

Example:

from transformers import AutoTokenizer
from optimum.neuron import NeuronModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained(
    "optimum/distilbert-base-uncased-finetuned-sst-2-english-neuronx"
)
model = NeuronModelForSequenceClassification.from_pretrained(
    "optimum/distilbert-base-uncased-finetuned-sst-2-english-neuronx"
)

inputs = tokenizer("Hamilton is considered to be the best musical of human history.", return_tensors="pt")

outputs = model(**inputs)

Supported tasks are:

Feature extraction
Masked language modeling
Text classification
Token classification
Question answering
Multiple choice

Relevant PR: #45

Generation methods

Two generation methods are now supported:

Greedy decoding (#70)
Beam search (#93)

This allows you to perform evaluation with generation during decoder and seq2seq models training.

Misc

The Optimum CLI now provides two new commands to help managing the cache:

optimum-cli neuron cache list: To list a remote cache repo on the Hugging Face Hub (#85)
optimum-cli neuron cache add: To add compilation files related to a model to a remote cache repo on the Hugging Face Hub (#51)

Assets 2

02 Jun 12:05

JingyaHuang

v0.0.4

31957da

v0.0.4: Patch release for Neuron installation

`optimum-cli neuron cache` command line

The optimum-cli now provides two commands to work with the Trainium cache:

Cache creation:

optimum-cli neuron cache create

Cache setting:

optimum-cli neuron set

Documentation

New Trainium model cache documentation page

Assets 2

26 Apr 08:31

michaelbenayoun

v0.0.3

51f76e1

v0.0.3: Patch release for the `huggingface_hub` library version

Pins the version of the huggingface_hub library to be greater or equal to 0.14.0.
Should fix errors related to #41.

Assets 2

25 Apr 12:22

michaelbenayoun

v0.0.2

2809674

v0.0.2: Compilation caching system and inference with Inferentia

Compilation caching system

Since compiling models before being able to train them can be a real bottleneck (for example on small datasets, compile-time is longer than training-time), we introduce a caching system directly connected to the Hugging Face Hub.

Before starting compilation, the TrainiumTrainer checks if the needed compile files are on the Hub, and fetched them if that is the case, saving the user the need to do that himself.

Custom cache repo

Since each user might want to have its own cache repo to be able to push stuff and/or keep things private, we offer the possibility to do so via CUSTOM_CACHE_REPO environment variable:

CUSTOM_CACHE_REPO=michaelbenayoun/cache_test python train.py

Neuron export

Support exporting PyTorch models to serialized TorchScript Module compiled by Neuron Compiler (neuron-cc or neuronx-cc) that can be used on AWS INF2 or INF1.

Example: Export the BERT model with static shapes:

optimum-cli export neuron --help
optimum-cli export neuron --model bert-base-uncased --sequence_length 128 --batch_size 16 bert_neuron/

By default, on INF2, matmul operations will be cast from fp32 to bf16. And on INF1, all operations will be cast to bf16. Using --auto_cast to configure which operations to perform auto-casting and using --auto_cast_type to define the data type for auto-casting.

Example: Auto-cast all operations (this option can potentially lower precision/accuracy) to fp16 data type:

optimum-cli export neuron --model bert-base-uncased --auto_cast all --auto_cast_type fp16 bert_neuron/

Assets 2

13 Mar 14:06

michaelbenayoun

v0.0.1

13d792c

v0.0.1: Training on AWS Trainium

The following architectures can be trained on AWS Trainium instances (trn1.2xlarge and trn1.32xlarge) :

ALBERT
BERT
DistilBERT
RoBERTa
XLM-RoBERTa
CamemBERT
Electra
GPT-2
GPT-Neo
MarianMT
T5
BART
ViT

Training examples for many tasks are provided here.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stable diffusion

`transformers` pipeline support

Cache repo fix

NeuronModel classes

Generation methods

Misc

`optimum-cli neuron cache` command line

Documentation

Compilation caching system

Custom cache repo

Neuron export

Releases: huggingface/optimum-neuron

v0.0.7: Stable diffusion, `transformers` pipeline and cache fix

Stable diffusion

transformers pipeline support

Cache repo fix

v0.0.6: Patch release

v0.0.5: NeuronModel classes and generation methods during training

NeuronModel classes

Generation methods

Misc

v0.0.4: Patch release for Neuron installation

optimum-cli neuron cache command line

Documentation

v0.0.3: Patch release for the `huggingface_hub` library version

v0.0.2: Compilation caching system and inference with Inferentia

Compilation caching system

Custom cache repo

Neuron export

v0.0.1: Training on AWS Trainium

`transformers` pipeline support

`optimum-cli neuron cache` command line