Skip to content

Releases: huggingface/optimum-neuron

v0.0.7: Stable diffusion, `transformers` pipeline and cache fix

05 Jul 12:39
Compare
Choose a tag to compare

Stable diffusion

Supports stable diffusion compilation with neuronx-cc for inference with inf2 / trn1.

Components chosen to be exported from StableDiffusionPipeline are:

  • CLIP text encoder
  • VAE decoder
  • UNet
  • VAE_post_quant_conv

The export can be done with optimum-cli as follow:

optimum-cli export neuron --model stabilityai/stable-diffusion-2-1-base --task stable-diffusion --batch_size 1 --num_channels 4 --height 64 --width 64 --sequence_length 32 sd_neuron/

Relevant PR: #101
More guide: Exporting stable diffusion to neuron

transformers pipeline support

Pipelines running on Inferiencia instances are now supported.

It can be used with an online export as follows:

from optimum.neuron.pipelines import pipeline

clf = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english", export=True)
clf("Amazon is a great company")
# [{'label': 'POSITIVE', 'score': 0.9998538494110107}]

clf = pipeline("question-answering")
clf({"context": "This is a sample context", "question": "What is the context here?"})
# {'score': 0.4972594678401947, 'start': 8, 'end': 16, 'answer': 'a sample'}

Or with precompiled models as follows:

from transformers import AutoTokenizer
from optimum.neuron import NeuronModelForQuestionAnswering, pipeline

tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")

# Loading the PyTorch checkpoint and converting to the neuron format by providing export=True
model = NeuronModelForQuestionAnswering.from_pretrained(
    "deepset/roberta-base-squad2",
    export=True
)

neuron_qa = pipeline("question-answering", model=model, tokenizer=tokenizer)
question = "What's my name?"
context = "My name is Philipp and I live in Nuremberg."

pred = neuron_qa(question=question, context=context)

Relevant PR: #107

Cache repo fix

The cache repo system was broken starting from Neuron 2.11.
This release fixes that, the relevant PR is #119.

v0.0.6: Patch release

26 Jun 16:35
Compare
Choose a tag to compare

Introduces fix for #109 (#113)

v0.0.5: NeuronModel classes and generation methods during training

23 Jun 16:07
Compare
Choose a tag to compare

NeuronModel classes

NeuronModel classes allow you to run inference on Inf1 and Inf2 instances while preserving the python interface you are used to from Transformers' auto model classses.

Example:

from transformers import AutoTokenizer
from optimum.neuron import NeuronModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained(
    "optimum/distilbert-base-uncased-finetuned-sst-2-english-neuronx"
)
model = NeuronModelForSequenceClassification.from_pretrained(
    "optimum/distilbert-base-uncased-finetuned-sst-2-english-neuronx"
)

inputs = tokenizer("Hamilton is considered to be the best musical of human history.", return_tensors="pt")

outputs = model(**inputs)

Supported tasks are:

  • Feature extraction
  • Masked language modeling
  • Text classification
  • Token classification
  • Question answering
  • Multiple choice

Relevant PR: #45

Generation methods

Two generation methods are now supported:

  • Greedy decoding (#70)
  • Beam search (#93)

This allows you to perform evaluation with generation during decoder and seq2seq models training.

Misc

The Optimum CLI now provides two new commands to help managing the cache:

  • optimum-cli neuron cache list: To list a remote cache repo on the Hugging Face Hub (#85)
  • optimum-cli neuron cache add: To add compilation files related to a model to a remote cache repo on the Hugging Face Hub (#51)

v0.0.4: Patch release for Neuron installation

02 Jun 12:05
Compare
Choose a tag to compare

optimum-cli neuron cache command line

The optimum-cli now provides two commands to work with the Trainium cache:

  • Cache creation:
optimum-cli neuron cache create
  • Cache setting:
optimum-cli neuron set

Documentation

  • New Trainium model cache documentation page

v0.0.3: Patch release for the `huggingface_hub` library version

26 Apr 08:31
Compare
Choose a tag to compare

Pins the version of the huggingface_hub library to be greater or equal to 0.14.0.
Should fix errors related to #41.

v0.0.2: Compilation caching system and inference with Inferentia

25 Apr 12:22
Compare
Choose a tag to compare

Compilation caching system

Since compiling models before being able to train them can be a real bottleneck (for example on small datasets, compile-time is longer than training-time), we introduce a caching system directly connected to the Hugging Face Hub.

Before starting compilation, the TrainiumTrainer checks if the needed compile files are on the Hub, and fetched them if that is the case, saving the user the need to do that himself.

Custom cache repo

Since each user might want to have its own cache repo to be able to push stuff and/or keep things private, we offer the possibility to do so via CUSTOM_CACHE_REPO environment variable:

CUSTOM_CACHE_REPO=michaelbenayoun/cache_test python train.py

Neuron export

Support exporting PyTorch models to serialized TorchScript Module compiled by Neuron Compiler (neuron-cc or neuronx-cc) that can be used on AWS INF2 or INF1.

Example: Export the BERT model with static shapes:

optimum-cli export neuron --help
optimum-cli export neuron --model bert-base-uncased --sequence_length 128 --batch_size 16 bert_neuron/

By default, on INF2, matmul operations will be cast from fp32 to bf16. And on INF1, all operations will be cast to bf16. Using --auto_cast to configure which operations to perform auto-casting and using --auto_cast_type to define the data type for auto-casting.

Example: Auto-cast all operations (this option can potentially lower precision/accuracy) to fp16 data type:

optimum-cli export neuron --model bert-base-uncased --auto_cast all --auto_cast_type fp16 bert_neuron/

v0.0.1: Training on AWS Trainium

13 Mar 14:06
Compare
Choose a tag to compare

The following architectures can be trained on AWS Trainium instances (trn1.2xlarge and trn1.32xlarge) :

  • ALBERT
  • BERT
  • DistilBERT
  • RoBERTa
  • XLM-RoBERTa
  • CamemBERT
  • Electra
  • GPT-2
  • GPT-Neo
  • MarianMT
  • T5
  • BART
  • ViT

Training examples for many tasks are provided here.