Releases: huggingface/transformers
Patch release: v4.33.3
Patch release: v4.33.2
Falcon, Code Llama, ViTDet, DINO v2, VITS
Falcon
Falcon is a class of causal decoder-only models built by TII. The largest Falcon checkpoints have been trained on >=1T tokens of text, with a particular emphasis on the RefinedWeb corpus. They are made available under the Apache 2.0 license.
Falcon’s architecture is modern and optimized for inference, with multi-query attention and support for efficient attention variants like FlashAttention. Both ‘base’ models trained only as causal language models as well as ‘instruct’ models that have received further fine-tuning are available.
- Falcon port #24523 by @Rocketknight1
- Falcon: Add RoPE scaling by @gante in #25878
- Add proper Falcon docs and conversion script by @Rocketknight1 in #25954
- Put Falcon back by @LysandreJik in #25960
- [
Falcon
] Remove SDPA for falcon to support earlier versions of PyTorch (< 2.0) by @younesbelkada in #25947
Code Llama
Code Llama, is a family of large language models for code based on Llama 2, providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks.
- [
CodeLlama
] Add support forCodeLlama
by @ArthurZucker in #25740 - [
CodeLlama
] Fix CI by @ArthurZucker in #25890
ViTDet
ViTDet reuses the ViT model architecture, adapted to object detection.
- Add ViTDet by @NielsRogge in #25524
DINO v2
DINO v2 is the next iteration of the DINO model. It is added as a backbone class, allowing it to be re-used in downstream models.
- [DINOv2] Add backbone class by @NielsRogge in #25520
VITS
VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is an end-to-end speech synthesis model that predicts a speech waveform conditional on an input text sequence. It is a conditional variational autoencoder (VAE) comprised of a posterior encoder, decoder, and conditional prior.
Breaking changes:
- 🚨🚨🚨 [
Refactor
] Move third-party related utility files intointegrations/
folder 🚨🚨🚨 by @younesbelkada in #25599
Moves all third party libs (outside HF ecosystem) related utility files inside integrations/
instead of having them in transformers
directly.
In order to get the previous usage you should be changing your call to the following:
- from transformers.deepspeed import HfDeepSpeedConfig
+ from transformers.integrations import HfDeepSpeedConfig
Bugfixes and improvements
- [DOCS] MusicGen Docs Update by @xNul in #25510
- [MINOR:TYPO] by @cakiki in #25646
- Pass the proper token to PEFT integration in auto classes by @sgugger in #25649
- Put IDEFICS in the right section of the doc by @sgugger in #25650
- TF 2.14 compatibility by @Rocketknight1 in #25630
- Fix bloom add prefix space by @ArthurZucker in #25652
- removing unnecesssary extra parameter by @rafaelpadilla in #25643
- Adds
TRANSFORMERS_TEST_BACKEND
by @vvvm23 in #25655 - stringify config by @AleksanderWWW in #25637
- Add input_embeds functionality to gpt_neo Causal LM by @gaasher in #25659
- Update doc toctree by @ydshieh in #25661
- Add Llama2 resources by @wonhyeongseo in #25531
- [
SPM
] Patchspm
Llama and T5 by @ArthurZucker in #25656 - [
GPTNeo
] Add input_embeds functionality to gpt_neo Causal LM by @ArthurZucker in #25664 - fix wrong path in some doc by @ydshieh in #25658
- Remove
utils/documentation_tests.txt
by @ydshieh in #25680 - Prevent Dynamo graph fragmentation in GPTNeoX with torch.baddbmm fix by @norabelrose in #24941
⚠️ [CLAP] Fix dtype of logit scales in init by @sanchit-gandhi in #25682- Sets the stalebot to 10 AM CEST by @LysandreJik in #25678
- Fix
pad_token
check condition by @ydshieh in #25685 - [DOCS] Added docstring example for EpsilonLogitsWarper #24783 by @sanjeevk-os in #25378
- correct resume training steps number in progress bar by @pphuc25 in #25691
- Generate: general test for decoder-only generation from
inputs_embeds
by @gante in #25687 - Fix typo in
configuration_gpt2.py
by @susnato in #25676 - fix ram efficient fsdp init by @pacman100 in #25686
- [
LlamaTokenizer
] make unk_token_length a property by @ArthurZucker in #25689 - Update list of persons to tag by @sgugger in #25708
- docs: Resolve typos in warning text by @tomaarsen in #25711
- Fix failing
test_batch_generation
for bloom by @ydshieh in #25718 - [
PEFT
] Fix peft version by @younesbelkada in #25710 - Fix number of minimal calls to the Hub with peft integration by @sgugger in #25715
- [
AutoGPTQ
] Add correct installation of GPTQ library + fix slow tests by @younesbelkada in #25713 - Generate: nudge towards
do_sample=False
whentemperature=0.0
by @gante in #25722 - [
from_pretrained
] Simpler code for peft by @ArthurZucker in #25726 - [idefics] idefics-9b test use 4bit quant by @stas00 in #25734
- ImageProcessor - check if input pixel values between 0-255 by @amyeroberts in #25688
- [
from_pretrained
] Fix failing PEFT tests by @younesbelkada in #25733 - [ASR Pipe Test] Fix CTC timestamps error message by @sanchit-gandhi in #25727
- 🌐 [i18n-KO] Translated
visual_question_answering.md
to Korean by @wonhyeongseo in #25679 - [
PEFT
] Fix PeftConfig save pretrained when callingadd_adapter
by @younesbelkada in #25738 - fixed typo in speech encoder decoder doc by @asusevski in #25745
- Add FlaxCLIPTextModelWithProjection by @pcuenca in #25254
- Generate: add missing logits processors docs by @gante in #25653
- [DOCS] Add example for HammingDiversityLogitsProcessor by @jessthebp in #25481
- Generate: logits processors are doctested and fix broken doctests by @gante in #25692
- [CLAP] Fix logit scales dtype for fp16 by @sanchit-gandhi in #25754
- [
Sentencepiece
] make surelegacy
do not requireprotobuf
by @ArthurZucker in #25684 - fix encoder hook by @SunMarc in #25735
- Docs: fix indentation in
HammingDiversityLogitsProcessor
by @gante in #25756 - Add type hints for several pytorch models (batch-3) by @nablabits in #25705
- Correct attention mask dtype for Flax GPT2 by @liutianlin0121 in #25636
- fix a typo in docsting by @statelesshz in #25759
- [idefics] small fixes by @stas00 in #25764
- Add docstrings and fix VIVIT examples by @Geometrein in #25628
- [
LlamaFamiliy
] add a tip about dtype by @ArthurZucker in #25794 - Add type hints for several pytorch models (batch-2) by @nablabits in #25557
- Add type hints for pytorch models (final batch) by @nablabits in #25750
- Add type hints for several pytorch models (batch-4) by @nablabits in #25749
- [idefics] fix vision's
hidden_act
by @stas00 in #25787 - Arde/fsdp activation checkpointing by @arde171 in #25771
- Fix incorrect Boolean value in deepspeed example by @tmm1 in #25788
- fixing name position_embeddings to object_queries by @Lorenzobattistela in #24652
- Resolving Attribute error when using the FSDP ram efficient feature by @pacman100 in #25820
- [
Docs
] More clarifications on BT + FA by @younesbelkada in #25823 - fix register by @zspo in #25779
- Minor wording changes for Code Llama by @osanseviero in #25815
- [
LlamaTokenizer
]tokenize
nits. by @ArthurZucker in #25793 - fix warning trigger for embed_positions when loading xglm by @MattYoon in #25798
- 🌐 [i18n-KO] Translated peft.md to Korean by @nuatmochoi in #25706
- 🌐 [i18n-KO]
model_memory_anatomy.md
to Korean by @mjk0618 in #25755 - Error with checking args.eval_accumulation_steps to gather tensors by @chaumng in #25819
- Tests: detect lines removed from "utils/not_doctested.txt" and doctest ALL generation files by @gante in #25763
- 🌐 [i18n-KO] Translated
add_new_pipeline.md
to Korean by @heuristicwave in #25498 - 🌐 [i18n-KO] Translated
community.md
to Korean by @sim-so in #25674 - 🤦update warning to If you want to use the new behaviour, set `legacy=… by @ArthurZucker in #25833
- update remaining
Pop2Piano
checkpoints by @susnato in #25827 - [AutoTokenizer] Add data2vec to mapping by @sanchit-gandhi in #25835
- MaskFormer,Mask2former - reduce memory load by @amyeroberts in #25741
- Support loading base64 images in pipelines by @InventivetalentDev in #25633
- Update README.md by @NinoRisteski in #25834
- Generate: models with custom
generate()
returnTrue
incan_generate()
by @gante in #25838 - Update README.md by @NinoRisteski in #25832
- minor typo fix in PeftAdapterMixin docs by @tmm1 in #25829
- Add flax installation in daily doctest workflow by @ydshieh in #25860
- Add Blip2 model in VQA pipeline by @jpizarrom in #25532
- Remote tools are turned off by @LysandreJik in #25867
- Fix imports by @ydshieh in #25869
- fix max_memory for bnb by @SunMarc in #25842
- Docs: fix example failing doctest in
generation_strategies.md
by @gante in #25874 - pin pandas==2.0.3 by @ydshieh in #25875
- Reduce CI output by @ydshieh in #25876
- [ViTDet] Fix doc tests by @NielsRogge in #25880
- For xla tensors, use an alternative way to get a unique id by @qihqi in #25802
- fix ds z3 checkpointing when
stage3_gather_16bit_weights_on_model_save=False
by @pacman100 in #25817 - Modify efficient GPU training doc with now-available adamw_bnb_8bit optimizer by @veezbo in #25807
- [
TokenizerFast
]can_save_slow_tokenizer
as a property for whenvocab_file
's folder was removed by @ArthurZucker in #25626 - Save image_processor while saving pipeline (ImageSegmentationPipeline) by @raghavanone in #25884
- [
InstructBlip
] FINAL Fix instructblip test by @younesbelkada in #25887 - Add type hints for tf models batch 1 by @nablabits in #25853
- Update
setup.py
by @ydshieh in #25893 - Smarter check for
is_tensor
by @sgugger in #25871 - remove torch_dtype override by @SunMarc in #25894
- fix FSDP model resume optimizer & schedu...
Patch release: v4.32.1
Patch release including several patches from v4.31.0, listed below:
IDEFICS, GPTQ Quantization
IDEFICS
The IDEFICS model was proposed in OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents by Hugo Laurençon, Lucile Saulnier, Léo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh
IDEFICS is the first open state-of-the-art visual language model at the 80B scale!
The model accepts arbitrary sequences of image and text and produces text, similarly to a multimodal ChatGPT.
Blogpost: hf.co/blog/idefics
Playground: HuggingFaceM4/idefics_playground
MPT
MPT has been added and is now officially supported within Transformers. The repositories from MosaicML have been updated to work best with the model integration within Transformers.
- [
MPT
] Add MosaicML'sMPT
model to transformers by @ArthurZucker & @younesbelkada in #24629
GPTQ Integration
GPTQ quantization is now supported in Transformers, through the optimum
library. The backend relies on the auto_gptq library, from which we use the GPTQ
and QuantLinear
classes.
See below for an example of the API, quantizing a model using the new GPTQConfig
configuration utility.
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig
model_name = "facebook/opt-125m"
tokenizer = AutoTokenizer.from_pretrained(model_name)
config = GPTQConfig(bits=4, dataset = "c4", tokenizer=tokenizer, group_size=128, desc_act=False)
# works also with device_map (cpu offload works but not disk offload)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, quantization_config=config)
Most models under TheBloke namespace with the suffix GPTQ
should be supported, for example, to load a GPTQ quantized model on TheBloke/Llama-2-13B-chat-GPTQ
simply run (after installing latest optimum and auto-gptq libraries):
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "TheBloke/Llama-2-13B-chat-GPTQ"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
For more information about this feature, we recommend taking a look at the following announcement blogpost: https://huggingface.co/blog/gptq-integration
Pipelines
A new pipeline, dedicated to text-to-audio and text-to-speech models, has been added to Transformers. It currently supports the 3 text-to-audio models integrated into transformers
: SpeechT5ForTextToSpeech
, MusicGen
and Bark
.
See below for an example:
from transformers import pipeline
classifier = pipeline(model="suno/bark")
output = pipeline("Hey it's HuggingFace on the phone!")
audio = output["audio"]
sampling_rate = output["sampling_rate"]
Classifier-Free Guidance decoding
Classifier-Free Guidance decoding is a text generation technique developed by EleutherAI, announced in this paper. With this technique, you can increase prompt adherence in generation. You can also set it up with negative prompts, ensuring your generation doesn't go in specific directions. See its docs for usage instructions.
- add CFG for .generate() by @Vermeille in #24654
Task guides
A new task guide going into Visual Question Answering has been added to Transformers.
- VQA task guide by @MKhalusova in #25244
Model deprecation
We continue the deprecation of models that was introduced in #24787.
By deprecating, we indicate that we will stop maintaining such models, but there is no intention of actually removing those models and breaking support for them (they might one day move into a separate repo/on the Hub, but we would still add the necessary imports to make sure backward compatibility stays). The main point is that we stop testing those models. The usage of the models drives this choice and aims to ease the burden on our CI so that it may be used to focus on more critical aspects of the library.
- Deprecate unused OpenLlama architecture by @tomaarsen in #24922
Translation Efforts
There are ongoing efforts to translate the transformers' documentation in other languages. These efforts are driven by groups independent to Hugging Face, and their work is greatly appreciated further to lower the barrier of entry to ML and Transformers.
If you'd like to kickstart such an effort or help out on an existing one, please feel free to reach out by opening an issue.
- 🌐 [i18n-KO] Translated
tasks/document_question_answering.md
to Korean by @jungnerd in #24588 - 🌐 [i18n-KO] Fixed Korean and English
quicktour.md
by @wonhyeongseo in #24664 - 🌐 [i18n-KO] Updated Korean
serialization.md
by @wonhyeongseo in #24686 - 🌐 [i18n-KO] Translated performance.md to Korean by @augustinLib in #24883
- 🌐 [i18n-KO] Translated
testing.md
to Korean by @Sunmin0520 in #24900 - 🌐 [i18n-KO] Translated
perf_train_cpu.md
to Korean by @seank021 in #24911 - 🌐 [i18n-KO] Translated
<tf_xla>.md
to Korean by @54data in #24904 - 🌐 [i18n-KO] Translated
perf_hardware.md
to Korean by @augustinLib in #24966 - 🌐 [i18n-KO] Translated
hpo_train.md
to Korean by @harheem in #24968 - 🌐 [i18n-KO] Translated
perf_infer_cpu.md
to Korean by @junejae in #24920 - 🌐 [i18n-KO] Translated pipeline_webserver.md to Korean by @kihoon71 in #24828
- 🌐 [i18n-KO] Translated
transformers_agents.md
to Korean by @sim-so in #24881 - 🌐 [i18n-KO] Translated
perf_infer_gpu_many.md
to Korean by @heuristicwave in #24943 - 🌐 [i18n-KO] Translated
perf_infer_gpu_one.md
to Korean by @eenzeenee in #24978 - 🌐 [i18n-KO] Translated
add_tensorflow_model.md
to Korean by @keonju2 in #25017 - 🌐 [i18n-KO] Translated
perf_train_cpu_many.md
to Korean by @nuatmochoi in #24923 - 🌐 [i18n-KO] Translated
add_new_model.md
to Korean by @mjk0618 in #24957 - 🌐 [i18n-KO] Translated
model_summary.md
to Korean by @0525hhgus in #24625 - 🌐 [i18n-KO] Translated
philosophy.md
to Korean by @TaeYupNoh in #25010 - 🌐 [i18n-KO] Translated
perf_train_tpu_tf.md
to Korean by @0525hhgus in #25433 - 🌐 [i18n-KO] Translated docs: ko: pr_checks.md to Korean by @sronger in #24987
Explicit input data format for image processing
Addition of input_data_format
argument to image transforms and ImageProcessor methods, allowing the user to explicitly set the data format of the images being processed. This enables processing of images with non-standard number of channels e.g. 4 or removes error which occur when the data format was inferred but the channel dimension was ambiguous.
import numpy as np
from transformers import ViTImageProcessor
img = np.random.randint(0, 256, (4, 6, 3))
image_processor = ViTImageProcessor()
inputs = image_processor(img, image_mean=0, image_std=1, input_data_format="channels_first")
- Input data format by @amyeroberts in #25464
- Add input_data_format argument, image transforms by @amyeroberts in #25462
Documentation clarification about efficient inference through torch.scaled_dot_product_attention
& Flash Attention
Users are not aware that it is possible to force dispatch torch.scaled_dot_product_attention
method from torch
to use Flash Attention kernels. This leads to considerable speedup and memory saving, and is also compatible with quantized models. We decided to make this explicit to users in the documentation.
- [Docs / BetterTransformer ] Added more details about flash attention + SDPA : #25265
In a nutshell, one can just run:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m").to("cuda")
# convert the model to BetterTransformer
model.to_bettertransformer()
input_text = "Hello my dog is cute and"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
+ with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False):
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
to enable Flash-attenion in their model. However, this feature does not support padding yet.
FSDP and DeepSpeed Changes
Users will no longer encounter CPU RAM OOM when using FSDP to train very large models in multi-gpu or multi-node multi-gpu setting.
Users no longer have to pass fsdp_transformer_layer_cls_to_wrap
as the code now use _no_split_modules
by default which is available for most of the popular models. DeepSpeed Z3 init now works properly with Accelerate Launcher + Trainer.
- add util for ram efficient loading of model when using fsdp by @pacman100 in #25107
- fix fsdp checkpointing issues by @pacman100 in #24926
- fsdp fixes and enhancements by @pacman100 in #24980
- fix deepspeed load best model at end when the model gets sharded by @pacman100 in #25057
- resolving zero3 init when using accelerate config with Trainer by @pacman100 in #25227
- fix z3 init when using accelerate launcher by @pacman100 in #25589
Breaking changes
Default optimizer in the Trainer
class
The defaul...
v4.31.0: Llama v2, MusicGen, Bark, MMS, EnCodec, InstructBLIP, Umt5, MRa, vIvIt
New models
Llama v2
Llama 2 was proposed in LLaMA: Open Foundation and Fine-Tuned Chat Models by Hugo Touvron et al. It builds upon the Llama architecture adding Grouped Query Attention for efficient inference.
- Add support for Llama 2 by @ArthurZucker in #24891
Musicgen
The MusicGen model was proposed in the paper Simple and Controllable Music Generation by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Défossez.
MusicGen is a single stage auto-regressive Transformer model capable of generating high-quality music samples conditioned on text descriptions or audio prompts. The text descriptions are passed through a frozen text encoder model to obtain a sequence of hidden-state representations. MusicGen is then trained to predict discrete audio tokens, or audio codes, conditioned on these hidden-states. These audio tokens are then decoded using an audio compression model, such as EnCodec, to recover the audio waveform.
Through an efficient token interleaving pattern, MusicGen does not require a self-supervised semantic representation of the text/audio prompts, thus eliminating the need to cascade multiple models to predict a set of codebooks (e.g. hierarchically or upsampling). Instead, it is able to generate all the codebooks in a single forward pass.
- Add Musicgen by @sanchit-gandhi in #24109
Bark
Bark is a transformer-based text-to-speech model proposed by Suno AI in suno-ai/bark.
MMS
The MMS model was proposed in Scaling Speech Technology to 1,000+ Languages by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli
- Add MMS CTC Fine-Tuning by @patrickvonplaten in #24281
EnCodec
The EnCodec neural codec model was proposed in High Fidelity Neural Audio Compression by Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi.
InstructBLIP
The InstructBLIP model was proposed in InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. InstructBLIP leverages the BLIP-2 architecture for visual instruction tuning.
- Add InstructBLIP by @NielsRogge in #23460
Umt5
The UMT5 model was proposed in UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining by Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant.
- [
Umt5
] Add google's umt5 totransformers
by @ArthurZucker in #24477
MRA
The MRA model was proposed in Multi Resolution Analysis (MRA) for Approximate Self-Attention by Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, and Vikas Singh.
ViViT
The Vivit model was proposed in ViViT: A Video Vision Transformer by Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, Cordelia Schmid. The paper proposes one of the first successful pure-transformer based set of models for video understanding.
Python 3.7
The last version to support Python 3.7 was 4.30.x, as it reached end-of-life on June 27, 2023 and is no longer supported by the Python Software Foundation.
PyTorch 1.9
The last version to support PyTorch 1.9 was 4.30.x. As it has been more than 2 years, and we're looking forward to using features available in PyTorch 1.10 and up, we do not support PyTorch 1.9 for v4.31 and up.
RoPE scaling
This PR adds RoPE scaling to the LLaMa and GPTNeoX families of models. It allows us to extrapolate and go beyond the original maximum sequence length (e.g. 2048 tokens on LLaMA), without fine-tuning. It offers two strategies:
- Linear scaling
- Dynamic NTK scaling
Agents
Tools now return a type that is specific to agents. This type can return a serialized version of itself (a string), that either points to a file on-disk or to the object's content. This should make interaction with text-based systems much simpler.
- Tool types by @LysandreJik in #24032
Tied weights load
Models with potentially tied weights dropped off some keys from the state dict even when the weights were not tied. This has now been fixed and more generally, the whole experience of loading a model with state dict that don't match exactly should be improved in this release.
Whisper word-level timestamps
This PR adds a method of predicting timestamps at the word (or even token) level, by analyzing the cross-attentions and applying dynamic time warping.
Auto model addition
A new auto model is added, AutoModelForTextEncoding
. It is to be used when you want to extract the text encoder from an encoder-decoder architecture.
- [AutoModel] Add AutoModelForTextEncoding by @sanchit-gandhi in #24305
Model deprecation
Transformers is growing a lot and to ease a bit the burden of maintenance on our side, we have taken the decision to deprecate models that are not used a lot. Those models will never actually disappear from the library, but we will stop testing them or accepting PRs modifying them.
(enfin ça
The criteria to identify models to deprecate was less than 1,000 unique downloads in the last 30 days for models that are at least one year old. The list of deprecated models is:
- BORT
- M-CTC-T
- MMBT
- RetriBERT
- TAPEX
- Trajectory Transformer
- VAN
Breaking changes
Fixes an issue with stripped spaces for the T5 family tokenizers. If this impacts negatively inference/training with your models, please let us know by opening an issue.
⚠️ ⚠️ [T5Tokenize
] Fix T5 family tokenizers⚠️ ⚠️ by @ArthurZucker in #24565
Bugfixes and improvements
- add trust_remote_code option to CLI download cmd by @radames in #24097
- Fix typo in Llama docstrings by @Kh4L in #24020
- Avoid
GPT-2
daily CI job OOM (in TF tests) by @ydshieh in #24106 - [Lllama] Update tokenization code to ensure parsing of the special tokens [core] by @ArthurZucker in #24042
- PLAM => PaLM by @xingener in #24129
- [
bnb
] Fix bnb config json serialization by @younesbelkada in #24137 - Correctly build models and import call_context for older TF versions by @Rocketknight1 in #24138
- Generate: PT's
top_p
enforcesmin_tokens_to_keep
when it is1
by @gante in #24111 - fix bugs with trainer by @pacman100 in #24134
- Fix TF Rag OOM issue by @ydshieh in #24122
- Fix SAM OOM issue on CI by @ydshieh in #24125
- Fix XGLM OOM on CI by @ydshieh in #24123
- [
SAM
] Fix sam slow test by @younesbelkada in #24140 - [lamaTokenizerFast] Update documentation by @ArthurZucker in #24132
- [BlenderBotSmall] Update doc example by @ArthurZucker in #24092
- Fix Pipeline CI OOM issue by @ydshieh in #24124
- [documentation] grammatical fixes in image_classification.mdx by @LiamSwayne in #24141
- Fix typo in streamers.py by @freddiev4 in #24144
- [tests] fix bitsandbytes import issue by @stas00 in #24151
- Avoid OOM in doctest CI by @ydshieh in #24139
- Fix
Wav2Vec2
CI OOM by @ydshieh in #24190 - Fix push to hub by @NielsRogge in #24187
- Change ProgressCallback to use dynamic_ncols=True by @gmlwns2000 in #24101
- [i18n]Translated "attention.mdx" to korean by @kihoon71 in #23878
- Generate: force caching on the main model, in assisted generation by @gante in #24177
- Fix device issue in
OpenLlamaModelTest::test_model_parallelism
by @ydshieh in #24195 - Update
GPTNeoXLanguageGenerationTest
by @ydshieh in #24193 - typo: fix typos in CONTRIBUTING.md and deepspeed.mdx by @zsj9509 in #24184
- Generate: detect special architectures when loaded from PEFT by @gante in #24198
- 🌐 [i18n-KO] Translated tasks_summary.mdx to Korean by @kihoon71 in #23977
- 🚨🚨🚨 Replace DataLoader logic for Accelerate in Trainer, remove unneeded tests 🚨🚨🚨 by @muellerzr in #24028
- Fix
_load_pretrained_model
by @SunMarc in #24200 - Fix steps bugs in no trainer examples by @Ethan-yt in #24197
- Skip RWKV test in past CI by @ydshieh in #24204
- Remove unnecessary aten::to overhead in llama by @fxmarty in #24203
- Update
WhisperForAudioClassification
doc example by @ydshieh in #24188 - Finish dataloader integration by @muellerzr in #24201
- Add the number of
model
test failures to slack CI report by @ydshieh in #24207 - fix: TextIteratorStreamer cannot work with pipeline by @yuanwu2017 in #23641
- Update
(TF)SamModelIntegrationTest
by @ydshieh in #24199 - Improving error message when using
use_safetensors=True
. by @Narsil in #24232 - Safely import pytest in testing_utils.py by @amyeroberts in #24241
- fix overflow when training mDeberta in fp16 by @sjrl in #24116
- deprecate
use_mps_device
by @pacman100 in #24239 - Tied params cleanup by @sgugger in #24211
...
v4.30.2: Patch release
- Fix push to hubby @NielsRogge in #24187
- Fix how we detect the TF package by @Rocketknight1 in #24255
v4.30.1 Patch release
- Fix bnb config json serialization in #24137 by @younesbelkada
- Correctly build models and import call_context for older TF versions in #24138 by @Rocketknight1
- Fix bugs with trainer in #24134 by @pacman100
v4.30.0: 100k, Agents improvements, Safetensors core dependency, Swiftformer, Autoformer, MobileViTv2, timm-as-a-backbone
100k
Transformers has just reached 100k stars on GitHub, and to celebrate we wanted to highlight 100 projects in the vicinity of transformers
and we have decided to create an awesome-transformers page to do just that.
We accept PRs to add projects to the list!
- Top 100 by @LysandreJik in #22912
- Add LlamaIndex to awesome-transformers.md by @ravi03071991 in #23484
- add cleanlab to awesome-transformers tools list by @jwmueller in #23440
4-bit quantization and QLoRA
By leveraging the bitsandbytes
library by @TimDettmers, we add 4-bit support to transformers
models!
- 4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) by @TimDettmers in #23479
Agents
The Agents framework has been improved and continues to be stabilized. Among bug fixes, here are the important new features that were added:
- Local agent capabilities, to load a generative model directly from
transformers
instead of relying on APIs. - Prompts are now hosted on the Hub, which means that anyone can fork the prompts and update them with theirs, to let other community contributors re-use them
- We add an
AzureOpenAiAgent
class to support Azure OpenAI agents.
- Add local agent by @sgugger in #23438
- Enable prompts on the Hub by @sgugger in #23662
- Add AzureOpenAiAgent by @sgugger in #24058
Safetensors
The safetensors
library is a safe serialization framework for machine learning tensors. It has been audited and will become the default serialization framework for several organizations (Hugging Face, EleutherAI, Stability AI).
It has now become a core dependency of transformers
.
New models
Swiftformer
The SwiftFormer paper introduces a novel efficient additive attention mechanism that effectively replaces the quadratic matrix multiplication operations in the self-attention computation with linear element-wise multiplications. A series of models called ‘SwiftFormer’ is built based on this, which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed. Even their small variant achieves 78.5% top-1 ImageNet1K accuracy with only 0.8 ms latency on iPhone 14, which is more accurate and 2× faster compared to MobileViT-v2.
- Add swiftformer by @shehanmunasinghe in #22686
Autoformer
This model augments the Transformer as a deep decomposition architecture, which can progressively decompose the trend and seasonal components during the forecasting process.
MobileViTv2
MobileViTV2 is the second version of MobileViT, constructed by replacing the multi-headed self-attention in MobileViT with separable self-attention.
- Add MobileViTv2 by @shehanmunasinghe in #22820
PerSAM
PerSAM proposes a minimal modification to SAM to allow dreambooth-like personalization, enabling to segment concepts in new images using just one example.
- Add PerSAM [bis] by @NielsRogge in #23659
Timm backbone
We add support for loading timm
weights within the AutoBackbone
API in transformers
. timm
models can be instantiated through the TimmBackbone
class, and then used with any vision model that needs a backbone.
- Add TimmBackbone model by @amyeroberts in #22619
Image to text pipeline conditional support
We add conditional text generation to the image to text pipeline; allowing the model to continue generating an initial text prompt according to an image.
- [image-to-text pipeline] Add conditional text support + GIT by @NielsRogge in #23362
TensorFlow implementations
- Add TensorFlow implementation of EfficientFormer by @D-Roberts in #22620
Accelerate Migration
A major rework of the internals of the Trainer
is underway, leveraging accelerate
instead of redefining them in transformers
. This should unify both framework and lead to increased interoperability and more efficient development.
- Smangrul/accelerate mp integrate by @pacman100 in #23148
- Smangrul/accelerate ddp integrate by @pacman100 in #23151
- fix trainer slow tests related to hyperparam search by @pacman100 in #24011
- remove the extra
accelerator.prepare
by @pacman100 in #23914 - move fsdp handling to accelerate by @pacman100 in #23158
- shift torch dynamo handling to accelerate by @pacman100 in #23168
- accelerate deepspeed and gradient accumulation integrate by @pacman100 in #23236
- fix executable batch size issue by @pacman100 in #24067
- fix accelerator prepare during eval only mode by @pacman100 in #24014
- reset accelerate env variables after each test by @pacman100 in #24107
- Fix translation no_trainer by @muellerzr in #23407
- Update error message when Accelerate isn't installed by @muellerzr in #23373
- Fix parallel mode check by @muellerzr in #23409
- Muellerzr fix deepspeed by @muellerzr in #23657
- Update all no_trainer with skip_first_batches by @muellerzr in #23664
- Fix sagemaker DP/MP by @muellerzr in #23681
- Log the right train_batch_size if using auto_find_batch_size and also log the adjusted value seperately. by @muellerzr in #23800
- Up pinned accelerate version by @muellerzr in #24089
- Move import check to before state reset by @muellerzr in #23906
- Upgrade safetensors version by @muellerzr in #23911
- Act on deprecations in Accelerate no_trainer examples by @muellerzr in #24053
- Oops, missed one by @muellerzr in #24054
Bugfixes and improvements
- chore: allow protobuf 3.20.3 requirement by @jose-turintech in #22759
- Fix link displayed for custom tools by @sgugger in #23274
- Remove missplaced test file by @sgugger in #23275
- Bring back the PR
Refactor doctests + add CI
tomain
by @ydshieh in #23271 - [
gpt
] Gpt2 fix half precision causal mask by @younesbelkada in #23256 - Temporary tolerance fix for flaky whipser PT-TF equiv. test by @amyeroberts in #23257
- Add
top_k
argument to post-process of conditional/deformable-DETR by @CreatlV in #22787 transformers-cli
->huggingface-cli
by @AlpinDale in #23276- Temporarily increase tol for PT-FLAX whisper tests by @amyeroberts in #23288
- Added missing " in CHAT_PROMPT_TEMPLATE by @galatolofederico in #23287
- Update custom_tools.mdx: fix link by @mishig25 in #23292
- Update transformers_agents.mdx by @mishig25 in #23289
- Convert numpy arrays to lists before saving the evaluation metrics as json by @harisankar95 in #23268
- Fix doctest files fetch issue by @ydshieh in #23277
- skip
test_run_squad_no_trainer
for now by @ydshieh in #23302 - Better check for packages availability by @apbard in #23163
- Add gradient_checkpointing parameter to FlaxWhisperEncoder by @raghavanone in #23300
- Agents extras by @LysandreJik in #23301
- Fix broken links in the agent docs by @sgugger in #23297
- Fix typo in gradio-tools docs by @freddyaboulton in #23305
- Fix image segmentation tool test by @sgugger in #23306
- unpin tf prob by @ydshieh in #23293
- Revert "search buffers for dtype" by @sgugger in #23308
- Remove
LanguageIdentificationTool
in__init__.py
as we don't have it yet by @ydshieh in #23326 - Fix docker image (caused by
tensorflow_text
) by @ydshieh in #23321 - Compute the mask in-place, with less memory reads, and on CUDA on
XLNetLMHeadModel
by @lezcano in #23332 - Only add files with modification outside doc blocks by @ydshieh in #23327
- [docs] Fix Agents and Tools docstring by @stevhliu in #23313
- OR am I crazy? by @hwuebben in #23295
- Handle padding warning in generation when using
inputs_embeds
by @zrthxn in #23131 - replaced assert with raise ValueError for t5, switch_transformers, pix2struct, mt5, longt5, gptsan_japanese. by @susnato in #23273
- Use cu118 with cudnn >= 8.6 in docker file by @ydshieh in #23339
- Removing one of the twice defined position_embeddings in LongFormer by @GregorySenay in #23343
- Fix issue introduced in PR #23163 by @ydshieh in #23363
- Typo suggestion by @richardachen in #23360
- Fix some
is_xxx_available
by @ydshieh in #23365 - Fix
BigBirdForMaskedLM
doctest by @ydshieh in #23369 - Fix
OwlViTForObjectDetection.image_guided_detection
doc example by @ydshieh in #23370 - Revert "Only add files with modification outside doc blocks" by @ydshieh in #23371
- [Bugfix]
OPTDecoderLayer
does not return attentions whengradient_checkpointing
andtraining
is enabled. by @gmlwns2000 in #23367 - Skip failing
AlignModelTest::test_multi_gpu_data_parallel_forward
by @ydshieh in #23374 - Fix test typos - audio feature extractors by @LWprogramming in #23310
- Added type hints for
Graphormer
pytorch version by @dewasahu2003 in #23073 - Replace NumPy Operations with JAX NumPy Equivalents for JIT Compilation Compatibility by @gojiteji in #23356
- Use
mkstemp
to replace deprecatedmktemp
by @ready-research in #23372 - Fix
RwkvModel
by @ydshieh in #23392 - Update
test_batched_inference_image_captioning_conditioned
by @ydshieh in #23391 - OPT/BioGPT: Improved attention mask shape exception by @gante in #23270
- Fix chat prompt in HFAgent by @IvanSedykh in #23335
- 🌐 [i18n-KO] Translated
asr.mdx
to Korean by @sim-so in #23106 - Minor fixes in transformers-tools by @Wauplin in #23364
- [
Pix2Struct
] Add conditional generation on docstring example by @younesbelkada in #23399 - Generate: faster
can_generate
check on TF and Flax by @gante in #23398 - [AutoModel] fix
torch_dtype=auto
infrom_pretrained
by @stas00 in #23379 - Docs: add link to assisted generation blog post by @gante in #23397
- Build with non Python files by @sgugger in #23405
- Generate: add test to check KV format by @gante in #23403
- Replace appends with list compr...
v4.29.2: Patch release
Fixes the package so non-Python files (like CUDA kernels) are properly included.