Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GLM-4 and Later GLM Model (Draft) #31977

Closed
wants to merge 86 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
9cf74d7
add GLM-4
zRzRzRzRzRzRzR Jul 11, 2024
bef7fd9
GLM-4 FastTokenizer
zRzRzRzRzRzRzR Jul 11, 2024
c986fac
tokenizer fix
zRzRzRzRzRzRzR Jul 11, 2024
2da5d32
rename
zRzRzRzRzRzRzR Jul 11, 2024
675e7a1
pad token
zRzRzRzRzRzRzR Jul 11, 2024
304e4ef
Merge branch 'huggingface:main' into glm-4
zRzRzRzRzRzRzR Jul 11, 2024
0b241f2
Merge branch 'huggingface:main' into glm-4
zRzRzRzRzRzRzR Jul 12, 2024
fa44041
Fix past_key_values
duzx16 Jul 14, 2024
24dec6b
Merge branch 'huggingface:main' into glm-4
zRzRzRzRzRzRzR Jul 14, 2024
5d2bf5e
Merge branch 'glm-4' of github.com:zRzRzRzRzRzRzR/transformers into g…
duzx16 Jul 14, 2024
63d49c9
Fix flash attention
duzx16 Jul 14, 2024
0a5adf3
Merge branch 'huggingface:main' into glm-4
zRzRzRzRzRzRzR Jul 15, 2024
51cbf5d
add update
zRzRzRzRzRzRzR Jul 15, 2024
86b5004
Merge branch 'glm-4' of https://github.com/zRzRzRzRzRzRzR/transformer…
zRzRzRzRzRzRzR Jul 15, 2024
9a553e5
test with glm
zRzRzRzRzRzRzR Jul 15, 2024
4d45b21
fix test
zRzRzRzRzRzRzR Jul 15, 2024
85cfe41
add discription
zRzRzRzRzRzRzR Jul 15, 2024
860c7ee
Merge branch 'huggingface:main' into glm-4
zRzRzRzRzRzRzR Jul 15, 2024
c83ec2d
update glm
zRzRzRzRzRzRzR Jul 16, 2024
2608010
Merge branch 'huggingface:main' into glm-4
zRzRzRzRzRzRzR Jul 16, 2024
1719000
Merge branch 'huggingface:main' into glm-4
zRzRzRzRzRzRzR Jul 18, 2024
3f0452e
rewrite tokenizer
zRzRzRzRzRzRzR Jul 18, 2024
33d2ca3
Merge branch 'huggingface:main' into glm-4
zRzRzRzRzRzRzR Jul 19, 2024
084988e
fix some test
zRzRzRzRzRzRzR Jul 19, 2024
0cb1531
fix testing
zRzRzRzRzRzRzR Jul 19, 2024
e49718f
Fix RMSNorm initialization
duzx16 Jul 20, 2024
a362206
Fix position ids when passing input_embeds
duzx16 Jul 20, 2024
08b43d9
Merge branch 'huggingface:main' into glm-4
zRzRzRzRzRzRzR Jul 20, 2024
3c5322d
Merge branch 'huggingface:main' into glm-4
zRzRzRzRzRzRzR Jul 23, 2024
dd06993
Merge branch 'huggingface:main' into glm-4
zRzRzRzRzRzRzR Jul 24, 2024
8cc0381
Fix dtype error
duzx16 Jul 24, 2024
a35997e
Merge branch 'glm-4' of github.com:zRzRzRzRzRzRzR/transformers into g…
duzx16 Jul 24, 2024
621d32f
Fix output_layer for classification models
duzx16 Jul 24, 2024
48d1704
fix gradient
zRzRzRzRzRzRzR Jul 24, 2024
5881ed5
remove some skip test
zRzRzRzRzRzRzR Jul 24, 2024
c920ad9
fix small test
zRzRzRzRzRzRzR Jul 24, 2024
21781b3
Fix prepare_inputs_for_generation
duzx16 Jul 24, 2024
9599200
Merge branch 'glm-4' of github.com:zRzRzRzRzRzRzR/transformers into g…
duzx16 Jul 24, 2024
a9b1d0d
fix
zRzRzRzRzRzRzR Jul 25, 2024
0631615
Merge branch 'huggingface:main' into glm-4
zRzRzRzRzRzRzR Jul 25, 2024
9f33751
add converter
zRzRzRzRzRzRzR Jul 25, 2024
2663a13
fix PEP 8
zRzRzRzRzRzRzR Jul 25, 2024
aad19db
remove test
zRzRzRzRzRzRzR Jul 25, 2024
1e9183c
index
zRzRzRzRzRzRzR Jul 25, 2024
e8b90a1
fix doctested
zRzRzRzRzRzRzR Jul 25, 2024
65e1996
remove init
zRzRzRzRzRzRzR Jul 25, 2024
266ce77
fix copied error
zRzRzRzRzRzRzR Jul 25, 2024
cd9c304
fix mlp differ
zRzRzRzRzRzRzR Jul 25, 2024
ba30dad
fix copied eerror
zRzRzRzRzRzRzR Jul 25, 2024
afb1423
Merge branch 'huggingface:main' into glm-4
zRzRzRzRzRzRzR Jul 25, 2024
48aaba1
test_hidden_states_output = False
zRzRzRzRzRzRzR Jul 25, 2024
33d976f
Merge branch 'glm-4' of https://github.com/zRzRzRzRzRzRzR/transformer…
zRzRzRzRzRzRzR Jul 25, 2024
0675202
fix
zRzRzRzRzRzRzR Jul 25, 2024
19b0939
Update modeling_glm.py
zRzRzRzRzRzRzR Jul 25, 2024
b2b6c0f
Update __init__.py
zRzRzRzRzRzRzR Jul 25, 2024
6760791
fix glm type error
zRzRzRzRzRzRzR Jul 25, 2024
515d9d9
fix
zRzRzRzRzRzRzR Jul 25, 2024
9951c92
ruff problem
zRzRzRzRzRzRzR Jul 25, 2024
547ac95
Update convert_slow_tokenizer.py
zRzRzRzRzRzRzR Jul 25, 2024
9ba6cf7
Add explanations in English
zRzRzRzRzRzRzR Jul 25, 2024
9fb6405
reformate
zRzRzRzRzRzRzR Jul 25, 2024
e37bb49
Merge branch 'huggingface:main' into glm-4
zRzRzRzRzRzRzR Jul 25, 2024
25aec29
Update configuration_glm.py
zRzRzRzRzRzRzR Jul 25, 2024
58d344a
Merge branch 'glm-4' of https://github.com/zRzRzRzRzRzRzR/transformer…
zRzRzRzRzRzRzR Jul 25, 2024
073b811
fix
zRzRzRzRzRzRzR Jul 25, 2024
c0e6ae9
Merge branch 'huggingface:main' into glm-4
zRzRzRzRzRzRzR Jul 25, 2024
6ac085f
fix glm dummy
zRzRzRzRzRzRzR Jul 25, 2024
f140603
Merge branch 'glm-4' of https://github.com/zRzRzRzRzRzRzR/transformer…
zRzRzRzRzRzRzR Jul 25, 2024
65f471d
add doc
zRzRzRzRzRzRzR Jul 26, 2024
7ad819f
fix init
zRzRzRzRzRzRzR Jul 26, 2024
f86af8e
Update __init__.py
zRzRzRzRzRzRzR Jul 26, 2024
c179377
Update dummy_vision_objects.py
zRzRzRzRzRzRzR Jul 26, 2024
41338d7
add_start_docstrings
zRzRzRzRzRzRzR Jul 26, 2024
dba6d1e
fix GLM_START_DOCSTRING
zRzRzRzRzRzRzR Jul 26, 2024
82b0c7f
1
zRzRzRzRzRzRzR Jul 26, 2024
a6b6f4e
Update perf_infer_gpu_one.md
zRzRzRzRzRzRzR Jul 26, 2024
d1a5ee1
Merge branch 'huggingface:main' into glm-4
zRzRzRzRzRzRzR Jul 26, 2024
c99610e
Merge branch 'huggingface:main' into glm-4
zRzRzRzRzRzRzR Jul 27, 2024
b283adc
flash attn
zRzRzRzRzRzRzR Jul 27, 2024
4cc618e
stiil need fix rotary_emb
zRzRzRzRzRzRzR Jul 27, 2024
b476dd0
fix GLMSelfAttension
zRzRzRzRzRzRzR Jul 27, 2024
aab2386
remove _get_unpad_data
zRzRzRzRzRzRzR Jul 27, 2024
550a692
fix GLMSelfAttention
zRzRzRzRzRzRzR Jul 27, 2024
6492ac3
Merge branch 'huggingface:main' into glm-4
zRzRzRzRzRzRzR Jul 30, 2024
c3d4636
Merge branch 'huggingface:main' into glm-4
zRzRzRzRzRzRzR Aug 9, 2024
70b7ff4
Merge branch 'huggingface:main' into glm-4
zRzRzRzRzRzRzR Aug 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
244 changes: 123 additions & 121 deletions docs/source/de/index.md

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -394,6 +394,8 @@
title: Gemma
- local: model_doc/gemma2
title: Gemma2
- local: model_doc/glm
title: GLM
- local: model_doc/openai-gpt
title: GPT
- local: model_doc/gpt_neo
Expand Down
1 change: 1 addition & 0 deletions docs/source/en/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@ Flax), PyTorch, and/or TensorFlow.
| [Gemma](model_doc/gemma) | ✅ | ❌ | ✅ |
| [Gemma2](model_doc/gemma2) | ✅ | ❌ | ❌ |
| [GIT](model_doc/git) | ✅ | ❌ | ❌ |
| [GLM](model_doc/glm) | ✅ | ❌ | ❌ |
| [GLPN](model_doc/glpn) | ✅ | ❌ | ❌ |
| [GPT Neo](model_doc/gpt_neo) | ✅ | ❌ | ✅ |
| [GPT NeoX](model_doc/gpt_neox) | ✅ | ❌ | ❌ |
Expand Down
108 changes: 108 additions & 0 deletions docs/source/en/model_doc/glm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
<!--Copyright 2024 The GLM & ZhipuAI team and The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# GLM

## Overview

The GLM Model was proposed
in [ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools](https://arxiv.org/html/2406.12793v1)
by GLM Team, THUDM & ZhipuAI.

The abstract from the paper is the following:

*We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report
primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most
capable models that are trained with all the insights and lessons gained from the preceding three generations of
ChatGLM. To date, the GLM-4 models are pre-trained on ten trillions of tokens mostly in Chinese and English, along with
a small set of corpus from 24 languages, and aligned primarily for Chinese and English usage. The high-quality alignment
is achieved via a multi-stage post-training process, which involves supervised fine-tuning and learning from human
feedback. Evaluations show that GLM-4 1) closely rivals or outperforms GPT-4 in terms of general metrics such as MMLU,
GSM8K, MATH, BBH, GPQA, and HumanEval, 2) gets close to GPT-4-Turbo in instruction following as measured by IFEval, 3)
matches GPT-4 Turbo (128K) and Claude 3 for long context tasks, and 4) outperforms GPT-4 in Chinese alignments as
measured by AlignBench. The GLM-4 All Tools model is further aligned to understand user intent and autonomously decide
when and which tool(s) to use—including web browser, Python interpreter, text-to-image model, and user-defined
functions—to effectively complete complex tasks. In practical applications, it matches and even surpasses GPT-4 All
Tools in tasks like accessing online information via web browsing and solving math problems using Python interpreter.
Over the course, we have open-sourced a series of models, including ChatGLM-6B (three generations), GLM-4-9B (128K, 1M),
GLM-4V-9B, WebGLM, and CodeGeeX, attracting over 10 million downloads on Hugging face in the year 2023 alone.*

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add the abstract of the paper here!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finish

Tips:

- This model was contributed by [THUDM](https://huggingface.co/THUDM). The most recent code can be
found [here](https://github.com/thudm/GLM-4).


## Usage tips

`GLM-4` can be found on the [Huggingface Hub](https://huggingface.co/collections/THUDM/glm-4-665fcf188c414b03c2f7e3b7)

In the following, we demonstrate how to use `glm-4-9b-chat` for the inference. Note that we have used the ChatML format for dialog, in this demo we show how to leverage `apply_chat_template` for this purpose.

```python
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> device = "cuda" # the device to load the model onto

>>> model = AutoModelForCausalLM.from_pretrained("THUDM/glm-4-9b-chat", device_map="auto")
>>> tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-4-9b-chat")

>>> prompt = "Give me a short introduction to large language model."

>>> messages = [{"role": "user", "content": prompt}]

>>> text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

>>> model_inputs = tokenizer([text], return_tensors="pt").to(device)

>>> generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=512, do_sample=True)

>>> generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]

>>> response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```

## GLMConfig

[[autodoc]] GLMConfig
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[[autodoc]] GLMConfig
[[autodoc]] GlmConfig

let's use camel casing everywhere we can!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix now


## GLMTokenizer

[[autodoc]] GLMTokenizer
- save_vocabulary

## GLMTokenizerFast

[[autodoc]] GLMTokenizerFast

## GLMModel

[[autodoc]] GLMModel
- forward

## GLMForCausalLM

[[autodoc]] GLMForCausalLM
- forward

## GLMForSequenceClassification

[[autodoc]] GLMForSequenceClassification
- forward

## GLMForTokenClassification

[[autodoc]] GLMForTokenClassification
- forward
2 changes: 2 additions & 0 deletions docs/source/en/perf_infer_gpu_one.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ FlashAttention-2 is currently supported for the following architectures:
* [DistilBert](https://huggingface.co/docs/transformers/model_doc/distilbert#transformers.DistilBertModel)
* [Gemma](https://huggingface.co/docs/transformers/model_doc/gemma#transformers.GemmaModel)
* [Gemma2](https://huggingface.co/docs/transformers/model_doc/gemma2#transformers.Gemma2Model)
* [GLM](https://huggingface.co/docs/transformers/model_doc/glm#transformers.GLMModel)
* [GPT2](https://huggingface.co/docs/transformers/model_doc/gpt2)
* [GPTBigCode](https://huggingface.co/docs/transformers/model_doc/gpt_bigcode#transformers.GPTBigCodeModel)
* [GPTNeo](https://huggingface.co/docs/transformers/model_doc/gpt_neo#transformers.GPTNeoModel)
Expand Down Expand Up @@ -211,6 +212,7 @@ For now, Transformers supports SDPA inference and training for the following arc
* [Falcon](https://huggingface.co/docs/transformers/model_doc/falcon#transformers.FalconModel)
* [Gemma](https://huggingface.co/docs/transformers/model_doc/gemma#transformers.GemmaModel)
* [Gemma2](https://huggingface.co/docs/transformers/model_doc/gemma2#transformers.Gemma2Model)
* [GLM](https://huggingface.co/docs/transformers/model_doc/glm#transformers.GLMModel)
* [GPT2](https://huggingface.co/docs/transformers/model_doc/gpt2)
* [GPTBigCode](https://huggingface.co/docs/transformers/model_doc/gpt_bigcode#transformers.GPTBigCodeModel)
* [GPTNeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox#transformers.GPTNeoXModel)
Expand Down
2 changes: 2 additions & 0 deletions docs/source/es/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ La biblioteca actualmente contiene implementaciones de JAX, PyTorch y TensorFlow
1. **[FNet](model_doc/fnet)** (de Google Research) publicado con el paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) por James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
1. **[Funnel Transformer](model_doc/funnel)** (de CMU/Google Brain) publicado con el paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) por Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
1. **[GLPN](model_doc/glpn)** (de KAIST) publicado con el paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) por Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim.
1. **[GLM](model_doc/glm)** (from THU/ZhipuAI) released with the paper [GLM: General Language Model Pretraining with Autoregressive Blank Infilling](https://arxiv.org/abs/2103.10360) by Team GLM, including Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang, Peng Zhang, Qinkai Zheng, Rui Lu, Shuaiqi Duan, Shudan Zhang, Shulin Cao, Shuxun Yang, Weng Lam Tam, Wenyi Zhao, Xiao Liu, Xiao Xia, Xiaohan Zhang, Xiaotao Gu, Xin Lv, Xinghan Liu, Xinyi Liu, Xinyue Yang, Xixuan Song, Xunkai Zhang, Yifan An, Yifan Xu, Yilin Niu, Yuantao Yang, Yueyan Li, Yushi Bai, Yuxiao Dong, Zehan Qi, Zhaoyu Wang, Zhen Yang, Zhengxiao Du, Zhenyu Hou, and Zihan Wang.
1. **[GPT](model_doc/openai-gpt)** (de OpenAI) publicado con el paper [Improving Language Understanding by Generative Pre-Training](https://openai.com/research/language-unsupervised/) por Alec Radford, Karthik Narasimhan, Tim Salimans y Ilya Sutskever.
1. **[GPT-2](model_doc/gpt2)** (de OpenAI) publicado con el paper [Language Models are Unsupervised Multitask Learners](https://openai.com/research/better-language-models/) por Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei y Ilya Sutskever.
1. **[GPT-J](model_doc/gptj)** (de EleutherAI) publicado con el repositorio [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) por Ben Wang y Aran Komatsuzaki.
Expand Down Expand Up @@ -208,6 +209,7 @@ Flax), PyTorch y/o TensorFlow.
| FNet | ✅ | ✅ | ✅ | ❌ | ❌ |
| Funnel Transformer | ✅ | ✅ | ✅ | ✅ | ❌ |
| GLPN | ❌ | ❌ | ✅ | ❌ | ❌ |
| GLM | ✅ | ✅ | ✅ | ❌ | ❌ |
| GPT Neo | ❌ | ❌ | ✅ | ❌ | ✅ |
| GPT-J | ❌ | ❌ | ✅ | ✅ | ✅ |
| Hubert | ❌ | ❌ | ✅ | ✅ | ❌ |
Expand Down
2 changes: 2 additions & 0 deletions docs/source/fr/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,7 @@ La documentation est organisée en 5 parties:
1. **[Funnel Transformer](model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
1. **[GIT](model_doc/git)** (from Microsoft Research) released with the paper [GIT: A Generative Image-to-text Transformer for Vision and Language](https://arxiv.org/abs/2205.14100) by Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang.
1. **[GLPN](model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim.
1. **[GLM](model_doc/glm)** (from THU/ZhipuAI) released with the paper [GLM: General Language Model Pretraining with Autoregressive Blank Infilling](https://arxiv.org/abs/2103.10360) by Team GLM, including Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang, Peng Zhang, Qinkai Zheng, Rui Lu, Shuaiqi Duan, Shudan Zhang, Shulin Cao, Shuxun Yang, Weng Lam Tam, Wenyi Zhao, Xiao Liu, Xiao Xia, Xiaohan Zhang, Xiaotao Gu, Xin Lv, Xinghan Liu, Xinyi Liu, Xinyue Yang, Xixuan Song, Xunkai Zhang, Yifan An, Yifan Xu, Yilin Niu, Yuantao Yang, Yueyan Li, Yushi Bai, Yuxiao Dong, Zehan Qi, Zhaoyu Wang, Zhen Yang, Zhengxiao Du, Zhenyu Hou, and Zihan Wang.
1. **[GPT](model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://openai.com/research/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
1. **[GPT Neo](model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
1. **[GPT NeoX](model_doc/gpt_neox)** (from EleutherAI) released with the paper [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://arxiv.org/abs/2204.06745) by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
Expand Down Expand Up @@ -298,6 +299,7 @@ Le tableau ci-dessous représente la prise en charge actuelle dans la bibliothè
| Funnel Transformer | ✅ | ✅ | ✅ | ✅ | ❌ |
| GIT | ❌ | ❌ | ✅ | ❌ | ❌ |
| GLPN | ❌ | ❌ | ✅ | ❌ | ❌ |
| GLM | ✅ | ✅ | ✅ | ❌ | ❌ |
| GPT Neo | ❌ | ❌ | ✅ | ❌ | ✅ |
| GPT NeoX | ❌ | ✅ | ✅ | ❌ | ❌ |
| GPT NeoX Japanese | ✅ | ❌ | ✅ | ❌ | ❌ |
Expand Down
2 changes: 2 additions & 0 deletions docs/source/it/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ La libreria attualmente contiene implementazioni in JAX, PyTorch e TensorFlow, p
1. **[FNet](model_doc/fnet)** (da Google Research) rilasciato con il paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) da James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
1. **[Funnel Transformer](model_doc/funnel)** (da CMU/Google Brain) rilasciato con il paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) da Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
1. **[GLPN](model_doc/glpn)** (da KAIST) rilasciato con il paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) da Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim.
1. **[GLM](model_doc/glm)** (from THU/ZhipuAI) released with the paper [GLM: General Language Model Pretraining with Autoregressive Blank Infilling](https://arxiv.org/abs/2103.10360) by Team GLM, including Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang, Peng Zhang, Qinkai Zheng, Rui Lu, Shuaiqi Duan, Shudan Zhang, Shulin Cao, Shuxun Yang, Weng Lam Tam, Wenyi Zhao, Xiao Liu, Xiao Xia, Xiaohan Zhang, Xiaotao Gu, Xin Lv, Xinghan Liu, Xinyi Liu, Xinyue Yang, Xixuan Song, Xunkai Zhang, Yifan An, Yifan Xu, Yilin Niu, Yuantao Yang, Yueyan Li, Yushi Bai, Yuxiao Dong, Zehan Qi, Zhaoyu Wang, Zhen Yang, Zhengxiao Du, Zhenyu Hou, and Zihan Wang.
1. **[GPT](model_doc/openai-gpt)** (da OpenAI) rilasciato con il paper [Improving Language Understanding by Generative Pre-Training](https://openai.com/research/language-unsupervised/) da Alec Radford, Karthik Narasimhan, Tim Salimans e Ilya Sutskever.
1. **[GPT-2](model_doc/gpt2)** (da OpenAI) rilasciato con il paper [Language Models are Unsupervised Multitask Learners](https://openai.com/research/better-language-models/) da Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei e Ilya Sutskever.
1. **[GPT-J](model_doc/gptj)** (da EleutherAI) rilasciato nel repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) da Ben Wang e Aran Komatsuzaki.
Expand Down Expand Up @@ -222,6 +223,7 @@ tokenizer (chiamato "slow"). Un tokenizer "fast" supportato dalla libreria 🤗
| FNet | ✅ | ✅ | ✅ | ❌ | ❌ |
| Funnel Transformer | ✅ | ✅ | ✅ | ✅ | ❌ |
| GLPN | ❌ | ❌ | ✅ | ❌ | ❌ |
| GLM | ✅ | ✅ | ✅ | ❌ | ❌ |
| GPT Neo | ❌ | ❌ | ✅ | ❌ | ✅ |
| GPT NeoX | ❌ | ✅ | ✅ | ❌ | ❌ |
| GPT-J | ❌ | ❌ | ✅ | ✅ | ✅ |
Expand Down
Loading