-
Notifications
You must be signed in to change notification settings - Fork 26.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* initial * first try * working 20B * 20B tokenizers * Docs * Import fixes for missing classes * Update docs, fixup * black formatting * isort * flake * dummy objects * documentation * Documentation yml * more docs * tweaks for tests * tokenization auto * fix neox tests * test * test * einsum * address PR feedback * Documentation * Update README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gpt_neox/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gpt_neox/configuration_gpt_neox.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Remove undefined LaTeX syntax * Update to full url to avoid confusion about if that's supposed to refer to the Hub * fix auto * move tests * documentation fix * more doc fixes * test refactor * fix import * fix import * fix import * fix import * fix import * style fixes * More modeling fixes Co-authored-by: Jason Phang <zp489@gr057.hpc.nyu.edu> Co-authored-by: Stella Biderman <stellabiderman@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
- Loading branch information
1 parent
374a2f6
commit 71e6027
Showing
20 changed files
with
1,392 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
<!--Copyright 2022 The HuggingFace Team. All rights reserved. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
the License. You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations under the License. | ||
--> | ||
|
||
# GPT-NeoX | ||
|
||
## Overview | ||
|
||
We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will | ||
be made freely and openly available to the public through a permissive license. It is, to the best of our knowledge, | ||
the largest dense autoregressive model that has publicly available weights at the time of submission. In this work, | ||
we describe GPT-NeoX-20B's architecture and training and evaluate its performance on a range of language-understanding, | ||
mathematics, and knowledge-based tasks. We find that GPT-NeoX-20B is a particularly powerful few-shot reasoner and | ||
gains far more in performance when evaluated five-shot than similarly sized GPT-3 and FairSeq models. We open-source | ||
the training and evaluation code, as well as the model weights, at [https://github.com/EleutherAI/gpt-neox](https://github.com/EleutherAI/gpt-neox). | ||
|
||
Development of the model was led by Sid Black, Stella Biderman and Eric Hallahan, and the model was trained with | ||
generous the support of [CoreWeave](https://www.coreweave.com/). | ||
|
||
GPT-NeoX-20B was trained with fp16, thus it is recommended to initialize the model as follows: | ||
|
||
```python | ||
model = GPTNeoXForCausalLM.from_pretrained("EleutherAI/gpt-neox-20b").half().cuda() | ||
``` | ||
|
||
GPT-NeoX-20B also has a different tokenizer from the one used in GPT-J-6B and GPT-Neo. The new tokenizer allocates | ||
additional tokens to whitespace characters, making the model more suitable for certain tasks like code generation. | ||
|
||
### Generation | ||
|
||
The `generate()` method can be used to generate text using GPT Neo model. | ||
|
||
```python | ||
>>> from transformers import GPTNeoXForCausalLM, GPTNeoXTokenizerFast | ||
|
||
>>> model = GPTNeoXForCausalLM.from_pretrained("EleutherAI/gpt-neox-20b") | ||
>>> tokenizer = GPTNeoXTokenizerFast.from_pretrained("EleutherAI/gpt-neox-20b") | ||
>>> prompt = "GPTNeoX20B is a 20B-parameter autoregressive Transformer model developed by EleutherAI." | ||
>>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids | ||
>>> gen_tokens = model.generate( | ||
... input_ids, | ||
... do_sample=True, | ||
... temperature=0.9, | ||
... max_length=100, | ||
... ) | ||
>>> gen_text = tokenizer.batch_decode(gen_tokens)[0] | ||
``` | ||
## GPTNeoXConfig | ||
[[autodoc]] GPTNeoXConfig | ||
## GPTNeoXTokenizerFast | ||
[[autodoc]] GPTNeoXTokenizerFast | ||
## GPTNeoXModel | ||
[[autodoc]] GPTNeoXModel | ||
- forward | ||
## GPTNeoXForCausalLM | ||
[[autodoc]] GPTNeoXForCausalLM | ||
- forward |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -62,6 +62,7 @@ | |
glpn, | ||
gpt2, | ||
gpt_neo, | ||
gpt_neox, | ||
gptj, | ||
herbert, | ||
hubert, | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.