Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with to_fp16() #70

Open
Manas-Embold opened this issue Nov 25, 2020 · 7 comments
Open

Issue with to_fp16() #70

Manas-Embold opened this issue Nov 25, 2020 · 7 comments

Comments

@Manas-Embold
Copy link

Manas-Embold commented Nov 25, 2020

Hi Max,

I trained 344M model using gpt2 simple (dataset was java code for auto code completion) and saved the checkpoint.
Converted the model to pytorch using:

! cd '/content/checkpoint' && transformers-cli convert --model_type gpt2 --tf_checkpoint '/content/checkpoint/run1/' --pytorch_dump_output '/content/checkpoint/run1/pytorch' --config '/content/checkpoint/run1/hparams.json'

When i load the model normally

from aitextgen import aitextgen
config = '/content/checkpoint/run1/pytorch/config.json'
ai = aitextgen(model="/content/checkpoint/run1/pytorch/pytorch_model.bin", config=config)

No issues and i can generate easily:

ai.generate(n=1, prompt="system.out.", max_length=100)

OUTPUT:
system.out.println( + id);

However since,
I want to convert this to fp16 for fast inferencing
I converted model to fp 16 as follows

from aitextgen import aitextgen
config = '/content/checkpoint/run1/pytorch/config.json'
ai = aitextgen(model="/content/checkpoint/run1/pytorch/pytorch_model.bin", config=config, to_gpu=True, to_fp16=True)

When i call generate now, it outputs english instead of java

ai.generate(n=1, prompt="system.out.", max_length=100)

OUTPUT:
system.out. loc character decidedally healthy ultimately points belie mass nearly regidedot price clicklike make TodayocaInd unlike journal Norretene links Good void et attackalsAnSD 54giving sing high Assassatelyhus Y humansware concerned connectionsSt� was believesligmartacing Geteworkamedann·aultrict dep2013� daughtermentructure couldentiallyrolloth confrontted Archbi suitiffge beaut Ed industward Sony* thereileOMrugateg rented Birminghamvironment underinceeg Windows intense static

@Manas-Embold
Copy link
Author

Any thoughts, where am i going wrong in conversion ?
I think after conversion its kind of loading default gpt2 english language model instead of mine gpt-2 model trained on java code.

@Manas-Embold
Copy link
Author

When i use to_gpu=True and to_fp16=True for loading, i get english as output
When i just use to_fp16=True and skip to_gpu=True, i get proper java output

This looks strange.

@minimaxir
Copy link
Owner

to_fp16() is sorta beta and not fully tested. Ideally the ONNX support which I intend to add will handle this better.

However, that output is just weird in that it's pseudorandom as opposed to fully random, which may imply a different issue in the pipeline.

@junkgear
Copy link

Alright, thanks for reviewing !

@minimaxir
Copy link
Owner

minimaxir commented Dec 1, 2020

Tested: yes it's random output. I assume something changed in Transformers upstream, so I might have to remove it (also there doesn't seem to be a speed increase anymore). Will add a warning for now.

minimaxir added a commit that referenced this issue Dec 1, 2020
minimaxir added a commit that referenced this issue Dec 1, 2020
* Update dependencies

* Fix depreciation warning

* Fix param name

* Update minimum versions

* Remove TorchScript refs

* version bump

* Bump PL version

* dev Dockerfile

* Update dependencies

* Transformer 4 fix

* Fix transformer 4 import for TF conversion

* Fix model training for lighting 1.0.0

* Add back GPU memory printing

* TPU fixes

* Assert descriptions

* Generation tweaks (remove pad message)

* Ignore .DS_Store

* Handle generation by prompt more canonically

* Set 20 for refresh default to avoid Colab warning

* Fix gen warning while training

* Fix model loading from config + generation

* FP16 warning (#70)

* Fix tokenizer for latest tokenizers

* Set default learning rate to 1e-3

* Set CPU config to match tokenizer default
@briansemrau
Copy link

briansemrau commented Jan 27, 2021

I'm able to use fp16 with sensible outputs if I use:

with torch.cuda.amp.autocast():
    ai.generate(...)

Interestingly, I seem to be getting slower generation using fp16 on an RTX2060. Though, half memory usage is a plus.

@jonnyplatt
Copy link

I was really puzzled by this: I found to_fp16 was generating sensible, normal content on Google Colab despite the warning messages, but was totally bizarre in production. It turned out the pyTorch versions were different - Google was on torch 1.8.1 and Cuda 11.1, while my server was torch 1.7 and Cuda 11.0 .
Once I upgraded the libraries on my server I found FP16 generation was working correctly again, so it may be worth updating the warning where people are on older pyTorch versions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants