Failure to compile a model on Inf1 with optimum-cli due to lack of arguments #471

tagucci · 2024-02-09T07:06:24Z

System Info

- `optimum` version: 1.16.2
- `transformers` version: 4.36.2
- Platform: Linux-5.15.0-1051-aws-x86_64-with-glibc2.29
- Python version: 3.8.10
- Huggingface_hub version: 0.20.1
- PyTorch version (GPU?): 1.13.1+cu117 (cuda availabe: False)
- Tensorflow version (GPU?): not installed (cuda availabe: NA)

Who can help?

@JingyaHuang

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

When I attempted to compile bert-base-uncased model on an Inf1 instance following the official document, I encountered the following error occurred. I used the pre-built PyTorch environment for Inf1 provided by "Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) 20240102".

$ source /opt/aws_neuron_venv_pytorch_inf1/bin/activate
$ pip install optimum[neuron]
$ optimum-cli export neuron \
  --model bert-base-uncased \
  --sequence_length 128 \
  --batch_size 1 \
  bert_neuron/

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/aws_neuron_venv_pytorch_inf1/lib/python3.8/site-packages/optimum/exporters/neuron/__main__.py", line 541, in <module>
    main()
  File "/opt/aws_neuron_venv_pytorch_inf1/lib/python3.8/site-packages/optimum/exporters/neuron/__main__.py", line 487, in main
    is_sentence_transformers = args.library_name == "sentence_transformers"
AttributeError: 'Namespace' object has no attribute 'library_name'
Traceback (most recent call last):
  File "/opt/aws_neuron_venv_pytorch_inf1/bin/optimum-cli", line 8, in <module>
    sys.exit(main())
  File "/opt/aws_neuron_venv_pytorch_inf1/lib/python3.8/site-packages/optimum/commands/optimum_cli.py", line 163, in main
    service.run()
  File "/opt/aws_neuron_venv_pytorch_inf1/lib/python3.8/site-packages/optimum/commands/export/neuron.py", line 137, in run
    subprocess.run(full_command, shell=True, check=True)
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python3 -m optimum.exporters.neuron --model bert-base-uncased --sequence_length 128 --batch_size 1 bert_neuron/' returned non-zero exit status 1.

Expected behavior

This error occurs because neuron.py does not use utilize arguments such as --library_name, --subfolder, --compiler_workdir, --disable-weights-neff-inline, and other arguments in the level_group category, which are used in neuronx.py. When I modified neuron.py to use the same arguments as neuronx.py, the model was successfully compiled. The output is as follows:

$ optimum-cli export neuron \
  --model bert-base-uncased \
  --sequence_length 128 \
  --batch_size 1 \
  bert_neuron/

config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 570/570 [00:00<00:00, 91.6kB/s]
model.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████| 440M/440M [00:01<00:00, 245MB/s]
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'cls.seq_relationship.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████| 28.0/28.0 [00:00<00:00, 4.94kB/s]
vocab.txt: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 698kB/s]
tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████| 466k/466k [00:00<00:00, 42.9MB/s]
***** Compiling bert-base-uncased *****
INFO:Neuron:There are 3 ops of 1 different types in the TorchScript that are not compiled by neuron-cc: aten::embedding, (For more information see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/compiler/neuron-cc/neuron-cc-ops/neuron-cc-ops-pytorch.html)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 563, fused = 546, percent fused = 96.98%
INFO:Neuron:Compiler args type is <class 'list'> value is ['--fast-math', 'none']
INFO:Neuron:Compiling function _NeuronGraph$704 with neuron-cc
INFO:Neuron:Compiling with command line: '/opt/aws_neuron_venv_pytorch_inf1/bin/neuron-cc compile /tmp/tmptgmdk1g3/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmptgmdk1g3/graph_def.neff --io-config {"inputs": {"0:0": [[1, 128, 768], "float32"], "1:0": [[1, 1, 1, 128], "float32"], "2:0": [[30522, 768], "float32"]}, "outputs": ["BertForMaskedLM_1/BertOnlyMLMHead_7/BertLMPredictionHead_1/Linear_4/aten_linear/Add:0"]} --fast-math none --verbose 35'
.......
Compiler status PASS
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 563, compiled = 546, percent compiled = 96.98%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 1 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 100.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 96
INFO:Neuron: => aten::add: 36
INFO:Neuron: => aten::contiguous: 12
INFO:Neuron: => aten::div: 12
INFO:Neuron: => aten::dropout: 37
INFO:Neuron: => aten::gelu: 13
INFO:Neuron: => aten::layer_norm: 26
INFO:Neuron: => aten::linear: 74
INFO:Neuron: => aten::matmul: 24
INFO:Neuron: => aten::permute: 48
INFO:Neuron: => aten::size: 96
INFO:Neuron: => aten::softmax: 12
INFO:Neuron: => aten::transpose: 12
INFO:Neuron: => aten::view: 48
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 1 [supported]
INFO:Neuron: => aten::add: 2 [supported]
INFO:Neuron: => aten::add_: 1 [supported]
INFO:Neuron: => aten::embedding: 3 [not supported]
INFO:Neuron: => aten::mul: 1 [supported]
INFO:Neuron: => aten::rsub: 1 [supported]
INFO:Neuron: => aten::size: 1 [supported]
INFO:Neuron: => aten::slice: 4 [supported]
INFO:Neuron: => aten::to: 1 [supported]
INFO:Neuron: => aten::unsqueeze: 2 [supported]
[Compilation Time] 237.75 seconds.
[Total compilation Time] 237.75 seconds.
Validating bert-base-uncased model...
	- Validating Neuron Model output "logits":
		-[✓] (1, 128, 30522) matches (1, 128, 30522)
		-[x] values not close enough, max diff: 0.28158092498779297 (atol: 0.001)
The maximum absolute difference between the output of the reference model and the Neuron exported model is not within the set tolerance 0.001:
- logits: max diff = 0.28158092498779297
The Neuron export succeeded and the exported model was saved at: bert_neuron

The text was updated successfully, but these errors were encountered:

JingyaHuang · 2024-02-09T20:32:57Z

Hi @tagucci, thanks a lot for reporting! I can reproduce the issue and it seems that our CI disabled the cli export test thus the bug was not detected. I just put a fix for this issue at #474, thanks again for catching it and report to us!

JingyaHuang · 2024-02-12T08:47:22Z

#474 is merged, we will do a release this week to include it. Feel free to reopen the issue if there are any further questions!

tagucci added the bug Something isn't working label Feb 9, 2024

JingyaHuang self-assigned this Feb 9, 2024

JingyaHuang closed this as completed Feb 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure to compile a model on Inf1 with optimum-cli due to lack of arguments #471

Failure to compile a model on Inf1 with optimum-cli due to lack of arguments #471

tagucci commented Feb 9, 2024

JingyaHuang commented Feb 9, 2024

JingyaHuang commented Feb 12, 2024

Failure to compile a model on Inf1 with optimum-cli due to lack of arguments #471

Failure to compile a model on Inf1 with optimum-cli due to lack of arguments #471

Comments

tagucci commented Feb 9, 2024

System Info

Who can help?

Information

Tasks

Reproduction (minimal, reproducible, runnable)

Expected behavior

JingyaHuang commented Feb 9, 2024

JingyaHuang commented Feb 12, 2024