[Inference] Add t5 support for export and inference #267

JingyaHuang · 2023-10-23T14:16:43Z

What's in the PR

T5 exporter support (2 parts: encoder and decoder)
T5 inference support (Seq2Seq modeling)
Tests

Quick Tests

Exporter

optimum-cli export neuron --model t5-small --task text2text-generation --batch_size 1 --sequence_length 18 --num_beams 4 t5_small_neuron/

Inference

from optimum.neuron import NeuronModelForSeq2SeqLM
from transformers import AutoTokenizer

model_id = "t5-small"
input_shapes = {
    "batch_size": 1,
    "sequence_length": 64,
    "num_beams": 4,
}
neuron_model = NeuronModelForSeq2SeqLM.from_pretrained(model_id, export=True, dynamic_batch_size=False, **input_shapes)
save_path = "t5_small_neuronx/"
neuron_model.save_pretrained(save_path)
del neuron_model

neuron_model = NeuronModelForSeq2SeqLM.from_pretrained(save_path)
tokenizer = AutoTokenizer.from_pretrained(save_path)
prompt = "translate English to German: Lets eat good food."
inputs = tokenizer(prompt, return_tensors="pt")
num_return_sequences = 1

output = neuron_model.generate(
    **inputs,
    num_return_sequences=num_return_sequences,
)
results = [tokenizer.decode(t, skip_special_tokens=True) for t in output]

print("Results:")
for i, summary in enumerate(results):
    print(i + 1, summary)

~~[Caveat] Beam search is not working yet. Got the following error while running the beam search with the official example. Will debug and add support in a coming PR.~~

Error Log Beam Search

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/home/ubuntu/optimum-neuron/optimum/neuron/generation/utils.py:824: UserWarning: use_cache is not supported for generation on Neuron devices, switching to use_cache=False.
  warnings.warn("use_cache is not supported for generation on Neuron devices, switching to use_cache=False.")
2023-Nov-05 17:52:36.718701 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor input13 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718727 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor input24 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718734 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor input16 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718744 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor input21 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718751 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor input12 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718756 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor input25 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718767 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor input19 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718771 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor input22 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718777 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor input5 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718785 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor input15 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718791 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor input26 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718798 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor input18 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718804 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor input23 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718811 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor input28 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718817 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor input6 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718824 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor input11 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718831 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor input14 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718838 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor input27 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718843 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor input8 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718853 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor input7 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718863 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor input10 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718870 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor input17 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718879 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor input20 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718888 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor input9 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718898 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor output9 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718907 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor output11 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718916 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor output26 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718926 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor output16 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718936 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor output7 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718945 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor output8 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718954 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor output12 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718963 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor output3 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718970 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor output21 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718979 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor output17 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718986 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor output6 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.718995 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor output25 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.719007 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor output13 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.719022 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor output20 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.719034 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor output18 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.719045 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor output5 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.719053 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor output24 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.719064 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor output14 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.719073 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor output23 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.719080 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor output19 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.719089 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor output4 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.719099 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor output10 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.719108 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor output15 allocated on logical nc 0 must be allocated on same nc as model (1)
2023-Nov-05 17:52:36.719117 32119:32119 ERROR  NMGR:dlr_check_tensor_set_on_same_tpb_v2     Tensor output22 allocated on logical nc 0 must be allocated on same nc as model (1)
Traceback (most recent call last):
  File "test_t5.py", line 923, in <module>
    output = model.generate(
  File "test_t5.py", line 390, in generate
    output = super().generate(
  File "/home/ubuntu/pyvenv/aws_neuron_venv_2.15/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/optimum-neuron/optimum/neuron/generation/utils.py", line 1041, in generate
    return self.beam_search(
  File "test_t5.py", line 497, in beam_search
    next_token_scores, next_tokens, next_indices = self(
  File "/home/ubuntu/pyvenv/aws_neuron_venv_2.15/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "test_t5.py", line 418, in forward
    decoder_outputs = self.decoder(
  File "/home/ubuntu/pyvenv/aws_neuron_venv_2.15/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/torch_neuronx/xla_impl/trace/___torch_mangle_3.py", line 65, in forward
    _0 = getattr(states22, "0")
    _24 = [argument_1, argument_2, argument_4, argument_5, argument_6, _0, _1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11, _12, _13, _14, _15, _16, _17, _18, _19, _20, _21, _22, _23]
    _25, _26, _27, _28, _29, _30, _31, _32, _33, _34, _35, _36, _37, _38, _39, _40, _41, _42, _43, _44, _45, _46, _47, _48, _49, _50, _51, = ops.neuron.forward_v2(_24, model)
                                                                                                                                             ~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _52 = [_25, _26, _27, _28, _29, _30, _31, _32, _33, _34, _35, _36, _37, _38, _39, _40, _41, _42, _43, _44, _45, _46, _47, _48, _49, _50, _51]
    return _52

Traceback of TorchScript, original code (most recent call last):
/home/ubuntu/pyvenv/aws_neuron_venv_2.15/lib/python3.8/site-packages/torch/_ops.py(442): __call__
/home/ubuntu/pyvenv/aws_neuron_venv_2.15/lib/python3.8/site-packages/torch_neuronx/xla_impl/trace.py(101): forward
/home/ubuntu/pyvenv/aws_neuron_venv_2.15/lib/python3.8/site-packages/torch/nn/modules/module.py(1182): _slow_forward
/home/ubuntu/pyvenv/aws_neuron_venv_2.15/lib/python3.8/site-packages/torch/nn/modules/module.py(1194): _call_impl
/home/ubuntu/pyvenv/aws_neuron_venv_2.15/lib/python3.8/site-packages/torch/jit/_trace.py(976): trace_module
/home/ubuntu/pyvenv/aws_neuron_venv_2.15/lib/python3.8/site-packages/torch/jit/_trace.py(759): trace
/home/ubuntu/pyvenv/aws_neuron_venv_2.15/lib/python3.8/site-packages/torch_neuronx/xla_impl/trace.py(422): create_neuron_model
/home/ubuntu/pyvenv/aws_neuron_venv_2.15/lib/python3.8/site-packages/torch_neuronx/xla_impl/trace.py(395): trace
test_t5.py(825): trace_decoder
test_t5.py(908): <module>
RuntimeError: Failed to execute the model status=2 message=Invalid

found the issue, in the notebook, runtime was not initialized for beam search, leading to the error.

Next Steps

Add parallelism support from neuronx-distributed
Documentation
Perhaps wider range of test and refactoring if I have the bandwidth.

HuggingFaceDocBuilderDev · 2023-10-23T14:21:15Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

optimum/exporters/neuron/model_configs.py

optimum/neuron/modeling_seq2seq.py

dacorvo · 2023-11-14T10:29:45Z

Awesome ! I have a few questions/issues regarding the generation code though.

michaelbenayoun

Left a few comments, and opened the discussion on generate methods: would it be possible to re-use what we already have?

optimum/exporters/neuron/base.py

optimum/exporters/neuron/config.py

optimum/neuron/modeling_seq2seq.py

tests/generation/test_hub.py

dacorvo · 2023-11-24T11:03:39Z

I noticed that all the code for this new model class lives in modeling_seq_2seq.py which is fine.
However, when I submitted NeuronModelForCausalLM, I was told to split it in modeling_decoder.py and modeling.py.
I think this is inconsistent and i will eventually push a pull-request to have all causal lm classes in modeling_decoder.

dacorvo

Thanks for taking the time to merge the generation code. Most of my questions are related to this part that I did not review when it was first merged: i find it a bit cryptic where it diverges from the original transformers code I am more familiar with.
My main concern is how this is actually compatible with the generation config / generation parameters that can be passed to generate.
So if possible I would like to have some unit tests on that before formally approving.