Errors trying to generate examples #7

marksher · 2024-11-18T15:36:57Z

Hello! I'm getting these errors when trying to generate one of the examples. Any thoughts? This is trying to run on a light Macbook Air M2 with 16GB memory, so could that be the issue?

/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/helpers.py:987: UserWarning: Unexpected argument. Filling with None.
  warnings.warn("Unexpected argument. Filling with None.")
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/transformers/generation/utils.py:1375: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation.

Full trace:
(venv) ➜ LLaMA-Mesh git:(main) ✗ python3 app.py
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████| 4/4 [00:07<00:00, 1.88s/it]
Some parameters are on the meta device because they were offloaded to the disk.
/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/components/chatbot.py:225: UserWarning: You have not specified a value for the type parameter. Defaulting to the 'tuples' format for chatbot messages, but this is deprecated and will be removed in a future version of Gradio. Please set type='messages' instead, which uses openai-style 'role' and 'content' keys.
warnings.warn(

Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/helpers.py:987: UserWarning: Unexpected argument. Filling with None.
warnings.warn("Unexpected argument. Filling with None.")
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/transformers/generation/utils.py:1375: UserWarning: Using the model-agnostic default max_length (=20) to control the generation length. We recommend setting max_new_tokens to control the maximum length of the generation.
warnings.warn(
Exception in thread Thread-9 (generate):
Traceback (most recent call last):
File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
self.run()
File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2068, in generate
self._validate_generated_length(generation_config, input_ids_length, has_default_max_length)
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 1383, in _validate_generated_length
raise ValueError(
ValueError: Input length of input_ids is 21, but max_length is set to 20. This can lead to unexpected behavior. You should consider increasing max_length or, better yet, setting max_new_tokens.
Traceback (most recent call last):
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/queueing.py", line 624, in process_events
response = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/route_utils.py", line 323, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/blocks.py", line 2015, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/blocks.py", line 1574, in call_function
prediction = await utils.async_iteration(iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 710, in async_iteration
return await anext(iterator)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 815, in asyncgen_wrapper
response = await iterator.anext()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/chat_interface.py", line 678, in _stream_fn
first_response = await async_iteration(generator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 710, in async_iteration
return await anext(iterator)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 704, in anext
return await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 943, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 687, in run_sync_iterator_async
return next(iterator)
^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/app.py", line 158, in chat_llama3_8b
for text in streamer:
^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/transformers/generation/streamers.py", line 223, in next
value = self.text_queue.get(timeout=self.timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/queue.py", line 179, in get
raise Empty
_queue.Empty

The text was updated successfully, but these errors were encountered:

thuwzy · 2024-11-18T15:54:50Z

You can set max_new_tokens=4096 before the following function. This is because the version difference. I will fix it later.
https://github.com/nv-tlabs/LLaMA-Mesh/blob/main/app.py#L142-L149

marksher · 2024-11-18T18:05:35Z

Getting somewhere! Thanks! After that I set the environment variable TOKENIZERS_PARALLELISM=false which cleared up another problem. None of them are "errors". Could my environment be the issue? I created a clean virtual environment and then just ran pip install -r requirements.

Gradio isn't included in that file. Is there a specific version I should try?

/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/helpers.py:987: UserWarning: Unexpected argument. Filling with None.
  warnings.warn("Unexpected argument. Filling with None.")
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.

(venv) ➜  LLaMA-Mesh git:(main) ✗ python app.py
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████| 4/4 [00:07<00:00,  1.96s/it]
Some parameters are on the meta device because they were offloaded to the disk.
/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/components/chatbot.py:225: UserWarning: You have not specified a value for the `type` parameter. Defaulting to the 'tuples' format for chatbot messages, but this is deprecated and will be removed in a future version of Gradio. Please set type='messages' instead, which uses openai-style 'role' and 'content' keys.
  warnings.warn(
* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/helpers.py:987: UserWarning: Unexpected argument. Filling with None.
  warnings.warn("Unexpected argument. Filling with None.")
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Traceback (most recent call last):
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/queueing.py", line 624, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/route_utils.py", line 323, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/blocks.py", line 2015, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/blocks.py", line 1574, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 710, in async_iteration
    return await anext(iterator)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 815, in asyncgen_wrapper
    response = await iterator.__anext__()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/chat_interface.py", line 678, in _stream_fn
    first_response = await async_iteration(generator)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 710, in async_iteration
    return await anext(iterator)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 704, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 943, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 687, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/app.py", line 159, in chat_llama3_8b
    for text in streamer:
                ^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/transformers/generation/streamers.py", line 223, in __next__
    value = self.text_queue.get(timeout=self.timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/queue.py", line 179, in get
    raise Empty
_queue.Empty

marksher · 2024-11-18T18:06:53Z

BTW. This error pops up about 10 seconds after clicking one of the example buttons. I'm down to just trying the 9000 * 9000 one to see if anything will get going before trying the bigger stuff.

runshengdu · 2024-11-20T17:08:55Z

BTW. This error pops up about 10 seconds after clicking one of the example buttons. I'm down to just trying the 9000 * 9000 one to see if anything will get going before trying the bigger stuff.

same issue, did you fix it?

oursland · 2024-11-20T18:24:37Z

Try typing in the command into the text box directly. My issue was that those buttons don't work correctly, but the queries are just fine.

runshengdu · 2024-11-21T03:01:52Z

tried it, did not work

oursland · 2024-11-21T04:36:57Z

Here's what I see on my machine.

The buttons in the upper box ("Gradio ChatInterface") do not seem to work, but the buttons below ("Examples") do.

thuwzy · 2024-11-21T13:43:30Z

Maybe you can try the version gradio==4.44.1? This is the gradio version tested in my environment.

spicfrankly · 2024-11-22T02:59:09Z

First I had to set default values for the parameters temperature and max_new_tokens in the function chat_llama3_8b() (I chose 0.9 and 4096 respectively).
These parameters were not set properly by the gr.ChatInterface component, dunno why and there is certainly a better fix by connecting the sliders values to the function call, but for now that connection seems broken.

Second I increased the timeout from 10s to 60s for the streamer in that same function:
streamer = TextIteratorStreamer(tokenizer, timeout=60.0, skip_prompt=True, skip_special_tokens=True).
This is due to limited hardware and the time it takes to predict. I have 12GB VRAM.

Once this is done, well it works but it is veryyyy slow! 😆

Otterwerks · 2024-11-24T18:18:38Z

First I had to set default values for the parameters temperature and max_new_tokens in the function chat_llama3_8b() (I chose 0.9 and 4096 respectively). These parameters were not set properly by the gr.ChatInterface component, dunno why and there is certainly a better fix by connecting the sliders values to the function call, but for now that connection seems broken.

Second I increased the timeout from 10s to 60s for the streamer in that same function: streamer = TextIteratorStreamer(tokenizer, timeout=60.0, skip_prompt=True, skip_special_tokens=True). This is due to limited hardware and the time it takes to predict. I have 12GB VRAM.

Once this is done, well it works but it is veryyyy slow! 😆

Hi, I was getting the same error and this was able to at least get me up and running along with setting PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0. Indeed very slow, but working!

MacBook Pro i9 2.4GHZ / 64GB / 5500M 8GB

tianqizhao-louis · 2025-01-12T22:22:46Z

bumping this - encountered the same problem. unable to run it locally.

tianqizhao-louis · 2025-01-13T23:21:02Z

update: I solved it by trying the gradio version gradio==4.44.1 and then doing a bit of quantization:

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="cuda",
    torch_dtype=torch.bfloat16,
    quantization_config=BitsAndBytesConfig(load_in_8bit=True),
)

hope that helps

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors trying to generate examples #7

Errors trying to generate examples #7

marksher commented Nov 18, 2024

thuwzy commented Nov 18, 2024

marksher commented Nov 18, 2024

marksher commented Nov 18, 2024

runshengdu commented Nov 20, 2024

oursland commented Nov 20, 2024

runshengdu commented Nov 21, 2024

oursland commented Nov 21, 2024

thuwzy commented Nov 21, 2024

spicfrankly commented Nov 22, 2024

Otterwerks commented Nov 24, 2024

tianqizhao-louis commented Jan 12, 2025

tianqizhao-louis commented Jan 13, 2025

Errors trying to generate examples #7

Errors trying to generate examples #7

Comments

marksher commented Nov 18, 2024

thuwzy commented Nov 18, 2024

marksher commented Nov 18, 2024

marksher commented Nov 18, 2024

runshengdu commented Nov 20, 2024

oursland commented Nov 20, 2024

runshengdu commented Nov 21, 2024

oursland commented Nov 21, 2024

thuwzy commented Nov 21, 2024

spicfrankly commented Nov 22, 2024

Otterwerks commented Nov 24, 2024

tianqizhao-louis commented Jan 12, 2025

tianqizhao-louis commented Jan 13, 2025