Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors trying to generate examples #7

Open
marksher opened this issue Nov 18, 2024 · 12 comments
Open

Errors trying to generate examples #7

marksher opened this issue Nov 18, 2024 · 12 comments

Comments

@marksher
Copy link

Hello! I'm getting these errors when trying to generate one of the examples. Any thoughts? This is trying to run on a light Macbook Air M2 with 16GB memory, so could that be the issue?

/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/helpers.py:987: UserWarning: Unexpected argument. Filling with None.
  warnings.warn("Unexpected argument. Filling with None.")
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/transformers/generation/utils.py:1375: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation.

Full trace:
(venv) ➜ LLaMA-Mesh git:(main) ✗ python3 app.py
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████| 4/4 [00:07<00:00, 1.88s/it]
Some parameters are on the meta device because they were offloaded to the disk.
/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/components/chatbot.py:225: UserWarning: You have not specified a value for the type parameter. Defaulting to the 'tuples' format for chatbot messages, but this is deprecated and will be removed in a future version of Gradio. Please set type='messages' instead, which uses openai-style 'role' and 'content' keys.
warnings.warn(

To create a public link, set share=True in launch().
/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/helpers.py:987: UserWarning: Unexpected argument. Filling with None.
warnings.warn("Unexpected argument. Filling with None.")
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/transformers/generation/utils.py:1375: UserWarning: Using the model-agnostic default max_length (=20) to control the generation length. We recommend setting max_new_tokens to control the maximum length of the generation.
warnings.warn(
Exception in thread Thread-9 (generate):
Traceback (most recent call last):
File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
self.run()
File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2068, in generate
self._validate_generated_length(generation_config, input_ids_length, has_default_max_length)
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 1383, in _validate_generated_length
raise ValueError(
ValueError: Input length of input_ids is 21, but max_length is set to 20. This can lead to unexpected behavior. You should consider increasing max_length or, better yet, setting max_new_tokens.
Traceback (most recent call last):
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/queueing.py", line 624, in process_events
response = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/route_utils.py", line 323, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/blocks.py", line 2015, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/blocks.py", line 1574, in call_function
prediction = await utils.async_iteration(iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 710, in async_iteration
return await anext(iterator)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 815, in asyncgen_wrapper
response = await iterator.anext()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/chat_interface.py", line 678, in _stream_fn
first_response = await async_iteration(generator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 710, in async_iteration
return await anext(iterator)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 704, in anext
return await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 943, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 687, in run_sync_iterator_async
return next(iterator)
^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/app.py", line 158, in chat_llama3_8b
for text in streamer:
^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/transformers/generation/streamers.py", line 223, in next
value = self.text_queue.get(timeout=self.timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/queue.py", line 179, in get
raise Empty
_queue.Empty

@thuwzy
Copy link
Collaborator

thuwzy commented Nov 18, 2024

You can set max_new_tokens=4096 before the following function. This is because the version difference. I will fix it later.
https://github.com/nv-tlabs/LLaMA-Mesh/blob/main/app.py#L142-L149

@marksher
Copy link
Author

Getting somewhere! Thanks! After that I set the environment variable TOKENIZERS_PARALLELISM=false which cleared up another problem. None of them are "errors". Could my environment be the issue? I created a clean virtual environment and then just ran pip install -r requirements.

Gradio isn't included in that file. Is there a specific version I should try?

/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/helpers.py:987: UserWarning: Unexpected argument. Filling with None.
  warnings.warn("Unexpected argument. Filling with None.")
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
(venv) ➜  LLaMA-Mesh git:(main) ✗ python app.py
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████| 4/4 [00:07<00:00,  1.96s/it]
Some parameters are on the meta device because they were offloaded to the disk.
/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/components/chatbot.py:225: UserWarning: You have not specified a value for the `type` parameter. Defaulting to the 'tuples' format for chatbot messages, but this is deprecated and will be removed in a future version of Gradio. Please set type='messages' instead, which uses openai-style 'role' and 'content' keys.
  warnings.warn(
* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/helpers.py:987: UserWarning: Unexpected argument. Filling with None.
  warnings.warn("Unexpected argument. Filling with None.")
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Traceback (most recent call last):
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/queueing.py", line 624, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/route_utils.py", line 323, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/blocks.py", line 2015, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/blocks.py", line 1574, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 710, in async_iteration
    return await anext(iterator)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 815, in asyncgen_wrapper
    response = await iterator.__anext__()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/chat_interface.py", line 678, in _stream_fn
    first_response = await async_iteration(generator)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 710, in async_iteration
    return await anext(iterator)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 704, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 943, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 687, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/app.py", line 159, in chat_llama3_8b
    for text in streamer:
                ^^^^^^^^
  File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/transformers/generation/streamers.py", line 223, in __next__
    value = self.text_queue.get(timeout=self.timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/queue.py", line 179, in get
    raise Empty
_queue.Empty

@marksher
Copy link
Author

BTW. This error pops up about 10 seconds after clicking one of the example buttons. I'm down to just trying the 9000 * 9000 one to see if anything will get going before trying the bigger stuff.

@runshengdu
Copy link

BTW. This error pops up about 10 seconds after clicking one of the example buttons. I'm down to just trying the 9000 * 9000 one to see if anything will get going before trying the bigger stuff.

same issue, did you fix it?

@oursland
Copy link
Contributor

Try typing in the command into the text box directly. My issue was that those buttons don't work correctly, but the queries are just fine.

@runshengdu
Copy link

tried it, did not work

@oursland
Copy link
Contributor

Here's what I see on my machine.

image

The buttons in the upper box ("Gradio ChatInterface") do not seem to work, but the buttons below ("Examples") do.

@thuwzy
Copy link
Collaborator

thuwzy commented Nov 21, 2024

Maybe you can try the version gradio==4.44.1? This is the gradio version tested in my environment.

@spicfrankly
Copy link

First I had to set default values for the parameters temperature and max_new_tokens in the function chat_llama3_8b() (I chose 0.9 and 4096 respectively).
These parameters were not set properly by the gr.ChatInterface component, dunno why and there is certainly a better fix by connecting the sliders values to the function call, but for now that connection seems broken.

Second I increased the timeout from 10s to 60s for the streamer in that same function:
streamer = TextIteratorStreamer(tokenizer, timeout=60.0, skip_prompt=True, skip_special_tokens=True).
This is due to limited hardware and the time it takes to predict. I have 12GB VRAM.

Once this is done, well it works but it is veryyyy slow! 😆

@Otterwerks
Copy link

First I had to set default values for the parameters temperature and max_new_tokens in the function chat_llama3_8b() (I chose 0.9 and 4096 respectively). These parameters were not set properly by the gr.ChatInterface component, dunno why and there is certainly a better fix by connecting the sliders values to the function call, but for now that connection seems broken.

Second I increased the timeout from 10s to 60s for the streamer in that same function: streamer = TextIteratorStreamer(tokenizer, timeout=60.0, skip_prompt=True, skip_special_tokens=True). This is due to limited hardware and the time it takes to predict. I have 12GB VRAM.

Once this is done, well it works but it is veryyyy slow! 😆

Hi, I was getting the same error and this was able to at least get me up and running along with setting PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0. Indeed very slow, but working!

MacBook Pro i9 2.4GHZ / 64GB / 5500M 8GB

@tianqizhao-louis
Copy link

bumping this - encountered the same problem. unable to run it locally.

@tianqizhao-louis
Copy link

update: I solved it by trying the gradio version gradio==4.44.1 and then doing a bit of quantization:

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="cuda",
    torch_dtype=torch.bfloat16,
    quantization_config=BitsAndBytesConfig(load_in_8bit=True),
)

hope that helps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants