Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to use interpreter --local --vision #1467

Open
drhouse opened this issue Sep 27, 2024 · 1 comment
Open

unable to use interpreter --local --vision #1467

drhouse opened this issue Sep 27, 2024 · 1 comment

Comments

@drhouse
Copy link

drhouse commented Sep 27, 2024

Issue

I'm not sure about the proper workflow to use with interpreter vision after reading this. For the record, I separately installed moondream and it ran great, including its gradio demo. As a sidenote, regarding vision, I am curious about whether Open Interpreter will be able use llama 3.2's vision capability.

Platform

I am running Windows10 x64, modern pc hardware, Windows Terminal > Powershell
Do I need to be using WSL?

Attempts

When trying to use 'interpreter --local --vision' with llama 3.2, it doesn't seem to be able to use moondream to view anything. I've tried commands like 'what do you see?' and 'take a screenshot and describe it', it doesn't understand it has moondream available.

I have also tried 'interpreter --local --vision --os' with llama 3.2 and get a bit further, it will:

  • take a screenshot
  • save it in 'C:/Windows/Temp'
  • opens it in FSViewer (my associated photo program)
  • tries using computer.view()

Error

after which I get this error

Traceback (most recent call last):
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\Lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\Lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\Scripts\interpreter.exe\__main__.py", line 7, in <module>
    sys.exit(main())
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\terminal_interface\start_terminal_interface.py", line 610, in main
    start_terminal_interface(interpreter)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\terminal_interface\start_terminal_interface.py", line 576, in start_terminal_interface
    interpreter.chat()
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\core.py", line 191, in chat
    for _ in self._streaming_chat(message=message, display=display):
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\core.py", line 223, in _streaming_chat
    yield from terminal_interface(self, message)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\terminal_interface\terminal_interface.py", line 157, in terminal_interface
    for chunk in interpreter.chat(message, display=False, stream=True):
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\core.py", line 259, in _streaming_chat
    yield from self._respond_and_store()
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\core.py", line 318, in _respond_and_store
    for chunk in respond(self):
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\respond.py", line 86, in respond
    for chunk in interpreter.llm.run(messages_for_llm):
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\llm\llm.py", line 180, in run
    image_description = self.vision_renderer(lmc=img_msg)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\computer\vision\vision.py", line 171, in query
    answer = self.model.answer_question(
  File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\moondream.py", line 93, in answer_question
    answer = self.generate(
  File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\moondream.py", line 77, in generate
    output_ids = self.text_model.generate(
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 2024, in generate
    result = self._sample(
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 2982, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\modeling_phi.py", line 1104, in forward
    outputs = self.transformer(
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\modeling_phi.py", line 959, in forward
    layer_outputs = decoder_layer(
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\modeling_phi.py", line 763, in forward
    attn_outputs, self_attn_weights, present_key_value = self.mixer(
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\modeling_phi.py", line 382, in forward
    query_rot, key_rot = apply_rotary_pos_emb(
  File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\modeling_phi.py", line 214, in apply_rotary_pos_emb
    cos = cos[position_ids].unsqueeze(unsqueeze_dim)
IndexError: index is out of bounds for dimension with size 0

Screenshot

2024-09-26 23_12_03-Greenshot

Open Interpreter version

0.3.13

Python version

3.10.11

Operating System name and version

Windows 10

@Manamama
Copy link

A tip - even without vision as an argument the "i" model is clever enough to use tesseract, on its own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants