unable to use interpreter --local --vision #1467

drhouse · 2024-09-27T03:12:26Z

Issue

I'm not sure about the proper workflow to use with interpreter vision after reading this. For the record, I separately installed moondream and it ran great, including its gradio demo. As a sidenote, regarding vision, I am curious about whether Open Interpreter will be able use llama 3.2's vision capability.

Platform

I am running Windows10 x64, modern pc hardware, Windows Terminal > Powershell
Do I need to be using WSL?

Attempts

When trying to use 'interpreter --local --vision' with llama 3.2, it doesn't seem to be able to use moondream to view anything. I've tried commands like 'what do you see?' and 'take a screenshot and describe it', it doesn't understand it has moondream available.

I have also tried 'interpreter --local --vision --os' with llama 3.2 and get a bit further, it will:

take a screenshot
save it in 'C:/Windows/Temp'
opens it in FSViewer (my associated photo program)
tries using computer.view()

Error

after which I get this error

Traceback (most recent call last):
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\Lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\Lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\Scripts\interpreter.exe\__main__.py", line 7, in <module>
    sys.exit(main())
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\terminal_interface\start_terminal_interface.py", line 610, in main
    start_terminal_interface(interpreter)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\terminal_interface\start_terminal_interface.py", line 576, in start_terminal_interface
    interpreter.chat()
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\core.py", line 191, in chat
    for _ in self._streaming_chat(message=message, display=display):
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\core.py", line 223, in _streaming_chat
    yield from terminal_interface(self, message)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\terminal_interface\terminal_interface.py", line 157, in terminal_interface
    for chunk in interpreter.chat(message, display=False, stream=True):
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\core.py", line 259, in _streaming_chat
    yield from self._respond_and_store()
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\core.py", line 318, in _respond_and_store
    for chunk in respond(self):
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\respond.py", line 86, in respond
    for chunk in interpreter.llm.run(messages_for_llm):
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\llm\llm.py", line 180, in run
    image_description = self.vision_renderer(lmc=img_msg)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\computer\vision\vision.py", line 171, in query
    answer = self.model.answer_question(
  File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\moondream.py", line 93, in answer_question
    answer = self.generate(
  File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\moondream.py", line 77, in generate
    output_ids = self.text_model.generate(
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 2024, in generate
    result = self._sample(
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 2982, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\modeling_phi.py", line 1104, in forward
    outputs = self.transformer(
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\modeling_phi.py", line 959, in forward
    layer_outputs = decoder_layer(
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\modeling_phi.py", line 763, in forward
    attn_outputs, self_attn_weights, present_key_value = self.mixer(
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\modeling_phi.py", line 382, in forward
    query_rot, key_rot = apply_rotary_pos_emb(
  File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\modeling_phi.py", line 214, in apply_rotary_pos_emb
    cos = cos[position_ids].unsqueeze(unsqueeze_dim)
IndexError: index is out of bounds for dimension with size 0

Screenshot

Open Interpreter version

0.3.13

Python version

3.10.11

Operating System name and version

Windows 10

The text was updated successfully, but these errors were encountered:

Manamama · 2024-10-30T14:09:55Z

A tip - even without vision as an argument the "i" model is clever enough to use tesseract, on its own.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unable to use interpreter --local --vision #1467

unable to use interpreter --local --vision #1467

drhouse commented Sep 27, 2024 •

edited

Loading

Manamama commented Oct 30, 2024

unable to use interpreter --local --vision #1467

unable to use interpreter --local --vision #1467

Comments

drhouse commented Sep 27, 2024 • edited Loading

Issue

Platform

Attempts

Error

Screenshot

Open Interpreter version

Python version

Operating System name and version

Manamama commented Oct 30, 2024

drhouse commented Sep 27, 2024 •

edited

Loading