Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I increase the batch size #1084

Closed
Aldoraz opened this issue Oct 20, 2023 · 3 comments · Fixed by #1087
Closed

How can I increase the batch size #1084

Aldoraz opened this issue Oct 20, 2023 · 3 comments · Fixed by #1087
Assignees

Comments

@Aldoraz
Copy link

Aldoraz commented Oct 20, 2023

I'm trying to run the model locally, however the pdfs i'd like to ingest all give me this error:

Parsing documents into nodes: 100%|██████████████████████████████████████████████████████████████████████| 254/254 [00:00<00:00, 478.26it/s]
Generating embeddings: 100%|████████████████████████████████████████████████████████████████████████████| 6393/6393 [03:36<00:00, 29.53it/s]
Traceback (most recent call last):
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\gradio\queueing.py", line 406, 
in call_prediction
    output = await route_utils.call_process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\gradio\route_utils.py", line 226, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\gradio\blocks.py", line 1554, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\gradio\blocks.py", line 1192, in call_function
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\gradio\utils.py", line 659, in 
wrapper
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "D:\Libraries\Desktop\privateGPT\private_gpt\ui\ui.py", line 96, in _upload_file
    ingest_service.ingest(file_name=path.name, file_data=path)
  File "D:\Libraries\Desktop\privateGPT\private_gpt\server\ingest\ingest_service.py", line 106, in ingest
    return self._save_docs(documents)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Libraries\Desktop\privateGPT\private_gpt\server\ingest\ingest_service.py", line 116, in _save_docs
    VectorStoreIndex.from_documents(
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\llama_index\indices\base.py", line 102, in from_documents
    return cls(
           ^^^^
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\llama_index\indices\vector_store\base.py", line 49, in __init__
    super().__init__(
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\llama_index\indices\base.py", line 71, in __init__
    index_struct = self.build_index_from_nodes(nodes)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\llama_index\indices\vector_store\base.py", line 238, in build_index_from_nodes
    return self._build_index_from_nodes(nodes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\llama_index\indices\vector_store\base.py", line 226, in _build_index_from_nodes
    self._add_nodes_to_index(
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\llama_index\indices\vector_store\base.py", line 187, in _add_nodes_to_index
    new_ids = self._vector_store.add(nodes)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\llama_index\vector_stores\chroma.py", line 146, in add
    self._collection.add(
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\chromadb\api\models\Collection.py", line 100, in add
    self._client._add(ids, self.id, embeddings, metadatas, documents)
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\chromadb\api\segment.py", line 
264, in _add
    validate_batch(
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\chromadb\api\types.py", line 377, in validate_batch
    raise ValueError(
ValueError: Batch size 6393 exceeds maximum batch size 5461
INFO:     connection closed
@imartinez imartinez self-assigned this Oct 20, 2023
@imartinez
Copy link
Collaborator

Thanks for sharing the detailed description. This is a regression. I'll work on the fix asap.

@imartinez
Copy link
Collaborator

@Aldoraz please verify the fix. Thanks!

@Aldoraz
Copy link
Author

Aldoraz commented Oct 20, 2023

@Aldoraz please verify the fix. Thanks!

Works now. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants