How can I increase the batch size #1084

Aldoraz · 2023-10-20T03:26:33Z

I'm trying to run the model locally, however the pdfs i'd like to ingest all give me this error:

Parsing documents into nodes: 100%|██████████████████████████████████████████████████████████████████████| 254/254 [00:00<00:00, 478.26it/s]
Generating embeddings: 100%|████████████████████████████████████████████████████████████████████████████| 6393/6393 [03:36<00:00, 29.53it/s]
Traceback (most recent call last):
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\gradio\queueing.py", line 406, 
in call_prediction
    output = await route_utils.call_process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\gradio\route_utils.py", line 226, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\gradio\blocks.py", line 1554, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\gradio\blocks.py", line 1192, in call_function
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\gradio\utils.py", line 659, in 
wrapper
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "D:\Libraries\Desktop\privateGPT\private_gpt\ui\ui.py", line 96, in _upload_file
    ingest_service.ingest(file_name=path.name, file_data=path)
  File "D:\Libraries\Desktop\privateGPT\private_gpt\server\ingest\ingest_service.py", line 106, in ingest
    return self._save_docs(documents)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Libraries\Desktop\privateGPT\private_gpt\server\ingest\ingest_service.py", line 116, in _save_docs
    VectorStoreIndex.from_documents(
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\llama_index\indices\base.py", line 102, in from_documents
    return cls(
           ^^^^
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\llama_index\indices\vector_store\base.py", line 49, in __init__
    super().__init__(
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\llama_index\indices\base.py", line 71, in __init__
    index_struct = self.build_index_from_nodes(nodes)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\llama_index\indices\vector_store\base.py", line 238, in build_index_from_nodes
    return self._build_index_from_nodes(nodes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\llama_index\indices\vector_store\base.py", line 226, in _build_index_from_nodes
    self._add_nodes_to_index(
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\llama_index\indices\vector_store\base.py", line 187, in _add_nodes_to_index
    new_ids = self._vector_store.add(nodes)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\llama_index\vector_stores\chroma.py", line 146, in add
    self._collection.add(
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\chromadb\api\models\Collection.py", line 100, in add
    self._client._add(ids, self.id, embeddings, metadatas, documents)
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\chromadb\api\segment.py", line 
264, in _add
    validate_batch(
  File "C:\Users\alpha\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-krlI0vxD-py3.11\Lib\site-packages\chromadb\api\types.py", line 377, in validate_batch
    raise ValueError(
ValueError: Batch size 6393 exceeds maximum batch size 5461
INFO:     connection closed

The text was updated successfully, but these errors were encountered:

imartinez · 2023-10-20T07:58:34Z

Thanks for sharing the detailed description. This is a regression. I'll work on the fix asap.

imartinez · 2023-10-20T16:25:47Z

@Aldoraz please verify the fix. Thanks!

Aldoraz · 2023-10-20T17:43:11Z

@Aldoraz please verify the fix. Thanks!

Works now. Thanks!

imartinez self-assigned this Oct 20, 2023

imartinez added a commit that referenced this issue Oct 20, 2023

fix: chromadb max batch size #1084

ae70499

imartinez mentioned this issue Oct 20, 2023

fix: chromadb max batch size #1084 #1087

Merged

imartinez closed this as completed in #1087 Oct 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I increase the batch size #1084

How can I increase the batch size #1084

Aldoraz commented Oct 20, 2023

imartinez commented Oct 20, 2023

imartinez commented Oct 20, 2023

Aldoraz commented Oct 20, 2023

How can I increase the batch size #1084

How can I increase the batch size #1084

Comments

Aldoraz commented Oct 20, 2023

imartinez commented Oct 20, 2023

imartinez commented Oct 20, 2023

Aldoraz commented Oct 20, 2023