Using a TextSplitter on multiple documents with filetype="recursive_paths" fails #11

rfishermonteith · 2024-11-11T20:46:25Z

Using a TextSplitter on multiple documents with filetype="recursive_paths" fails with the below error.

This seems to be fixed by changing https://github.com/thiswillbeyourgithub/wdoc/blame/main/wdoc/utils/misc.py#L459 to:

return text_splitters[task][modelname]

Command I'm running:

python -m wdoc
--path="data_for_wdoc"
--filetype="recursive_paths"
--task=search
--query="How can I make wdoc run faster?"
--query_retrievers='default_multiquery'
--top_k=auto_200_500
--llms_api_bases="{'model':'http://localhost:11434','query_eval_model':'http://localhost:11434'}"
--modelname="ollama/gemma2:2b"
--query_eval_modelname="ollama/gemma2:2b"
--recursed_filetype="txt"
--pattern="*.txt"

Error:

Error when loading doc with filetype txt: ''dict' object has no attribute 'transform_documents''. Arguments: {'llm_name': 'ollama/gemma2:2b', 'task': 'search', 'temp_dir': PosixPath('XXXX'), 'path': 'data_for_wdoc/fe061b430a2c4991a002f039c8ca6cb9.txt', 'filetype': 'txt', 'recur_parent_id': '206b66c9-9d44-4138-a413-fc1561d601a3', 'file_hash': '74a0d0bb291717058af1'}
Line number: 340
Full traceback:
  File "XXXX/venv/lib/python3.11/site-packages/wdoc/utils/loaders.py", line 340, in load_one_doc_wrapped
    out = load_one_doc(**doc_kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "<@beartype(wdoc.utils.loaders.load_one_doc) at 0x12b15aca0>", line 205, in load_one_doc

  File "XXXX/venv/lib/python3.11/site-packages/wdoc/utils/loaders.py", line 507, in load_one_doc
    docs = text_splitter.transform_documents(docs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I'm seeing some issues with using recursed_filetype, which I'll open a separate issue for.

The text was updated successfully, but these errors were encountered:

Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>

thiswillbeyourgithub · 2024-11-12T07:16:03Z

Taking a closer look during my commute it appears to be a nobrainer that your suggested fix is right. Thank you very much. I just pushed that to the dev branch

thiswillbeyourgithub added a commit that referenced this issue Nov 12, 2024

fix: text splitter thanks to @rfishermonteith in #11

a1ff236

Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>

thiswillbeyourgithub closed this as completed Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using a TextSplitter on multiple documents with filetype="recursive_paths" fails #11

Using a TextSplitter on multiple documents with filetype="recursive_paths" fails #11

rfishermonteith commented Nov 11, 2024 •

edited

Loading

thiswillbeyourgithub commented Nov 12, 2024

Using a TextSplitter on multiple documents with filetype="recursive_paths" fails #11

Using a TextSplitter on multiple documents with filetype="recursive_paths" fails #11

Comments

rfishermonteith commented Nov 11, 2024 • edited Loading

thiswillbeyourgithub commented Nov 12, 2024

rfishermonteith commented Nov 11, 2024 •

edited

Loading