Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata values are being dropped when list of values are provided #3434

Closed
vijayirlapati opened this issue Oct 20, 2022 · 5 comments · Fixed by #3547
Closed

Metadata values are being dropped when list of values are provided #3434

vijayirlapati opened this issue Oct 20, 2022 · 5 comments · Fixed by #3547
Assignees
Labels
Contributions wanted! Looking for external contributions good first issue Good for newcomers topic:document_store topic:metadata topic:sql type:bug Something isn't working

Comments

@vijayirlapati
Copy link

vijayirlapati commented Oct 20, 2022

Describe the bug
If document metadata has a list, the corresponding values are not being added to the documentstore. It is removing the metadata for that particular document during writing to documentstore.

Error message
The following code is being executed but it is throwing the following warning. As a result, some metadata is not added to the documentstore.

document_store.write_documents(docs )
>>>>>
Writing Documents:   0%|                                                                                                                                              | 0/8 [00:00<?, ?it/s]ERROR:haystack.document_stores.sql:Document 4b9f55bd6cbde00ac607619025a13c1e - Discarded metadata 'item_id', since it has invalid type: list.
SQLDocumentStore can accept and cast to string only the following types: str, int, float, bool, bytes, bytearray, NoneType
Writing Documents: 10000it [00:00, 244791.35it/s]

Expected behavior
All metadata should be added to the documentstore so as to use it during prediction using the filters option

Additional context
I am using FAISS Documentstore with the haystack v1.9.1 version. Metadata filtering is working with single values but facing an issue when we have multiple values in metadata.

To Reproduce

from haystack.document_stores import FAISSDocumentStore
from haystack.utils import convert_files_to_docs

document_store = FAISSDocumentStore( )
data_dir = "./data/4docs"
docs = convert_files_to_docs(data_dir)
docs[0].meta['item_id'] = ['B0017D5Z40', 'B0017D5Z41', 'B0017D5Z42']
docs[1].meta['item_id'] = 'B0017D5Z40'
docs[2].meta['item_id'] = 'B0017D5Z41'
docs[3].meta['item_id'] = 'B0017D5Z42'

document_store.write_documents(docs)

FAQ Check
Done

System:

  • OS: Windows WSL
  • GPU/CPU: CPU
  • Haystack version (commit or version number): v1.9.1
  • DocumentStore: FAISS
  • Retriever: DPR
  • Reader: FARM
@vijayirlapati
Copy link
Author

Hi @ZanSara, Created an issue as discussed on the discord community.

@ZanSara ZanSara added type:bug Something isn't working topic:sql topic:metadata topic:document_store good first issue Good for newcomers Contributions wanted! Looking for external contributions labels Oct 20, 2022
@anakin87
Copy link
Member

Related to #2792 and #2868.

Simply put, the metadata for FAISS Documentstore is saved in an SQL database.
To be database-independent, lists are not accepted, as some databases do not manage them.

Possible workarounds:

  • choose a Documentstore that can handle lists (such as Elastic...)
  • represent your field as string (I imagine it can cause issues with filtering)

@ZanSara
Copy link
Contributor

ZanSara commented Oct 20, 2022

Hey @anakin87, great to see you jumping in already 😁

I had the idea that this issue might be best addressed by serializing the field as JSON (basically a variant of the second option you list above), but I don't think that will work with filtering at all. Do you think we can make filtering work on JSON fields somehow? Do SQLAlchemy support a feature like this across SQL DBs? I have the impression this has been discussed before but I don't see where right now.

Edit: it was buried into #2792

@vijayirlapati
Copy link
Author

Thanks for you reply @anakin87
Yeah with elastic documentstore this might work but FAISS documentstore is doing well on my documents than elastic. So I am looking for some filtering mechanism that works with FAISS espeacially when we need to tag multiple values to a document

@ZanSara
Copy link
Contributor

ZanSara commented Oct 28, 2022

@vijayirlapati as a workaround, have you considered other vector document stores, like Weaviate or Milvus?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Contributions wanted! Looking for external contributions good first issue Good for newcomers topic:document_store topic:metadata topic:sql type:bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants