Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: add test to check id_hash_keys is not ignored #3577

Merged
merged 3 commits into from
Nov 17, 2022

Conversation

julian-risch
Copy link
Member

@julian-risch julian-risch commented Nov 15, 2022

Related Issues

Proposed Changes:

Add a test case to check for issue #3236
In particular, the test checks that two documents with the same content and different metadata get assigned:

  • the same id if id_hash_keys is None (default)
  • different ids if id_hash_keys is set to 'meta'

Tiny other change: I removed an unnecessary import from pathlib import Path that I came across while searching for a good place to put this test.

How did you test it?

The new test passes locally, which I checked for the current main branch and also for Haystack v1.9.0 and v.1.10 with python 3.10, 3.9 and 3.8 (see the table below)

Notes for the reviewer

I wasn't able to reproduce the error locally with Haystack version >1.8 and even if you should see a need for further investigation with other setups or older Haystack versions, we can already merge a test case for that issue.

Python/Haystack Version 1.8.0 1.9.0 1.10.0
3.10
3.9
3.8

One Weaviate test failed and I am investigating it. It's about test_write_with_duplicate_doc_ids and one document has name in the meta field while the other doesn't. The problem is on our main branch and has nothing to do with this PR.
Waiting for the fix implemented in #3578. Merged.

Checklist

@julian-risch julian-risch requested a review from a team as a code owner November 15, 2022 09:32
@julian-risch julian-risch requested review from masci and removed request for a team November 15, 2022 09:32
@julian-risch julian-risch marked this pull request as draft November 15, 2022 09:54
@julian-risch julian-risch marked this pull request as ready for review November 15, 2022 12:00
@julian-risch julian-risch mentioned this pull request Nov 15, 2022
1 task
Copy link
Contributor

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the detailed PR explanation 👍

@julian-risch julian-risch merged commit 8052632 into main Nov 17, 2022
@julian-risch julian-risch deleted the id_hash_keys_ignored branch November 17, 2022 08:25
ZanSara added a commit that referenced this pull request Nov 28, 2022
* Fix docstrings for DocumentStores

* Fix docstrings for AnswerGenerator

* Fix docstrings for Connector

* Fix docstrings for DocumentClassifier

* Fix docstrings for LabelGenerator

* Fix docstrings for QueryClassifier

* Fix docstrings for Ranker

* Fix docstrings for Retriever and Summarizer

* Fix docstrings for Translator

* Fix docstrings for Pipelines

* Fix docstrings for Primitives

* Fix Python code block spacing

* Add line break before code block

* Fix code blocks

* fix: discard metadata fields if not set in Weaviate (#3578)

* fix weaviate bug in returning embeddings and setting empty meta fields

* review comment

* Update unstable version and openapi schema (#3584)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* fix: Flatten `DocumentClassifier` output in `SQLDocumentStore`; remove `_sql_session_rollback` hack in tests (#3273)

* first draft

* fix

* fix

* move test to test_sql

* test: add test to check id_hash_keys is not ignored (#3577)

* refactor: Generate JSON schema when missing (#3533)

* removed unused script

* print info logs when generating openapi schema

* create json schema only when needed

* fix tests

* Remove leftover

Co-authored-by: ZanSara <sarazanzo94@gmail.com>

* move milvus tests to their own module (#3596)

* feat: store metadata using JSON in SQLDocumentStore (#3547)

* add warnings

* make the field cachable

* review comment

* Pin faiss-cpu as 1.7.3 seems to have problems (#3603)

* Update Haystack imports (#3599)

* Update Python version (#3602)

* fix: `ParsrConverter` fails on pages without text (#3605)

* try to fix bug

* remove print

* leftover

* refactor: update Squad data  (#3513)

* refractor the to_squad data class

* fix the validation label

* refractor the to_squad data class

* fix the validation label

* add the test for the to_label object function

* fix the tests for to_label_objects

* move all the test related to squad data to one file

* remove unused imports

* revert tiny_augmented.json

Co-authored-by: ZanSara <sarazanzo94@gmail.com>

* Url fixes (#3592)

* add 2 example scripts

* fixing faq script

* fixing some urls

* removing example scripts

* black reformatting

* add labeler to the repo (#3609)

* convert eval metrics to python float (#3612)

* feat: add support for `BM25Retriever` in `InMemoryDocumentStore` (#3561)

* very first draft

* implement query and query_batch

* add more bm25 parameters

* add rank_bm25 dependency

* fix mypy

* remove tokenizer callable parameter

* remove unused import

* only json serializable attributes

* try to fix: pylint too-many-public-methods / R0904

* bm25 attribute always present

* convert errors into warnings to make the tutorial 1 work

* add docstrings; tests

* try to make tests run

* better docstrings; revert not running tests

* some suggestions from review

* rename elasticsearch retriever as bm25 in tests; try to test memory_bm25

* exclude tests with filters

* change elasticsearch to bm25 retriever in test_summarizer

* add tests

* try to improve tests

* better type hint

* adapt test_table_text_retriever_embedding

* handle non-textual docs

* query only textual documents

* Incorporate Reviewer feedback

* refactor: replace `torch.no_grad` with `torch.inference_mode` (where possible) (#3601)

* try to replace torch.no_grad

* revert erroneous change

* revert other module breaking

* revert training/base

* Fix docstrings for DocumentStores

* Fix docstrings for AnswerGenerator

* Fix docstrings for Connector

* Fix docstrings for DocumentClassifier

* Fix docstrings for LabelGenerator

* Fix docstrings for QueryClassifier

* Fix docstrings for Ranker

* Fix docstrings for Retriever and Summarizer

* Fix docstrings for Translator

* Fix docstrings for Pipelines

* Fix docstrings for Primitives

* Fix Python code block spacing

* Add line break before code block

* Fix code blocks

* Incorporate Reviewer feedback

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: Espoir Murhabazi <espoir.mur@gmail.com>
Co-authored-by: Tuana Celik <tuana.celik@deepset.ai>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

id_hash_keys is ignored
2 participants