Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI Refactoring] Refactor Document fixtures in tests #2577

Merged
merged 52 commits into from
Jun 10, 2022

Conversation

ZanSara
Copy link
Contributor

@ZanSara ZanSara commented May 19, 2022

Problem:

  • Document fixtures were quite stubbed in conftest.py.
  • conftest.py included the entire definition of two sentence embeddings in the code, for about 1400 lines of "code" added to the file for no reason.
  • Some constants containing documents should have been fixtures in the first place DOCS_WITH_EMBEDDINGS
  • Many functions were duplicating effort to convert documents from dict to actual Document class, while you could do the same in the original fixture.

Solution:

  • Move the embedding into dedicated files under test/samples/embeddings
  • Change DOCS_WITH_EMBEDDINGS into the fixture docs_with_true_emb and refactor relative tests
  • Change test_docs_xs into simply docs and ensure that all items of the returned list are Document objects
  • Add the docs_all_formats fixture for documents of legacy types (dictionaries mostly)
  • Add the docs_with_random_emb for tests that require documents with embeddings, but not exact embeddings
  • Remove many useless scope="function" fixture parameters (that's the default)

@ZanSara ZanSara added type:refactor Not necessarily visible to the users topic:tests ignore-for-release-notes PRs with this flag won't be included in the release notes. labels May 19, 2022
@ZanSara ZanSara mentioned this pull request May 19, 2022
@masci masci mentioned this pull request May 27, 2022
5 tasks
@ZanSara ZanSara marked this pull request as ready for review June 2, 2022 16:54
@ZanSara ZanSara requested a review from masci June 2, 2022 16:54
@ZanSara ZanSara requested a review from bogdankostic June 10, 2022 07:43
Copy link
Contributor

@bogdankostic bogdankostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! There just seems to be an unresolved merge conflict in crawler.py.

haystack/nodes/connector/crawler.py Outdated Show resolved Hide resolved
@@ -942,7 +955,7 @@ def adaptive_model_qa(num_processes):
logging.error(f"Not all the subprocesses are closed! {len(children)} are still running.")


@pytest.fixture(scope="function")
@pytest.fixture
def bert_base_squad2(request):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated to this PR, but not sure about the naming of this fixture, given that the model used is not bert-base but minilm... 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OMG 😄 Let's remember about this one!

@@ -86,9 +89,9 @@ def test_pseudo_label_generator_using_question_document_pairs(
@pytest.mark.parametrize("document_store", ["memory"], indirect=True)
@pytest.mark.parametrize("retriever", ["embedding_sbert"], indirect=True)
def test_pseudo_label_generator_using_question_document_pairs_batch(
document_store: BaseDocumentStore, retriever: EmbeddingRetriever, tmp_path: Path
document_store: BaseDocumentStore, retriever: EmbeddingRetriever,docs_with_true_emb: List[Document]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
document_store: BaseDocumentStore, retriever: EmbeddingRetriever,docs_with_true_emb: List[Document]
document_store: BaseDocumentStore, retriever: EmbeddingRetriever, docs_with_true_emb: List[Document]

@ZanSara ZanSara merged commit 54518ac into master Jun 10, 2022
@ZanSara ZanSara deleted the refactor_test_documents_fixtures branch June 10, 2022 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ignore-for-release-notes PRs with this flag won't be included in the release notes. topic:tests type:refactor Not necessarily visible to the users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants