Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI Refactoring] Refactor Document fixtures in tests #2577

Merged
merged 52 commits into from
Jun 10, 2022
Merged
Show file tree
Hide file tree
Changes from 51 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
0ec452f
Refactor document fixtures
ZanSara May 19, 2022
b322fdd
Add embedding files
ZanSara May 19, 2022
d692e51
Update Documentation & Code Style
github-actions[bot] May 19, 2022
2666375
Indentation issue
ZanSara May 19, 2022
dc0765f
Merge branch 'refactor_test_documents_fixtures' of github.com:deepset…
ZanSara May 19, 2022
bee1413
Update Documentation & Code Style
github-actions[bot] May 19, 2022
c87d617
Fix type conversion in conftest.py
ZanSara May 19, 2022
4b3b1ee
Merge branch 'refactor_test_documents_fixtures' of github.com:deepset…
ZanSara May 19, 2022
93dfdd4
Update Documentation & Code Style
github-actions[bot] May 19, 2022
f0625bc
mypy on sql.py
ZanSara May 24, 2022
3fca2cb
mypy on crawler.py
ZanSara May 24, 2022
0028999
mypy on pinecone.py
ZanSara May 24, 2022
5a05883
Adapt retriever tests
ZanSara May 24, 2022
3a2f452
Update Documentation & Code Style
github-actions[bot] May 24, 2022
c9a0e19
Merge branch 'master' into refactor_test_documents_fixtures
ZanSara May 24, 2022
6668b8b
mypy on crawler.py
ZanSara May 24, 2022
ba07328
Update Documentation & Code Style
github-actions[bot] May 24, 2022
2468216
mypy on crawler.py again
ZanSara May 24, 2022
77d21be
Merge branch 'refactor_test_documents_fixtures' of github.com:deepset…
ZanSara May 24, 2022
734c77c
Update Documentation & Code Style
github-actions[bot] May 24, 2022
36b9230
mypy fix was too rough
ZanSara May 24, 2022
9a1a86f
Fix some more tests
ZanSara May 24, 2022
43c70ac
Update Documentation & Code Style
github-actions[bot] May 24, 2022
6a02533
Skip meaningless test on FilterRetriever
ZanSara May 25, 2022
0132c6c
Make embedding values less specific
ZanSara May 25, 2022
e29152a
Merge branch 'refactor_test_documents_fixtures' of github.com:deepset…
ZanSara May 25, 2022
2872b32
Update Documentation & Code Style
github-actions[bot] May 25, 2022
181588f
Use stable IDs in retriever tests that depend on it
ZanSara May 25, 2022
cecd090
Remove needless fixtures
ZanSara May 25, 2022
de59cef
docs_with_ids
ZanSara May 25, 2022
e5f0717
Merge branch 'refactor_test_documents_fixtures' of github.com:deepset…
ZanSara May 25, 2022
877ceab
Update Documentation & Code Style
github-actions[bot] May 25, 2022
296e89e
Typo
ZanSara May 25, 2022
9883fb9
Merge branch 'refactor_test_documents_fixtures' of github.com:deepset…
ZanSara Jun 1, 2022
7237435
Merge branch 'master' into refactor_test_documents_fixtures
ZanSara Jun 1, 2022
37e1cd6
Fix retriever tests
ZanSara Jun 1, 2022
625e85e
Fix reader tests
ZanSara Jun 1, 2022
f52c509
Update Documentation & Code Style
github-actions[bot] Jun 1, 2022
0f1fc41
Workaround #2626
ZanSara Jun 2, 2022
58ac23d
Update Documentation & Code Style
github-actions[bot] Jun 2, 2022
c976738
Merge branch 'master' into refactor_test_documents_fixtures
ZanSara Jun 2, 2022
3c24425
Fix label generator tests
ZanSara Jun 2, 2022
2256838
Reorder vectors
ZanSara Jun 2, 2022
0177afe
remove print
ZanSara Jun 2, 2022
87cb056
Update Documentation & Code Style
github-actions[bot] Jun 2, 2022
1765a4a
Merge branch 'master' into refactor_test_documents_fixtures
ZanSara Jun 10, 2022
bd53b2c
Update Documentation & Code Style
github-actions[bot] Jun 10, 2022
2fd769b
Merge branch 'master' into refactor_test_documents_fixtures
ZanSara Jun 10, 2022
4f54ea9
Merge branch 'refactor_test_documents_fixtures' of github.com:deepset…
ZanSara Jun 10, 2022
a5a283b
git tags leftover
ZanSara Jun 10, 2022
c88703f
Update Documentation & Code Style
github-actions[bot] Jun 10, 2022
a91677f
fix last failing test
ZanSara Jun 10, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/_src/api/api/crawler.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ List of paths where the crawled webpages got stored
#### Crawler.run

```python
def run(output_dir: Union[str, Path, None] = None, urls: Optional[List[str]] = None, crawler_depth: Optional[int] = None, filter_urls: Optional[List] = None, overwrite_existing_files: Optional[bool] = None, return_documents: Optional[bool] = False, id_hash_keys: Optional[List[str]] = None, extract_hidden_text: Optional[bool] = True) -> Tuple[Dict, str]
def run(output_dir: Union[str, Path, None] = None, urls: Optional[List[str]] = None, crawler_depth: Optional[int] = None, filter_urls: Optional[List] = None, overwrite_existing_files: Optional[bool] = None, return_documents: Optional[bool] = False, id_hash_keys: Optional[List[str]] = None, extract_hidden_text: Optional[bool] = True) -> Tuple[Dict[str, Union[List[Document], List[Path]]], str]
```

Method to be executed when the Crawler is used as a Node within a Haystack pipeline.
Expand Down
2 changes: 1 addition & 1 deletion docs/_src/api/api/primitives.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ dict with content of the Document

```python
@classmethod
def from_dict(cls, dict, field_map={}, id_hash_keys=None)
def from_dict(cls, dict: Dict[str, Any], field_map: Dict[str, Any] = {}, id_hash_keys: Optional[List[str]] = None) -> Document
```

Create Document from dict. An optional field_map can be supplied to adjust for custom names of the keys in the
Expand Down
2 changes: 1 addition & 1 deletion haystack/document_stores/pinecone.py
Original file line number Diff line number Diff line change
Expand Up @@ -256,7 +256,7 @@ def write_documents(
if self.similarity == "cosine":
self.normalize_embedding(embeddings_to_index)
# Convert embeddings to list objects
embeddings = [embed.tolist() for embed in embeddings]
embeddings = [embed.tolist() if embed is not None else None for embed in embeddings]
data_to_write_to_pinecone = zip(ids, embeddings, metadata)
# Metadata fields and embeddings are stored in Pinecone
self.pinecone_indexes[index].upsert(vectors=data_to_write_to_pinecone)
Expand Down
5 changes: 1 addition & 4 deletions haystack/document_stores/sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -376,10 +376,7 @@ def write_documents(
if len(documents) == 0:
return
# Make sure we comply to Document class format
if isinstance(documents[0], dict):
document_objects = [Document.from_dict(d) if isinstance(d, dict) else d for d in documents]
else:
document_objects = documents
document_objects = [Document.from_dict(d) if isinstance(d, dict) else d for d in documents]

document_objects = self._handle_duplicate_documents(
documents=document_objects, index=index, duplicate_documents=duplicate_documents
Expand Down
3 changes: 2 additions & 1 deletion haystack/nodes/connector/crawler.py
Original file line number Diff line number Diff line change
Expand Up @@ -232,7 +232,7 @@ def run( # type: ignore
return_documents: Optional[bool] = False,
id_hash_keys: Optional[List[str]] = None,
extract_hidden_text: Optional[bool] = True,
) -> Tuple[Dict, str]:
) -> Tuple[Dict[str, Union[List[Document], List[Path]]], str]:
"""
Method to be executed when the Crawler is used as a Node within a Haystack pipeline.

Expand Down Expand Up @@ -263,6 +263,7 @@ def run( # type: ignore
overwrite_existing_files=overwrite_existing_files,
extract_hidden_text=extract_hidden_text,
)
results: Dict[str, Union[List[Document], List[Path]]] = {}
if return_documents:
crawled_data = []
for _file in file_paths:
Expand Down
4 changes: 3 additions & 1 deletion haystack/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,9 @@ def to_dict(self, field_map={}) -> Dict:
return _doc

@classmethod
def from_dict(cls, dict, field_map={}, id_hash_keys=None):
def from_dict(
cls, dict: Dict[str, Any], field_map: Dict[str, Any] = {}, id_hash_keys: Optional[List[str]] = None
) -> Document:
"""
Create Document from dict. An optional field_map can be supplied to adjust for custom names of the keys in the
input dict. This way you can work with standardized Document objects in Haystack, but adjust the format that
Expand Down
Loading