Skip to content

Commit

Permalink
bug: omit session handler from serialization to avoid mp issues (#2366)
Browse files Browse the repository at this point in the history
### Description
The session handler variable can be anything, because it's specific to
the SDK being used for the connector. This can break the serialization
depending on what that is. To avoid this all together, the session
handler itself is not serialized. Instead, it needs to be recreated if
an object is serialized and then deserialized.
  • Loading branch information
rbiseck3 authored Jan 8, 2024
1 parent 0ca154a commit 7caf255
Show file tree
Hide file tree
Showing 4 changed files with 6 additions and 10 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## 0.11.9-dev3
## 0.11.9-dev4

### Enhancements

Expand Down
2 changes: 1 addition & 1 deletion unstructured/__version__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.11.9-dev3" # pragma: no cover
__version__ = "0.11.9-dev4" # pragma: no cover
2 changes: 2 additions & 0 deletions unstructured/ingest/interfaces.py
Original file line number Diff line number Diff line change
Expand Up @@ -299,6 +299,8 @@ def add_props(self, as_dict: dict, props: t.List[str]):

def to_dict(self, **kwargs) -> t.Dict[str, Json]:
as_dict = _asdict(self, **kwargs)
if "_session_handle" in as_dict:
as_dict.pop("_session_handle", None)
self.add_props(as_dict=as_dict, props=self.properties_to_serialize)
if getattr(self, "_source_metadata") is not None:
self.add_props(as_dict=as_dict, props=self.metadata_properties)
Expand Down
10 changes: 2 additions & 8 deletions unstructured/ingest/pipeline/source.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,8 @@ def get_single(self, doc: BaseSingleIngestDoc, ingest_doc_dict: dict) -> str:
# Still need to fetch metadata if file exists locally
doc.update_source_metadata()
else:
# TODO: update all to use doc.to_json(redact_sensitive=True) once session handler
# can be serialized
try:
serialized_doc = doc.to_json(redact_sensitive=True)
logger.debug(f"Fetching {serialized_doc} - PID: {os.getpid()}")
except Exception as e:
logger.warning("failed to print full doc: ", e)
logger.debug(f"Fetching {doc.__class__.__name__} - PID: {os.getpid()}")
serialized_doc = doc.to_json(redact_sensitive=True)
logger.debug(f"Fetching {serialized_doc} - PID: {os.getpid()}")
if self.retry_strategy:
self.retry_strategy(doc.get_file)
else:
Expand Down

0 comments on commit 7caf255

Please sign in to comment.