Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark offline store does not work for Python 3.7 #2608

Closed
felixwang9817 opened this issue Apr 25, 2022 · 1 comment
Closed

Spark offline store does not work for Python 3.7 #2608

felixwang9817 opened this issue Apr 25, 2022 · 1 comment

Comments

@felixwang9817
Copy link
Collaborator

Expected Behavior

The Spark offline store should work for Python 3.7

Current Behavior

The Spark offline store does not pass integration tests on Python 3.7 (but does pass on 3.8).

Steps to reproduce

Running a specific integration test with e.g.

PYTHONPATH='.' FULL_REPO_CONFIGS_MODULE=sdk.python.feast.infra.offline_stores.contrib.contrib_repo_configuration IS_TEST=True pytest -s --integration sdk/python/tests/integration/offline_store/test_universal_historical_retrieval.py::test_historical_features_with_missing_request_data

yields _pickle.PicklingError: Could not serialize object: ValueError: Cell is empty.

The full stack trace looks like

Traceback (most recent call last):
  File "/Users/felixwang/feast/env/lib/python3.7/site-packages/pyspark/serializers.py", line 437, in dumps
    return cloudpickle.dumps(obj, pickle_protocol)
  File "/Users/felixwang/feast/env/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 102, in dumps
    cp.dump(obj)
  File "/Users/felixwang/feast/env/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 563, in dump
    return Pickler.dump(self, obj)
  File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 437, in dump
    self.save(obj)
  File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 789, in save_tuple
    save(element)
  File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/Users/felixwang/feast/env/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 745, in save_function
    *self._dynamic_function_reduce(obj), obj=obj
  File "/Users/felixwang/feast/env/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 682, in _save_reduce_pickle5
    dictitems=dictitems, obj=obj
  File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 638, in save_reduce
    save(args)
  File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 789, in save_tuple
    save(element)
  File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 774, in save_tuple
    save(element)
  File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/Users/felixwang/feast/env/lib/python3.7/site-packages/dill/_dill.py", line 1226, in save_cell
    f = obj.cell_contents
ValueError: Cell is empty

I believe the specific issue is that a conflict between dill and cloudpickle: pyspark uses cloudpickle for its default serializer, whereas Feast uses dill in various places. See this for someone else who had a similar issue and this for more details on the conflicts between dill and cloudpickle.

For some reason, switching to Python 3.8 immediately solved this problem for me. I'm not sure if this because dill is especially brittle with Python 3.7; maybe #1971 is related to this?

Specifications

Possible Solution

@felixwang9817
Copy link
Collaborator Author

Closing due to #2810

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant