Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module 'pydantic.fields' has no attribute 'ModelField' in awswrangler[modin,ray]==3.2.1 #2379

Closed
kangks opened this issue Jul 5, 2023 · 5 comments · Fixed by #2403
Closed
Labels
bug Something isn't working
Milestone

Comments

@kangks
Copy link

kangks commented Jul 5, 2023

Describe the bug

When trying to invote wr.engine.initialize() with "awswrangler[modin,ray]==3.2.1", following error reported

Traceback (most recent call last):
  File "//wr_init.py", line 8, in <module>
    df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
  File "/usr/local/lib/python3.10/site-packages/modin/logging/logger_decorator.py", line 128, in run_and_log
    return obj(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/modin/pandas/dataframe.py", line 179, in __init__
    distributed_frame = from_non_pandas(data, index, columns, dtype)
  File "/usr/local/lib/python3.10/site-packages/modin/pandas/utils.py", line 71, in from_non_pandas
    new_qc = FactoryDispatcher.from_non_pandas(df, index, columns, dtype)
  File "/usr/local/lib/python3.10/site-packages/modin/core/execution/dispatching/factories/dispatcher.py", line 173, in from_non_pandas
    return cls.get_factory()._from_non_pandas(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/modin/core/execution/dispatching/factories/dispatcher.py", line 116, in get_factory
    Engine.subscribe(cls._update_factory)
  File "/usr/local/lib/python3.10/site-packages/modin/config/pubsub.py", line 217, in subscribe
    callback(cls)
  File "/usr/local/lib/python3.10/site-packages/modin/core/execution/dispatching/factories/dispatcher.py", line 158, in _update_factory
    cls.__factory.prepare()
  File "/usr/local/lib/python3.10/site-packages/modin/core/execution/dispatching/factories/factories.py", line 447, in prepare
    from modin.core.execution.ray.implementations.pandas_on_ray.io import (
  File "/usr/local/lib/python3.10/site-packages/modin/core/execution/ray/implementations/pandas_on_ray/io/__init__.py", line 16, in <module>
    from .io import PandasOnRayIO
  File "/usr/local/lib/python3.10/site-packages/modin/core/execution/ray/implementations/pandas_on_ray/io/io.py", line 43, in <module>
    from ..dataframe import PandasOnRayDataframe
  File "/usr/local/lib/python3.10/site-packages/modin/core/execution/ray/implementations/pandas_on_ray/dataframe/__init__.py", line 16, in <module>
    from .dataframe import PandasOnRayDataframe
  File "/usr/local/lib/python3.10/site-packages/modin/core/execution/ray/implementations/pandas_on_ray/dataframe/dataframe.py", line 16, in <module>
    from ..partitioning.partition_manager import PandasOnRayDataframePartitionManager
  File "/usr/local/lib/python3.10/site-packages/modin/core/execution/ray/implementations/pandas_on_ray/partitioning/__init__.py", line 16, in <module>
    from .partition import PandasOnRayDataframePartition
  File "/usr/local/lib/python3.10/site-packages/modin/core/execution/ray/implementations/pandas_on_ray/partitioning/partition.py", line 28, in <module>
    class PandasOnRayDataframePartition(PandasDataframePartition):
  File "/usr/local/lib/python3.10/site-packages/modin/core/execution/ray/implementations/pandas_on_ray/partitioning/partition.py", line 178, in PandasOnRayDataframePartition
    _iloc = execution_wrapper.put(PandasDataframePartition._iloc)
  File "/usr/local/lib/python3.10/site-packages/modin/core/execution/ray/common/engine_wrapper.py", line 111, in put
    return ray.put(data, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 18, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/ray/_private/worker.py", line 2612, in put
    object_ref = worker.put_object(value, owner_address=serialize_owner_address)
  File "/usr/local/lib/python3.10/site-packages/ray/_private/worker.py", line 693, in put_object
    serialized_value = self.get_serialization_context().serialize(value)
  File "/usr/local/lib/python3.10/site-packages/ray/_private/worker.py", line 618, in get_serialization_context
    context_map[job_id] = serialization.SerializationContext(self)
  File "/usr/local/lib/python3.10/site-packages/ray/_private/serialization.py", line 151, in __init__
    serialization_addons.apply(self)
  File "/usr/local/lib/python3.10/site-packages/ray/util/serialization_addons.py", line 58, in apply
    register_pydantic_serializer(serialization_context)
  File "/usr/local/lib/python3.10/site-packages/ray/util/serialization_addons.py", line 21, in register_pydantic_serializer
    pydantic.fields.ModelField,
AttributeError: module 'pydantic.fields' has no attribute 'ModelField'

The requirements.txt from awswrangler[modin,ray]==3.2.1 as below:

aiohttp==3.8.4
aiohttp-cors==0.7.0
aiosignal==1.3.1
annotated-types==0.5.0
async-timeout==4.0.2
attrs==23.1.0
awswrangler==3.2.1
blessed==1.20.0
boto3==1.27.0
botocore==1.30.0
cachetools==5.3.1
certifi==2023.5.7
charset-normalizer==3.1.0
click==8.1.3
colorful==0.5.5
distlib==0.3.6
filelock==3.12.2
frozenlist==1.3.3
fsspec==2023.6.0
google-api-core==2.11.1
google-auth==2.21.0
googleapis-common-protos==1.59.1
gpustat==1.1
grpcio==1.51.3
idna==3.4
jmespath==1.0.1
jsonschema==4.17.3
modin==0.20.1
msgpack==1.0.5
multidict==6.0.4
numpy==1.25.0
nvidia-ml-py==11.525.131
opencensus==0.11.2
opencensus-context==0.1.3
packaging==23.1
pandas==1.5.3
platformdirs==3.8.0
prometheus-client==0.17.0
protobuf==4.23.3
psutil==5.9.5
py-spy==0.3.14
pyarrow==12.0.1
pyasn1==0.5.0
pyasn1-modules==0.3.0
pydantic==2.0.1
pydantic_core==2.0.2
pyrsistent==0.19.3
python-dateutil==2.8.2
pytz==2023.3
PyYAML==6.0
ray==2.5.0
requests==2.31.0
rsa==4.9
s3transfer==0.6.1
six==1.16.0
smart-open==6.3.0
typing_extensions==4.7.1
urllib3==1.26.16
virtualenv==20.21.0
wcwidth==0.2.6
yarl==1.9.2

How to Reproduce

Good baseline of modin+ray without pydantic errors

  1. Create a simple ray_init.py
import ray
ray.init()

import modin.pandas as pd
df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
  1. Create the first dockerfile with Python3.10 base image
FROM python:3.10-slim as build

ARG FUNCTION_DIR

RUN apt-get update
RUN apt-get install -y --no-install-recommends \
	build-essential gcc 

RUN pip3 install modin==0.20.1 ray==2.5.0
COPY ray_init.py .
  1. Build and run the container
% docker build --no-cache -t wr_test -f ./Dockerfile .
% docker run --rm --shm-size=4G wr_test python3.10 ray_init.py
2023-07-05 01:03:40,427 INFO worker.py:1636 -- Started a local Ray instance.
UserWarning: When using a pre-initialized Ray cluster, please ensure that the runtime env sets environment variable __MODIN_AUTOIMPORT_PANDAS__ to 1
UserWarning: Distributing <class 'dict'> object. This may take some time.

Error with awswrangler

  1. Create a new wr_init.py
import awswrangler as wr
import modin.pandas as pd

wr.engine.initialize()
print(f"Execution Engine: {wr.engine.get()}")
print(f"Memory Format: {wr.memory_format.get()}")

df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
  1. Update the Dockerfile to install "awswrangler[modin,ray]==3.2.1"
FROM python:3.10-slim as build

ARG FUNCTION_DIR

RUN apt-get update
RUN apt-get install -y --no-install-recommends \
	build-essential gcc 

RUN pip3 install modin==0.20.1 ray==2.5.0
COPY wr_init.py .

FROM build
RUN pip3 install "awswrangler[modin,ray]==3.2.1"
  1. Build and run the container
% docker build --no-cache -t wr_test -f ./Dockerfile .
% docker run --rm --shm-size=4G wr_test python3.10 wr_init.py
2023-07-05 01:16:37,943 ERROR services.py:1207 -- Failed to start the dashboard , return code 1
2023-07-05 01:16:37,943 ERROR services.py:1232 -- Error should be written to 'dashboard.log' or 'dashboard.err'. We are printing the last 20 lines for you. See 'https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure' to find where the log file is.
2023-07-05 01:16:37,944 ERROR services.py:1276 -- 
The last 20 lines of /tmp/ray/session_2023-07-05_01-16-35_556700_771/logs/dashboard.log (it contains the error message from the dashboard): 
  File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.10/site-packages/ray/dashboard/modules/log/log_manager.py", line 8, in <module>
    from ray.util.state.common import (
  File "/usr/local/lib/python3.10/site-packages/ray/util/state/__init__.py", line 1, in <module>
    from ray.util.state.api import (
  File "/usr/local/lib/python3.10/site-packages/ray/util/state/api.py", line 17, in <module>
    from ray.util.state.common import (
  File "/usr/local/lib/python3.10/site-packages/ray/util/state/common.py", line 120, in <module>
    @dataclass(init=True)
  File "/usr/local/lib/python3.10/site-packages/pydantic/dataclasses.py", line 139, in dataclass
    assert init is False, 'pydantic.dataclasses.dataclass only supports init=False'
AssertionError: pydantic.dataclasses.dataclass only supports init=False
2023-07-05 01:16:39,212 INFO worker.py:1636 -- Started a local Ray instance.
Execution Engine: EngineEnum.RAY
Memory Format: MemoryFormatEnum.MODIN
UserWarning: When using a pre-initialized Ray cluster, please ensure that the runtime env sets environment variable __MODIN_AUTOIMPORT_PANDAS__ to 1
Traceback (most recent call last):
  File "/app/wr_init.py", line 8, in <module>
    df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
  File "/usr/local/lib/python3.10/site-packages/modin/logging/logger_decorator.py", line 128, in run_and_log
    return obj(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/modin/pandas/dataframe.py", line 179, in __init__
    distributed_frame = from_non_pandas(data, index, columns, dtype)
  File "/usr/local/lib/python3.10/site-packages/modin/pandas/utils.py", line 71, in from_non_pandas
    new_qc = FactoryDispatcher.from_non_pandas(df, index, columns, dtype)
  File "/usr/local/lib/python3.10/site-packages/modin/core/execution/dispatching/factories/dispatcher.py", line 173, in from_non_pandas
    return cls.get_factory()._from_non_pandas(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/modin/core/execution/dispatching/factories/dispatcher.py", line 116, in get_factory
    Engine.subscribe(cls._update_factory)
  File "/usr/local/lib/python3.10/site-packages/modin/config/pubsub.py", line 217, in subscribe
    callback(cls)
  File "/usr/local/lib/python3.10/site-packages/modin/core/execution/dispatching/factories/dispatcher.py", line 158, in _update_factory
    cls.__factory.prepare()
  File "/usr/local/lib/python3.10/site-packages/modin/core/execution/dispatching/factories/factories.py", line 447, in prepare
    from modin.core.execution.ray.implementations.pandas_on_ray.io import (
  File "/usr/local/lib/python3.10/site-packages/modin/core/execution/ray/implementations/pandas_on_ray/io/__init__.py", line 16, in <module>
    from .io import PandasOnRayIO
  File "/usr/local/lib/python3.10/site-packages/modin/core/execution/ray/implementations/pandas_on_ray/io/io.py", line 43, in <module>
    from ..dataframe import PandasOnRayDataframe
  File "/usr/local/lib/python3.10/site-packages/modin/core/execution/ray/implementations/pandas_on_ray/dataframe/__init__.py", line 16, in <module>
    from .dataframe import PandasOnRayDataframe
  File "/usr/local/lib/python3.10/site-packages/modin/core/execution/ray/implementations/pandas_on_ray/dataframe/dataframe.py", line 16, in <module>
    from ..partitioning.partition_manager import PandasOnRayDataframePartitionManager
  File "/usr/local/lib/python3.10/site-packages/modin/core/execution/ray/implementations/pandas_on_ray/partitioning/__init__.py", line 16, in <module>
    from .partition import PandasOnRayDataframePartition
  File "/usr/local/lib/python3.10/site-packages/modin/core/execution/ray/implementations/pandas_on_ray/partitioning/partition.py", line 28, in <module>
    class PandasOnRayDataframePartition(PandasDataframePartition):
  File "/usr/local/lib/python3.10/site-packages/modin/core/execution/ray/implementations/pandas_on_ray/partitioning/partition.py", line 178, in PandasOnRayDataframePartition
    _iloc = execution_wrapper.put(PandasDataframePartition._iloc)
  File "/usr/local/lib/python3.10/site-packages/modin/core/execution/ray/common/engine_wrapper.py", line 111, in put
    return ray.put(data, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 18, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/ray/_private/worker.py", line 2612, in put
    object_ref = worker.put_object(value, owner_address=serialize_owner_address)
  File "/usr/local/lib/python3.10/site-packages/ray/_private/worker.py", line 693, in put_object
    serialized_value = self.get_serialization_context().serialize(value)
  File "/usr/local/lib/python3.10/site-packages/ray/_private/worker.py", line 618, in get_serialization_context
    context_map[job_id] = serialization.SerializationContext(self)
  File "/usr/local/lib/python3.10/site-packages/ray/_private/serialization.py", line 151, in __init__
    serialization_addons.apply(self)
  File "/usr/local/lib/python3.10/site-packages/ray/util/serialization_addons.py", line 58, in apply
    register_pydantic_serializer(serialization_context)
  File "/usr/local/lib/python3.10/site-packages/ray/util/serialization_addons.py", line 21, in register_pydantic_serializer
    pydantic.fields.ModelField,
AttributeError: module 'pydantic.fields' has no attribute 'ModelField'

Expected behavior

No error or as per the output before installation of AWSWrangler

OS

python3.10 base image on aarch64

Python version

3.10

AWS SDK for pandas version

N/A

@kangks kangks added the bug Something isn't working label Jul 5, 2023
@jaidisido
Copy link
Contributor

Thanks for raising this @kangks, it seems that there is a major issue opened a couple of days ago in Ray about this:
ray-project/ray#37019

The current workaround is to force a lower version of Pydantic as it's currently conflicting with Ray:

pip install "pydantic<2"

After downgrading pydantic I was able to run your code snippet without issues

@jaidisido jaidisido pinned this issue Jul 6, 2023
@CalinLucian
Copy link

I can confirm the same, using ray[tune]==2.5.1 and pydantic > 2 yielded the same error.

@PhilippWillms
Copy link

Thanks for raising this @kangks, it seems that there is a major issue opened a couple of days ago in Ray about this: ray-project/ray#37019

The current workaround is to force a lower version of Pydantic as it's currently conflicting with Ray:

pip install "pydantic<2"

After downgrading pydantic I was able to run your code snippet without issues

Did also do that, restricting my poetry package manager accordingly.

@skabbit
Copy link

skabbit commented Jul 20, 2023

confirm that ray>=2.3 isn't compatible with pydantic>=2

@jaidisido jaidisido linked a pull request Jul 31, 2023 that will close this issue
@jaidisido jaidisido added this to the 3.3.0 milestone Jul 31, 2023
@jaidisido jaidisido unpinned this issue Jul 31, 2023
@ckgresla
Copy link

ckgresla commented Nov 28, 2023

confirm that ray>=2.3 isn't compatible with pydantic>=2

downgrading to an older pydantic solved my issue, thank you!

here was the one-line snippet in case anyone else ends up here:
pip install pydantic==1.10.13
This should be the latest version of the lib prior to 2.0.0, as per the PyPI releases page

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants