Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]xgb train exception in py 3.9.7 #2860

Closed
wuyeguo opened this issue Mar 24, 2022 · 1 comment · Fixed by #2861
Closed

[BUG]xgb train exception in py 3.9.7 #2860

wuyeguo opened this issue Mar 24, 2022 · 1 comment · Fixed by #2861
Labels
Milestone

Comments

@wuyeguo
Copy link

wuyeguo commented Mar 24, 2022

Describe the bug
raise exception when train model use xgb
my code like this

(ray) [ray@ml-test ~]$ cat test_mars_xgb.py
import ray
ray.init(address="ray://172.16.210.22:10001")

import mars
import mars.tensor as mt
import mars.dataframe as md
session = mars.new_ray_session(worker_num=2, worker_mem=2 * 1024 ** 3)

from sklearn.datasets import load_boston
boston = load_boston()

data = md.DataFrame(boston.data, columns=boston.feature_names)

print("data.head().execute()")
print(data.head().execute())

print("data.describe().execute()")
print(data.describe().execute())

from mars.learn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(data, boston.target, train_size=0.7, random_state=0)

print("after split X_train: %s" % X_train)

from mars.learn.contrib import xgboost as xgb

train_dmatrix = xgb.MarsDMatrix(data=X_train, label=y_train)
test_dmatrix = xgb.MarsDMatrix(data=X_test, label=y_test)

print("train_dmatrix: %s" % train_dmatrix)

#params = {'objective': 'reg:squarederror','colsample_bytree': 0.3,'learning_rate': 0.1, 'max_depth': 5, 'alpha': 10, 'n_estimators': 10}

#booster = xgb.train(dtrain=train_dmatrix, params=params)

#xg_reg = xgb.XGBRegressor(objective='reg:squarederror', colsample_bytree=0.3, learning_rate=0.1, max_depth=5, alpha=10, n_estimators=10)
xg_reg = xgb.XGBRegressor()

print("xg_reg.fit %s" % xg_reg)

model = xg_reg.fit(X_train, y_train, session=session)

#xgb.predict(booster, X_test)

print("results.predict")

test_r = model.predict(X_test)

print("output:test_r:%s" % type(test_r))
print(test_r)


To Reproduce
To help us reproducing this bug, please provide information below:

  1. Your Python version:3.9.7
  2. The version of Mars you use:0.9.0rc1
  3. Versions of crucial packages, such as numpy, scipy and pandas
    1. Ray:1.11.0
    2. Numpy:1.22.3
    3. Pandas:1.4.1
    4. Scipy:1.8.0
  4. Full stack of the error.
(ray) [ray@ml-test ~]$ python test_mars_xgb.py
2022-03-24 10:59:42,970 INFO ray.py:432 -- Start cluster with config {'services': ['cluster', 'session', 'storage', 'meta', 'lifecycle', 'scheduling', 'subtask', 'task', 'mutable'], 'cluster': {'backend': 'ray', 'node_timeout': 120, 'node_check_interval': 1, 'ray': {'supervisor': {'standalone': False, 'sub_pool_num': 0}}}, 'session': {'custom_log_dir': None}, 'storage': {'default_config': {'transfer_block_size': '5 * 1024 ** 2'}, 'plasma': {'store_memory': '20%'}, 'backends': ['ray']}, 'meta': {'store': 'dict'}, 'task': {'default_config': {'optimize_tileable_graph': True, 'optimize_chunk_graph': True, 'fuse_enabled': True, 'initial_same_color_num': None, 'as_broadcaster_successor_num': None}}, 'scheduling': {'autoscale': {'enabled': False, 'min_workers': 1, 'max_workers': 100, 'scheduler_backlog_timeout': 20, 'worker_idle_timeout': 40}, 'speculation': {'enabled': False, 'dry': False, 'interval': 5, 'threshold': '75%', 'min_task_runtime': 3, 'multiplier': 1.5, 'max_concurrent_run': 3}, 'subtask_cancel_timeout': 5, 'subtask_max_retries': 3, 'subtask_max_reschedules': 2}, 'metrics': {'backend': 'ray', 'port': 0}}
2022-03-24 10:59:42,970 INFO api.py:53 -- Finished initialize the metrics with backend ray
2022-03-24 10:59:42,970 INFO driver.py:34 -- Setup cluster with {'ray://ray-cluster-1648090782/0': {'CPU': 2}, 'ray://ray-cluster-1648090782/1': {'CPU': 2}}
2022-03-24 10:59:42,970 INFO driver.py:40 -- Creating placement group ray-cluster-1648090782 with bundles [{'CPU': 2}, {'CPU': 2}].
2022-03-24 10:59:43,852 INFO driver.py:55 -- Create placement group success.
2022-03-24 10:59:45,128 INFO backend.py:82 -- Submit create actor pool ClientActorHandle(44dff4e8c2ea47cdd02bb84609000000) took 1.2752630710601807 seconds.
2022-03-24 10:59:46,268 INFO backend.py:82 -- Submit create actor pool ClientActorHandle(9ee3d50e43948f0f784697b809000000) took 1.116509199142456 seconds.
2022-03-24 10:59:48,475 INFO backend.py:82 -- Submit create actor pool ClientActorHandle(01f40453e2be6ed5ff7204d409000000) took 2.1755218505859375 seconds.
2022-03-24 10:59:48,501 INFO backend.py:89 -- Start actor pool ClientActorHandle(44dff4e8c2ea47cdd02bb84609000000) took 3.352660894393921 seconds.
2022-03-24 10:59:48,501 INFO backend.py:89 -- Start actor pool ClientActorHandle(9ee3d50e43948f0f784697b809000000) took 2.2049944400787354 seconds.
2022-03-24 10:59:48,501 INFO ray.py:526 -- Create supervisor on node ray://ray-cluster-1648090782/0/0 succeeds.
2022-03-24 10:59:50,148 INFO ray.py:536 -- Start services on supervisor ray://ray-cluster-1648090782/0/0 succeeds.
2022-03-24 10:59:50,494 INFO backend.py:89 -- Start actor pool ClientActorHandle(01f40453e2be6ed5ff7204d409000000) took 1.9973196983337402 seconds.
2022-03-24 10:59:50,494 INFO ray.py:541 -- Create 2 workers succeeds.
2022-03-24 10:59:50,722 INFO ray.py:545 -- Start services on 2 workers succeeds.
(RaySubPool pid=15700, ip=172.16.210.21) 2022-03-24 10:59:50,720        ERROR serialization.py:311 -- __init__() missing 1 required positional argument: 'pid'
(RaySubPool pid=15700, ip=172.16.210.21) Traceback (most recent call last):
(RaySubPool pid=15700, ip=172.16.210.21)   File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/ray/serialization.py", line 309, in deserialize_objects
(RaySubPool pid=15700, ip=172.16.210.21)     obj = self._deserialize_object(data, metadata, object_ref)
(RaySubPool pid=15700, ip=172.16.210.21)   File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/oscar/backends/ray/communication.py", line 90, in _deserialize_object
(RaySubPool pid=15700, ip=172.16.210.21)     value = _ray_deserialize_object(self, data, metadata, object_ref)
(RaySubPool pid=15700, ip=172.16.210.21)   File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/ray/serialization.py", line 215, in _deserialize_object
(RaySubPool pid=15700, ip=172.16.210.21)     return self._deserialize_msgpack_data(data, metadata_fields)
(RaySubPool pid=15700, ip=172.16.210.21)   File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/ray/serialization.py", line 174, in _deserialize_msgpack_data
(RaySubPool pid=15700, ip=172.16.210.21)     python_objects = self._deserialize_pickle5_data(pickle5_data)
(RaySubPool pid=15700, ip=172.16.210.21)   File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/ray/serialization.py", line 164, in _deserialize_pickle5_data
(RaySubPool pid=15700, ip=172.16.210.21)     obj = pickle.loads(in_band)
(RaySubPool pid=15700, ip=172.16.210.21)   File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/lib/tblib/pickling_support.py", line 29, in unpickle_exception
(RaySubPool pid=15700, ip=172.16.210.21)     inst = func(*args)
(RaySubPool pid=15700, ip=172.16.210.21) TypeError: __init__() missing 1 required positional argument: 'pid'
2022-03-24 10:59:50,770 WARNING ray.py:556 -- Web service started at http://0.0.0.0:50749
(RaySubPool pid=3583) 2022-03-24 10:59:50,725   ERROR serialization.py:311 -- __init__() missing 1 required positional argument: 'pid'
(RaySubPool pid=3583) Traceback (most recent call last):
(RaySubPool pid=3583)   File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/ray/serialization.py", line 309, in deserialize_objects
(RaySubPool pid=3583)     obj = self._deserialize_object(data, metadata, object_ref)
(RaySubPool pid=3583)   File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/oscar/backends/ray/communication.py", line 90, in _deserialize_object
(RaySubPool pid=3583)     value = _ray_deserialize_object(self, data, metadata, object_ref)
(RaySubPool pid=3583)   File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/ray/serialization.py", line 215, in _deserialize_object
(RaySubPool pid=3583)     return self._deserialize_msgpack_data(data, metadata_fields)
(RaySubPool pid=3583)   File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/ray/serialization.py", line 174, in _deserialize_msgpack_data
(RaySubPool pid=3583)     python_objects = self._deserialize_pickle5_data(pickle5_data)
(RaySubPool pid=3583)   File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/ray/serialization.py", line 164, in _deserialize_pickle5_data
(RaySubPool pid=3583)     obj = pickle.loads(in_band)
(RaySubPool pid=3583)   File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/lib/tblib/pickling_support.py", line 29, in unpickle_exception
(RaySubPool pid=3583)     inst = func(*args)
(RaySubPool pid=3583) TypeError: __init__() missing 1 required positional argument: 'pid'
/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function load_boston is deprecated; `load_boston` is deprecated in 1.0 and will be removed in 1.2.

    The Boston housing prices dataset has an ethical problem. You can refer to
    the documentation of this function for further details.

    The scikit-learn maintainers therefore strongly discourage the use of this
    dataset unless the purpose of the code is to study and educate about
    ethical issues in data science and machine learning.

    In this special case, you can fetch the dataset from the original
    source::

        import pandas as pd
        import numpy as np


        data_url = "http://lib.stat.cmu.edu/datasets/boston"
        raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
        data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
        target = raw_df.values[1::2, 2]

    Alternative datasets include the California housing dataset (i.e.
    :func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
    dataset. You can load the datasets as follows::

        from sklearn.datasets import fetch_california_housing
        housing = fetch_california_housing()

    for the California housing dataset and::

        from sklearn.datasets import fetch_openml
        housing = fetch_openml(name="house_prices", as_frame=True)

    for the Ames housing dataset.

  warnings.warn(msg, category=FutureWarning)
data.head().execute()
2022-03-24 10:59:51,023 INFO session.py:979 -- Time consuming to generate a tileable graph is 0.0007078647613525391s with address ray://ray-cluster-1648090782/0/0, session id zLE6ibnXqYxfFNUiCEndgZaF
      CRIM    ZN  INDUS  CHAS    NOX     RM   AGE     DIS  RAD    TAX  PTRATIO       B  LSTAT
0  0.00632  18.0   2.31   0.0  0.538  6.575  65.2  4.0900  1.0  296.0     15.3  396.90   4.98
1  0.02731   0.0   7.07   0.0  0.469  6.421  78.9  4.9671  2.0  242.0     17.8  396.90   9.14
2  0.02729   0.0   7.07   0.0  0.469  7.185  61.1  4.9671  2.0  242.0     17.8  392.83   4.03
3  0.03237   0.0   2.18   0.0  0.458  6.998  45.8  6.0622  3.0  222.0     18.7  394.63   2.94
4  0.06905   0.0   2.18   0.0  0.458  7.147  54.2  6.0622  3.0  222.0     18.7  396.90   5.33
data.describe().execute()
2022-03-24 10:59:51,504 INFO session.py:979 -- Time consuming to generate a tileable graph is 0.0005688667297363281s with address ray://ray-cluster-1648090782/0/0, session id zLE6ibnXqYxfFNUiCEndgZaF
             CRIM          ZN       INDUS        CHAS         NOX          RM         AGE         DIS         RAD         TAX     PTRATIO           B       LSTAT
count  506.000000  506.000000  506.000000  506.000000  506.000000  506.000000  506.000000  506.000000  506.000000  506.000000  506.000000  506.000000  506.000000
mean     3.613524   11.363636   11.136779    0.069170    0.554695    6.284634   68.574901    3.795043    9.549407  408.237154   18.455534  356.674032   12.653063
std      8.601545   23.322453    6.860353    0.253994    0.115878    0.702617   28.148861    2.105710    8.707259  168.537116    2.164946   91.294864    7.141062
min      0.006320    0.000000    0.460000    0.000000    0.385000    3.561000    2.900000    1.129600    1.000000  187.000000   12.600000    0.320000    1.730000
25%      0.082045    0.000000    5.190000    0.000000    0.449000    5.885500   45.025000    2.100175    4.000000  279.000000   17.400000  375.377500    6.950000
50%      0.256510    0.000000    9.690000    0.000000    0.538000    6.208500   77.500000    3.207450    5.000000  330.000000   19.050000  391.440000   11.360000
75%      3.677083   12.500000   18.100000    0.000000    0.624000    6.623500   94.075000    5.188425   24.000000  666.000000   20.200000  396.225000   16.955000
max     88.976200  100.000000   27.740000    1.000000    0.871000    8.780000  100.000000   12.126500   24.000000  711.000000   22.000000  396.900000   37.970000
2022-03-24 10:59:51,992 INFO session.py:979 -- Time consuming to generate a tileable graph is 0.0019736289978027344s with address ray://ray-cluster-1648090782/0/0, session id zLE6ibnXqYxfFNUiCEndgZaF
after split X_train:          CRIM    ZN  INDUS  CHAS    NOX     RM   AGE     DIS   RAD    TAX  PTRATIO       B  LSTAT
191   0.06911  45.0   3.44   0.0  0.437  6.739  30.8  6.4798   5.0  398.0     15.2  389.71   4.69
380  88.97620   0.0  18.10   0.0  0.671  6.968  91.9  1.4165  24.0  666.0     20.2  396.90  17.21
337   0.03041   0.0   5.19   0.0  0.515  5.895  59.6  5.6150   5.0  224.0     20.2  394.81  10.56
266   0.78570  20.0   3.97   0.0  0.647  7.014  84.6  2.1329   5.0  264.0     13.0  384.07  14.79
221   0.40771   0.0   6.20   1.0  0.507  6.164  91.3  3.0480   8.0  307.0     17.4  395.24  21.46
..        ...   ...    ...   ...    ...    ...   ...     ...   ...    ...      ...     ...    ...
275   0.09604  40.0   6.41   0.0  0.447  6.854  42.8  4.2673   4.0  254.0     17.6  396.90   2.98
217   0.07013   0.0  13.89   0.0  0.550  6.642  85.1  3.4211   5.0  276.0     16.4  392.78   9.69
369   5.66998   0.0  18.10   1.0  0.631  6.683  96.8  1.3567  24.0  666.0     20.2  375.33   3.73
95    0.12204   0.0   2.89   0.0  0.445  6.625  57.8  3.4952   2.0  276.0     18.0  357.98   6.65
277   0.06127  40.0   6.41   1.0  0.447  6.826  27.6  4.8628   4.0  254.0     17.6  393.45   4.16

[354 rows x 13 columns]
train_dmatrix: DataFrame(op=ToDMatrix)
/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index
xg_reg.fit XGBRegressor()
2022-03-24 10:59:53,085 INFO session.py:979 -- Time consuming to generate a tileable graph is 0.0010030269622802734s with address ray://ray-cluster-1648090782/0/0, session id zLE6ibnXqYxfFNUiCEndgZaF
(RaySubPool pid=15805, ip=172.16.210.21) Exception in thread Thread-42:
(RaySubPool pid=15805, ip=172.16.210.21) Traceback (most recent call last):
(RaySubPool pid=15805, ip=172.16.210.21)   File "/home/ray/anaconda3/envs/ray/lib/python3.9/threading.py", line 973, in _bootstrap_inner
(RaySubPool pid=15805, ip=172.16.210.21)     self.run()
(RaySubPool pid=15805, ip=172.16.210.21)   File "/home/ray/anaconda3/envs/ray/lib/python3.9/threading.py", line 910, in run
(RaySubPool pid=15805, ip=172.16.210.21)     self._target(*self._args, **self._kwargs)
(RaySubPool pid=15805, ip=172.16.210.21)   File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/learn/contrib/xgboost/tracker.py", line 355, in join
(RaySubPool pid=15805, ip=172.16.210.21)     while self.thread.isAlive():
(RaySubPool pid=15805, ip=172.16.210.21) AttributeError: 'Thread' object has no attribute 'isAlive'
(RaySubPool pid=3583) [10:59:53] task NULL got new rank 0
2022-03-24 10:59:54,331 ERROR session.py:1822 -- Task exception was never retrieved
future: <Task finished name='Task-110' coro=<_wrap_awaitable() done, defined at /home/ray/anaconda3/envs/ray/lib/python3.9/asyncio/tasks.py:684> exception=TypeError("ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''")>
Traceback (most recent call last):
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/asyncio/tasks.py", line 691, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/deploy/oscar/session.py", line 106, in wait
    return await self._aio_task
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/deploy/oscar/session.py", line 950, in _run_in_background
    fetch_tileables = await self._task_api.get_fetch_tileables(task_id)
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/services/task/api/oscar.py", line 100, in get_fetch_tileables
    return await self._task_manager_ref.get_task_result_tileables(task_id)
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/oscar/backends/context.py", line 188, in send
    result = await self._wait(future, actor_ref.address, message)
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/oscar/backends/context.py", line 83, in _wait
    return await future
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/oscar/backends/context.py", line 74, in _wait
    await asyncio.shield(future)
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/oscar/backends/core.py", line 50, in _listen
    message: _MessageBase = await client.recv()
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/oscar/backends/communication/base.py", line 262, in recv
    return await self.channel.recv()
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/oscar/backends/ray/communication.py", line 209, in recv
    result = await object_ref
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/ray/util/client/server/server.py", line 375, in send_get_response
    serialized = dumps_from_server(result, client_id, self)
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/ray/util/client/server/server_pickler.py", line 114, in dumps_from_server
    sp.dump(obj)
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 620, in dump
    return Pickler.dump(self, obj)
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/oscar/backends/ray/communication.py", line 55, in __reduce__
    return _argwrapper_unpickler, (serialize(self.message),)
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/serialization/core.py", line 361, in serialize
    gen_to_serial = gen.send(last_serial)
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/core/base.py", line 140, in serialize
    return (yield from super().serialize(obj, context))
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/serialization/serializables/core.py", line 108, in serialize
    tag_to_values = self._get_tag_to_values(obj)
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/serialization/serializables/core.py", line 101, in _get_tag_to_values
    value = field.on_serialize(value)
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/utils.py", line 157, in on_serialize_nsplits
    new_nsplits.append(tuple(None if np.isnan(v) else v for v in dim_splits))
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/utils.py", line 157, in <genexpr>
    new_nsplits.append(tuple(None if np.isnan(v) else v for v in dim_splits))
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Traceback (most recent call last):
  File "/home/ray/test_mars_xgb.py", line 42, in <module>
    model = xg_reg.fit(X_train, y_train, session=session)
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/learn/contrib/xgboost/regressor.py", line 61, in fit
    result = train(
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/learn/contrib/xgboost/train.py", line 249, in train
    ret = t.execute(session=session, **run_kwargs).fetch(session=session)
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/core/entity/executable.py", line 98, in execute
    return execute(self, session=session, **kw)
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/deploy/oscar/session.py", line 1851, in execute
    return session.execute(
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/deploy/oscar/session.py", line 1647, in execute
    execution_info: ExecutionInfo = fut.result(
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/concurrent/futures/_base.py", line 445, in result
    return self.__get_result()
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/concurrent/futures/_base.py", line 390, in __get_result
    raise self._exception
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/deploy/oscar/session.py", line 1831, in _execute
    await execution_info
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/asyncio/tasks.py", line 691, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/deploy/oscar/session.py", line 106, in wait
    return await self._aio_task
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/deploy/oscar/session.py", line 950, in _run_in_background
    fetch_tileables = await self._task_api.get_fetch_tileables(task_id)
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/services/task/api/oscar.py", line 100, in get_fetch_tileables
    return await self._task_manager_ref.get_task_result_tileables(task_id)
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/oscar/backends/context.py", line 188, in send
    result = await self._wait(future, actor_ref.address, message)
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/oscar/backends/context.py", line 83, in _wait
    return await future
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/oscar/backends/context.py", line 74, in _wait
    await asyncio.shield(future)
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/oscar/backends/core.py", line 50, in _listen
    message: _MessageBase = await client.recv()
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/oscar/backends/communication/base.py", line 262, in recv
    return await self.channel.recv()
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/oscar/backends/ray/communication.py", line 209, in recv
    result = await object_ref
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/ray/util/client/server/server.py", line 375, in send_get_response
    serialized = dumps_from_server(result, client_id, self)
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/ray/util/client/server/server_pickler.py", line 114, in dumps_from_server
    sp.dump(obj)
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 620, in dump
    return Pickler.dump(self, obj)
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/oscar/backends/ray/communication.py", line 55, in __reduce__
    return _argwrapper_unpickler, (serialize(self.message),)
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/serialization/core.py", line 361, in serialize
    gen_to_serial = gen.send(last_serial)
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/core/base.py", line 140, in serialize
    return (yield from super().serialize(obj, context))
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/serialization/serializables/core.py", line 108, in serialize
    tag_to_values = self._get_tag_to_values(obj)
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/serialization/serializables/core.py", line 101, in _get_tag_to_values
    value = field.on_serialize(value)
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/utils.py", line 157, in on_serialize_nsplits
    new_nsplits.append(tuple(None if np.isnan(v) else v for v in dim_splits))
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/utils.py", line 157, in <genexpr>
    new_nsplits.append(tuple(None if np.isnan(v) else v for v in dim_splits))
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
(RaySubPool pid=3400) Main pool Actor(RayMainPool, 9ee3d50e43948f0f784697b809000000) has exited, exit current sub pool now.
(RaySubPool pid=3400) Traceback (most recent call last):
(RaySubPool pid=3400)   File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/oscar/backends/ray/pool.py", line 365, in check_main_pool_alive
(RaySubPool pid=3400)     main_pool_start_timestamp = await main_pool.alive.remote()
(RaySubPool pid=3400) ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
(RaySubPool pid=3400)   class_name: RayMainPool
(RaySubPool pid=3400)   actor_id: 9ee3d50e43948f0f784697b809000000
(RaySubPool pid=3400)   pid: 3514
(RaySubPool pid=3400)   name: ray://ray-cluster-1648090782/0/1
(RaySubPool pid=3400)   namespace: b7b70429-e17c-486f-9172-0872403ed6ef
(RaySubPool pid=3400)   ip: 172.16.210.22
(RaySubPool pid=3400) The actor is dead because because all references to the actor were removed.
A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff6f1ccaae6135c700f75befbe09000000 Worker ID: 707d6a3f910fa005ec33fe7ae60ddef5cfc1b9eb67510f1bc0f19623 Node ID: 7c54d788f2585a26ce8ef92e01f7e774359a4f0636b4bcfcb84272f7 Worker IP address: 172.16.210.21 Worker port: 10043 Worker PID: 15700
Exception ignored in: <function _TileableSession.__init__.<locals>.cb at 0x7efbd9a75160>
Traceback (most recent call last):
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/core/entity/executable.py", line 52, in cb
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/concurrent/futures/thread.py", line 156, in submit
AttributeError: __enter__
Exception ignored in: <function _TileableSession.__init__.<locals>.cb at 0x7efbd9a75dc0>
Traceback (most recent call last):
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/site-packages/mars/core/entity/executable.py", line 52, in cb
  File "/home/ray/anaconda3/envs/ray/lib/python3.9/concurrent/futures/thread.py", line 156, in submit
AttributeError: __enter__


  1. Minimized code to reproduce the error.

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

@qinxuye
Copy link
Collaborator

qinxuye commented Mar 24, 2022

Error is raised by mars/learn/contrib/xgboost/tracker.py which is copied from xgboost, we will update this file ASAP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants