-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ray Data] PythonObjectArray
missing methods causing serialization failures
#48737
base: master
Are you sure you want to change the base?
[Ray Data] PythonObjectArray
missing methods causing serialization failures
#48737
Conversation
Signed-off-by: akavalar <akavalar@ucla.edu>
This is great! Think the failure is unrelated. Could you add some tests as well? |
Hey, I'd like to write some tests for this PR! |
@xingyu-long I started working on them but then stopped when I saw the Btw pandas has an entire section on how to test extension arrays. The original PR also added a small number of tests here. (forgot: let me know if you need help!) |
@akavalar thanks for quick response! Yeah I will work on it, will post some updates tomorrow. |
def test_pandas_isna():
arr = np.array([1, np.nan, 3, 4, 5, np.nan, 7, 8, 9], dtype=object)
ta = PythonObjectArray(arr)
np.testing.assert_array_equal(ta.isna(), pd.isna(arr))
def test_pandas_take():
arr = np.array([1, 2, 3, 4, 5], dtype=object)
ta = PythonObjectArray(arr)
indices = [1, 2, 3]
np.testing.assert_array_equal(ta.take(indices).to_numpy(), arr[indices])
indices = [1, 2, -1]
np.testing.assert_array_equal(
ta.take(indices, allow_fill=True, fill_value=100).to_numpy(),
np.array([2, 3, 100]),
)
def test_pandas_concat():
arr1 = np.array([1, 2, 3, 4, 5], dtype=object)
arr2 = np.array([6, 7, 8, 9, 10], dtype=object)
ta1 = PythonObjectArray(arr1)
ta2 = PythonObjectArray(arr2)
concat_arr = PythonObjectArray._concat_same_type([ta1, ta2])
assert len(concat_arr) == arr1.shape[0] + arr2.shape[0]
np.testing.assert_array_equal(concat_arr.to_numpy(), np.concatenate([arr1, arr2])) ^ I came up with these tests. maybe you can add them or have something similar into your PR @akavalar cc @richardliaw |
@xingyu-long thanks a lot! will have a look later today |
hey @akavalar - did you have a chance to take a look at the tests? |
hey guys, I had to pivot to something else at work and haven't yet found the time for this - should be able to do it today though. thanks for your patience! |
PythonObjectArray
is a subclass ofExtensionArray
, but as such it is missing some of the mandatory methods described here, specificallyisna
,take
,copy
, and_concat_same_type
(the last methodinterpolate
does not seem to be required). Such storage of Python objects in Arrow blocks was added to Ray Data in #45272 and was released as part of Ray 2.33.Custom Python objects currently cannot be wrapped/serialized due to
pandas.errors.AbstractMethodError: This method must be defined in the concrete class PythonObjectArray
failures, as shown by the contrived example here: #48748This PR adds the missing methods.
Why are these changes needed?
See above.
Related issue number
N/A
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.