Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX-#4564: Register custom serializer to avoid Ray race condition #4568

Closed

Conversation

devin-petersohn
Copy link
Collaborator

@devin-petersohn devin-petersohn commented Jun 10, 2022

Signed-off-by: Devin Petersohn devin.petersohn@gmail.com

What do these changes do?

  • commit message follows format outlined here
  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves BUG: ray 1.13 breaks CI, including test_binary and test_pickle, with circular import #4564
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date
  • added (Issue Number: PR title (PR Number)) and github username to release notes for next major release

@codecov
Copy link

codecov bot commented Jun 10, 2022

Codecov Report

Merging #4568 (13f964b) into master (d5ff94e) will increase coverage by 19.30%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           master    #4568       +/-   ##
===========================================
+ Coverage   70.08%   89.38%   +19.30%     
===========================================
  Files         228      229        +1     
  Lines       18438    18714      +276     
===========================================
+ Hits        12922    16728     +3806     
+ Misses       5516     1986     -3530     
Impacted Files Coverage Δ
modin/core/execution/ray/common/utils.py 96.87% <100.00%> (+6.25%) ⬆️
modin/experimental/batch/test/test_pipeline.py 100.00% <0.00%> (ø)
modin/pandas/series.py 94.00% <0.00%> (+0.23%) ⬆️
modin/pandas/dataframe.py 91.47% <0.00%> (+0.64%) ⬆️
modin/pandas/groupby.py 93.68% <0.00%> (+0.72%) ⬆️
modin/pandas/base.py 94.81% <0.00%> (+0.87%) ⬆️
...mentations/pandas_on_ray/partitioning/partition.py 93.57% <0.00%> (+1.83%) ⬆️
modin/config/envvars.py 86.63% <0.00%> (+3.46%) ⬆️
modin/core/execution/dask/common/engine_wrapper.py 100.00% <0.00%> (+5.55%) ⬆️
... and 71 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

RehanSD
RehanSD previously approved these changes Jun 10, 2022
Copy link
Collaborator

@RehanSD RehanSD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Pleased to release notes!

@lgtm-com
Copy link

lgtm-com bot commented Jun 10, 2022

This pull request introduces 1 alert when merging f2031ed00a1f5b660a185cd1d972b5253a0cb9ab into d5ff94e - view on LGTM.com

new alerts:

  • 1 for Module is imported more than once

Copy link
Collaborator

@YarShev YarShev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the PR title, link an issue to be resolved and add a release note with your GH nickname.

modin/core/execution/ray/common/utils.py Show resolved Hide resolved
…e condition (modin-project#4568)

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>
Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>
Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>
Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>
@devin-petersohn devin-petersohn changed the title FIX-#0000: Sleep for 1 second to avoid Ray's race condition FIX-#4564: Register customer serializer to avoid Ray race condition Jun 10, 2022
@mvashishtha
Copy link
Collaborator

@devin-petersohn You have "customer" instead of "custom" in the PR title and first commit description.

mvashishtha
mvashishtha previously approved these changes Jun 10, 2022
Copy link
Collaborator

@mvashishtha mvashishtha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks!

@devin-petersohn devin-petersohn changed the title FIX-#4564: Register customer serializer to avoid Ray race condition FIX-#4564: Register custom serializer to avoid Ray race condition Jun 10, 2022
Copy link
Collaborator

@RehanSD RehanSD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! We should conduct a perf check to see how pickle compares to Ray's serializer.

@devin-petersohn
Copy link
Collaborator Author

@devin-petersohn You have "customer" instead of "custom" in the PR title and first commit description.

🤣 fixed

@mvashishtha
Copy link
Collaborator

@devin-petersohn "customer" is still in the first commit description. It's fine by me to leave it that way.

Copy link
Collaborator

@RehanSD RehanSD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add your name to the contributors in the release notes!

Co-authored-by: Yaroslav Igoshev <Poolliver868@mail.ru>
Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>
Copy link
Collaborator

@RehanSD RehanSD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@devin-petersohn
Copy link
Collaborator Author

@modin-project/modin-ray is this the recommended way of fixing this? It seems like a bad idea but it is the only way we could use the latest ray release without a race condition

Copy link
Collaborator

@vnlitvinov vnlitvinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though I would like to find a better way... I wish it existed 😞

@vnlitvinov
Copy link
Collaborator

@devin-petersohn a good point here:

LGTM! We should conduct a perf check to see how pickle compares to Ray's serializer.

Could someone please check some benchmarks?

@NickCrews
Copy link
Contributor

Drive-by-ing here, IDK if you have a good way of running benchmarks, but perhaps the github-actions bot that we wrote for the dedupe library could be useful to you. Take a look at this comment to see how it is used. Makes it really easy for us to test performance changes on a PR.

@devin-petersohn
Copy link
Collaborator Author

Thanks @NickCrews, we do have asv running on every pulled-in commit: https://modin.org/modin-bench/#/

Having it run as a job for pull requests would be nice, with our current setup we can't do that. I will definitely take a look!


@vnlitvinov I have run some performance benchmark numbers and they don't look good at all. On small-moderate datasets we see 2x performance degradation across the board. Extra serialization + storage costs 200ms for every 50MB serialized, and the memory usage is higher across the board.

I think we need to pin Ray<1.13, as much as I hate doing that. I cannot justify the increased performance penalty.

@devin-petersohn devin-petersohn marked this pull request as draft June 13, 2022 14:34
@devin-petersohn
Copy link
Collaborator Author

I have converted this pull request to a draft, hoping we can get some response from @modin-project/modin-ray .

Linking to the ray issue I created here: ray-project/ray#25675

@devin-petersohn
Copy link
Collaborator Author

Closing because we chose to go a different direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: ray 1.13 breaks CI, including test_binary and test_pickle, with circular import
7 participants