-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Enable cloud checkpointing. #47682
[RLlib] Enable cloud checkpointing. #47682
Commits on Sep 13, 2024
-
Interchanged local filesystem with PyArrow filesystem to be able to s…
…tore to any PyArrow filesystem, i.e. epsecially GCS/S3/ABS/NFS. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for dd040de - Browse repository at this point
Copy the full SHA dd040deView commit details -
Interchanged local filesystem with PyArrow filesystem to be able to r…
…estore from any PyArrow filesystem, i.e. epsecially GCS/S3/ABS/NFS. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 1ce7acf - Browse repository at this point
Copy the full SHA 1ce7acfView commit details
Commits on Sep 16, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 0c51f7e - Browse repository at this point
Copy the full SHA 0c51f7eView commit details -
Added filesystem to all subcomponent calls and added conversion to st…
…ring paths before using PyArrow's filesystem detector. Furthermpore, added docstrings. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 6416f08 - Browse repository at this point
Copy the full SHA 6416f08View commit details -
Added pyarrow FileSystem to 'from_checkpoint' and 'get_checkpoint_info'.
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 2367b56 - Browse repository at this point
Copy the full SHA 2367b56View commit details -
Added suggestions from @sven1977's review and fixed a small path error.
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 147f1ea - Browse repository at this point
Copy the full SHA 147f1eaView commit details -
Fixed a unit test in 'checkpoint_utils'.
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 39b5619 - Browse repository at this point
Copy the full SHA 39b5619View commit details
Commits on Sep 17, 2024
-
Fixed bug in doctests of 'rllib-learner.rst'.
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 8013d7b - Browse repository at this point
Copy the full SHA 8013d7bView commit details -
[spark] Refine comment in Starting ray worker spark task (ray-project…
…#47670) Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 9fa4ed9 - Browse repository at this point
Copy the full SHA 9fa4ed9View commit details -
[Data] Add
SERVICE_UNAVAILABLE
to list of retried transient errors (r……ay-project#47673) While reading or writing files with Ray Data, S3 might raise a transient SERVICE_UNAVAILABLE error. This PR adds the error to the list of retried transient errors. Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Configuration menu - View commit details
-
Copy full SHA for 8a4fe7a - Browse repository at this point
Copy the full SHA 8a4fe7aView commit details -
[Data] Fix bug where Ray Data incorrectly emits progress bar warning (r…
…ay-project#47680) Fixes ray-project#47679 --------- Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Configuration menu - View commit details
-
Copy full SHA for 05fd902 - Browse repository at this point
Copy the full SHA 05fd902View commit details -
[serve] Additional metadata and context (ray-project#47652)
## Why are these changes needed? Add some additional items to replica metadata and request context. --------- Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 3efd47f - Browse repository at this point
Copy the full SHA 3efd47fView commit details -
[Core][aDAG] Set buffer size to 1 for regression (ray-project#47639)
There's a regression with buffer size 10. I am going to investigate but I will revert it to buffer size 1 for now until further investigation. With buffer size 1, regression seems to be gone https://buildkite.com/ray-project/release/builds/22594#0191ed4b-5477-45ff-be9e-6e098b5fbb3c. probably some sort of contention or sth like that
Configuration menu - View commit details
-
Copy full SHA for 2216f2d - Browse repository at this point
Copy the full SHA 2216f2dView commit details -
[core][aDAG] Fix microbenchmark regression adag 2 (ray-project#47683)
After multi ref PR, we cannot just do await on returned value when it is multi ref output
Configuration menu - View commit details
-
Copy full SHA for 3929ce6 - Browse repository at this point
Copy the full SHA 3929ce6View commit details -
Configuration menu - View commit details
-
Copy full SHA for b52a38f - Browse repository at this point
Copy the full SHA b52a38fView commit details -
Add perf metrics for 2.36.0 (ray-project#47574)
``` REGRESSION 12.66%: single_client_get_object_containing_10k_refs (THROUGHPUT) regresses from 13.204885454613315 to 11.533423619760748 in microbenchmark.json REGRESSION 9.50%: client__1_1_actor_calls_sync (THROUGHPUT) regresses from 523.3469473257671 to 473.62862729568997 in microbenchmark.json REGRESSION 6.76%: multi_client_put_gigabytes (THROUGHPUT) regresses from 45.440179854469804 to 42.368678421213005 in microbenchmark.json REGRESSION 4.92%: 1_n_actor_calls_async (THROUGHPUT) regresses from 8803.178389859915 to 8370.014425096557 in microbenchmark.json REGRESSION 3.89%: n_n_actor_calls_with_arg_async (THROUGHPUT) regresses from 2748.863962184806 to 2641.837605625889 in microbenchmark.json REGRESSION 3.45%: client__1_1_actor_calls_async (THROUGHPUT) regresses from 1019.3028285821217 to 984.156036006501 in microbenchmark.json REGRESSION 3.06%: client__1_1_actor_calls_concurrent (THROUGHPUT) regresses from 1007.6444648899972 to 976.8103650114274 in microbenchmark.json REGRESSION 0.65%: placement_group_create/removal (THROUGHPUT) regresses from 805.1759941825478 to 799.9345402492929 in microbenchmark.json REGRESSION 0.33%: single_client_put_calls_Plasma_Store (THROUGHPUT) regresses from 5273.203424794718 to 5255.898134426729 in microbenchmark.json REGRESSION 0.02%: 1_1_actor_calls_async (THROUGHPUT) regresses from 9012.880467992636 to 9011.034048587637 in microbenchmark.json REGRESSION 0.01%: client__put_gigabytes (THROUGHPUT) regresses from 0.13947664668408546 to 0.13945791828216536 in microbenchmark.json REGRESSION 0.00%: client__put_calls (THROUGHPUT) regresses from 806.1974515278531 to 806.172478450918 in microbenchmark.json REGRESSION 70.55%: dashboard_p50_latency_ms (LATENCY) regresses from 104.211 to 177.731 in benchmarks/many_actors.json REGRESSION 13.13%: time_to_broadcast_1073741824_bytes_to_50_nodes (LATENCY) regresses from 18.961532712000007 to 21.451945214000006 in scalability/object_store.json REGRESSION 4.50%: 3000_returns_time (LATENCY) regresses from 5.680022101000006 to 5.935367576000004 in scalability/single_node.json REGRESSION 3.96%: avg_iteration_time (LATENCY) regresses from 0.9740754842758179 to 1.012664566040039 in stress_tests/stress_test_dead_actors.json REGRESSION 2.75%: stage_2_avg_iteration_time (LATENCY) regresses from 63.694758081436156 to 65.44879236221314 in stress_tests/stress_test_many_tasks.json REGRESSION 1.66%: 10000_args_time (LATENCY) regresses from 17.328640389999997 to 17.61703060299999 in scalability/single_node.json REGRESSION 1.40%: stage_4_spread (LATENCY) regresses from 0.45063567085147194 to 0.4569625792772166 in stress_tests/stress_test_many_tasks.json REGRESSION 0.69%: dashboard_p50_latency_ms (LATENCY) regresses from 3.347 to 3.37 in benchmarks/many_pgs.json REGRESSION 0.19%: 10000_get_time (LATENCY) regresses from 23.896780481999997 to 23.942006032999984 in scalability/single_node.json ``` Signed-off-by: kevin <kevin@anyscale.com>
Configuration menu - View commit details
-
Copy full SHA for d5f1a01 - Browse repository at this point
Copy the full SHA d5f1a01View commit details -
[Core] Added spaces to disallowed char for working dir (ray-project#4…
…6767) Signed-off-by: prithvi-mac <mprithvi08@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 05c866c - Browse repository at this point
Copy the full SHA 05c866cView commit details -
Indented code in docs as CI tests were raising an error.
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 75ddea8 - Browse repository at this point
Copy the full SHA 75ddea8View commit details
Commits on Sep 25, 2024
-
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 7d35bf7 - Browse repository at this point
Copy the full SHA 7d35bf7View commit details -
Removed indentation in 'rllib-learner.rst'.
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 306df5b - Browse repository at this point
Copy the full SHA 306df5bView commit details