-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Reimplement of fix memory pandas #48970
Conversation
sampled_indices = np.random.choice( | ||
total_size, sample_size, replace=False | ||
) | ||
sampled_data = sampled_column[sampled_indices] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's see if the tests work -- i remember last time i had to implement def take
in PythonObjectArray, but maybe you get around it more easily
relevant tests failing it looks like |
It can pass this time but two tests are retried. In fact these two test has no problem locally so I am not sure it's related to the memory problem. i.e. when run full test there will OOM but retry can succeed. Let me try to make it stable first. But currently I fixed many test by checking numeric tensor or numpy. I think in these case it's safe to use nbytes will object string need future check, |
|
||
# TensorDtype for ray.air.util.tensor_extensions.pandas.TensorDtype | ||
object_need_check = (TensorDtype,) | ||
min_sample_size = _PANDAS_SIZE_BYTES_MIN_COUNT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like it should be max_sample_size
? as in "max number of items to sample"
Co-authored-by: Hao Chen <chenh1024@gmail.com> Signed-off-by: zhilong <121425509+Bye-legumes@users.noreply.github.com>
@richardliaw @raulchen I think it OK now. |
Signed-off-by: Connor Sanders <connor@elastiflow.com>
Signed-off-by: hjiang <dentinyhao@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
Why are these changes needed?
fix of #46939 with sampling to reduce the overhead.
Key improvement:
nbytes
can be use directly.Related issue number
#46785
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.