-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[data/preprocessors] concatenator should preserve order of concatenated #47997
[data/preprocessors] concatenator should preserve order of concatenated #47997
Conversation
531a467
to
b8cf5d6
Compare
2f701df
to
071ad2b
Compare
columns: A list of columns to concatenate. The order will define the order | ||
of the concatenated columns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
columns: A list of columns to concatenate. The order will define the order | |
of the concatenated columns. | |
columns: A list of columns to concatenate. The provided order of the columns | |
will be retained during concatenation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also - should we just keep it as include
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok with keeping it, but just want to point out that it would be still a breaking change, and that other preporcessors use columns
to define the columns that the preprocessor would be apply to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be breaking in particular?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be a required argument right now, if non is provided it will fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline - this makes sense and columns is more consistent with other preprocessors as well.
071ad2b
to
191af73
Compare
88acac8
to
a5ec10e
Compare
351b643
to
a88e22a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall, just a couple small nits/docs requests
@@ -20,6 +20,8 @@ class Concatenator(Preprocessor): | |||
:class:`~ray.air.util.tensor_extensions.pandas.TensorArrayElement` objects of | |||
shape :math:`(m,)`, where :math:`m` is the number of columns concatenated. | |||
The :math:`m` concatenated columns are dropped after concatenation. | |||
The preprocessor preserves the order of the columns provided in the ``colummns`` | |||
argument and will use that order when on transform, and transform_batch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
argument and will use that order when on transform, and transform_batch. | |
argument and will use that order when calling ``transform()`` and ``transform_batch()``. |
@@ -20,6 +20,8 @@ class Concatenator(Preprocessor): | |||
:class:`~ray.air.util.tensor_extensions.pandas.TensorArrayElement` objects of | |||
shape :math:`(m,)`, where :math:`m` is the number of columns concatenated. | |||
The :math:`m` concatenated columns are dropped after concatenation. | |||
The preprocessor preserves the order of the columns provided in the ``colummns`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i cannot add a comment to the docstring above this line because it wasn't modified in this PR. but can we also modify this line, to indicate that only columns specified in the columns
arg will be included?
Combine numeric columns into a column of type
:class:`~ray.air.util.tensor_extensions.pandas.TensorDtype`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
a88e22a
to
ddb92f4
Compare
@richardliaw @scottjlee adressed all your comments! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one small typo
Signed-off-by: Martin Bomio <martinbomio@spotify.com>
206fc2c
to
fe8952d
Compare
@richardliaw @scottjlee do you mind merging? I don't think I can |
…ed (ray-project#47997) Make concatenator preserve order. This is a breaking change, since the `Concatenator` since now the expectation is that the list of columns to include is specify always, and that list will define the order of the columns in the output column. ## Why are these changes needed? See issue ray-project#47996 ## Related issue number Closes ray-project#47996 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: Martin Bomio <martinbomio@spotify.com>
…ed (ray-project#47997) Make concatenator preserve order. This is a breaking change, since the `Concatenator` since now the expectation is that the list of columns to include is specify always, and that list will define the order of the columns in the output column. ## Why are these changes needed? See issue ray-project#47996 ## Related issue number Closes ray-project#47996 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: Martin Bomio <martinbomio@spotify.com>
…ed (ray-project#47997) Make concatenator preserve order. This is a breaking change, since the `Concatenator` since now the expectation is that the list of columns to include is specify always, and that list will define the order of the columns in the output column. ## Why are these changes needed? See issue ray-project#47996 ## Related issue number Closes ray-project#47996 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: Martin Bomio <martinbomio@spotify.com>
…ed (ray-project#47997) Make concatenator preserve order. This is a breaking change, since the `Concatenator` since now the expectation is that the list of columns to include is specify always, and that list will define the order of the columns in the output column. ## Why are these changes needed? See issue ray-project#47996 ## Related issue number Closes ray-project#47996 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: Martin Bomio <martinbomio@spotify.com> Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
Make concatenator preserve order.
This is a breaking change, since the
Concatenator
since now the expectation is that the list of columns to include is specify always, and that list will define the order of the columns in the output column.Why are these changes needed?
See issue #47996
Related issue number
Closes #47996
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.