-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX-#7250: Revert "PERF-#6666: Avoid internal reset_index for left merge" #7251
Conversation
np.random.uniform(0, 100, size=(2**6, 2**6)), | ||
np.random.uniform(0, 100, size=(2**7, 2**6)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the left merge can change the shape of the result if the merge occurs using values that are in both the left and right operands. However, in our testing we used float numbers, which were not considered identical when compared and the problem was not visible.
de95c5c
to
acd7ed2
Compare
…al reset_index for left merge" Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
@@ -1994,8 +1994,6 @@ def sort_rows(self, columns, ascending, ignore_index, na_position): | |||
drop_index_cols_after = [ | |||
col for col in base._index_cols if col in columns | |||
] | |||
if not drop_index_cols_after: | |||
drop_index_cols_after = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable cannot be None, since further on in the code we are trying to iterate over it.
@@ -296,7 +298,9 @@ def test_merge(test_data, test_data2): | |||
right_on="key", | |||
sort=sorts[j], | |||
) | |||
sort_if_range_partitioning(modin_result, pandas_result) | |||
sort_if_range_partitioning( | |||
modin_result, pandas_result, force=StorageFormat.get() == "Hdk" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like HDK
doesn't preserve the same order as pandas. I don’t create an issue for this, since the engine is deprecated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to note, Modin on Ray/Dask/MPI may also have mismatched order - #2246.
What do these changes do?
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
merge
failed with message:ValueError: Length mismatch: Expected axis has 36 elements, new values have 32 elements
#7250docs/development/architecture.rst
is up-to-date