-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REFACTOR-#1917: move part of reset_index code to backend #1920
Conversation
Signed-off-by: ienkovich <ilya.enkovich@intel.com>
Codecov Report
@@ Coverage Diff @@
## master #1920 +/- ##
==========================================
- Coverage 81.53% 81.50% -0.04%
==========================================
Files 79 79
Lines 9192 9193 +1
==========================================
- Hits 7495 7493 -2
- Misses 1697 1700 +3
Continue to review full report at Codecov.
|
level = kwargs.get("level", None) | ||
# TODO Implement level | ||
if level is not None or isinstance(self.index, pandas.MultiIndex): | ||
return self.default_to_pandas(pandas.DataFrame.reset_index, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we try to call reset_index
from query compiler that represents a Series
object, the exception will be arisen here. So, a condition should be presented here in depends on which DataFrame.reset_index
or Series.reset_index
will be called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean here. Why should we have an exception? Both Series
and DataFrame
are represented as DataFrame
in the engine, so we always call pandas.DataFrame.reset_index
. Both Series
and DataFrame
tests for reset_index
pass with this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need an exception here. Currently, all tests are passing because we call reset_index
from API layer (there are separate functions for Series
and for DataFrame
). However, when we add new functional into the query compiler layer and call reset_index
from there for compiler representing a Series
, the exception will be arisen from pandas. That is, I mean we will be able to call reset_index
from the query compiler representing DataFrame
only, because if we call reset_index
from the query compiler representing Series
, the parameters of reset_index
will be different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think query_compiler.to_pandas
needs to be modified a bit. An example:
import modin.pandas as pd
s = pd.Series([1,2,3])
type(s._query_compiler.to_pandas())
pandas.core.frame.DataFrame # I think, here should be `Series` object
A couple of changes that are required from my point of view.
The first one is here. return cls.concatenate(df_rows)
-> return cls.concatenate(df_rows[0].squeeze(axis=1))
The second one is here.
Even though we assume that a Series
is a DataFrame
with one column, what do you think on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Query compiler doesn't know whom it belongs to. Moreover, it can belong to both Series
and DataFrame
objects at the same time.
df = pd.DataFrame({"a": [1, 2,3]})
s = df.squeeze(axis=1)
assert df._query_compiler == s._query_compiler
You are right about reset_index
parameters for Series
though. It has the name
parameter, which should be processed in the query compiler and not passed to default_to_pandas
. Probably Series
misses a test for that case. I'll have a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, earlier there is a discussion on theme of query compilers and its awareness about Series
and DataFrame
. It was said that Series
is treated as DataFrame
with one column in the query compiler layer. Regarding to tests for Series.reset_index
, there is a separate method in the API layer, which doesn't fall down into the query compiler layer. That's why all tests are passing. We should always remember that we process objects as DataFrame
in the query compiler layer. So, if you have no concerns about this, I will approve the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't notice Series
has its own implementation of reset_index
. The current patch should be fine then. We will need to move Series
version of reset_index
to a backend too at some point, but this can wait.
…modin-project#1920) Signed-off-by: ienkovich <ilya.enkovich@intel.com>
What do these changes do?
Move part of set_index implementation to a backend to allow other backends support multi-index reset.
flake8 modin
black --check modin
git commit -s