Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: prepare dataframe info for series info #37868

Merged
merged 7 commits into from
Nov 24, 2020

Conversation

ivanovmg
Copy link
Member

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Precursor for #37320

Refactor pandas/io/formats/info.py to simplify further introduction of Series.info.
Basically what is done here:

  • Move docstring to BaseInfo from DataFrameInfo - supposedly will be shared with SeriesInfo
  • Move col_count, ids from BaseInfo to DataFrameInfo (will not be relevant to SeriesInfo)
  • Move memory_usage_bytes to DataFrameInfo (will have different implementation for SeriesInfo)
  • Extract InfoPrinterAbstract. Concrete classes will differ by the _create_table_builder
  • Move common methods and properties from DataFrameTableBuilder to TableBuilderAbstract
  • Extract TableBuilderVerboseMixin from DataFrameTableBuilderVerbose as the generator functions there will be used for verbose series info builders

I suggest moving the substitution decorator for the docstring from pandas/core/frame.py to pandas/io/formats/info.py later on in a separate PR.

def ids(self) -> Index:
"""Column names or index names."""
def dtypes(self) -> Iterable[Dtype]:
"""Dtypes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline after triple-quote

if there's nothing useful to write, leave it blank

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed that.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine.


Parameters
----------
data : FrameOrSeries
data : FrameOrSeriesUnion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonjayhawkins is this right here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable type annotation for data is correct if this doesn't need to be a Generic class.

FrameOrSeries is a typevar and using a typevar for a variable type annotation gives error: Type variable "pandas._typing.FrameOrSeries" is unbound [valid-type]

However, for the docstring, I think it is still preferred not to use Type notation, even for internal docstrings and follow the documentation contributing guide.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed that.

@jreback jreback added Clean Output-Formatting __repr__ of pandas objects, to_string labels Nov 17, 2020
@jreback jreback added this to the 1.2 milestone Nov 17, 2020
Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ivanovmg. lgtm. one comment (can be done in the follow-up)

@abstractmethod
def ids(self) -> Index:
"""Column names or index names."""
def dtypes(self) -> Iterable[Dtype]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abstractmethod?

Copy link
Member Author

@ivanovmg ivanovmg Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I guess abstractmethod should be there.
Fixed.

@ivanovmg
Copy link
Member Author

Got some irrelevant error in CI Check.

mypy --version
mypy 0.782
Performing static analysis using mypy
pandas/core/indexes/datetimelike.py:776: error: Argument 1 to "_simple_new" of "DatetimeIndexOpsMixin" has incompatible type "Union[ExtensionArray, Any]"; expected "Union[DatetimeArray, TimedeltaArray, PeriodArray]"  [arg-type]
Found 1 error in 1 file (checked 1119 source files)

@simonjayhawkins
Copy link
Member

Got some irrelevant error in CI Check.

fixed on master

):
self.data = data
self.memory_usage = _initialize_memory_usage(memory_usage)
data: FrameOrSeriesUnion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we using FrameOrSeries? cc @simonjayhawkins

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -2523,16 +2523,25 @@ def to_html(
@Substitution(
klass="DataFrame",
type_sub=" and columns",
max_cols_sub=(
"""max_cols : int, optional
max_cols_sub=dedent(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do a followup where you switch to using @doc before the Series.info change

@jreback jreback merged commit db0cd36 into pandas-dev:master Nov 24, 2020
@ivanovmg ivanovmg deleted the refactor/dataframe-info branch December 3, 2020 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants