Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): add describe method to compute summary stats of table expressions #8739

Merged
merged 15 commits into from
Apr 15, 2024

Conversation

jitingxu1
Copy link
Contributor

@jitingxu1 jitingxu1 commented Mar 22, 2024

Description of changes

Add summary stats for a table. Only calculate stats for numeric, string, and bool columns.

  • name
  • type
  • count (counts and nulls)

For numeric columns:

  • min
  • mean
  • std
  • percentile
  • max

For category columns:

  • mode
  • unique

For Boolean columns:

  • mean

My questions:

  • Not sure if it is a good practice to use execute() here. I have another way to implement this, but have to handle the null columns.
  • Is unit test necessary for this, noticed we do not have tests for info()

Issues closed

Resolves #8459

@jitingxu1 jitingxu1 added the duckdb The DuckDB backend label Mar 22, 2024
ibis/expr/types/relations.py Outdated Show resolved Hide resolved
ibis/expr/types/relations.py Outdated Show resolved Hide resolved
@jitingxu1 jitingxu1 requested a review from cpcloud March 22, 2024 16:47
@jitingxu1
Copy link
Contributor Author

@cpcloud soft ping for the second review after rewriting the implementation. Thanks a lot for your feedback and time.

ibis/expr/types/relations.py Outdated Show resolved Hide resolved
ibis/expr/types/relations.py Outdated Show resolved Hide resolved
ibis/expr/types/relations.py Outdated Show resolved Hide resolved
@cpcloud cpcloud added this to the 9.0 milestone Mar 26, 2024
@cpcloud cpcloud added ux User experience related issues and removed duckdb The DuckDB backend labels Mar 26, 2024
@jitingxu1 jitingxu1 requested a review from cpcloud March 27, 2024 02:18
Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for this somewhere? Ideally wherever we test the info method (I think this is in test_generic.py).

@cpcloud cpcloud removed this from the 9.0 milestone Apr 1, 2024
@jitingxu1
Copy link
Contributor Author

Can you add a test for this somewhere? Ideally wherever we test the info method (I think this is in test_generic.py).

Changes made:

  • added unit test.
  • Stats calculation now exclusively targets numeric, string, and boolean columns, disregarding others.

TODOs:
Certain backends encounter issues due to inadequate support for functionalities like quantile, mode, or standdev.

  • The following backends need to implement mode to facilitate the full functionality of describe:
    clickhouse, pyspark, clickhouse, risingwave, impala, oracle
  • Both mode and StandardDev are required for these backends: exasol, druid

Other backends require both mode and quantile: "datafusion", "impala", "trino", "mysql", "mssql", "trino", "flink"

@jitingxu1 jitingxu1 requested a review from cpcloud April 1, 2024 22:34
Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'll rebase it and merge on green.

@cpcloud cpcloud force-pushed the add-table-summary-stats branch from dadd142 to 49d966f Compare April 15, 2024 16:28
@cpcloud cpcloud added the feature Features or general enhancements label Apr 15, 2024
@cpcloud cpcloud changed the title feat: add summary stats of table feat(api): add summary stats of table Apr 15, 2024
@cpcloud cpcloud added this to the 9.0 milestone Apr 15, 2024
@cpcloud cpcloud changed the title feat(api): add summary stats of table feat(api): add describe method to compute summary stats of table expressions Apr 15, 2024
@cpcloud cpcloud merged commit c8d98a1 into ibis-project:main Apr 15, 2024
89 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements ux User experience related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: Generate descriptive statistics for a ibis table
2 participants