Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report physical input data size in EXPLAIN ANALYZE #14907

Merged
merged 1 commit into from
Nov 10, 2022

Conversation

Dith3r
Copy link
Member

@Dith3r Dith3r commented Nov 4, 2022

Description

Additional information about physical data size used by connector to EXPLAIN ANALYZE in connector metrics section.

Example output.

     └─ TableScan[table = iceberg:part.ztest$data@1581575691215353074]
            Layout: [id:bigint, small:varchar, medium:varchar, big:varchar]
            Estimates: {rows: 483 (125.00kB), cpu: 125.00k, memory: 0B, network: 0B}
            CPU: 1.32s (62.71%), Scheduled: 3.59s (76.56%), Blocked: 0.00ns (0.00%), Output: 87 rows (1.36GB)
            Input avg.: 0.97 rows, Input std.dev.: 18.57%
            small := 2:small:varchar
            big := 4:big:varchar
            id := 1:id:bigint
            medium := 3:medium:varchar
            Physical Input: 932.06kB

Non-technical explanation

Additional information about physical data size used by connector to EXPLAIN ANALYZE.

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# General
* Add the amount of data read from an external source during table scan to EXPLAIN ANALYZE. ({issue}`14907`)

@cla-bot cla-bot bot added the cla-signed label Nov 4, 2022
@Dith3r Dith3r requested review from sopel39 and lukasz-stec November 4, 2022 14:03
@Dith3r Dith3r force-pushed the feature/physical-size branch from 9937be9 to cdd2c4b Compare November 7, 2022 12:25
@Dith3r Dith3r requested a review from lukasz-stec November 7, 2022 12:25
@Dith3r Dith3r force-pushed the feature/physical-size branch from cdd2c4b to 4cbdb82 Compare November 7, 2022 12:36
Copy link
Member

@lukasz-stec lukasz-stec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@raunaqmorarka
Copy link
Member

I think the approach here should be similar to #10472
Also, please add example of how the EXPLAIN output changed to the description

@lukasz-stec
Copy link
Member

I think the approach here should be similar to #10472

I think we want "physical input data size" displayed even if verbose=false and connector metrics are only displayed if verbose=true

@Dith3r
Copy link
Member Author

Dith3r commented Nov 8, 2022

@raunaqmorarka As @lukasz-stec wrote, connector metrics are displayed only for verbose output, whereas we want to display physical input data metric for explain analyze only. Updated description.

@Dith3r Dith3r force-pushed the feature/physical-size branch from 4cbdb82 to 3fc3ca9 Compare November 8, 2022 12:02
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test (see BaseJdbcConnectorTest#testExplainAnalyzePhysicalReadWallTime)

@Dith3r Dith3r force-pushed the feature/physical-size branch 3 times, most recently from 78d3fc4 to 8802812 Compare November 9, 2022 09:01
@Dith3r Dith3r force-pushed the feature/physical-size branch from 8802812 to 084b0c4 Compare November 9, 2022 09:24
@Dith3r Dith3r force-pushed the feature/physical-size branch from 084b0c4 to 81814fe Compare November 9, 2022 09:45
@Dith3r Dith3r requested a review from raunaqmorarka November 9, 2022 11:10
Example output:
- ScanFilterProject[table = hive:sf1:orders, filterPredicate = ("orderdate" > DATE '1995-01-01')]
     Layout: [clerk:varchar(15), $hashvalue_2:bigint]
     Estimates: {rows: 1500000 (41.48MB), cpu: 35.76M, memory: 0B, network: 0B}/{rows: 816424 (22.58MB), cpu: 35.76M, memory: 0B, network: 0B}/{rows: 816424 (22.58MB), cpu: 22.58M, memory: 0B, network: 0B}
     CPU: 180.00ms (78.95%), Scheduled: 298.00ms (71.46%), Blocked: 0.00ns (0.00%), Output: 818058 rows (12.98MB)
     Input avg.: 1500000.00 rows, Input std.dev.: 0.00%
     $hashvalue_2 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("clerk"), 0))
     clerk := clerk:varchar(15):REGULAR
     orderdate := orderdate:date:REGULAR
     Input: 1500000 rows (18.17MB), Filtered: 45.46%, Physical Input: 4.51MB
@Dith3r Dith3r force-pushed the feature/physical-size branch from 81814fe to f276cde Compare November 9, 2022 13:38
@raunaqmorarka raunaqmorarka merged commit f9654eb into trinodb:master Nov 10, 2022
@github-actions github-actions bot added this to the 403 milestone Nov 10, 2022
@Dith3r Dith3r deleted the feature/physical-size branch November 14, 2022 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants