Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Materalize row count query. #635

Open
tolomea opened this issue Mar 12, 2024 · 1 comment
Open

Materalize row count query. #635

tolomea opened this issue Mar 12, 2024 · 1 comment

Comments

@tolomea
Copy link

tolomea commented Mar 12, 2024

This is version 1.9.1

I don't understand explain output that well, I'm still learning.

I have an explain from Postgres that includes

->  Nested Loop Semi Join  (cost=105120.95..1407009.98 rows=1 width=254) (actual time=9567.961..183600.927 rows=11768 loops=1)
    Join Filter: (REDACTED)
    Rows Removed by Join Filter: 1196949001
    ->  Nested Loop  (cost=104809.76..1404697.63 rows=2798 width=286) (actual time=8337.268..65035.557 rows=205377 loops=1)
    ->  Materialize  (cost=311.19..2010.18 rows=27 width=16) (actual time=0.001..0.292 rows=5828 loops=205377)
        ->  Nested Loop  (cost=311.19..2010.15 rows=27 width=16) (actual time=106.768..1213.133 rows=6000 loops=1)

In the UI these four nodes look like:
image

1: The rows and estimated rows on the materialize surprise me they seem to have been multiplied by loops. Is this indicating that the nested loop semi join is reading the whole materialized table 205377 times?

2: I note that that's the number of rows coming up to the other side of the semi join from the nested loop. It's estimated rows is much lower which suggests that if 1 is correct and expected, then maybe for the estimated rows on the materialize it should be multiplying by the estimated rows off the nested loop, not the actual loops from the materialize.

@tolomea
Copy link
Author

tolomea commented Mar 12, 2024

Also I have another version of the same query where instead of materialize and nested loop semi join it's used a hash and hash semi join.
The hash only has a loop count of 1, so it doesn't pump up the rows like this. That makes it look much better, but is it really?
As far as I can see the planner has done that because it's row estimates that time were much less accurate leading it to think there would only be 27 rows on the left side of this join, not 205,377. Suggesting that the planner thinks the materialize is better when there are more rows.
Although the runtimes contradict that idea so 🤷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant