Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made improvements to lineage query performance #2472

Merged
merged 2 commits into from
Apr 5, 2023

Conversation

collado-mike
Copy link
Collaborator

Problem

After merging #2448, there was a pretty significant performance regression in the lineage query. After various attempts, this change addresses the performance, though it is still not as fast as it was before.

Current explain plan when executed on a database with ~900,000 job records:

Unique  (cost=62725593721.50..63084437013.70 rows=884152 width=742)
  CTE job_io
    ->  GroupAggregate  (cost=689629.07..721327.76 rows=905677 width=112)
          Group Key: (COALESCE(j_1.symlink_target_uuid, j_1.uuid))
          ->  Sort  (cost=689629.07..691893.26 rows=905677 width=54)
                Sort Key: (COALESCE(j_1.symlink_target_uuid, j_1.uuid))
                ->  Hash Left Join  (cost=266135.08..538103.50 rows=905677 width=54)
                      Hash Cond: (v.uuid = io.job_version_uuid)
                      ->  Hash Left Join  (cost=266111.09..529011.68 rows=905677 width=48)
                            Hash Cond: (COALESCE(j_2.current_version_uuid, j_1.current_version_uuid) = v.uuid)
                            ->  Hash Left Join  (cost=230797.67..468842.84 rows=905677 width=64)
                                  Hash Cond: (j_1.symlink_target_uuid = j_2.uuid)
                                  ->  Seq Scan on jobs j_1  (cost=0.00..211974.77 rows=905677 width=48)
                                  ->  Hash  (cost=211974.77..211974.77 rows=884152 width=48)
                                        ->  Seq Scan on jobs j_2  (cost=0.00..211974.77 rows=884152 width=48)
                                              Filter: ((is_hidden IS FALSE) AND (symlink_target_uuid IS NULL))
                            ->  Hash  (cost=24564.30..24564.30 rows=618330 width=16)
                                  ->  Seq Scan on job_versions v  (cost=0.00..24564.30 rows=618330 width=16)
                      ->  Hash  (cost=14.55..14.55 rows=755 width=38)
                            ->  Seq Scan on job_versions_io_mapping io  (cost=0.00..14.55 rows=755 width=38)
  CTE lineage
    ->  Recursive Union  (cost=0.42..57275676397.49 rows=13278198704 width=84)
          ->  Nested Loop  (cost=0.42..247254.76 rows=44204 width=84)
                ->  CTE Scan on job_io io_1  (cost=0.00..20377.73 rows=4528 width=96)
                      Filter: (ids && '{1468c86b-1364-42bf-9137-62c9bb7ce6ab}'::uuid[])
                ->  Index Scan using jobs_pkey on jobs j_3  (cost=0.42..50.01 rows=10 width=48)
                      Index Cond: (uuid = ANY (io_1.ids))
                      Filter: ((is_hidden IS FALSE) AND (symlink_target_uuid IS NULL))
          ->  Nested Loop  (cost=0.00..5700986516.86 rows=1327815450 width=84)
                Join Filter: ((io_2.job_uuid <> l.job_uuid) AND (array_cat(io_2.inputs, io_2.outputs) && array_cat(l.inputs, l.outputs)))
                ->  WorkTable Scan on lineage l  (cost=0.00..9945.90 rows=147347 width=84)
                      Filter: (depth < 20)
                ->  CTE Scan on job_io io_2  (cost=0.00..18113.54 rows=905677 width=80)
  ->  Merge Join  (cost=5449195996.25..5775632740.80 rows=12962619058 width=742)
        Merge Cond: (j.uuid = l2.job_uuid)
        ->  Sort  (cost=1022657.88..1024868.26 rows=884152 width=526)
              Sort Key: j.uuid
              ->  Hash Left Join  (cost=231543.62..518291.48 rows=884152 width=526)
                    Hash Cond: (j.current_job_context_uuid = jc.uuid)
                    ->  Hash Left Join  (cost=231256.73..515681.41 rows=884152 width=291)
                          Hash Cond: (j.parent_job_uuid = p.uuid)
                          ->  Seq Scan on jobs j  (cost=0.00..211974.77 rows=884152 width=263)
                                Filter: ((is_hidden IS FALSE) AND (symlink_target_uuid IS NULL))
                          ->  Hash  (cost=211974.77..211974.77 rows=905677 width=44)
                                ->  Seq Scan on jobs p  (cost=0.00..211974.77 rows=905677 width=44)
                    ->  Hash  (cost=233.06..233.06 rows=4306 width=251)
                          ->  Seq Scan on job_contexts jc  (cost=0.00..233.06 rows=4306 width=251)
        ->  Materialize  (cost=5448173338.36..5514564331.88 rows=13278198704 width=80)
              ->  Sort  (cost=5448173338.36..5481368835.12 rows=13278198704 width=80)
                    Sort Key: l2.job_uuid
                    ->  CTE Scan on lineage l2  (cost=0.00..265563974.08 rows=13278198704 width=80)

New explain plan:

Unique  (cost=524955.66..524955.96 rows=60 width=742)
  CTE job_current_version
    ->  Gather  (cost=214988.72..459085.45 rows=272005 width=32)
          Workers Planned: 2
          ->  Parallel Hash Left Join  (cost=213988.72..430884.95 rows=113335 width=32)
                Hash Cond: (j_1.symlink_target_uuid = s.uuid)
                Filter: (s.current_version_uuid IS NULL)
                ->  Parallel Seq Scan on jobs j_1  (cost=0.00..206691.65 rows=377365 width=48)
                ->  Parallel Hash  (cost=206691.65..206691.65 rows=377365 width=32)
                      ->  Parallel Seq Scan on jobs s  (cost=0.00..206691.65 rows=377365 width=32)
  CTE job_io
    ->  GroupAggregate  (cost=45670.75..50061.87 rows=200 width=80)
          Group Key: j_2.job_uuid
          ->  Sort  (cost=45670.75..46402.11 rows=292541 width=38)
                Sort Key: j_2.job_uuid
                ->  Hash Join  (cost=23.99..11109.55 rows=292541 width=38)
                      Hash Cond: (j_2.job_version_uuid = io.job_version_uuid)
                      ->  CTE Scan on job_current_version j_2  (cost=0.00..5440.10 rows=272005 width=32)
                      ->  Hash  (cost=14.55..14.55 rows=755 width=38)
                            ->  Seq Scan on job_versions_io_mapping io  (cost=0.00..14.55 rows=755 width=38)
  CTE lineage
    ->  Recursive Union  (cost=8.85..15187.26 rows=61 width=84)
          ->  Nested Loop Left Join  (cost=8.85..14983.64 rows=1 width=84)
                Join Filter: (io_1.job_uuid = v.job_uuid)
                ->  Nested Loop  (cost=8.85..14977.14 rows=1 width=16)
                      Join Filter: (((j_3.symlink_target_uuid IS NULL) AND (j_3.uuid = v.job_uuid)) OR (v.job_uuid = j_3.symlink_target_uuid))
                      ->  CTE Scan on job_current_version v  (cost=0.00..5440.10 rows=272005 width=16)
                      ->  Materialize  (cost=8.85..16.87 rows=2 width=32)
                            ->  Bitmap Heap Scan on jobs j_3  (cost=8.85..16.86 rows=2 width=32)
                                  Recheck Cond: ((uuid = '1468c86b-1364-42bf-9137-62c9bb7ce6ab'::uuid) OR (symlink_target_uuid = '1468c86b-1364-42bf-9137-62c9bb7ce6ab'::uuid))
                                  ->  BitmapOr  (cost=8.85..8.85 rows=2 width=0)
                                        ->  Bitmap Index Scan on jobs_pkey  (cost=0.00..4.43 rows=1 width=0)
                                              Index Cond: (uuid = '1468c86b-1364-42bf-9137-62c9bb7ce6ab'::uuid)
                                        ->  Bitmap Index Scan on jobs_symlink_target_uuid_index  (cost=0.00..4.42 rows=1 width=0)
                                              Index Cond: (symlink_target_uuid = '1468c86b-1364-42bf-9137-62c9bb7ce6ab'::uuid)
                ->  CTE Scan on job_io io_1  (cost=0.00..4.00 rows=200 width=80)
          ->  Nested Loop  (cost=0.00..20.24 rows=6 width=84)
                Join Filter: ((io_2.job_uuid <> l.job_uuid) AND (array_cat(io_2.inputs, io_2.outputs) && array_cat(l.inputs, l.outputs)))
                ->  WorkTable Scan on lineage l  (cost=0.00..0.22 rows=3 width=84)
                      Filter: (depth < 20)
                ->  CTE Scan on job_io io_2  (cost=0.00..4.00 rows=200 width=80)
  ->  Sort  (cost=621.07..621.22 rows=60 width=742)
        Sort Key: j.uuid
        ->  Nested Loop Left Join  (cost=1.13..619.30 rows=60 width=742)
              ->  Nested Loop Left Join  (cost=0.85..600.79 rows=60 width=355)
                    ->  Nested Loop  (cost=0.42..516.21 rows=60 width=327)
                          ->  CTE Scan on lineage l2  (cost=0.00..1.22 rows=61 width=80)
                          ->  Index Scan using jobs_pkey on jobs j  (cost=0.42..8.44 rows=1 width=263)
                                Index Cond: (uuid = l2.job_uuid)
                                Filter: ((is_hidden IS FALSE) AND (symlink_target_uuid IS NULL))
                    ->  Index Scan using jobs_pkey on jobs p  (cost=0.42..1.41 rows=1 width=44)
                          Index Cond: (j.parent_job_uuid = uuid)
              ->  Index Scan using job_contexts_pkey on job_contexts jc  (cost=0.28..0.30 rows=1 width=251)
                    Index Cond: (uuid = j.current_job_context_uuid)

Cost reduction from 63,084,437,013 -> 524,955

Solution

Please describe your change as it relates to the problem, or bug fix, as well as any dependencies. If your change requires a database schema migration, please describe the schema modification(s) and whether it's a backwards-incompatible or backwards-compatible change.

Note: All database schema changes require discussion. Please link the issue for context.

One-line summary:

Checklist

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've included a one-line summary of your change for the CHANGELOG.md (Depending on the change, this may not be necessary).
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

@collado-mike collado-mike requested a review from fm100 April 5, 2023 22:10
@boring-cyborg boring-cyborg bot added the api API layer changes label Apr 5, 2023
@collado-mike collado-mike force-pushed the fix/lineage_query_perf branch from 4963675 to f8a5720 Compare April 5, 2023 22:11
@codecov
Copy link

codecov bot commented Apr 5, 2023

Codecov Report

Merging #2472 (f8a5720) into main (289fa3e) will not change coverage.
The diff coverage is n/a.

❗ Current head f8a5720 differs from pull request most recent head 1159a3c. Consider uploading reports for the commit 1159a3c to get more accurate results

@@            Coverage Diff            @@
##               main    #2472   +/-   ##
=========================================
  Coverage     83.53%   83.53%           
  Complexity     1207     1207           
=========================================
  Files           231      231           
  Lines          5503     5503           
  Branches        267      267           
=========================================
  Hits           4597     4597           
  Misses          762      762           
  Partials        144      144           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@collado-mike collado-mike force-pushed the fix/lineage_query_perf branch from f8a5720 to fb84d9f Compare April 5, 2023 22:27
Signed-off-by: Michael Collado <collado.mike@gmail.com>
@collado-mike collado-mike force-pushed the fix/lineage_query_perf branch from fb84d9f to b5d3b65 Compare April 5, 2023 22:28
@collado-mike collado-mike enabled auto-merge (squash) April 5, 2023 22:48
@collado-mike collado-mike merged commit caca9a0 into main Apr 5, 2023
@collado-mike collado-mike deleted the fix/lineage_query_perf branch April 5, 2023 22:56
Xavier-Cliquennois pushed a commit to Xavier-Cliquennois/marquez that referenced this pull request Jul 26, 2023
Signed-off-by: Michael Collado <collado.mike@gmail.com>
Signed-off-by: Xavier-Cliquennois <xavier.cliquennois@wearegraphite.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api API layer changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants