Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jobs: avoid crdb_internal.system_jobs in gc-jobs #108093

Merged
merged 1 commit into from
Aug 8, 2023

Conversation

stevendanna
Copy link
Collaborator

The crdb_internal.system_jobs is a virtual table that joins information from the jobs table and the jobs_info table.

For the previous query,

SELECT id, payload, status FROM "".crdb_internal.system_jobs
WHERE (created < $1) AND (id > $2)
ORDER BY id
LIMIT $3

this is a little suboptimal because:

  • We don't make use of the progress column so any read of that is useless.

  • While the crdb_internal.virtual table has a virtual index on job id, and EXPLAIN will even claim that it will be used:

    • limit
    │ count: 100
    │
    └── • filter
        │ filter: created < '2023-07-20 07:29:01.17001'
        │
        └── • virtual table
              table: system_jobs@system_jobs_id_idx
              spans: [/101 - ]
    

    This is actually a lie. A virtual index can only handle single-key spans. As a result the unconstrained query is used:

    WITH
        latestpayload AS (SELECT job_id, value FROM system.job_info AS payload WHERE info_key = 'legacy_payload' ORDER BY written DESC),
        latestprogress AS (SELECT job_id, value FROM system.job_info AS progress WHERE info_key = 'legacy_progress' ORDER BY written DESC)
    SELECT
       DISTINCT(id), status, created, payload.value AS payload, progress.value AS progress,
                created_by_type, created_by_id, claim_session_id, claim_instance_id, num_runs, last_run, job_type
    FROM system.jobs AS j
    INNER JOIN latestpayload AS payload ON j.id = payload.job_id
    LEFT JOIN latestprogress AS progress ON j.id = progress.job_id

which has a full scan of the jobs table and 2 full scans of the info table:

  • distinct
  │ distinct on: id, value, value
  │
  └── • merge join
      │ equality: (job_id) = (id)
      │
      ├── • render
      │   │
      │   └── • filter
      │       │ estimated row count: 7,318
      │       │ filter: info_key = 'legacy_payload'
      │       │
      │       └── • scan
      │             estimated row count: 14,648 (100% of the table; stats collected 39 minutes ago; using stats forecast for 2 hours in the future)
      │             table: job_info@primary
      │             spans: FULL SCAN
      │
      └── • merge join (right outer)
          │ equality: (job_id) = (id)
          │ right cols are key
          │
          ├── • render
          │   │
          │   └── • filter
          │       │ estimated row count: 7,317
          │       │ filter: info_key = 'legacy_progress'
          │       │
          │       └── • scan
          │             estimated row count: 14,648 (100% of the table; stats collected 39 minutes ago; using stats forecast for 2 hours in the future)
          │             table: job_info@primary
          │             spans: FULL SCAN
          │
          └── • scan
                missing stats
                table: jobs@primary
                spans: FULL SCAN

Because of the limit, I don't think this ends up being as bad as it looks. But it still isn't great.

In this PR, we replace crdb_internal.jobs with a query that removes the join on the unused progress field and also constrains the query of the job_info table.

  • distinct
  │ distinct on: id, value
  │
  └── • merge join
      │ equality: (job_id) = (id)
      │ right cols are key
      │
      ├── • render
      │   │
      │   └── • filter
      │       │ estimated row count: 7,318
      │       │ filter: info_key = 'legacy_payload'
      │       │
      │       └── • scan
      │             estimated row count: 14,646 (100% of the table; stats collected 45 minutes ago; using stats forecast for 2 hours in the future)
      │             table: job_info@primary
      │             spans: [/101/'legacy_payload' - ]
      │
      └── • render
          │
          └── • limit
              │ count: 100
              │
              └── • filter
                  │ filter: created < '2023-07-20 07:29:01.17001'
                  │
                  └── • scan
                        missing stats
                        table: jobs@primary
                        spans: [/101 - ]

In a local example, this does seem faster:

> SELECT id, payload, status, created
> FROM "".crdb_internal.system_jobs
> WHERE (created < '2023-07-20 07:29:01.17001') AND (id > 100) ORDER BY id LIMIT 100;

id | payload | status | created
-----+---------+--------+----------
(0 rows)

Time: 183ms total (execution 183ms / network 0ms)

> WITH
> latestpayload AS (
>     SELECT job_id, value
>     FROM system.job_info AS payload
>     WHERE job_id > 100 AND info_key = 'legacy_payload'
>     ORDER BY written desc
> ),
> jobpage AS (
>     SELECT id, status, created
>     FROM system.jobs
>     WHERE (created < '2023-07-20 07:29:01.17001') and (id > 100)
>     ORDER BY id
>     LIMIT 100
> )
> SELECT distinct (id), latestpayload.value AS payload, status
> FROM jobpage AS j
> INNER JOIN latestpayload ON j.id = latestpayload.job_id;
  id | payload | status
-----+---------+---------
(0 rows)

Time: 43ms total (execution 42ms / network 0ms)

Release note: None

Epic: none

The crdb_internal.system_jobs is a virtual table that joins
information from the jobs table and the jobs_info table.

For the previous query,

    SELECT id, payload, status FROM "".crdb_internal.system_jobs
    WHERE (created < $1) AND (id > $2)
    ORDER BY id
    LIMIT $3

this is a little suboptimal because:

- We don't make use of the progress column so any read of that is
  useless.

- While the crdb_internal.virtual table has a virtual index on job id,
  and EXPLAIN will even claim that it will be used:

      • limit
      │ count: 100
      │
      └── • filter
          │ filter: created < '2023-07-20 07:29:01.17001'
          │
          └── • virtual table
                table: system_jobs@system_jobs_id_idx
                spans: [/101 - ]

  This is actually a lie. A virtual index can only handle single-key
  spans. As a result the unconstrained query is used:

    WITH
        latestpayload AS (SELECT job_id, value FROM system.job_info AS payload WHERE info_key = 'legacy_payload' ORDER BY written DESC),
        latestprogress AS (SELECT job_id, value FROM system.job_info AS progress WHERE info_key = 'legacy_progress' ORDER BY written DESC)
    SELECT
       DISTINCT(id), status, created, payload.value AS payload, progress.value AS progress,
                created_by_type, created_by_id, claim_session_id, claim_instance_id, num_runs, last_run, job_type
    FROM system.jobs AS j
    INNER JOIN latestpayload AS payload ON j.id = payload.job_id
    LEFT JOIN latestprogress AS progress ON j.id = progress.job_id

  which has a full scan of the jobs table and 2 full scans of the info
  table:

      • distinct
      │ distinct on: id, value, value
      │
      └── • merge join
          │ equality: (job_id) = (id)
          │
          ├── • render
          │   │
          │   └── • filter
          │       │ estimated row count: 7,318
          │       │ filter: info_key = 'legacy_payload'
          │       │
          │       └── • scan
          │             estimated row count: 14,648 (100% of the table; stats collected 39 minutes ago; using stats forecast for 2 hours in the future)
          │             table: job_info@primary
          │             spans: FULL SCAN
          │
          └── • merge join (right outer)
              │ equality: (job_id) = (id)
              │ right cols are key
              │
              ├── • render
              │   │
              │   └── • filter
              │       │ estimated row count: 7,317
              │       │ filter: info_key = 'legacy_progress'
              │       │
              │       └── • scan
              │             estimated row count: 14,648 (100% of the table; stats collected 39 minutes ago; using stats forecast for 2 hours in the future)
              │             table: job_info@primary
              │             spans: FULL SCAN
              │
              └── • scan
                    missing stats
                    table: jobs@primary
                    spans: FULL SCAN

  Because of the limit, I don't think this ends up being as bad as it
  looks. But it still isn't great.

In this PR, we replace crdb_internal.jobs with a query that removes
the join on the unused progress field and also constrains the query of
the job_info table.

      • distinct
      │ distinct on: id, value
      │
      └── • merge join
          │ equality: (job_id) = (id)
          │ right cols are key
          │
          ├── • render
          │   │
          │   └── • filter
          │       │ estimated row count: 7,318
          │       │ filter: info_key = 'legacy_payload'
          │       │
          │       └── • scan
          │             estimated row count: 14,646 (100% of the table; stats collected 45 minutes ago; using stats forecast for 2 hours in the future)
          │             table: job_info@primary
          │             spans: [/101/'legacy_payload' - ]
          │
          └── • render
              │
              └── • limit
                  │ count: 100
                  │
                  └── • filter
                      │ filter: created < '2023-07-20 07:29:01.17001'
                      │
                      └── • scan
                            missing stats
                            table: jobs@primary
                            spans: [/101 - ]

In a local example, this does seem faster:

    > SELECT id, payload, status, created
    > FROM "".crdb_internal.system_jobs
    > WHERE (created < '2023-07-20 07:29:01.17001') AND (id > 100) ORDER BY id LIMIT 100;

    id | payload | status | created
    -----+---------+--------+----------
    (0 rows)

    Time: 183ms total (execution 183ms / network 0ms)

    > WITH
    > latestpayload AS (
    >     SELECT job_id, value
    >     FROM system.job_info AS payload
    >     WHERE job_id > 100 AND info_key = 'legacy_payload'
    >     ORDER BY written desc
    > ),
    > jobpage AS (
    >     SELECT id, status, created
    >     FROM system.jobs
    >     WHERE (created < '2023-07-20 07:29:01.17001') and (id > 100)
    >     ORDER BY id
    >     LIMIT 100
    > )
    > SELECT distinct (id), latestpayload.value AS payload, status
    > FROM jobpage AS j
    > INNER JOIN latestpayload ON j.id = latestpayload.job_id;
      id | payload | status
    -----+---------+---------
    (0 rows)

    Time: 43ms total (execution 42ms / network 0ms)

Release note: None

Epic: none
@stevendanna stevendanna requested review from a team as code owners August 3, 2023 10:29
@stevendanna stevendanna requested review from msbutler and removed request for a team August 3, 2023 10:29
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@stevendanna stevendanna requested review from dt and adityamaru and removed request for a team and msbutler August 3, 2023 10:29
Copy link
Contributor

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!


const expiredJobsQueryWithJobInfoTable = `
WITH
latestpayload AS (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having to unmarshal the payload just to read FinishedMicros is 😢

@miretskiy
Copy link
Contributor

Any plans on backporting this?

@stevendanna
Copy link
Collaborator Author

@miretskiy Yes, I think we should backport this.

@stevendanna stevendanna added the backport-23.1.x Flags PRs that need to be backported to 23.1 label Aug 3, 2023
@stevendanna
Copy link
Collaborator Author

stevendanna commented Aug 8, 2023

bors r=adityamaru

@craig
Copy link
Contributor

craig bot commented Aug 8, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants