release-23.1: jobs: avoid crdb_internal.system_jobs in gc-jobs #108390

blathers-crl · 2023-08-08T22:19:44Z

Backport 1/1 commits from #108093 on behalf of @stevendanna.

/cc @cockroachdb/release

The crdb_internal.system_jobs is a virtual table that joins information from the jobs table and the jobs_info table.

For the previous query,

SELECT id, payload, status FROM "".crdb_internal.system_jobs
WHERE (created < $1) AND (id > $2)
ORDER BY id
LIMIT $3

this is a little suboptimal because:

We don't make use of the progress column so any read of that is useless.
While the crdb_internal.virtual table has a virtual index on job id, and EXPLAIN will even claim that it will be used:
```
• limit
│ count: 100
│
└── • filter
    │ filter: created < '2023-07-20 07:29:01.17001'
    │
    └── • virtual table
          table: system_jobs@system_jobs_id_idx
          spans: [/101 - ]
```
This is actually a lie. A virtual index can only handle single-key spans. As a result the unconstrained query is used:

    WITH
        latestpayload AS (SELECT job_id, value FROM system.job_info AS payload WHERE info_key = 'legacy_payload' ORDER BY written DESC),
        latestprogress AS (SELECT job_id, value FROM system.job_info AS progress WHERE info_key = 'legacy_progress' ORDER BY written DESC)
    SELECT
       DISTINCT(id), status, created, payload.value AS payload, progress.value AS progress,
                created_by_type, created_by_id, claim_session_id, claim_instance_id, num_runs, last_run, job_type
    FROM system.jobs AS j
    INNER JOIN latestpayload AS payload ON j.id = payload.job_id
    LEFT JOIN latestprogress AS progress ON j.id = progress.job_id

which has a full scan of the jobs table and 2 full scans of the info table:

  • distinct
  │ distinct on: id, value, value
  │
  └── • merge join
      │ equality: (job_id) = (id)
      │
      ├── • render
      │   │
      │   └── • filter
      │       │ estimated row count: 7,318
      │       │ filter: info_key = 'legacy_payload'
      │       │
      │       └── • scan
      │             estimated row count: 14,648 (100% of the table; stats collected 39 minutes ago; using stats forecast for 2 hours in the future)
      │             table: job_info@primary
      │             spans: FULL SCAN
      │
      └── • merge join (right outer)
          │ equality: (job_id) = (id)
          │ right cols are key
          │
          ├── • render
          │   │
          │   └── • filter
          │       │ estimated row count: 7,317
          │       │ filter: info_key = 'legacy_progress'
          │       │
          │       └── • scan
          │             estimated row count: 14,648 (100% of the table; stats collected 39 minutes ago; using stats forecast for 2 hours in the future)
          │             table: job_info@primary
          │             spans: FULL SCAN
          │
          └── • scan
                missing stats
                table: jobs@primary
                spans: FULL SCAN

Because of the limit, I don't think this ends up being as bad as it looks. But it still isn't great.

In this PR, we replace crdb_internal.jobs with a query that removes the join on the unused progress field and also constrains the query of the job_info table.

  • distinct
  │ distinct on: id, value
  │
  └── • merge join
      │ equality: (job_id) = (id)
      │ right cols are key
      │
      ├── • render
      │   │
      │   └── • filter
      │       │ estimated row count: 7,318
      │       │ filter: info_key = 'legacy_payload'
      │       │
      │       └── • scan
      │             estimated row count: 14,646 (100% of the table; stats collected 45 minutes ago; using stats forecast for 2 hours in the future)
      │             table: job_info@primary
      │             spans: [/101/'legacy_payload' - ]
      │
      └── • render
          │
          └── • limit
              │ count: 100
              │
              └── • filter
                  │ filter: created < '2023-07-20 07:29:01.17001'
                  │
                  └── • scan
                        missing stats
                        table: jobs@primary
                        spans: [/101 - ]

In a local example, this does seem faster:

> SELECT id, payload, status, created
> FROM "".crdb_internal.system_jobs
> WHERE (created < '2023-07-20 07:29:01.17001') AND (id > 100) ORDER BY id LIMIT 100;

id | payload | status | created
-----+---------+--------+----------
(0 rows)

Time: 183ms total (execution 183ms / network 0ms)

> WITH
> latestpayload AS (
>     SELECT job_id, value
>     FROM system.job_info AS payload
>     WHERE job_id > 100 AND info_key = 'legacy_payload'
>     ORDER BY written desc
> ),
> jobpage AS (
>     SELECT id, status, created
>     FROM system.jobs
>     WHERE (created < '2023-07-20 07:29:01.17001') and (id > 100)
>     ORDER BY id
>     LIMIT 100
> )
> SELECT distinct (id), latestpayload.value AS payload, status
> FROM jobpage AS j
> INNER JOIN latestpayload ON j.id = latestpayload.job_id;
  id | payload | status
-----+---------+---------
(0 rows)

Time: 43ms total (execution 42ms / network 0ms)

Release note: None

Epic: none

Release justification: Bug fix for performance issue that could cause job system contention.

blathers-crl · 2023-08-08T22:19:48Z

cockroach-teamcity · 2023-08-08T22:19:58Z

This change is

github-actions · 2023-09-11T10:03:17Z

Reminder: it has been 3 weeks please merge or close your backport!

yuzefovich · 2024-01-31T03:45:40Z

@stevendanna do we want to merge this?

github-actions · 2024-02-21T10:03:43Z

Reminder: it has been 3 weeks please merge or close your backport!

The crdb_internal.system_jobs is a virtual table that joins information from the jobs table and the jobs_info table. For the previous query, SELECT id, payload, status FROM "".crdb_internal.system_jobs WHERE (created < $1) AND (id > $2) ORDER BY id LIMIT $3 this is a little suboptimal because: - We don't make use of the progress column so any read of that is useless. - While the crdb_internal.virtual table has a virtual index on job id, and EXPLAIN will even claim that it will be used: • limit │ count: 100 │ └── • filter │ filter: created < '2023-07-20 07:29:01.17001' │ └── • virtual table table: system_jobs@system_jobs_id_idx spans: [/101 - ] This is actually a lie. A virtual index can only handle single-key spans. As a result the unconstrained query is used: WITH latestpayload AS (SELECT job_id, value FROM system.job_info AS payload WHERE info_key = 'legacy_payload' ORDER BY written DESC), latestprogress AS (SELECT job_id, value FROM system.job_info AS progress WHERE info_key = 'legacy_progress' ORDER BY written DESC) SELECT DISTINCT(id), status, created, payload.value AS payload, progress.value AS progress, created_by_type, created_by_id, claim_session_id, claim_instance_id, num_runs, last_run, job_type FROM system.jobs AS j INNER JOIN latestpayload AS payload ON j.id = payload.job_id LEFT JOIN latestprogress AS progress ON j.id = progress.job_id which has a full scan of the jobs table and 2 full scans of the info table: • distinct │ distinct on: id, value, value │ └── • merge join │ equality: (job_id) = (id) │ ├── • render │ │ │ └── • filter │ │ estimated row count: 7,318 │ │ filter: info_key = 'legacy_payload' │ │ │ └── • scan │ estimated row count: 14,648 (100% of the table; stats collected 39 minutes ago; using stats forecast for 2 hours in the future) │ table: job_info@primary │ spans: FULL SCAN │ └── • merge join (right outer) │ equality: (job_id) = (id) │ right cols are key │ ├── • render │ │ │ └── • filter │ │ estimated row count: 7,317 │ │ filter: info_key = 'legacy_progress' │ │ │ └── • scan │ estimated row count: 14,648 (100% of the table; stats collected 39 minutes ago; using stats forecast for 2 hours in the future) │ table: job_info@primary │ spans: FULL SCAN │ └── • scan missing stats table: jobs@primary spans: FULL SCAN Because of the limit, I don't think this ends up being as bad as it looks. But it still isn't great. In this PR, we replace crdb_internal.jobs with a query that removes the join on the unused progress field and also constrains the query of the job_info table. • distinct │ distinct on: id, value │ └── • merge join │ equality: (job_id) = (id) │ right cols are key │ ├── • render │ │ │ └── • filter │ │ estimated row count: 7,318 │ │ filter: info_key = 'legacy_payload' │ │ │ └── • scan │ estimated row count: 14,646 (100% of the table; stats collected 45 minutes ago; using stats forecast for 2 hours in the future) │ table: job_info@primary │ spans: [/101/'legacy_payload' - ] │ └── • render │ └── • limit │ count: 100 │ └── • filter │ filter: created < '2023-07-20 07:29:01.17001' │ └── • scan missing stats table: jobs@primary spans: [/101 - ] In a local example, this does seem faster: > SELECT id, payload, status, created > FROM "".crdb_internal.system_jobs > WHERE (created < '2023-07-20 07:29:01.17001') AND (id > 100) ORDER BY id LIMIT 100; id | payload | status | created -----+---------+--------+---------- (0 rows) Time: 183ms total (execution 183ms / network 0ms) > WITH > latestpayload AS ( > SELECT job_id, value > FROM system.job_info AS payload > WHERE job_id > 100 AND info_key = 'legacy_payload' > ORDER BY written desc > ), > jobpage AS ( > SELECT id, status, created > FROM system.jobs > WHERE (created < '2023-07-20 07:29:01.17001') and (id > 100) > ORDER BY id > LIMIT 100 > ) > SELECT distinct (id), latestpayload.value AS payload, status > FROM jobpage AS j > INNER JOIN latestpayload ON j.id = latestpayload.job_id; id | payload | status -----+---------+--------- (0 rows) Time: 43ms total (execution 42ms / network 0ms) Release note: None Epic: none

stevendanna · 2024-05-15T06:22:02Z

@dt We originally held off on merging this one because it was relatively speculative. I don't think this makes much of a dent but it is increasingly looking like anything we can do for 23.1 here we probably should do, so I've rebased this if you want to take a look.

stevendanna · 2024-05-15T06:25:10Z

Although, I'm a bit more bullish about #123848 as it might improve the query plan here as a side-effect.

github-actions · 2024-06-06T10:03:40Z

Reminder: it has been 3 weeks please merge or close your backport!

yuzefovich · 2025-01-27T04:33:35Z

23.1 branch is done

blathers-crl bot requested a review from a team as a code owner August 8, 2023 22:19

blathers-crl bot force-pushed the blathers/backport-release-23.1-108093 branch from daeaec2 to 206eeb1 Compare August 8, 2023 22:19

blathers-crl bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels Aug 8, 2023

blathers-crl bot force-pushed the blathers/backport-release-23.1-108093 branch from 8a1be38 to 854fe36 Compare August 8, 2023 22:19

blathers-crl bot assigned stevendanna Aug 8, 2023

blathers-crl bot requested review from adityamaru and dt August 8, 2023 22:19

github-actions bot added the no-backport-pr-activity label Sep 11, 2023

adityamaru approved these changes Sep 12, 2023

View reviewed changes

github-actions bot removed the no-backport-pr-activity label Jan 31, 2024

github-actions bot added the no-backport-pr-activity label Feb 21, 2024

stevendanna force-pushed the blathers/backport-release-23.1-108093 branch from 854fe36 to cbb28c7 Compare May 15, 2024 06:18

github-actions bot removed the no-backport-pr-activity label May 15, 2024

github-actions bot added the no-backport-pr-activity label Jun 6, 2024

yuzefovich closed this Jan 27, 2025

yuzefovich deleted the blathers/backport-release-23.1-108093 branch January 27, 2025 04:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-23.1: jobs: avoid crdb_internal.system_jobs in gc-jobs #108390

release-23.1: jobs: avoid crdb_internal.system_jobs in gc-jobs #108390

blathers-crl bot commented Aug 8, 2023 •

edited by stevendanna

Loading

blathers-crl bot commented Aug 8, 2023

cockroach-teamcity commented Aug 8, 2023

github-actions bot commented Sep 11, 2023

yuzefovich commented Jan 31, 2024

github-actions bot commented Feb 21, 2024

stevendanna commented May 15, 2024

stevendanna commented May 15, 2024

github-actions bot commented Jun 6, 2024

yuzefovich commented Jan 27, 2025

release-23.1: jobs: avoid crdb_internal.system_jobs in gc-jobs #108390

release-23.1: jobs: avoid crdb_internal.system_jobs in gc-jobs #108390

Conversation

blathers-crl bot commented Aug 8, 2023 • edited by stevendanna Loading

blathers-crl bot commented Aug 8, 2023

cockroach-teamcity commented Aug 8, 2023

github-actions bot commented Sep 11, 2023

yuzefovich commented Jan 31, 2024

github-actions bot commented Feb 21, 2024

stevendanna commented May 15, 2024

stevendanna commented May 15, 2024

github-actions bot commented Jun 6, 2024

yuzefovich commented Jan 27, 2025

blathers-crl bot commented Aug 8, 2023 •

edited by stevendanna

Loading