Improve jdbc performance #1012

loicmathieu · 2023-02-23T17:18:39Z

Improve JDBC performance by lowering the min poll loop interval and changing PostgreSQL indexing.

Min poll loop interval is the minimum delay between each task execution. When running a single flow that have a lot of tasks, lowering it lower the delay between each tasks. It is currently configured at 100ms, configuring it to 25ms show a nice performance gain and should still be not too low so the database will not be sollicited too much. See performance gains bellow.

On PostgreSQL, the queues table unique index disturb the PostgreSQL cost optimizer that chooses this index when doing the poll query resulting in 10ms poll queries on a table with a few thousands of line.
Removing the PK and the index that exists on the columns type and offset, and replacing them with a hash index on the offset colm-umn do the trick. There is still an index on the offset column but it is no longuer used when doing the poll query. This new index is needed for other queries.

Using this flow;

id: slow-flow
namespace: io.kestra.tests

tasks:
  - id: waterconsumption
    type: io.kestra.core.tasks.flows.Template
    namespace: io.kestra.tests
    templateId: for-each-1
  - id: watertemperature
    type: io.kestra.core.tasks.flows.Template
    namespace: io.kestra.tests
    templateId: for-each-2
  - id: internaltemperature
    type: io.kestra.core.tasks.flows.Template
    namespace: io.kestra.tests
    templateId: for-each-3
  - id: internalhumidity
    type: io.kestra.core.tasks.flows.Template
    namespace: io.kestra.tests
    templateId: for-each-4

That using this template (all 4 templates are the same):

id: for-each-1
namespace: io.kestra.tests
tasks:
  - id: for-each-1
    type: io.kestra.core.tasks.flows.EachSequential
    tasks:
      - id: each-value-1
        type: io.kestra.core.tasks.debugs.Return
        format: "{{ task.id }} with current value '{{ taskrun.value }}'"
    value: "[\"value 1\", \"value 2\", \"value 3\", \"value 4\", \"value 5\", \"value 6\", \"value 7\", \"value 8\", \"value 9\", \"value 10\"]"

Gives the following results:

Baseline: 22s
Min poll interval from 100ms to 25ms: 15s
Changing PG indexing: 7s

These two optimizations on top of each other gives a 70% execution time reduction (execution time divided by 3 approx).

Even if not tested thoroughly on EachParallel, a quick test switching EachSequential to EachParalle show also a nice improvement.

…ith a lot of tasks

The property has been set to 25ms since kestra-io/kestra#1012.

relate to kestra-io/kestra#1012

loicmathieu added 2 commits February 23, 2023 17:39

feat(cli): reduce min poll interval to improve performance of flows w…

cb18c14

…ith a lot of tasks

feat: improve poll queries on PostgreSQL

2274f48

loicmathieu mentioned this pull request Feb 24, 2023

feat(jdbc-mysql): force using the index on poll queries #1016

Merged

tchiotludo merged commit 0d02517 into develop Feb 24, 2023

tchiotludo deleted the improve-jdbc-performance branch February 24, 2023 13:20

yuri1969 added a commit to yuri1969/kestra.io that referenced this pull request Mar 29, 2023

Update min-poll-interval default value

552dea5

The property has been set to 25ms since kestra-io/kestra#1012.

yuri1969 mentioned this pull request Mar 29, 2023

Update the min-poll-interval default value kestra-io/docs#171

Merged

tchiotludo pushed a commit to kestra-io/docs that referenced this pull request Mar 29, 2023

fix(docs): update min-poll-interval default value (#171)

92c4d84

relate to kestra-io/kestra#1012

yuri1969 mentioned this pull request Mar 31, 2023

Improve PostgreSQL JDBC queue poll performance #1121

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve jdbc performance #1012

Improve jdbc performance #1012

loicmathieu commented Feb 23, 2023

Improve jdbc performance #1012

Improve jdbc performance #1012

Conversation

loicmathieu commented Feb 23, 2023