Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve jdbc performance #1012

Merged
merged 2 commits into from
Feb 24, 2023
Merged

Improve jdbc performance #1012

merged 2 commits into from
Feb 24, 2023

Conversation

loicmathieu
Copy link
Member

Improve JDBC performance by lowering the min poll loop interval and changing PostgreSQL indexing.

Min poll loop interval is the minimum delay between each task execution. When running a single flow that have a lot of tasks, lowering it lower the delay between each tasks. It is currently configured at 100ms, configuring it to 25ms show a nice performance gain and should still be not too low so the database will not be sollicited too much. See performance gains bellow.

On PostgreSQL, the queues table unique index disturb the PostgreSQL cost optimizer that chooses this index when doing the poll query resulting in 10ms poll queries on a table with a few thousands of line.
Removing the PK and the index that exists on the columns type and offset, and replacing them with a hash index on the offset colm-umn do the trick. There is still an index on the offset column but it is no longuer used when doing the poll query. This new index is needed for other queries.

Using this flow;

id: slow-flow
namespace: io.kestra.tests

tasks:
  - id: waterconsumption
    type: io.kestra.core.tasks.flows.Template
    namespace: io.kestra.tests
    templateId: for-each-1
  - id: watertemperature
    type: io.kestra.core.tasks.flows.Template
    namespace: io.kestra.tests
    templateId: for-each-2
  - id: internaltemperature
    type: io.kestra.core.tasks.flows.Template
    namespace: io.kestra.tests
    templateId: for-each-3
  - id: internalhumidity
    type: io.kestra.core.tasks.flows.Template
    namespace: io.kestra.tests
    templateId: for-each-4

That using this template (all 4 templates are the same):

id: for-each-1
namespace: io.kestra.tests
tasks:
  - id: for-each-1
    type: io.kestra.core.tasks.flows.EachSequential
    tasks:
      - id: each-value-1
        type: io.kestra.core.tasks.debugs.Return
        format: "{{ task.id }} with current value '{{ taskrun.value }}'"
    value: "[\"value 1\", \"value 2\", \"value 3\", \"value 4\", \"value 5\", \"value 6\", \"value 7\", \"value 8\", \"value 9\", \"value 10\"]"

Gives the following results:

Baseline: 22s
Min poll interval from 100ms to 25ms: 15s
Changing PG indexing: 7s

These two optimizations on top of each other gives a 70% execution time reduction (execution time divided by 3 approx).

Even if not tested thoroughly on EachParallel, a quick test switching EachSequential to EachParalle show also a nice improvement.

@tchiotludo tchiotludo merged commit 0d02517 into develop Feb 24, 2023
@tchiotludo tchiotludo deleted the improve-jdbc-performance branch February 24, 2023 13:20
yuri1969 added a commit to yuri1969/kestra.io that referenced this pull request Mar 29, 2023
The property has been set to 25ms since kestra-io/kestra#1012.
tchiotludo pushed a commit to kestra-io/docs that referenced this pull request Mar 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants