trino query performance #24360

sukhlab · 2024-12-04T00:34:40Z

sukhlab
Dec 4, 2024

hello,
I am testing sql queries performance using aws emr and trino configuration and s3 data lake apache iceberg tables.

I have query which joins multiple tables in data lake. Approx. rows to return are 16k
When I run query first time, i takes approx. 7 seconds and 2nd takes almost 4sec

first run:

2nd run:

I wanted to know if anyone has experience this kind of behaviour and how to make query time consistent. Also I have noticed that parallelism is not being used at best.

configuration of coordinator and task nodes:
coordinator node (1) : r6g.xlarge 4vcpu, 16gb ram
task nodes (2) : r6g.xlarge 4vcpu, 16gb ram

coordinator configs:
trino config
coordinator=true
node-scheduler.include-coordinator=false
http-server.threads.max=500
sink.max-buffer-size=1GB
query.max-memory=14054MB
query.max-memory-per-node=6549825127B
query.max-history=40
query.min-expire-age=30m
query.client.timeout=30m
query.stage-count-warning-threshold=100
query.max-stage-count=150
http-server.http.port=8889
http-server.log.path=/var/log/trino/http-request.log
http-server.log.max-size=67108864B
http-server.log.max-history=5
log.max-size=268435456B
jmx.rmiregistry.port = 9080
jmx.rmiserver.port = 9081
task.max-worker-threads = 8
task.concurrency = 8

jvm:
-server
-Xmx13099650253
-XX:InitialRAMPercentage=80
-XX:MaxRAMPercentage=80
-XX:G1HeapRegionSize=32M
-XX:+UseG1GC
-XX:+ExplicitGCInvokesConcurrent
-XX:+ExitOnOutOfMemoryError
-XX:+HeapDumpOnOutOfMemoryError
-XX:-OmitStackTraceInFastThrow
-XX:ReservedCodeCacheSize=512M
-XX:PerMethodRecompilationCutoff=10000
-XX:PerBytecodeRecompilationCutoff=10000
-Djdk.attach.allowAttachSelf=true
-Djdk.nio.maxCachedBufferSize=2000000

task node configs:
trino config:
coordinator=false
node-scheduler.include-coordinator=false
http-server.threads.max=500
sink.max-buffer-size=1GB
query.max-memory=14054MB
query.max-memory-per-node=6549825127B
query.max-history=40
query.min-expire-age=30m
query.client.timeout=30m
query.stage-count-warning-threshold=100
query.max-stage-count=150
http-server.http.port=8889
http-server.log.path=/var/log/trino/http-request.log
http-server.log.max-size=67108864B
http-server.log.max-history=5
log.max-size=268435456B
jmx.rmiregistry.port = 9080
jmx.rmiserver.port = 9081
task.max-worker-threads = 8
task.concurrency = 8

jvm config:
-server
-Xmx13099650253
-XX:InitialRAMPercentage=80
-XX:MaxRAMPercentage=80
-XX:G1HeapRegionSize=32M
-XX:+UseG1GC
-XX:+ExplicitGCInvokesConcurrent
-XX:+ExitOnOutOfMemoryError
-XX:+HeapDumpOnOutOfMemoryError
-XX:-OmitStackTraceInFastThrow
-XX:ReservedCodeCacheSize=512M
-XX:PerMethodRecompilationCutoff=10000
-XX:PerBytecodeRecompilationCutoff=10000
-Djdk.attach.allowAttachSelf=true
-Djdk.nio.maxCachedBufferSize=2000000

am I missing anything here? Also same sql query takes 2 seconds to return data from aws rds sql server ( 4vcpu, 32gb) and taking more than double with trino 3 worker nodes ( 4vcpu, 16gb) each.

any help much appreciated!

thanks

Answered by hashhar

Dec 6, 2024

look at the scheduled time metric and physical input read time metric - that's basically the amount of time spent doing I/O. In your case it dominates the query time. Look at whether you have small files in AWS or if you have too many paritions (hence slow listing) and so on.

also you've configured a lot of settings to non-default values (some being even lower than defaults). Remove those.

Also a EXPLAIN plan at the minimum (preferably explain analyze) for the queries you think are running slow would be helpful. A query json from the web ui would be even better.

View full answer

sukhlab · 2024-12-04T01:42:46Z

sukhlab
Dec 4, 2024
Author

I have tested same with more number of task node. Added 2 more . I could able to increase the splits but response time has same pattern,
First run e.g. 10sec and then further runs go down to 4sec. splits increased from 110 to 220

0 replies

hashhar · 2024-12-06T14:21:23Z

hashhar
Dec 6, 2024
Collaborator

look at the scheduled time metric and physical input read time metric - that's basically the amount of time spent doing I/O. In your case it dominates the query time. Look at whether you have small files in AWS or if you have too many paritions (hence slow listing) and so on.

also you've configured a lot of settings to non-default values (some being even lower than defaults). Remove those.

Also a EXPLAIN plan at the minimum (preferably explain analyze) for the queries you think are running slow would be helpful. A query json from the web ui would be even better.

2 replies

sukhlab Dec 9, 2024
Author

thanks @hashhar for reply. I will have to investigate number of files etc. in s3. any other specific setting at trino level which you could share. I tried task and threads level settings but nothing helped much in reducing response time. I tried different node types and added more nodes but similar behaviour.

hashhar Dec 9, 2024
Collaborator

any other specific setting at trino level which you could share.

There are a LOT of configs which I can share and you can set but unless you first figure out WHY the query is slow no amount of setting random configs is going to help improve performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trino query performance #24360

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

trino query performance #24360

sukhlab Dec 4, 2024

Replies: 2 comments · 2 replies

sukhlab Dec 4, 2024 Author

hashhar Dec 6, 2024 Collaborator

sukhlab Dec 9, 2024 Author

hashhar Dec 9, 2024 Collaborator

sukhlab
Dec 4, 2024

Replies: 2 comments 2 replies

sukhlab
Dec 4, 2024
Author

hashhar
Dec 6, 2024
Collaborator

sukhlab Dec 9, 2024
Author

hashhar Dec 9, 2024
Collaborator