-
hello, I have query which joins multiple tables in data lake. Approx. rows to return are 16k I wanted to know if anyone has experience this kind of behaviour and how to make query time consistent. Also I have noticed that parallelism is not being used at best. configuration of coordinator and task nodes: coordinator configs: jvm: task node configs: jvm config: am I missing anything here? Also same sql query takes 2 seconds to return data from aws rds sql server ( 4vcpu, 32gb) and taking more than double with trino 3 worker nodes ( 4vcpu, 16gb) each. any help much appreciated! thanks |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Beta Was this translation helpful? Give feedback.
-
look at the scheduled time metric and physical input read time metric - that's basically the amount of time spent doing I/O. In your case it dominates the query time. Look at whether you have small files in AWS or if you have too many paritions (hence slow listing) and so on. also you've configured a lot of settings to non-default values (some being even lower than defaults). Remove those. Also a |
Beta Was this translation helpful? Give feedback.
look at the scheduled time metric and physical input read time metric - that's basically the amount of time spent doing I/O. In your case it dominates the query time. Look at whether you have small files in AWS or if you have too many paritions (hence slow listing) and so on.
also you've configured a lot of settings to non-default values (some being even lower than defaults). Remove those.
Also a
EXPLAIN
plan at the minimum (preferablyexplain analyze
) for the queries you think are running slow would be helpful. A query json from the web ui would be even better.