Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve query performance for ClickHouse #133

Merged
merged 1 commit into from
Sep 4, 2021

Conversation

ricoberger
Copy link
Member

Queries to get logs from ClickHouse over a large time range should now
be faster. For this we are only returning a maximum of round about 10000
documents from ClickHouse. For this we had to remove the stats enpoint
which returned the overall document count and the buckets data and added
it to the documents endpoint. With the help of the retrieved buckets we
are modifying the start time of a query, so that large queries should be
faster. More information about this new approach can be found in the
code as inline comment.

For example, queries to get all logs from the last 7 days, which took up
to 3 minutes, are taking 10 seconds now.

Note: We decided for the 10000 limit, because the default limit in
Kibana is 500 and so it should be large enough. In the future we can
also provide an option in the ClickHouse configuration or an additional
field in the Options component to increase this limit.

Queries to get logs from ClickHouse over a large time range should now
be faster. For this we are only returning a maximum of round about 10000
documents from ClickHouse. For this we had to remove the stats enpoint
which returned the overall document count and the buckets data and added
it to the documents endpoint. With the help of the retrieved buckets we
are modifying the start time of a query, so that large queries should be
faster. More information about this new approach can be found in the
code as inline comment.

For example, queries to get all logs from the last 7 days, which took up
to 3 minutes, are taking 10 seconds now.

Note: We decided for the 10000 limit, because the default limit in
Kibana is 500 and so it should be large enough. In the future we can
also provide an option in the ClickHouse configuration or an additional
field in the Options component to increase this limit.
@ricoberger ricoberger merged commit b6a5ff1 into main Sep 4, 2021
@ricoberger ricoberger deleted the improve-query-performance-for-clickhouse-logs branch September 4, 2021 11:54
ricoberger added a commit that referenced this pull request Sep 14, 2021
Instead of running the counts and buckets query before getting the logs
from ClickHouse we are now just running the buckets query. The count
query was removed, because we can also determine the count by building
the sum of the count in each bucket. This should improve the query
performance for some queries by up to 50%.

In #133 we introduced an setting, where we limit the amount of
documents, which are returned by the API. There we set a limit of 10000
documents. This setting can now be set by the user via the additional
options. If the user doesn't provide this option we set a default limit
of 1000.
ricoberger added a commit that referenced this pull request Sep 14, 2021
Instead of running the counts and buckets query before getting the logs
from ClickHouse we are now just running the buckets query. The count
query was removed, because we can also determine the count by building
the sum of the count in each bucket. This should improve the query
performance for some queries by up to 50%.

In #133 we introduced an setting, where we limit the amount of
documents, which are returned by the API. There we set a limit of 10000
documents. This setting can now be set by the user via the additional
options. If the user doesn't provide this option we set a default limit
of 1000.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant