Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Massive slowdown when simply including a (large) column in the result #7187

Open
lorenz opened this issue Oct 3, 2019 · 21 comments
Open

Massive slowdown when simply including a (large) column in the result #7187

lorenz opened this issue Oct 3, 2019 · 21 comments
Labels
performance st-hold We've paused the work on issue for some reason

Comments

@lorenz
Copy link

lorenz commented Oct 3, 2019

Describe the situation
Simply adding a column in the result (even if only used for a final lookup) slows the whole query down massively since it get read completely as far as I can tell.

Please note that I'm just evaluating ClickHouse and I might be doing something dumb when specifying the table layout or writing queries.

How to reproduce
Running on a single host with NVMe (read bandwith ~2GiB/s) with ClickHouse 19.15.2.2

Dataset is publicly available: https://blob.dolansoft.org/datasets/boyter-10m-repos/clickhouse-columns.tar (:warning: ~50GiB / 3.5B+ rows)

ATTACH TABLE repositories
(
    `id` UInt32 CODEC(DoubleDelta), 
    `source` LowCardinality(String) CODEC(LZ4), 
    `user` String CODEC(LZ4), 
    `name` String CODEC(LZ4)
)
ENGINE = MergeTree
ORDER BY id
SETTINGS index_granularity = 8192

ATTACH TABLE files
(
    `repository_id` UInt32 CODEC(DoubleDelta), 
    `location` String CODEC(ZSTD(1)), 
    `language` LowCardinality(String), 
    `possible_languages` Array(LowCardinality(String)), 
    `bytes` UInt32 CODEC(T64, LZ4), 
    `lines` UInt32 CODEC(T64, LZ4), 
    `code` UInt32 CODEC(T64, LZ4), 
    `comment` UInt32 CODEC(T64, LZ4), 
    `blank` UInt32 CODEC(T64, LZ4), 
    `complexity` UInt32 CODEC(T64, LZ4), 
    `weighted_complexity` UInt32 CODEC(T64, LZ4)
)
ENGINE = MergeTree
ORDER BY repository_id
SETTINGS index_granularity = 8192

Query A (fast):

SELECT 
    repository_id, 
    lines
FROM files
WHERE language = 'Plain Text'
ORDER BY lines DESC
LIMIT 1

┌─repository_id─┬─────lines─┐
│       3919430 │ 347671811 │
└───────────────┴───────────┘

1 rows in set. Elapsed: 3.587 sec. Processed 3.53 billion rows, 4.82 GB (983.87 million rows/s., 1.34 GB/s.) 

Query B (slow):

SELECT 
    repository_id, 
    location, 
    lines
FROM files
WHERE language = 'Plain Text'
ORDER BY lines DESC
LIMIT 1

┌─repository_id─┬─location────────────────┬─────lines─┐
│       3919430 │ data/1366100696temp.txt │ 347671811 │
└───────────────┴─────────────────────────┴───────────┘

1 rows in set. Elapsed: 25.234 sec. Processed 3.53 billion rows, 13.53 GB (139.87 million rows/s., 536.21 MB/s.) 

Expected performance
I would expect both of these queries to take approximately the same time since ClickHouse can ignore the location column until it has found the single match and then just read that chunk from the (much heavier) location column. Instead it looks like it tries to read the whole location column and slows down the query around 8 times. I've also tried with argMax() instead of order by x limit 1 but it seems to suffer from the same issue.

Originally I also had a join to repositories in there, but that did not change the performance of either query so I've removed it in the interest of a more minimal reproducer.

@den-crane
Copy link
Contributor

try to replace where with prewhere

SELECT 
    repository_id, 
    location, 
    lines
FROM files
PREWHERE language = 'Plain Text'
ORDER BY lines DESC
LIMIT 1

@lorenz
Copy link
Author

lorenz commented Oct 3, 2019

It makes the first query that was already fast around 30% faster, but doesn't do anything for the already slow one.

@den-crane
Copy link
Contributor

can you show the output of slow query with debug info

set send_logs_level='trace';
...SELECT ...

@lorenz
Copy link
Author

lorenz commented Oct 3, 2019

Sure:

[green] 2019.10.03 23:37:53.986197 {cd510939-ded6-48f7-9f22-8198d3a6193c} [ 70 ] <Debug> executeQuery: (from 127.0.0.1:37212) SELECT repository_id, location, lines FROM files PREWHERE language = 'Plain Text' ORDER BY lines DESC LIMIT 1
↑ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) [green] 2019.10.03 23:37:53.986966 {cd510939-ded6-48f7-9f22-8198d3a6193c} [ 70 ] <Debug> default.files (SelectExecutor): Key condition: unknown
[green] 2019.10.03 23:37:53.986979 {cd510939-ded6-48f7-9f22-8198d3a6193c} [ 70 ] <Debug> default.files (SelectExecutor): Selected 1 parts by date, 1 parts by key, 430850 marks to read from 1 ranges
[green] 2019.10.03 23:37:53.987026 {cd510939-ded6-48f7-9f22-8198d3a6193c} [ 70 ] <Trace> default.files (SelectExecutor): Reading approx. 3529516251 rows with 24 streams
[green] 2019.10.03 23:37:53.987452 {cd510939-ded6-48f7-9f22-8198d3a6193c} [ 70 ] <Trace> InterpreterSelectQuery: FetchColumns -> Complete
[green] 2019.10.03 23:37:53.987947 {cd510939-ded6-48f7-9f22-8198d3a6193c} [ 70 ] <Debug> executeQuery: Query pipeline:
Limit
 Expression
  MergeSorting
   Union
    PartialSorting × 24
     Expression
      MergeTreeThread

┌─repository_id─┬─location────────────────┬─────lines─┐
│       3919430 │ data/1366100696temp.txt │ 347671811 │
└───────────────┴─────────────────────────┴───────────┘
[green] 2019.10.03 23:38:20.049494 {cd510939-ded6-48f7-9f22-8198d3a6193c} [ 70 ] <Trace> UnionBlockInputStream: Waiting for threads to finish
[green] 2019.10.03 23:38:20.049533 {cd510939-ded6-48f7-9f22-8198d3a6193c} [ 70 ] <Trace> UnionBlockInputStream: Waited for threads to finish
[green] 2019.10.03 23:38:20.049620 {cd510939-ded6-48f7-9f22-8198d3a6193c} [ 70 ] <Information> executeQuery: Read 3529516251 rows, 12.60 GiB in 26.063 sec., 135421628 rows/sec., 495.10 MiB/sec.
[green] 2019.10.03 23:38:20.049634 {cd510939-ded6-48f7-9f22-8198d3a6193c} [ 70 ] <Debug> MemoryTracker: Peak memory usage (for query): 375.26 MiB.

1 rows in set. Elapsed: 26.065 sec. Processed 3.53 billion rows, 13.53 GB (135.41 million rows/s., 519.11 MB/s.) 

@den-crane
Copy link
Contributor

repository_id UInt32 CODEC(DoubleDelta),

This is probably should by CODEC(DoubleDelta, LZ4))
Because DoubleDelta does not do compression.

It seems the reason is slow decompression of ZSTD(1). And CH decompresses repository_id,location,lines before order by limit

Try this for a test (to check my guess) :

SELECT 
    repository_id, 
    location, 
    lines
FROM files
PREWHERE repository_id in (
           SELECT repository_id FROM files
           PREWHERE language = 'Plain Text'
           ORDER BY lines DESC
           LIMIT 1
           )
ORDER BY lines DESC
LIMIT 1

@lorenz
Copy link
Author

lorenz commented Oct 4, 2019

Thanks! Your query is even a bit faster than my first one, but I have to add the language constraint to the outer query otherwise I get wrong results sometimes.
As for the compounding factors:

  • I'm pretty sure that DoubleDelta does in fact compress the data (446MiB column size for 3.5B rows -> 0.1 byte per row), but LZ4 does actually improve on its compression rate. I might switch this over to a (Delta, LZ4) anyways, but it's still a very small column so it doesn't matter much.
  • Also I don't think that ZSTD is contributing to slowness since the CPU is not fully loaded, but the disk only delivers a bit less than 2GiB/s and is at 100% utilization and the column is ~30GiB compressed and ~175GiB uncompressed.

The only question that remains: Could ClickHouse do better on this query or is there something stopping it from doing something similar itself? Obviously it can't do exactly what you've done since this optimization depends on noticing that the number of files per repository is very small in general, but it could for example lazily load the column or at least not read all chunks from disk when it will not use 99.9% of them anyways. If this doesn't fit the ClickHouse devteam's agenda I think we can close this, otherwise let's leave it open as a tracking issue for this feature.

@filimonov
Copy link
Contributor

filimonov commented Oct 7, 2019

I'm pretty sure that DoubleDelta does in fact compress the data

It is.

I might switch this over to a (Delta, LZ4) anyways, but it's still a very small column so it doesn't matter much.

For now i would you recommend to do that (you can also consider T64) - as DoubleDelta is extremely slow now. :\

That problem will be addressed in further releases. #7082

Let's use #7082 to continue tracking that issue.

@den-crane
Copy link
Contributor

@filimonov the issue is not related to CODECs.

create table Xp(id Int64, a UInt8, b String) Engine = MergeTree order by id
settings index_granularity=1;

insert into Xp select number,
        number%103, 
       toString(arrayMap(x->sipHash64(number,x), range(50)))
  from numbers(10000000);

set max_threads=1;

select id, a, b from Xp order by a limit 1;
Elapsed: 18.953 sec. 

select id, a, b from Xp where id = (select id from Xp order by a limit 1);
Elapsed: 3.014 sec.

@filimonov filimonov reopened this Oct 7, 2019
@filimonov
Copy link
Contributor

Ok, reopened. but is it still the same isuue as in original issue description?

@den-crane
Copy link
Contributor

Yes, the same issue. Sort of feature request for performance improvement.

@filimonov
Copy link
Contributor

filimonov commented Oct 7, 2019

It look like quite easy to workaround with @den-crane proposed option, or (even better speed) with argMin, i.e. something like

SELECT  id, a, b FROM Xp
WHERE id IN ( SELECT argMin(id, a) FROM Xp )
ORDER BY a ASC
LIMIT 1

But will such optimisation be good "in general"?
For example - imagine that id extracted by subquery can much (let's say) 80% of rows in original table, or will return NULL?

select id, a, b from Xp where id = (select id from Xp order by a limit 1);

@lorenz
Copy link
Author

lorenz commented Oct 7, 2019

It's not about that specific optimization (as I've said above) but about ClickHouse reading whole columns when it only needs a tiny amount of values. The workaround is application-specific but the solution would actually be generalizable.

@alexey-milovidov
Copy link
Member

alexey-milovidov commented Jan 25, 2020

@lorenz It is wonderful that we have this dataset publicly available!
Maybe we can use it in our performance tests and examples...

Is it Ok if we will copy and redistribute this dataset from our servers?

@alexey-milovidov
Copy link
Member

We have similar optimization in mind: #5329

We will implement this optimization but it will be disabled by default, because we don't use the same table snapshot to process subqueries.

@alexey-milovidov alexey-milovidov added the st-hold We've paused the work on issue for some reason label Jan 26, 2020
@alexey-milovidov
Copy link
Member

@lorenz Could you please describe this dataset?
And why I don't see ClickHouse inside?

@lorenz
Copy link
Author

lorenz commented Jan 26, 2020

@alexey-milovidov The dataset is for directly attaching to ClickHouse. You should be able to extract it to /var/lib/clickhouse/data/$some_db/ and then just use the attach commands in the first post.

The data was not collected by me, it comes from here. I've just asked the author for the raw data and then imported that into ClickHouse for testing and distribution (original dataset ist huge, around 1TiB since it is all JSON). From my side feel free to do whatever you want with the data. The cluster which hosts the data has a few Gbps of unmetered bandwitdth, you don't need to be concerned about bandwidth on my side. I also host the raw data at https://blob.dolansoft.org/datasets/boyter-10m-repos/raw.tar.gz

EDIT: If you want to double-check some analyses against the original blog post, be aware that some of them are wrong because the author's code has floating point accuracy issues. I've reported these, but he didn't update the post.

@alexey-milovidov
Copy link
Member

Yes, I have loaded the data but it shows strange numbers.
And the dataset doesn't contain our repository probably because it is from 2015.

@hanleijun
Copy link

Describe the situation
Simply adding a column in the result (even if only used for a final lookup) slows the whole query down massively since it get read completely as far as I can tell.

Please note that I'm just evaluating ClickHouse and I might be doing something dumb when specifying the table layout or writing queries.

How to reproduce
Running on a single host with NVMe (read bandwith ~2GiB/s) with ClickHouse 19.15.2.2

Dataset is publicly available: https://blob.dolansoft.org/datasets/boyter-10m-repos/clickhouse-columns.tar (⚠️ ~50GiB / 3.5B+ rows)

ATTACH TABLE repositories
(
    `id` UInt32 CODEC(DoubleDelta), 
    `source` LowCardinality(String) CODEC(LZ4), 
    `user` String CODEC(LZ4), 
    `name` String CODEC(LZ4)
)
ENGINE = MergeTree
ORDER BY id
SETTINGS index_granularity = 8192

ATTACH TABLE files
(
    `repository_id` UInt32 CODEC(DoubleDelta), 
    `location` String CODEC(ZSTD(1)), 
    `language` LowCardinality(String), 
    `possible_languages` Array(LowCardinality(String)), 
    `bytes` UInt32 CODEC(T64, LZ4), 
    `lines` UInt32 CODEC(T64, LZ4), 
    `code` UInt32 CODEC(T64, LZ4), 
    `comment` UInt32 CODEC(T64, LZ4), 
    `blank` UInt32 CODEC(T64, LZ4), 
    `complexity` UInt32 CODEC(T64, LZ4), 
    `weighted_complexity` UInt32 CODEC(T64, LZ4)
)
ENGINE = MergeTree
ORDER BY repository_id
SETTINGS index_granularity = 8192

Query A (fast):

SELECT 
    repository_id, 
    lines
FROM files
WHERE language = 'Plain Text'
ORDER BY lines DESC
LIMIT 1

┌─repository_id─┬─────lines─┐
│       3919430 │ 347671811 │
└───────────────┴───────────┘

1 rows in set. Elapsed: 3.587 sec. Processed 3.53 billion rows, 4.82 GB (983.87 million rows/s., 1.34 GB/s.) 

Query B (slow):

SELECT 
    repository_id, 
    location, 
    lines
FROM files
WHERE language = 'Plain Text'
ORDER BY lines DESC
LIMIT 1

┌─repository_id─┬─location────────────────┬─────lines─┐
│       3919430 │ data/1366100696temp.txt │ 347671811 │
└───────────────┴─────────────────────────┴───────────┘

1 rows in set. Elapsed: 25.234 sec. Processed 3.53 billion rows, 13.53 GB (139.87 million rows/s., 536.21 MB/s.) 

Expected performance
I would expect both of these queries to take approximately the same time since ClickHouse can ignore the location column until it has found the single match and then just read that chunk from the (much heavier) location column. Instead it looks like it tries to read the whole location column and slows down the query around 8 times. I've also tried with argMax() instead of order by x limit 1 but it seems to suffer from the same issue.

Originally I also had a join to repositories in there, but that did not change the performance of either query so I've removed it in the interest of a more minimal reproducer.

we encountered the same isssue, changing WHERE to PREWHERE worked.

et added a commit to highlight/highlight that referenced this issue Mar 24, 2023
## Summary

<!--
Ideally, there is an attached Linear ticket that will describe the
"why".

If relevant, use this section to call out any additional information
you'd like to _highlight_ to the reviewer.
-->

This PR lowers the number of logs loaded from 100 -> 50. There's a
couple reasons for this:

* Loading large amounts of data (which we see with `LogAttributes`) can
cause clickhouse to slow down significantly (see
ClickHouse/ClickHouse#7187)
* If there is very few matching results (e.g. find me logs with a log
body of "STRIPE_INTEGRATION_ERROR"), we want to terminate as early as
possible
* Loading less data results in a smaller payload size.
* Realistically, only about ~20-25 logs fix on the the screen (dependent
on screen size) given our UX. This reason is less important as the
previous points but mostly mentioning that this change shouldn't break
our UX.

## How did you test this change?

<!--
Frontend - Leave a screencast or a screenshot to visually describe the
changes.
-->

Verified that logs still load and pagination works as expected.

## Are there any deployment considerations?

<!--
 Backend - Do we need to consider migrations or backfilling data?
-->

N/A
@rokukapsch
Copy link

Hi!
We encontered now the same issue when updating from 22.8 LTS (22.8.16.32) to 23.3 LTS (23.3.2.37).
Adding PREWHERE solved the issue, but it tooks us rather long to identify the issue.
Was there some feature reverted, or why was this working properly in 22.8?
Thanks and BR, Robert

@cpiotr
Copy link

cpiotr commented Aug 24, 2023

Hi,
I suspect we're hitting a similar (same?) problem in our use case.

We have a table with a set of small columns and one column containing binary blobs:

CREATE TABLE TypedData
(
    `id` Int64,
    `parentId` Int64,
    `subParentId` Int64,
    `version` Int64,
    `type` LowCardinality(String),
    `modified` Int64,
    `created` Int64,
    `binaryData` String,
    `toBeDeleted` Int8
)
ENGINE = ReplacingMergeTree(version)
ORDER BY (parentId, subParentId, id)
SETTINGS index_granularity = 8192

When a query is executed to select the binary blob, it takes a long time even if it returns no results:

SELECT binaryData
FROM TypedData FINAL
WHERE (
    modified >= 1692873657000000000
    AND (
        parentId IN (1,2,3,4,5,100,1000,456,567))
        and (type is not null and type != 'SampleType1')
    )
    and toBeDeleted = 0
)

Ok.
0 rows in set. Elapsed: 1.904 sec. Processed 7.24 million rows, 7.01 GB (3.80 million rows/s., 3.68 GB/s.)

It takes significantly longer than when selecting one of the smaller fields:

SELECT id
FROM TypedData FINAL
WHERE (
    modified >= 1692873657000000000
    AND (
        parentId IN (1,2,3,4,5,100,1000,456,567))
        and (type is not null and type != 'SampleType1')
    )
    and toBeDeleted = 0
)

Ok.
0 rows in set. Elapsed: 0.207 sec. Processed 7.24 million rows, 304.22 MB (34.93 million rows/s., 1.47 GB/s.)

Since we're relaying on FINAL state of records, we cannot use PREWHERE.

Are there any plans to address this issue? I can see that it is currently on hold.

@den-crane
Copy link
Contributor

@cpiotr your issue will addressed due #45868

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance st-hold We've paused the work on issue for some reason
Projects
None yet
Development

No branches or pull requests

7 participants