Massive slowdown when simply including a (large) column in the result #7187

lorenz · 2019-10-03T22:52:32Z

Describe the situation
Simply adding a column in the result (even if only used for a final lookup) slows the whole query down massively since it get read completely as far as I can tell.

Please note that I'm just evaluating ClickHouse and I might be doing something dumb when specifying the table layout or writing queries.

How to reproduce
Running on a single host with NVMe (read bandwith ~2GiB/s) with ClickHouse 19.15.2.2

Dataset is publicly available: https://blob.dolansoft.org/datasets/boyter-10m-repos/clickhouse-columns.tar (:warning: ~50GiB / 3.5B+ rows)

ATTACH TABLE repositories
(
    `id` UInt32 CODEC(DoubleDelta), 
    `source` LowCardinality(String) CODEC(LZ4), 
    `user` String CODEC(LZ4), 
    `name` String CODEC(LZ4)
)
ENGINE = MergeTree
ORDER BY id
SETTINGS index_granularity = 8192

ATTACH TABLE files
(
    `repository_id` UInt32 CODEC(DoubleDelta), 
    `location` String CODEC(ZSTD(1)), 
    `language` LowCardinality(String), 
    `possible_languages` Array(LowCardinality(String)), 
    `bytes` UInt32 CODEC(T64, LZ4), 
    `lines` UInt32 CODEC(T64, LZ4), 
    `code` UInt32 CODEC(T64, LZ4), 
    `comment` UInt32 CODEC(T64, LZ4), 
    `blank` UInt32 CODEC(T64, LZ4), 
    `complexity` UInt32 CODEC(T64, LZ4), 
    `weighted_complexity` UInt32 CODEC(T64, LZ4)
)
ENGINE = MergeTree
ORDER BY repository_id
SETTINGS index_granularity = 8192

Query A (fast):

SELECT 
    repository_id, 
    lines
FROM files
WHERE language = 'Plain Text'
ORDER BY lines DESC
LIMIT 1

┌─repository_id─┬─────lines─┐
│       3919430 │ 347671811 │
└───────────────┴───────────┘

1 rows in set. Elapsed: 3.587 sec. Processed 3.53 billion rows, 4.82 GB (983.87 million rows/s., 1.34 GB/s.)

Query B (slow):

SELECT 
    repository_id, 
    location, 
    lines
FROM files
WHERE language = 'Plain Text'
ORDER BY lines DESC
LIMIT 1

┌─repository_id─┬─location────────────────┬─────lines─┐
│       3919430 │ data/1366100696temp.txt │ 347671811 │
└───────────────┴─────────────────────────┴───────────┘

1 rows in set. Elapsed: 25.234 sec. Processed 3.53 billion rows, 13.53 GB (139.87 million rows/s., 536.21 MB/s.)

Expected performance
I would expect both of these queries to take approximately the same time since ClickHouse can ignore the location column until it has found the single match and then just read that chunk from the (much heavier) location column. Instead it looks like it tries to read the whole location column and slows down the query around 8 times. I've also tried with argMax() instead of order by x limit 1 but it seems to suffer from the same issue.

Originally I also had a join to repositories in there, but that did not change the performance of either query so I've removed it in the interest of a more minimal reproducer.

The text was updated successfully, but these errors were encountered:

den-crane · 2019-10-03T23:14:32Z

try to replace where with prewhere

SELECT 
    repository_id, 
    location, 
    lines
FROM files
PREWHERE language = 'Plain Text'
ORDER BY lines DESC
LIMIT 1

lorenz · 2019-10-03T23:19:49Z

It makes the first query that was already fast around 30% faster, but doesn't do anything for the already slow one.

den-crane · 2019-10-03T23:25:35Z

can you show the output of slow query with debug info

set send_logs_level='trace';
...SELECT ...

lorenz · 2019-10-03T23:39:06Z

Sure:

[green] 2019.10.03 23:37:53.986197 {cd510939-ded6-48f7-9f22-8198d3a6193c} [ 70 ] <Debug> executeQuery: (from 127.0.0.1:37212) SELECT repository_id, location, lines FROM files PREWHERE language = 'Plain Text' ORDER BY lines DESC LIMIT 1
↑ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) [green] 2019.10.03 23:37:53.986966 {cd510939-ded6-48f7-9f22-8198d3a6193c} [ 70 ] <Debug> default.files (SelectExecutor): Key condition: unknown
[green] 2019.10.03 23:37:53.986979 {cd510939-ded6-48f7-9f22-8198d3a6193c} [ 70 ] <Debug> default.files (SelectExecutor): Selected 1 parts by date, 1 parts by key, 430850 marks to read from 1 ranges
[green] 2019.10.03 23:37:53.987026 {cd510939-ded6-48f7-9f22-8198d3a6193c} [ 70 ] <Trace> default.files (SelectExecutor): Reading approx. 3529516251 rows with 24 streams
[green] 2019.10.03 23:37:53.987452 {cd510939-ded6-48f7-9f22-8198d3a6193c} [ 70 ] <Trace> InterpreterSelectQuery: FetchColumns -> Complete
[green] 2019.10.03 23:37:53.987947 {cd510939-ded6-48f7-9f22-8198d3a6193c} [ 70 ] <Debug> executeQuery: Query pipeline:
Limit
 Expression
  MergeSorting
   Union
    PartialSorting × 24
     Expression
      MergeTreeThread

┌─repository_id─┬─location────────────────┬─────lines─┐
│       3919430 │ data/1366100696temp.txt │ 347671811 │
└───────────────┴─────────────────────────┴───────────┘
[green] 2019.10.03 23:38:20.049494 {cd510939-ded6-48f7-9f22-8198d3a6193c} [ 70 ] <Trace> UnionBlockInputStream: Waiting for threads to finish
[green] 2019.10.03 23:38:20.049533 {cd510939-ded6-48f7-9f22-8198d3a6193c} [ 70 ] <Trace> UnionBlockInputStream: Waited for threads to finish
[green] 2019.10.03 23:38:20.049620 {cd510939-ded6-48f7-9f22-8198d3a6193c} [ 70 ] <Information> executeQuery: Read 3529516251 rows, 12.60 GiB in 26.063 sec., 135421628 rows/sec., 495.10 MiB/sec.
[green] 2019.10.03 23:38:20.049634 {cd510939-ded6-48f7-9f22-8198d3a6193c} [ 70 ] <Debug> MemoryTracker: Peak memory usage (for query): 375.26 MiB.

1 rows in set. Elapsed: 26.065 sec. Processed 3.53 billion rows, 13.53 GB (135.41 million rows/s., 519.11 MB/s.)

den-crane · 2019-10-04T00:01:44Z

repository_id UInt32 CODEC(DoubleDelta),

This is probably should by CODEC(DoubleDelta, LZ4))
Because DoubleDelta does not do compression.

It seems the reason is slow decompression of ZSTD(1). And CH decompresses repository_id,location,lines before order by limit

Try this for a test (to check my guess) :

SELECT 
    repository_id, 
    location, 
    lines
FROM files
PREWHERE repository_id in (
           SELECT repository_id FROM files
           PREWHERE language = 'Plain Text'
           ORDER BY lines DESC
           LIMIT 1
           )
ORDER BY lines DESC
LIMIT 1

lorenz · 2019-10-04T00:47:46Z

Thanks! Your query is even a bit faster than my first one, but I have to add the language constraint to the outer query otherwise I get wrong results sometimes.
As for the compounding factors:

I'm pretty sure that DoubleDelta does in fact compress the data (446MiB column size for 3.5B rows -> 0.1 byte per row), but LZ4 does actually improve on its compression rate. I might switch this over to a (Delta, LZ4) anyways, but it's still a very small column so it doesn't matter much.
Also I don't think that ZSTD is contributing to slowness since the CPU is not fully loaded, but the disk only delivers a bit less than 2GiB/s and is at 100% utilization and the column is ~30GiB compressed and ~175GiB uncompressed.

The only question that remains: Could ClickHouse do better on this query or is there something stopping it from doing something similar itself? Obviously it can't do exactly what you've done since this optimization depends on noticing that the number of files per repository is very small in general, but it could for example lazily load the column or at least not read all chunks from disk when it will not use 99.9% of them anyways. If this doesn't fit the ClickHouse devteam's agenda I think we can close this, otherwise let's leave it open as a tracking issue for this feature.

filimonov · 2019-10-07T09:45:22Z

I'm pretty sure that DoubleDelta does in fact compress the data

It is.

I might switch this over to a (Delta, LZ4) anyways, but it's still a very small column so it doesn't matter much.

For now i would you recommend to do that (you can also consider T64) - as DoubleDelta is extremely slow now. :\

That problem will be addressed in further releases. #7082

Let's use #7082 to continue tracking that issue.

den-crane · 2019-10-07T14:46:48Z

@filimonov the issue is not related to CODECs.

create table Xp(id Int64, a UInt8, b String) Engine = MergeTree order by id
settings index_granularity=1;

insert into Xp select number,
        number%103, 
       toString(arrayMap(x->sipHash64(number,x), range(50)))
  from numbers(10000000);

set max_threads=1;

select id, a, b from Xp order by a limit 1;
Elapsed: 18.953 sec. 

select id, a, b from Xp where id = (select id from Xp order by a limit 1);
Elapsed: 3.014 sec.

filimonov · 2019-10-07T17:56:57Z

Ok, reopened. but is it still the same isuue as in original issue description?

den-crane · 2019-10-07T18:21:48Z

Yes, the same issue. Sort of feature request for performance improvement.

filimonov · 2019-10-07T22:30:35Z

It look like quite easy to workaround with @den-crane proposed option, or (even better speed) with argMin, i.e. something like

SELECT  id, a, b FROM Xp
WHERE id IN ( SELECT argMin(id, a) FROM Xp )
ORDER BY a ASC
LIMIT 1

But will such optimisation be good "in general"?
For example - imagine that id extracted by subquery can much (let's say) 80% of rows in original table, or will return NULL?

select id, a, b from Xp where id = (select id from Xp order by a limit 1);

lorenz · 2019-10-07T23:02:44Z

It's not about that specific optimization (as I've said above) but about ClickHouse reading whole columns when it only needs a tiny amount of values. The workaround is application-specific but the solution would actually be generalizable.

alexey-milovidov · 2020-01-25T23:50:40Z

@lorenz It is wonderful that we have this dataset publicly available!
Maybe we can use it in our performance tests and examples...

Is it Ok if we will copy and redistribute this dataset from our servers?

alexey-milovidov · 2020-01-25T23:57:45Z

We have similar optimization in mind: #5329

We will implement this optimization but it will be disabled by default, because we don't use the same table snapshot to process subqueries.

alexey-milovidov · 2020-01-26T01:48:39Z

@lorenz Could you please describe this dataset?
And why I don't see ClickHouse inside?

lorenz · 2020-01-26T02:48:14Z

@alexey-milovidov The dataset is for directly attaching to ClickHouse. You should be able to extract it to /var/lib/clickhouse/data/$some_db/ and then just use the attach commands in the first post.

The data was not collected by me, it comes from here. I've just asked the author for the raw data and then imported that into ClickHouse for testing and distribution (original dataset ist huge, around 1TiB since it is all JSON). From my side feel free to do whatever you want with the data. The cluster which hosts the data has a few Gbps of unmetered bandwitdth, you don't need to be concerned about bandwidth on my side. I also host the raw data at https://blob.dolansoft.org/datasets/boyter-10m-repos/raw.tar.gz

EDIT: If you want to double-check some analyses against the original blog post, be aware that some of them are wrong because the author's code has floating point accuracy issues. I've reported these, but he didn't update the post.

alexey-milovidov · 2020-01-26T21:12:18Z

Yes, I have loaded the data but it shows strange numbers.
And the dataset doesn't contain our repository probably because it is from 2015.

hanleijun · 2021-06-03T10:57:13Z

Describe the situation
Simply adding a column in the result (even if only used for a final lookup) slows the whole query down massively since it get read completely as far as I can tell.

Please note that I'm just evaluating ClickHouse and I might be doing something dumb when specifying the table layout or writing queries.

How to reproduce
Running on a single host with NVMe (read bandwith ~2GiB/s) with ClickHouse 19.15.2.2

Dataset is publicly available: https://blob.dolansoft.org/datasets/boyter-10m-repos/clickhouse-columns.tar (⚠️ ~50GiB / 3.5B+ rows)
ATTACH TABLE repositories
(
    `id` UInt32 CODEC(DoubleDelta), 
    `source` LowCardinality(String) CODEC(LZ4), 
    `user` String CODEC(LZ4), 
    `name` String CODEC(LZ4)
)
ENGINE = MergeTree
ORDER BY id
SETTINGS index_granularity = 8192

ATTACH TABLE files
(
    `repository_id` UInt32 CODEC(DoubleDelta), 
    `location` String CODEC(ZSTD(1)), 
    `language` LowCardinality(String), 
    `possible_languages` Array(LowCardinality(String)), 
    `bytes` UInt32 CODEC(T64, LZ4), 
    `lines` UInt32 CODEC(T64, LZ4), 
    `code` UInt32 CODEC(T64, LZ4), 
    `comment` UInt32 CODEC(T64, LZ4), 
    `blank` UInt32 CODEC(T64, LZ4), 
    `complexity` UInt32 CODEC(T64, LZ4), 
    `weighted_complexity` UInt32 CODEC(T64, LZ4)
)
ENGINE = MergeTree
ORDER BY repository_id
SETTINGS index_granularity = 8192
Query A (fast):
SELECT 
    repository_id, 
    lines
FROM files
WHERE language = 'Plain Text'
ORDER BY lines DESC
LIMIT 1

┌─repository_id─┬─────lines─┐
│       3919430 │ 347671811 │
└───────────────┴───────────┘

1 rows in set. Elapsed: 3.587 sec. Processed 3.53 billion rows, 4.82 GB (983.87 million rows/s., 1.34 GB/s.) 
Query B (slow):
SELECT 
    repository_id, 
    location, 
    lines
FROM files
WHERE language = 'Plain Text'
ORDER BY lines DESC
LIMIT 1

┌─repository_id─┬─location────────────────┬─────lines─┐
│       3919430 │ data/1366100696temp.txt │ 347671811 │
└───────────────┴─────────────────────────┴───────────┘

1 rows in set. Elapsed: 25.234 sec. Processed 3.53 billion rows, 13.53 GB (139.87 million rows/s., 536.21 MB/s.) 
Expected performance
I would expect both of these queries to take approximately the same time since ClickHouse can ignore the location column until it has found the single match and then just read that chunk from the (much heavier) location column. Instead it looks like it tries to read the whole location column and slows down the query around 8 times. I've also tried with argMax() instead of order by x limit 1 but it seems to suffer from the same issue.

Originally I also had a join to repositories in there, but that did not change the performance of either query so I've removed it in the interest of a more minimal reproducer.

we encountered the same isssue, changing WHERE to PREWHERE worked.

## Summary  This PR lowers the number of logs loaded from 100 -> 50. There's a couple reasons for this: * Loading large amounts of data (which we see with `LogAttributes`) can cause clickhouse to slow down significantly (see ClickHouse/ClickHouse#7187) * If there is very few matching results (e.g. find me logs with a log body of "STRIPE_INTEGRATION_ERROR"), we want to terminate as early as possible * Loading less data results in a smaller payload size. * Realistically, only about ~20-25 logs fix on the the screen (dependent on screen size) given our UX. This reason is less important as the previous points but mostly mentioning that this change shouldn't break our UX. ## How did you test this change?  Verified that logs still load and pagination works as expected. ## Are there any deployment considerations?  N/A

rokukapsch · 2023-05-05T09:44:47Z

Hi!
We encontered now the same issue when updating from 22.8 LTS (22.8.16.32) to 23.3 LTS (23.3.2.37).
Adding PREWHERE solved the issue, but it tooks us rather long to identify the issue.
Was there some feature reverted, or why was this working properly in 22.8?
Thanks and BR, Robert

cpiotr · 2023-08-24T13:02:38Z

Hi,
I suspect we're hitting a similar (same?) problem in our use case.

We have a table with a set of small columns and one column containing binary blobs:

CREATE TABLE TypedData
(
    `id` Int64,
    `parentId` Int64,
    `subParentId` Int64,
    `version` Int64,
    `type` LowCardinality(String),
    `modified` Int64,
    `created` Int64,
    `binaryData` String,
    `toBeDeleted` Int8
)
ENGINE = ReplacingMergeTree(version)
ORDER BY (parentId, subParentId, id)
SETTINGS index_granularity = 8192

When a query is executed to select the binary blob, it takes a long time even if it returns no results:

SELECT binaryData
FROM TypedData FINAL
WHERE (
    modified >= 1692873657000000000
    AND (
        parentId IN (1,2,3,4,5,100,1000,456,567))
        and (type is not null and type != 'SampleType1')
    )
    and toBeDeleted = 0
)

Ok.
0 rows in set. Elapsed: 1.904 sec. Processed 7.24 million rows, 7.01 GB (3.80 million rows/s., 3.68 GB/s.)

It takes significantly longer than when selecting one of the smaller fields:

SELECT id
FROM TypedData FINAL
WHERE (
    modified >= 1692873657000000000
    AND (
        parentId IN (1,2,3,4,5,100,1000,456,567))
        and (type is not null and type != 'SampleType1')
    )
    and toBeDeleted = 0
)

Ok.
0 rows in set. Elapsed: 0.207 sec. Processed 7.24 million rows, 304.22 MB (34.93 million rows/s., 1.47 GB/s.)

Since we're relaying on FINAL state of records, we cannot use PREWHERE.

Are there any plans to address this issue? I can see that it is currently on hold.

den-crane · 2023-08-24T14:17:03Z

@cpiotr your issue will addressed due #45868

lorenz added the performance label Oct 3, 2019

filimonov closed this as completed Oct 7, 2019

filimonov reopened this Oct 7, 2019

alexey-milovidov added the st-hold We've paused the work on issue for some reason label Jan 26, 2020

shawel mentioned this issue Jul 10, 2020

excessive time on insert select on tables with large columns (~2000) even inserting a single row #12408

Closed

den-crane mentioned this issue Jul 25, 2020

Selecting large/wide column with ORDER BY + LIMIT #12757

Closed

et mentioned this issue Mar 23, 2023

lower log limit highlight/highlight#4676

Merged

et mentioned this issue Mar 28, 2023

use QueryRow select 1 row tips:clickhouse [Decode]: live_data row 0: read length: size 11015713 too big (10485760 is maximum ClickHouse/clickhouse-go#867

Closed

MaheshGPai mentioned this issue Sep 29, 2023

Clickhouse scanning huge amount of data while querying parquet files even in the latest build which has predicate pushdown #54977

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Massive slowdown when simply including a (large) column in the result #7187

Massive slowdown when simply including a (large) column in the result #7187

lorenz commented Oct 3, 2019

den-crane commented Oct 3, 2019

lorenz commented Oct 3, 2019

den-crane commented Oct 3, 2019

lorenz commented Oct 3, 2019

den-crane commented Oct 4, 2019

lorenz commented Oct 4, 2019

filimonov commented Oct 7, 2019 •

edited

Loading

den-crane commented Oct 7, 2019

filimonov commented Oct 7, 2019

den-crane commented Oct 7, 2019

filimonov commented Oct 7, 2019 •

edited

Loading

lorenz commented Oct 7, 2019

alexey-milovidov commented Jan 25, 2020 •

edited

Loading

alexey-milovidov commented Jan 25, 2020

alexey-milovidov commented Jan 26, 2020

lorenz commented Jan 26, 2020 •

edited

Loading

alexey-milovidov commented Jan 26, 2020

hanleijun commented Jun 3, 2021

rokukapsch commented May 5, 2023

cpiotr commented Aug 24, 2023

den-crane commented Aug 24, 2023

Massive slowdown when simply including a (large) column in the result #7187

Massive slowdown when simply including a (large) column in the result #7187

Comments

lorenz commented Oct 3, 2019

den-crane commented Oct 3, 2019

lorenz commented Oct 3, 2019

den-crane commented Oct 3, 2019

lorenz commented Oct 3, 2019

den-crane commented Oct 4, 2019

lorenz commented Oct 4, 2019

filimonov commented Oct 7, 2019 • edited Loading

den-crane commented Oct 7, 2019

filimonov commented Oct 7, 2019

den-crane commented Oct 7, 2019

filimonov commented Oct 7, 2019 • edited Loading

lorenz commented Oct 7, 2019

alexey-milovidov commented Jan 25, 2020 • edited Loading

alexey-milovidov commented Jan 25, 2020

alexey-milovidov commented Jan 26, 2020

lorenz commented Jan 26, 2020 • edited Loading

alexey-milovidov commented Jan 26, 2020

hanleijun commented Jun 3, 2021

rokukapsch commented May 5, 2023

cpiotr commented Aug 24, 2023

den-crane commented Aug 24, 2023

filimonov commented Oct 7, 2019 •

edited

Loading

filimonov commented Oct 7, 2019 •

edited

Loading

alexey-milovidov commented Jan 25, 2020 •

edited

Loading

lorenz commented Jan 26, 2020 •

edited

Loading