-
Notifications
You must be signed in to change notification settings - Fork 756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: table data cache for object storage #9772
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
6eb948b
to
4ab5881
Compare
b89d146
to
118c8fa
Compare
to avoid naming collision of `Setting` (with `common-settings::Settings`)
@lichuang need your help :) please take a look at the following methods, I am not sure if they are implemented correctly (or concisely) |
|
…om_index_caches -> enable_table_bloom_index_cache
rename _caches -> _cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, cache coming.
|
I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/
Summary
table raw data cache
which caches raw column(compressed) data of the data block. currently, only disk-based cache storage is supported.
by default, it is disabled, to enable it:
table_data_cache_enabled
to true in the query config file (or corresponding env var, command line arg)table_disk_cache_max_size
,table_disk_cache_root
metrics:
cache_table_data_access_count
,cache_table_data_hit_count
,cache_table_data_miss_count
note that even if
table_data_cache_enabled
is set to true, disk cache will NOT take effect if storage type is set tofs
, since caching block data of local fs in the local disk is ... usually not what we want.cache will NOT be populated during data ingestion.
table data in-memory cache (experiment feature)
which caches deserialized column objects of a data block.
by default, it is disabled, to enable it:
table_data_cache_in_memory_max_size
to some non-zero valueplease use it with caution, the deserialized column objects may take lots of memory. enable it only if query nodes have plenty of memory, and the working set can be fitted into it, and the data access pattern will benefit from caching.
non-backward compatible config change:
several configuration entries are obsoleted.
during databend-query starting up, if any obsoleted configuration entry is used (command-line opt, env, or toml config file), the related migration suggestions will be shown (and then quit), like this:
some implementation details
bring back @PsiACE 's
DiskCache
modTableDataCache
consist of a
LruDiskCache
and a cache population worker.get
operations,LruDiskCache
is used directly.put
operations,LruDiskCache
will be checked first (without accessing the disk), if cach missed, the items will be put into a bouned queue, or dropped if the queue is full. in a dedicated thread, the cache population worker takes items from a bounded queue, persists them to disk, and populates the cache.setting
table_data_cache_population_queue_size
controls the max capacity of bounded queue.metrics:
cache_table_data_population_pending_count
shows the number of items pending in the bounded queue.cache_table_data_population_overflow_count
shows the number of items that have been droppped.ColumnArrayCache
A Lru in-memory object cache, which caches
Box<dyn Arrar>
. ideally, cachingBlockEntry
is preferred, but that needs some further tweaks of theDataBlock
structure (not only taking owned BlockEntrys but also shared ownership of BlockEntrys).BlockReader::merge_io_read
integrated with TableDataCache and ColumnArrayCache
Performance evaluation
ClickBench
1. table data cache enabled vs main branch default setting
Databend (pr, in_memory_data_cache on)
this pr, in-memory data cache is enabled, disk-based table data cache disabled
metrics:
cache_table_data_column_array_hit_count 134241
cache_table_data_column_array_access_count 389649
cache_table_data_column_array_miss_count 255408
hits rate: 134241 / 389649 = 34%
memory:
after run.sh ended, top shows that the process RES is 18.9g
note that although
table_data_cache_in_memory_max_size = 5368709120
(5G) is used in this scenario, the cached object is currently measured by the uncompressed bytes size of the column.Databend (pr, disk_data_cache on)
this pr, disk-based table data cache enabled (table_disk_cache_max_size set to 20Gb, population queue size set to 65535), in-memory data cache is disabled.
metrics:
cache_table_data_hit_count 310269
cache_table_data_access_count 389649
cache_table_data_miss_count 79380
hits rate: 310269 / 389649 = 79.62%
note that, this ec2 machine's memory is large enough to buffer all the cache files. Although
run.sh
will drop os caches for the first run, the subsequent 2 runs are likely to read from the os pagecache without hitting the disk. (with exceptions that some query's execution involves non-deterministic partition accesses).Databend (main, 1st round)
main branch(commit 9078cfd), default config
2. disable table cache vs main branch
main branch, default config, 3 rounds
both disk and in-memory cache is disabled (default setting of this pr), 3 rounds
it shows that, if table data cache disabled, the performance is on par with main branch
the raw data:
pr.html.gz
misc
to be improved
BlockEntry
is preferred, and they should be measure by the "heap size" of object, not the uncompressed byte size** cluster mode"
eval the cache performance in cluster mod, verify that, with
PartitionsShuffleKind::Mod
, distribution of table data cache is nearly balanced amount the nodes of cluster.crc checksum, sip2-4 128 hash as the cache key, two-level cache file dir layout, etc
by default, only enable for async reading (since sync reading almost indicates we are working on local fs storage)
a bounded queue of pending cache items, with a dedicated thread that writes cache to disk
dyn arry
objectideally,
BlockEntry
should be cached, but it seems that DataBlock or BlockEntry needs further tweaks.some initial perf has been done, to be detailed
Closes #issue