Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: table data cache for object storage #9772

Merged
merged 91 commits into from
Feb 16, 2023

Conversation

dantengsky
Copy link
Member

@dantengsky dantengsky commented Jan 29, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

  • table raw data cache

    which caches raw column(compressed) data of the data block. currently, only disk-based cache storage is supported.

    by default, it is disabled, to enable it:

    • set table_data_cache_enabled to true in the query config file (or corresponding env var, command line arg)
    • adjust table_disk_cache_max_size , table_disk_cache_root

    metrics: cache_table_data_access_count, cache_table_data_hit_count, cache_table_data_miss_count

    note that even if table_data_cache_enabled is set to true, disk cache will NOT take effect if storage type is set to fs, since caching block data of local fs in the local disk is ... usually not what we want.

    cache will NOT be populated during data ingestion.

  • table data in-memory cache (experiment feature)

    which caches deserialized column objects of a data block.

    by default, it is disabled, to enable it:

    • set table_data_cache_in_memory_max_size to some non-zero value

    please use it with caution, the deserialized column objects may take lots of memory. enable it only if query nodes have plenty of memory, and the working set can be fitted into it, and the data access pattern will benefit from caching.

non-backward compatible config change:

several configuration entries are obsoleted.

during databend-query starting up, if any obsoleted configuration entry is used (command-line opt, env, or toml config file), the related migration suggestions will be shown (and then quit), like this:

--------------------------------------------------------------
 *** table-disk-cache-mb-size *** is obsoleted : 
 --------------------------------------------------------------
   alternative command-line options : cache-disk-max-bytes
   alternative environment variable : CACHE_DISK_MAX_BYTES
            alternative toml config : 
                    [cache]
                    ...
                    data_cache_storage = "disk"
                    ...
                    [cache.disk]
                    max_bytes = [MAX_BYTES]  
                    ...
                  
 --------------------------------------------------------------


 --------------------------------------------------------------
 *** table-meta-cache-enabled *** is obsoleted : 
 --------------------------------------------------------------
   alternative command-line options : cache-enable-table-meta-caches
   alternative environment variable : CACHE_ENABLE_TABLE_META_CACHE
            alternative toml config : 
                    [cache]
                    table-meta-cache-enabled=[true|false]
                  
 --------------------------------------------------------------


 --------------------------------------------------------------
 *** table-cache-block-meta-count *** is obsoleted : 
 --------------------------------------------------------------
   alternative command-line options : N/A
   alternative environment variable : N/A
            alternative toml config : N/A
 --------------------------------------------------------------


 --------------------------------------------------------------
 *** table-memory-cache-mb-size *** is obsoleted : 
 --------------------------------------------------------------
   alternative command-line options : N/A
   alternative environment variable : N/A
            alternative toml config : N/A
 --------------------------------------------------------------


 --------------------------------------------------------------
 *** table-disk-cache-root *** is obsoleted : 
 --------------------------------------------------------------
   alternative command-line options : cache-disk-path
   alternative environment variable : CACHE_DISK_PATH
            alternative toml config : 
                    [cache]
                    ...
                    data_cache_storage = "disk"
                    ...
                    [cache.disk]
                    max_bytes = [MAX_BYTES]  
                    path = [PATH]
                    ...
                    
 --------------------------------------------------------------


 --------------------------------------------------------------
 *** table-cache-snapshot-count *** is obsoleted : 
 --------------------------------------------------------------
   alternative command-line options : cache-table-meta-snapshot-count
   alternative environment variable : CACHE_TABLE_META_SNAPSHOT_COUNT
            alternative toml config : 
                    [cache]
                    table_meta_snapshot_count = [COUNT]
                    
 --------------------------------------------------------------


 --------------------------------------------------------------
 *** table-cache-statistic-count *** is obsoleted : 
 --------------------------------------------------------------
   alternative command-line options : cache-table-meta-statistic-count
   alternative environment variable : CACHE_TABLE_META_STATISTIC_COUNT
            alternative toml config : 
                    [cache]
                    table_meta_statistic_count = [COUNT]
                    
 --------------------------------------------------------------


 --------------------------------------------------------------
 *** table-cache-segment-count *** is obsoleted : 
 --------------------------------------------------------------
   alternative command-line options : cache-table-meta-segment-count
   alternative environment variable : CACHE_TABLE_META_SEGMENT_COUNT
            alternative toml config : 
                    [cache]
                    table_meta_segment_count = [COUNT]
                    
 --------------------------------------------------------------


 --------------------------------------------------------------
 *** table-cache-bloom-index-meta-count *** is obsoleted : 
 --------------------------------------------------------------
   alternative command-line options : cache-table-bloom-index-meta-count
   alternative environment variable : CACHE_TABLE_BLOOM_INDEX_META_COUNT
            alternative toml config : 
                    [cache]
                    table_bloom_index_meta_count = [COUNT]
                    
 --------------------------------------------------------------


 --------------------------------------------------------------
 *** table-cache-bloom-index-filter-count *** is obsoleted : 
 --------------------------------------------------------------
   alternative command-line options : cache-table-bloom-index-filter-count
   alternative environment variable : CACHE_TABLE_BLOOM_INDEX_FILTER_COUNT
            alternative toml config : 
                    [cache]
                    table_bloom_index_filter_count = [COUNT]
                    
 --------------------------------------------------------------

some implementation details

  • bring back @PsiACE 's DiskCache mod

    • cache items are identified by the siphash (2-4, 128 bit) of cache key
    • cache files are prefixed with the path of the first 3 common chars, e.g.
    • crc32 checksum placed at the end of the file
  • TableDataCache
    consist of a LruDiskCache and a cache population worker.

    • while serving the get operations, LruDiskCache is used directly.
    • while serving the put operations, LruDiskCache will be checked first (without accessing the disk), if cach missed, the items will be put into a bouned queue, or dropped if the queue is full. in a dedicated thread, the cache population worker takes items from a bounded queue, persists them to disk, and populates the cache.

    setting table_data_cache_population_queue_size controls the max capacity of bounded queue.

    metrics:

    • cache_table_data_population_pending_count shows the number of items pending in the bounded queue.
    • cache_table_data_population_overflow_count shows the number of items that have been droppped.
  • ColumnArrayCache
    A Lru in-memory object cache, which caches Box<dyn Arrar>. ideally, caching BlockEntry is preferred, but that needs some further tweaks of the DataBlock structure (not only taking owned BlockEntrys but also shared ownership of BlockEntrys).

  • BlockReader::merge_io_read
    integrated with TableDataCache and ColumnArrayCache

Performance evaluation

ClickBench

1. table data cache enabled vs main branch default setting

  • Databend (pr, in_memory_data_cache on)
    this pr, in-memory data cache is enabled, disk-based table data cache disabled

    metrics:

    cache_table_data_column_array_hit_count 134241
    cache_table_data_column_array_access_count 389649
    cache_table_data_column_array_miss_count 255408

    hits rate: 134241 / 389649 = 34%

    memory:
    after run.sh ended, top shows that the process RES is 18.9g

    note that although table_data_cache_in_memory_max_size = 5368709120 (5G) is used in this scenario, the cached object is currently measured by the uncompressed bytes size of the column.

  • Databend (pr, disk_data_cache on)
    this pr, disk-based table data cache enabled (table_disk_cache_max_size set to 20Gb, population queue size set to 65535), in-memory data cache is disabled.

    metrics:

    cache_table_data_hit_count 310269
    cache_table_data_access_count 389649
    cache_table_data_miss_count 79380

    hits rate: 310269 / 389649 = 79.62%

    note that, this ec2 machine's memory is large enough to buffer all the cache files. Although run.sh will drop os caches for the first run, the subsequent 2 runs are likely to read from the os pagecache without hitting the disk. (with exceptions that some query's execution involves non-deterministic partition accesses).

  • Databend (main, 1st round)
    main branch(commit 9078cfd), default config

2023-02-13-145206_1822x976_scrot

2. disable table cache vs main branch

  • Databend (main, nth round)
    main branch, default config, 3 rounds
  • Databend (pr, no data cache, nth round)
    both disk and in-memory cache is disabled (default setting of this pr), 3 rounds

it shows that, if table data cache disabled, the performance is on par with main branch

2023-02-13-152710_1826x200_scrot

the raw data:
pr.html.gz

misc

to be improved

  • for raw data cache, the size of pending items should be measured by the bytes of pending data, not the number of them.
  • for object in-memory cache, BlockEntry is preferred, and they should be measure by the "heap size" of object, not the uncompressed byte size

** cluster mode"

eval the cache performance in cluster mod, verify that, with PartitionsShuffleKind::Mod, distribution of table data cache is nearly balanced amount the nodes of cluster.

  • disk cache (based on @PsiACE 's previous work
    crc checksum, sip2-4 128 hash as the cache key, two-level cache file dir layout, etc
  • integrate disk cache with merge io reader
    by default, only enable for async reading (since sync reading almost indicates we are working on local fs storage)
  • threshold of disk cache population
    a bounded queue of pending cache items, with a dedicated thread that writes cache to disk
  • cache of dyn arry object
    ideally, BlockEntry should be cached, but it seems that DataBlock or BlockEntry needs further tweaks.
  • prefix data cache dir with tenant id
  • refactor cache configuration
  • performance evaluation
    some initial perf has been done, to be detailed

Closes #issue

@vercel
Copy link

vercel bot commented Jan 29, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated
databend ✅ Ready (Inspect) Visit Preview 💬 Add your feedback Feb 16, 2023 at 2:29AM (UTC)

src/common/cache/src/todo Outdated Show resolved Hide resolved
@BohuTANG BohuTANG mentioned this pull request Feb 4, 2023
5 tasks
@dantengsky dantengsky changed the title WIP: Feat block cache Feat: block cache Feb 7, 2023
@dantengsky dantengsky changed the title Feat: block cache feat: table data cache Feb 7, 2023
@mergify mergify bot added the pr-feature this PR introduces a new feature to the codebase label Feb 7, 2023
to avoid naming collision of `Setting` (with `common-settings::Settings`)
@dantengsky
Copy link
Member Author

dantengsky commented Feb 15, 2023

@dantengsky dantengsky marked this pull request as ready for review February 15, 2023 15:05
@dantengsky dantengsky requested a review from lichuang February 15, 2023 15:05
@BohuTANG BohuTANG added the ci-benchmark Benchmark: run all test label Feb 15, 2023
@github-actions
Copy link
Contributor

BohuTANG and others added 2 commits February 16, 2023 08:38
Copy link
Member

@BohuTANG BohuTANG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, cache coming.

@github-actions
Copy link
Contributor

@sundy-li sundy-li changed the title feat: table data cache feat: table data cache for object storage Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-benchmark Benchmark: run all test pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants