Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: table data cache for object storage #9772

Merged
merged 91 commits into from
Feb 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
71d3030
bring back @PsiACE's disk cache
dantengsky Jan 28, 2023
a29c2c1
tailor disk cache
dantengsky Jan 29, 2023
c388108
assembly data cache to async reader
dantengsky Jan 30, 2023
f09faa0
add CacheItem type parameter
dantengsky Jan 31, 2023
32e4251
fix cache key to string, use 128 bit hash
dantengsky Feb 1, 2023
291fe74
shrink lock scope of reading cached data
dantengsky Feb 1, 2023
9ac4c3d
fix: cache init
dantengsky Feb 1, 2023
118c8fa
wip
dantengsky Feb 1, 2023
b10afa7
fix: evcition
dantengsky Feb 3, 2023
60517b1
Merge remote-tracking branch 'origin/main' into feat-block-cache
dantengsky Feb 3, 2023
c265c4d
add unit tests
dantengsky Feb 4, 2023
254f5a7
add crc checksum
dantengsky Feb 6, 2023
4b1ca8b
tiered disk cache
dantengsky Feb 6, 2023
c05d3c7
tuning and metrics
dantengsky Feb 6, 2023
47cfc1d
add data cache related config
dantengsky Feb 6, 2023
9f9bce8
fix ut
dantengsky Feb 6, 2023
e28564c
fix ut it
dantengsky Feb 7, 2023
bdca8a4
refactor: generic external cache type
dantengsky Feb 7, 2023
945024c
Merge remote-tracking branch 'origin/main' into feat-block-cache
dantengsky Feb 7, 2023
248b663
remove debug env var
dantengsky Feb 7, 2023
5eadd48
rename table_data_cache setting name & fix ut
dantengsky Feb 7, 2023
776446d
fix cache population thread number to 1
dantengsky Feb 7, 2023
ac83c28
refactor: move metrics into TableDataCache
dantengsky Feb 7, 2023
bfb1eff
cleanup unused metrics
dantengsky Feb 8, 2023
0031834
cleaup traits
dantengsky Feb 8, 2023
1f06dd9
tweak doc
dantengsky Feb 8, 2023
7344fa1
minor refactor
dantengsky Feb 8, 2023
915f663
wip: integrate ColumnArrayCache
dantengsky Feb 9, 2023
2b72e40
wip
dantengsky Feb 9, 2023
e7f5258
refactor: avoid clone Box<dyn Array>
dantengsky Feb 9, 2023
e091e01
array cache for sync parquet read
dantengsky Feb 9, 2023
89d393b
add setting for in memnory table colum object cache
dantengsky Feb 9, 2023
c7af42b
meter in memory column array cache by uncompressed bytes size
dantengsky Feb 9, 2023
b7fdfcd
remove in-memory raw column data cache
dantengsky Feb 10, 2023
a2592a7
fix ut
dantengsky Feb 10, 2023
811aa53
Merge remote-tracking branch 'origin/main' into feat-block-cache
dantengsky Feb 10, 2023
936ed75
iterate ArrayIter in-place
dantengsky Feb 10, 2023
1c2978c
fix ut
dantengsky Feb 10, 2023
4bb8311
extract deserialize_field method
dantengsky Feb 10, 2023
7277243
tweak doc
dantengsky Feb 10, 2023
5f3cbae
fix sql logic test
dantengsky Feb 10, 2023
213017f
split mod block_reader & fix logic test
dantengsky Feb 10, 2023
943c499
refactoring
dantengsky Feb 10, 2023
8d6f32b
fix sqllogic test
dantengsky Feb 10, 2023
3741a3e
remove unwraps
dantengsky Feb 10, 2023
842eec9
clean up
dantengsky Feb 11, 2023
abd411d
fix typo
dantengsky Feb 11, 2023
e1eda53
add doc
dantengsky Feb 11, 2023
debdace
Merge remote-tracking branch 'origin/main' into feat-block-cache
dantengsky Feb 11, 2023
fd7cea6
fix cache metric name
dantengsky Feb 11, 2023
16fbedf
fix: shoot
dantengsky Feb 11, 2023
6d50da5
fix: metrics of pending cache population items
dantengsky Feb 11, 2023
2797815
tweak metrics
dantengsky Feb 11, 2023
f15a386
refactor: declare cache name in cache_managers mod
dantengsky Feb 11, 2023
9e817d6
refactor: resovle clippy::too_many_arguments
dantengsky Feb 11, 2023
c4819fb
refactor: move DiskCache into crate storage-common-cache
dantengsky Feb 12, 2023
ccd1ada
refacot: rename setting and metrics
dantengsky Feb 12, 2023
8afeda9
Merge remote-tracking branch 'origin/main' into feat-block-cache
dantengsky Feb 12, 2023
65e07ab
update doc
dantengsky Feb 12, 2023
be4c69d
refactor: renaming cache setting names and units
dantengsky Feb 12, 2023
b730558
refactor: disk cache open file without holding lock
dantengsky Feb 12, 2023
9e9c6e5
remove obsolete setting from ci configs
dantengsky Feb 12, 2023
01486b2
fix: revert DataBlock
dantengsky Feb 12, 2023
06c5906
fix: metric of cache population overflow should be increased
dantengsky Feb 12, 2023
3954137
adjust doc
dantengsky Feb 12, 2023
31d1f44
remove obsolete setting
dantengsky Feb 12, 2023
53c43ae
Merge remote-tracking branch 'origin/main' into feat-block-cache
dantengsky Feb 13, 2023
d7be32b
add waring log for failure of invalid disk cache item removal
dantengsky Feb 13, 2023
c17edbb
remove unnecessary clone
dantengsky Feb 13, 2023
88b2e3d
fix lint
dantengsky Feb 13, 2023
8afb981
refactor: TableDataColumnCacheKey -> TableDataCacheKey
dantengsky Feb 13, 2023
f88fb1f
Merge remote-tracking branch 'origin/main' into feat-block-cache
dantengsky Feb 13, 2023
b6872c6
resolve conflict
dantengsky Feb 13, 2023
f81f25a
tenant disk cache path isolation
dantengsky Feb 13, 2023
6c1d404
Merge remote-tracking branch 'origin/main' into feat-block-cache
dantengsky Feb 13, 2023
5b3315b
resovle conflicts
dantengsky Feb 13, 2023
5b7a565
minor code gc
dantengsky Feb 13, 2023
4c29d67
refactor cache configuration
dantengsky Feb 14, 2023
95376f9
Merge remote-tracking branch 'origin/main' into feat-block-cache
dantengsky Feb 15, 2023
9e05186
the converters
dantengsky Feb 15, 2023
0eaf747
cache configs
dantengsky Feb 15, 2023
86fbcbe
fix typo
dantengsky Feb 15, 2023
b1de5fc
Merge remote-tracking branch 'origin/main' into feat-block-cache
dantengsky Feb 15, 2023
4f1a006
fix typos
dantengsky Feb 15, 2023
4c72dc5
revert `common_config::inner` namespace
dantengsky Feb 15, 2023
8e14720
fix typo
dantengsky Feb 15, 2023
46def0b
update doc: 70-system-tables/system-configs.md
dantengsky Feb 15, 2023
49b30ff
fix: incorrect serde name
dantengsky Feb 15, 2023
05c5a29
update golden file
dantengsky Feb 15, 2023
355d9f1
enable_table_meta_caches -> enable_table_meta_cache, enable_table_blo…
BohuTANG Feb 16, 2023
4ec7754
Merge pull request #79 from BohuTANG/block-cache-rename
dantengsky Feb 16, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 20 additions & 2 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

173 changes: 127 additions & 46 deletions docs/doc/13-sql-reference/70-system-tables/system-configs.md

Large diffs are not rendered by default.

49 changes: 42 additions & 7 deletions scripts/ci/deploy/config/databend-query-node-1.toml
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,6 @@ cluster_id = "test_cluster"
table_engine_memory_enabled = true
database_engine_github_enabled = true

table_meta_cache_enabled = true
table_memory_cache_mb_size = 1024
table_disk_cache_root = "_cache"
table_disk_cache_mb_size = 10240
table_cache_bloom_index_meta_count=3000
table_cache_bloom_index_filter_count=1048576

# [[query.users]]
# name = "admin"
# auth_type = "no_password"
Expand Down Expand Up @@ -110,3 +103,45 @@ data_path = "./.databend/stateless_test_data"
# endpoint_url = "<your-endpoint>"
# access_key_id = "<your-key-id>"
# access_key_secret = "<your-account-key>"


[cache]

### table meta caches ###
# Enable table meta cache. Default is true.
# Set it to false wll disable all the table meta caches
enable_table_meta_cache = true
# Max number of cached table snapshot. Set it to 0 to disable it.
table_meta_snapshot_count = 256
# Max number of cached table segment. Set it to 0 to disable it.
table_meta_segment_count = 10240
# Max number of cached table statistic meta. Set it to 0 to disable it.
table_meta_statistic_count = 256

### table bloom index caches ###
# Enable bloom index cache. Default is true
# Set it to false will disable all the bloom index caches
enable_table_bloom_index_cache = true
# Max number of cached bloom index meta objects. Set it to 0 to disable it.
table_bloom_index_meta_count = 3000
# Max number of cached bloom index filters. Set it to 0 to disable it.
table_bloom_index_filter_count = 1048576

### table data caches ###

# Type of storage to keep the table data cache
#
# available options: [none|disk]
# default is "none", which disable table data cache
# use "disk" to enabled disk cache
data_cache_storage = "none"

# Max size of external cache population queue length
table_data_cache_population_queue_size = 65535


[cache.disk]
# cache path
path = "./databend/_cache"
# max bytes of cached data 20G
max_bytes = 21474836480
48 changes: 41 additions & 7 deletions scripts/ci/deploy/config/databend-query-node-2.toml
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,6 @@ cluster_id = "test_cluster"
table_engine_memory_enabled = true
database_engine_github_enabled = true

table_meta_cache_enabled = true
table_memory_cache_mb_size = 1024
table_disk_cache_root = "./.databend/cache"
table_disk_cache_mb_size = 10240
table_cache_bloom_index_meta_count=3000
table_cache_bloom_index_filter_count=1048576

[log]

[log.file]
Expand All @@ -64,3 +57,44 @@ type = "fs"
# Comment out this block if you're NOT using local file system as storage.
[storage.fs]
data_path = "./.databend/stateless_test_data"

[cache]

### table meta caches ###
# Enable table meta cache. Default is true.
# Set it to false wll disable all the table meta caches
enable_table_meta_cache = true
# Max number of cached table snapshot. Set it to 0 to disable it.
table_meta_snapshot_count = 256
# Max number of cached table segment. Set it to 0 to disable it.
table_meta_segment_count = 10240
# Max number of cached table statistic meta. Set it to 0 to disable it.
table_meta_statistic_count = 256

### table bloom index caches ###
# Enable bloom index cache. Default is true
# Set it to false will disable all the bloom index caches
enable_table_bloom_index_cache = true
# Max number of cached bloom index meta objects. Set it to 0 to disable it.
table_bloom_index_meta_count = 3000
# Max number of cached bloom index filters. Set it to 0 to disable it.
table_bloom_index_filter_count = 1048576

### table data caches ###

# Type of storage to keep the table data cache
#
# available options: [none|disk]
# default is "none", which disable table data cache
# use "disk" to enabled disk cache
data_cache_storage = "none"

# Max size of external cache population queue length
table_data_cache_population_queue_size = 65535


[cache.disk]
# cache path
path = "./databend/_cache"
# max bytes of cached data 20G
max_bytes = 21474836480
48 changes: 41 additions & 7 deletions scripts/ci/deploy/config/databend-query-node-3.toml
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,6 @@ cluster_id = "test_cluster"
table_engine_memory_enabled = true
database_engine_github_enabled = true

table_meta_cache_enabled = true
table_memory_cache_mb_size = 1024
table_disk_cache_root = "./.databend/cache"
table_disk_cache_mb_size = 10240
table_cache_bloom_index_meta_count=3000
table_cache_bloom_index_filter_count=1048576

[log]

[log.file]
Expand All @@ -65,3 +58,44 @@ type = "fs"
# Comment out this block if you're NOT using local file system as storage.
[storage.fs]
data_path = "./.databend/stateless_test_data"

[cache]

### table meta caches ###
# Enable table meta cache. Default is true.
# Set it to false wll disable all the table meta caches
enable_table_meta_cache = true
# Max number of cached table snapshot. Set it to 0 to disable it.
table_meta_snapshot_count = 256
# Max number of cached table segment. Set it to 0 to disable it.
table_meta_segment_count = 10240
# Max number of cached table statistic meta. Set it to 0 to disable it.
table_meta_statistic_count = 256

### table bloom index caches ###
# Enable bloom index cache. Default is true
# Set it to false will disable all the bloom index caches
enable_table_bloom_index_cache = true
# Max number of cached bloom index meta objects. Set it to 0 to disable it.
table_bloom_index_meta_count = 3000
# Max number of cached bloom index filters. Set it to 0 to disable it.
table_bloom_index_filter_count = 1048576

### table data caches ###

# Type of storage to keep the table data cache
#
# available options: [none|disk]
# default is "none", which disable table data cache
# use "disk" to enabled disk cache
data_cache_storage = "none"

# Max size of external cache population queue length
table_data_cache_population_queue_size = 65535


[cache.disk]
# cache path
path = "./databend/_cache"
# max bytes of cached data 20G
max_bytes = 21474836480
48 changes: 41 additions & 7 deletions scripts/ci/deploy/config/databend-query-node-shared.toml
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,6 @@ cluster_id = "test_cluster"
table_engine_memory_enabled = true
database_engine_github_enabled = true

table_meta_cache_enabled = true
table_memory_cache_mb_size = 1024
table_disk_cache_root = "_cache"
table_disk_cache_mb_size = 10240
table_cache_bloom_index_meta_count=3000
table_cache_bloom_index_filter_count=1048576

share_endpoint_address = "127.0.0.1:33003" # receive shared information from open sharing
# [[query.users]]
# name = "admin"
Expand Down Expand Up @@ -110,3 +103,44 @@ data_path = "./.databend/stateless_test_data"
# endpoint_url = "<your-endpoint>"
# access_key_id = "<your-key-id>"
# access_key_secret = "<your-account-key>"

[cache]

### table meta caches ###
# Enable table meta cache. Default is true.
# Set it to false wll disable all the table meta caches
enable_table_meta_cache = true
# Max number of cached table snapshot. Set it to 0 to disable it.
table_meta_snapshot_count = 256
# Max number of cached table segment. Set it to 0 to disable it.
table_meta_segment_count = 10240
# Max number of cached table statistic meta. Set it to 0 to disable it.
table_meta_statistic_count = 256

### table bloom index caches ###
# Enable bloom index cache. Default is true
# Set it to false will disable all the bloom index caches
enable_table_bloom_index_cache = true
# Max number of cached bloom index meta objects. Set it to 0 to disable it.
table_bloom_index_meta_count = 3000
# Max number of cached bloom index filters. Set it to 0 to disable it.
table_bloom_index_filter_count = 1048576

### table data caches ###

# Type of storage to keep the table data cache
#
# available options: [none|disk]
# default is "none", which disable table data cache
# use "disk" to enabled disk cache
data_cache_storage = "none"

# Max size of external cache population queue length
table_data_cache_population_queue_size = 65535


[cache.disk]
# cache path
path = "./databend/_cache"
# max bytes of cached data 20G
max_bytes = 21474836480
4 changes: 2 additions & 2 deletions src/binaries/query/local.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ use std::time::Instant;

use comfy_table::Cell;
use comfy_table::Table;
use common_config::Config;
use common_config::InnerConfig;
use common_exception::ErrorCode;
use common_exception::Result;
use common_expression::DataBlock;
Expand All @@ -28,7 +28,7 @@ use databend_query::sql::Planner;
use databend_query::GlobalServices;
use tokio_stream::StreamExt;

pub async fn query_local(conf: &Config) -> Result<()> {
pub async fn query_local(conf: &InnerConfig) -> Result<()> {
let mut conf = conf.clone();
conf.storage.allow_insecure = true;
let local_conf = conf.local.clone();
Expand Down
6 changes: 3 additions & 3 deletions src/binaries/query/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ use common_base::mem_allocator::GlobalAllocator;
use common_base::runtime::Runtime;
use common_base::runtime::GLOBAL_MEM_STAT;
use common_base::set_alloc_error_hook;
use common_config::Config;
use common_config::InnerConfig;
use common_config::DATABEND_COMMIT_VERSION;
use common_config::QUERY_SEMVER;
use common_exception::Result;
Expand Down Expand Up @@ -62,7 +62,7 @@ fn main() {
}

async fn main_entrypoint() -> Result<()> {
let conf: Config = Config::load()?;
let conf: InnerConfig = InnerConfig::load()?;

if run_cmd(&conf).await? {
return Ok(());
Expand Down Expand Up @@ -310,7 +310,7 @@ async fn main_entrypoint() -> Result<()> {
Ok(())
}

async fn run_cmd(conf: &Config) -> Result<bool> {
async fn run_cmd(conf: &InnerConfig) -> Result<bool> {
if conf.cmd.is_empty() {
return Ok(false);
}
Expand Down
9 changes: 6 additions & 3 deletions src/common/cache/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,17 @@ test = false
heapsize = ["heapsize_"]
amortized = ["ritelinked/ahash-amortized", "ritelinked/inline-more-amortized"]

[dependencies] # In alphabetical order
# Github dependencies
[dependencies]

# Crates.io dependencies
crc32fast = "1.3.2"
hex = "0.4.3"
ritelinked = { version = "0.3.2", default-features = false, features = ["ahash", "inline-more"] }
siphasher = "0.3.10"
tracing = "0.1.36"
walkdir = "2.3.2"

[target.'cfg(not(target_os = "macos"))'.dependencies]
heapsize_ = { package = "heapsize", version = "0.4.2", optional = true }

[dev-dependencies]
tempfile = "3.3.0"
1 change: 1 addition & 0 deletions src/common/cache/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.

#![feature(write_all_vectored)]
#![allow(clippy::uninlined_format_args)]
#[cfg(feature = "heapsize")]
#[cfg(not(target_os = "macos"))]
Expand Down
Loading