Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(compact): optimize compact for data load. #8644

Merged
merged 16 commits into from
Nov 5, 2022

Conversation

youngsofun
Copy link
Member

@youngsofun youngsofun commented Nov 5, 2022

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

an improved and simpler version of #8311

simply resize to 1 and resize back to max_threads is enough: new BlockCompactorNoSplit make sure big block pass through it to downstream(Sink) fast.

The premise is that DeserializerProcessor in multi threads already try to accumulate big Block, before output to the compactor.

other optimize:

  • call block.memory_size() only once.
  • avoid call concat_blocks for each new block

refactor:

  • get real BlockCompactThresholds from dest table

Closes #8311

@vercel
Copy link

vercel bot commented Nov 5, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated
databend ⬜️ Ignored (Inspect) Nov 5, 2022 at 8:45AM (UTC)

@youngsofun youngsofun requested review from zhyass and dantengsky and removed request for zhyass November 5, 2022 03:15
@mergify mergify bot added the pr-feature this PR introduces a new feature to the codebase label Nov 5, 2022
@youngsofun
Copy link
Member Author

this also makes it easier if we want to copy to commit by file later.
because we can maintain blocks for all reading files in one map.

@BohuTANG
Copy link
Member

BohuTANG commented Nov 5, 2022

Some issues with this PR need to be addressed:

Following failed statements:
---------------------------------------------
Runner: mysql
ErrorType: statement ok execute with exception
Message: Failed to execute. Collected info: 1105 (HY000): Code: 1068, displayText = Cannot join handle from context's runtime, cause: task 14675 panicked.

Parsed Statement
    at_line: 22,
    s_type: Statement: ok, type: None,
    suite_name: base/03_dml/03_0028_copy_into_stage,
    text:
        copy into test_table from @test;
    results: [],
    runs_on: {'mysql', 'clickhouse', 'http'},
---------------------------------------------
Runner: mysql
ErrorType: statement ok execute with exception
Message: Failed to execute. Collected info: 1105 (HY000): Code: 1068, displayText = Cannot join handle from context's runtime, cause: task 15186 panicked.

https://github.com/datafuselabs/databend/actions/runs/3398563740/jobs/5651782219#step:4:2092

Serve log:

 2022-11-05T06:45:25.930162Z ERROR common_tracing::panic_hook: panicked at 'not implemented', /root/github/databend/src/query/catalog/src/table.rs:236:9, backtrace: Backtrace [{ fn: "common_tracing::panic_hook::log_panic", file: "/root/github/databend/src/common/tracing/src/panic_hook.rs", line: 33 }, { fn: "<alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/alloc/src/boxed.rs", line: 2001 }, { fn: "std::panicking::rust_panic_with_hook", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/std/src/panicking.rs", line: 692 }, { fn: "std::panicking::begin_panic_handler::{{closure}}", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/std/src/panicking.rs", line: 577 }, { fn: "std::sys_common::backtrace::__rust_end_short_backtrace", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/std/src/sys_common/backtrace.rs", line: 137 }, { fn: "rust_begin_unwind", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/std/src/panicking.rs", line: 575 }, { fn: "core::panicking::panic_fmt", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/core/src/panicking.rs", line: 65 }, { fn: "core::panicking::panic", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/core/src/panicking.rs", line: 115 }, { fn: "common_catalog::table::Table::set_block_compact_thresholds", file: "/root/github/databend/src/query/catalog/src/table.rs", line: 236 }, { fn: "databend_query::interpreters::interpreter_copy_v2::CopyInterpreterV2::copy_files_to_table::{{closure}}::{{closure}}", file: "/root/github/databend/src/query/service/src/interpreters/interpreter_copy_v2.rs", line: 298 }, { fn: "<core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/core/src/future/mod.rs", line: 91 }, { fn: "databend_query::interpreters::interpreter_copy_v2::CopyInterpreterV2::copy_files_to_table::{{closure}}", file: "/root/github/databend/src/query/service/src/interpreters/interpreter_copy_v2.rs", line: 274 }, { fn: "<core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/core/src/future/mod.rs", line: 91 }, { fn: "<databend_query::interpreters::interpreter_copy_v2::CopyInterpreterV2 as databend_query::interpreters::interpreter::Interpreter>::execute2::{{closure}}::{{closure}}", file: "/root/github/databend/src/query/service/src/interpreters/interpreter_copy_v2.rs", line: 495 }, { fn: "<core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/core/src/future/mod.rs", line: 91 }, { fn: "<databend_query::interpreters::interpreter_copy_v2::CopyInterpreterV2 as databend_query::interpreters::interpreter::Interpreter>::execute2::{{closure}}", file: "/root/github/databend/src/query/service/src/interpreters/interpreter_copy_v2.rs", line: 421 }, { fn: "<core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/core/src/future/mod.rs", line: 91 }, { fn: "<core::pin::Pin<P> as core::future::future::Future>::poll", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/core/src/future/future.rs", line: 124 }, { fn: "databend_query::interpreters::interpreter::Interpreter::execute::{{closure}}", file: "/root/github/databend/src/query/service/src/interpreters/interpreter.rs", line: 54 }, { fn: "<core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/core/src/future/mod.rs", line: 91 }, { fn: "<core::pin::Pin<P> as core::future::future::Future>::poll", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/core/src/future/future.rs", line: 124 }, { fn: "databend_query::servers::mysql::mysql_interactive_worker::InteractiveWorkerBase<W>::exec_query::{{closure}}::{{closure}}::{{closure}}", file: "/root/github/databend/src/query/service/src/servers/mysql/mysql_interactive_worker.rs", line: 383 }, { fn: "<core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/core/src/future/mod.rs", line: 91 }, { fn: "<tracing::instrument::Instrumented<T> as core::future::future::Future>::poll", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tracing-0.1.37/src/instrument.rs", line: 272 }, { fn: "tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/core.rs", line: 184 }, { fn: "tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/loom/std/unsafe_cell.rs", line: 14 }, { fn: "tokio::runtime::task::core::CoreStage<T>::poll", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/core.rs", line: 174 }, { fn: "tokio::runtime::task::harness::poll_future::{{closure}}", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/harness.rs", line: 480 }, { fn: "<core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/core/src/panic/unwind_safe.rs", line: 271 }, { fn: "std::panicking::try::do_call", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/std/src/panicking.rs", line: 483 }, { fn: "std::panicking::try", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/std/src/panicking.rs", line: 447 }, { fn: "std::panic::catch_unwind", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/std/src/panic.rs", line: 137 }, { fn: "tokio::runtime::task::harness::poll_future", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/harness.rs", line: 468 }, { fn: "tokio::runtime::task::harness::Harness<T,S>::poll_inner", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/harness.rs", line: 104 }, { fn: "tokio::runtime::task::harness::Harness<T,S>::poll", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/harness.rs", line: 57 }, { fn: "tokio::runtime::task::raw::RawTask::poll", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/raw.rs", line: 134 }, { fn: "tokio::runtime::task::LocalNotified<S>::run", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/mod.rs", line: 385 }, { fn: "tokio::runtime::scheduler::multi_thread::worker::Context::run_task::{{closure}}", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/scheduler/multi_thread/worker.rs", line: 421 }, { fn: "tokio::coop::with_budget::{{closure}}", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/coop.rs", line: 102 }, { fn: "std::thread::local::LocalKey<T>::try_with", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/std/src/thread/local.rs", line: 446 }, { fn: "std::thread::local::LocalKey<T>::with", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/std/src/thread/local.rs", line: 422 }, { fn: "tokio::coop::with_budget", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/coop.rs", line: 95 }, { fn: "tokio::coop::budget", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/coop.rs", line: 72 }, { fn: "tokio::runtime::scheduler::multi_thread::worker::Context::run_task", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/scheduler/multi_thread/worker.rs", line: 420 }, { fn: "tokio::runtime::scheduler::multi_thread::worker::Context::run", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/scheduler/multi_thread/worker.rs", line: 387 }, { fn: "tokio::runtime::scheduler::multi_thread::worker::run::{{closure}}", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/scheduler/multi_thread/worker.rs", line: 372 }, { fn: "tokio::macros::scoped_tls::ScopedKey<T>::set", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/macros/scoped_tls.rs", line: 61 }, { fn: "tokio::runtime::scheduler::multi_thread::worker::run", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/scheduler/multi_thread/worker.rs", line: 369 }, { fn: "tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{{closure}}", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/scheduler/multi_thread/worker.rs", line: 348 }, { fn: "<tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/blocking/task.rs", line: 42 }, { fn: "tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/core.rs", line: 184 }, { fn: "tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/loom/std/unsafe_cell.rs", line: 14 }, { fn: "tokio::runtime::task::core::CoreStage<T>::poll", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/core.rs", line: 174 }, { fn: "tokio::runtime::task::harness::poll_future::{{closure}}", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/harness.rs", line: 480 }, { fn: "<core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/core/src/panic/unwind_safe.rs", line: 271 }, { fn: "std::panicking::try::do_call", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/std/src/panicking.rs", line: 483 }, { fn: "std::panicking::try", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/std/src/panicking.rs", line: 447 }, { fn: "std::panic::catch_unwind", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/std/src/panic.rs", line: 137 }, { fn: "tokio::runtime::task::harness::poll_future", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/harness.rs", line: 468 }, { fn: "tokio::runtime::task::harness::Harness<T,S>::poll_inner", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/harness.rs", line: 104 }, { fn: "tokio::runtime::task::harness::Harness<T,S>::poll", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/harness.rs", line: 57 }, { fn: "tokio::runtime::task::raw::poll", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/raw.rs", line: 194 }, { fn: "tokio::runtime::task::raw::RawTask::poll", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/raw.rs", line: 134 }, { fn: "tokio::runtime::task::UnownedTask<S>::run", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/mod.rs", line: 422 }, { fn: "tokio::runtime::blocking::pool::Task::run", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/blocking/pool.rs", line: 111 }, { fn: "tokio::runtime::blocking::pool::Inner::run", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/blocking/pool.rs", line: 346 }, { fn: "tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}", file: "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.21.2/src/runtime/blocking/pool.rs", line: 321 }, { fn: "std::sys_common::backtrace::__rust_begin_short_backtrace", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/std/src/sys_common/backtrace.rs", line: 121 }, { fn: "std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/std/src/thread/mod.rs", line: 551 }, { fn: "<core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/core/src/panic/unwind_safe.rs", line: 271 }, { fn: "std::panicking::try::do_call", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/std/src/panicking.rs", line: 483 }, { fn: "std::panicking::try", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/std/src/panicking.rs", line: 447 }, { fn: "std::panic::catch_unwind", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/std/src/panic.rs", line: 137 }, { fn: "std::thread::Builder::spawn_unchecked_::{{closure}}", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/std/src/thread/mod.rs", line: 550 }, { fn: "core::ops::function::FnOnce::call_once{{vtable.shim}}", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/core/src/ops/function.rs", line: 251 }, { fn: "<alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/alloc/src/boxed.rs", line: 1987 }, { fn: "<alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/alloc/src/boxed.rs", line: 1987 }, { fn: "std::sys::unix::thread::Thread::new::thread_start", file: "/rustc/11ebe6512b4c77633c59f8dcdd421df3b79d1a9f/library/std/src/sys/unix/thread.rs", line: 108 }, { fn: "start_thread", file: "./nptl/./nptl/pthread_create.c", line: 442 }, { fn: "clone3", file: "./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S", line: 81 }], panic.file: "/root/github/databend/src/query/catalog/src/table.rs", panic.line: 236, panic.column: 9
    at src/common/tracing/src/panic_hook.rs:36

@youngsofun
Copy link
Member Author

the effect is demonstrated in the test. e0dee7f

@BohuTANG
Copy link
Member

BohuTANG commented Nov 5, 2022

From my COPY test, this PR works well 👍 :
image

@BohuTANG BohuTANG merged commit 7eb407f into databendlabs:main Nov 5, 2022
@youngsofun youngsofun mentioned this pull request Nov 8, 2022
58 tasks
@youngsofun youngsofun deleted the compact branch November 16, 2022 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

further compact blocks when insert/load data with many threads
2 participants