-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow checkpoints to run while other connections are reading, and no longer block new connections while checkpointing #11918
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…, when this is called, we grab a shared checkpoint lock. This effectively blocks transactions from being upgraded to write transactions while a checkpoint is running. Simultaneously, this prevents checkpoints from running while there are active write transactions.
…mmands only for certain loop iterations
…eckpoint lock when checkpointing. Grab a shared copy when reading from the table to prevent checkpoints while reading.
…hods for index manipulation. Also add test for concurrent index scans while appending, and correctly grab checkpoint lock when we perform the index scan check.
…ssertion that could be triggered in zone map skipping when scanning and appending at the same time
…n_tests_one_by_one where failure when obtainined the test list was ignored
…nd clean-up code + tests
…we write to the WAL
… disallow using non-constant parameters to sequences
…hich allows them to signal a database will be modified, and make Binder::StatementProperties use the same pattern as other Binder related things
…ve lock during checkpointing
…e latest rows that are still readable in other transactions
…not vacuum deletes unless we are doing a FULL CHECKPOINT
… lock inside the transaction
…nstances of a table when the table is altered
…e stats are also shared
…we store multiple validity values per validity_t. Reading threads can read earlier bits, while a writing thread could append and write later bits in the same value. While this is a data race on the entire validity_t - our code ensures the reading threads are not reading the *bits* that the writing thread is writing, making this race non-problematic.
…cks. One change we need to make for this is that we need to flush partial blocks after every table write. This means we can no longer store multiple tables on the same (partial) block, and each table will occupy at least one full block in the file.
…ucceed in obtaining the checkpoint lock initially
…ny entries in the catalog
Nightly test failures are unrelated and should be picked up separate from this PR. |
@Mytherin thanks a lot for your work! this is great news! :) |
github-actions bot
pushed a commit
to duckdb/duckdb-r
that referenced
this pull request
May 6, 2024
Merge pull request duckdb/duckdb#11918 from Mytherin/morefinegrainedcheckpointing
This was referenced May 14, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Partially fixes #9150
This PR reworks the way that locking around checkpointing works. Previously, when checkpointing, we required that the checkpointing thread was the only active thread. As a result, automatic checkpoints were blocked when other connections were querying the database, and
CHECKPOINT
would throw an exception. While checkpointing, new connections could not start transactions and would block until the checkpoint was completed. This introduced a number of issues:transaction_lock
was held, meaning new connections could not connect/start transactions until those operations were finished. As a result, writers could introduce significant latency for readers.After this PR, checkpointing uses more granular locking. Automatic checkpoints can now run under the following conditions:
UPDATE
statement) or dropped catalog entries (using theDROP
statement), automatic checkpointing is only possible if there are no active read transactionsDELETE
statement), vacuuming can only be performed if there are no active read transactionsCheckpoint Lock
After these changes, we can either have:
n
active writing transactionsThis is enforced using a shared/exclusive lock in the
DuckTransactionManager
called thecheckpoint_lock
. Writing threads grab a shared checkpoint lock, and checkpoints grab an exclusive checkpoint lock. This locking also means that, while a checkpoint is running, new writers are blocked until the checkpoint is finished.Transactions are started as read transactions, and are upgraded to write transactions when they attempt to modify the database (through e.g. inserting data, updating, deleting, or performing an
ALTER
statement). There is a new callback added to transactions that is called when this happens:For the
DuckTransaction
, this callback grabs a shared checkpoint lock.Table Locks
Read transactions are not blocked by checkpointing - they can be started, committed, and can read data from the system while a checkpoint is running. However, since checkpoints restructure table data, it is not safe to read table data while a checkpoint is running. To prevent this from causing problems, this PR introduces table-level locks. Similar to the checkpoint locks, these locks are shared/exclusive locks. Threads that read from a table grab a shared lock, while the checkpoint grabs exclusive locks over tables while it is checkpointing.
This means that read transactions can be blocked by the checkpoint process when reading table data. In the future we plan to make these locks more granular instead of table-level.
Partial Blocks
When doing a checkpoint, we gather small writes and co-locate them on the same block. This is particularly useful when writing small tables, as without the partial block manager we would need to dedicate one block per column.
Prior to this PR, we would also co-locate multiple tables on the same block, as we would flush the partial block manager once at the end of a checkpoint.
This is problematic for concurrent checkpointing, however. The way the partial block manager works is that it gathers (small) writes, and writes them out to a block when a block is filled. This means that a write to Column A, can trigger a write to other columns. When working with table-level locks, this does not work nicely, since that would mean we must keep a lock on all tables for the entire duration of the checkpoint. This could even cause dead-locks, as the checkpoint would grab exclusive locks to multiple tables, and reading transactions can grab shared locks to multiple tables.
To solve this problem, in this PR we modify the partial block manager to operate on the table level only. As a result, partial blocks will never be shared across tables. During checkpoints, we then only need to hold locks on individual tables, and can release locks on tables after we are done processing them.
Sequences
This PR includes a minor rework of sequences. Previously, sequences could be used by read-only transactions, and were handled outside of the regular
UndoBuffer
infrastructure. When sequences were used in read-only databases, the sequences would increment in value, but that increment would not be stored on disk.This behavior is reworked in this PR: calling
nextval
on a sequence is now a write operation on the database the sequence exists in. This change is required because otherwise read-only transactions would need to e.g. append to the write ahead log, which causes conflicts with checkpointing. In addition, another restriction is added in this PR:nextval
can now only be called with a constant parameter (e.g.nextval('seq')
, the following is no longer supported:That is because allowing random sequences to be referenced during run-time causes a lot of head-aches, e.g. we can have multiple threads now marking transactions as write transactions concurrently, and we no longer know up-front when binding if a query is going to be read-only or not (does this refer to a temporary sequence or a persistent one?).
sqllogictest skipif/onlyif
This PR also extends the
skipif
/onlyif
decorators in the sqllogictest to operate on loop identifiers. This allows us to write threads where e.g. certain threads perform a task, while others perform another task. The syntax isonlyif X=0
orskipif X=0
prior to astatement
orquery
, e.g.:Below is a full example where we run
concurrentloop
, and then filter onthreadid
so we get one thread that is appending data, while the other threads are reading:Experiments
Below is an experiment that shows the effects of this PR by appending data to the
lineitem
table, while another connection is continuously running queries (TPC-H Q1 in this instance).Generate Data
Run Queries
v0.10.2
New
We can see that in the new version, because the reading thread no longer blocks the optimistic writing/automatic checkpointing, no data is ever written to the WAL. Instead data can keep on being written into the database file and the WAL is never utilized in this scenario.