Skip to content

Commit

Permalink
[#23045] DocDB: Support for table locks using tserver local object lo…
Browse files Browse the repository at this point in the history
…ck manager

Summary:
To achieve table locking, the high level proposal was as follows -
1. DMLs would acquire table/object locks at the local tserver, which need not be persisted since we don't need them post restart/crash (as all transactions hosted by the query layer at tserver would be aborted)
2. DDLs would acquire table/object locks at all the tserver, and the master leader would be responsible for re-acquiring these locks post tserver startup.
... other details not relevant for the current diff.

The earlier idea was to re-use the whole of conflict resolution path (acquire shared in memory locks + scan intent/regular db for conflicts + wait-queue ...) by creating a tserver local virtual transaction participant that maintains just one rocksdb instance for tracking the intents (and the suggestion was to tweak a few flags for this to be mostly be in-memory). It is important to note that the concept of transaction id is deeply rooted into all of this, and waiter resumption etc depend on txn status requests/ intent cleanup workflow triggered by transaction abort codepath etc.

Today, we don't start a distributed transaction until necessary (first read with explicit locks or a write in case of non serializable isolation levels start the distributed txn). But all the table/object lock request would/could be made prior to this. So the idea was to use some sort of a unique session id for acquiring the object locks. Plugging in the concept session id into the existing conflict resolution codepath doesn't go smoothly since there is lot of handling based on the "existence" of a transaction. Also, note that there is significant performance degradation for read only transactions if we always start a distributed transaction at the soonest, so we definitely need to go with the session id as a proxy.

In this diff, we try to re-use just the shared lock manager piece of the conflict resolution for achieving table locking feature.

`SharedLockManager` too does conflict resolution (checks conflicts during lock acquisition), but the scope of the lock is restricted to scope of the write rpc request itself using `LockBatch`.

This diff introduces the following
- templates `SharedLockManager::Impl` code to work with different key types as opposed to `RefCntPrefix` alone, and puts it all under `LockManagerImpl`
- adds support for a storing the acquired/waiting locks in-memory at the `ObjectLockManager` and thus not restricting the scope of the lock to the lifetime of the lock rpc request. For the core write path, we continue to restrict the scope of the in-memory locks to the scope of the rpc request.
- adds support for instrumenting locks on a session id granularity, thus allowing unlocking capability by providing a session id and optionally specific keys to unlock. Again, for the core write path, we don't instrument the in-memory locks.

With these enhancements to the `SharedLockManager::Impl` (now `LockManagerImpl`), the diff additionally creates an `TSLocalLockManager` maintained locally to every tserver that serves object lock/unlock requests. It also imposes a restriction that there can be at most one active lock/unlock request for any given session id actively being processed at the `TSLocalLockManager`. This is required for preventing invalid updates to the `LockState` of an object (an unlock should only be processed after a lock, else the unlock would be a no-op and the lock processed after it would leave the lock state in a corrupt state until process restart).

**Upgrade/Downgrade safety**
Introduced new proto messages, and new rpc service methods. No production usage of these yet, everything is guarded under a test flag `enable_object_locking_for_table_locks` which is default by false. The only usage of these is from `yb-ts-cli` for now.

Note: The `TSLocalLockManager` does not preserve the locks across tserver crashes/restarts, and that is the desired behavior for table locks. Also, there aren't any leadership concerns since every tserver has its own instance of the `TSLocalLockManager`.
Jira: DB-11977

Test Plan:
./yb_build.sh --cxx-test tserver_ts_local_lock_manager-test
./yb_build.sh --cxx-test='TEST_F(DocDBTableLocksConflictMatrixTest, TableConflictMatrix) {'

Also added lock/release api as part of ts-cli. To test manually, spin up a cluster using the following
```
./bin/yb-ctl create --rf=1 --data_dir ~/yugabyte-data --tserver_flags 'TEST_enable_object_locking_for_table_locks=true'
```

Can issue lock/unlock request using the following,
```
yb-ts-cli acquire_object_lock <session id> <object id> lock_mode
yb-ts-cli release_object_lock <session id> <object id>
yb-ts-cli release_all_locks_for_session <session id>
```
where lock mode is one of the following
```
ACCESS_SHARE
ROW_SHARE
ROW_EXCLUSIVE
SHARE_UPDATE_EXCLUSIVE
SHARE
SHARE_ROW_EXCLUSIVE
EXCLUSIVE
ACCESS_EXCLUSIVE
```

Reviewers: amitanand, myang, sergei, rthallam

Reviewed By: sergei

Subscribers: patnaik.balivada, esheng, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D35822
  • Loading branch information
basavaraj29 committed Jul 30, 2024
1 parent 86c4e4c commit a03ccda
Show file tree
Hide file tree
Showing 43 changed files with 1,687 additions and 258 deletions.
18 changes: 18 additions & 0 deletions src/yb/common/transaction.proto
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,24 @@ enum RowMarkType {
ROW_MARK_ABSENT = 15;
}

// This enum matches enum table locks defined in src/include/storage/lockdefs.h.
// Table level lock conflict matrix.
// Source: https://www.postgresql.org/docs/15/explicit-locking.html#LOCKING-TABLES
enum TableLockType {
// Though NONE is listed as a lock type, it seems to be a flag conveying "don't get a
// lock". Despite that, if a lock request with NONE type makes it to DocDB, we return
// an error status.
NONE = 0;
ACCESS_SHARE = 1;
ROW_SHARE = 2;
ROW_EXCLUSIVE = 3;
SHARE_UPDATE_EXCLUSIVE = 4;
SHARE = 5;
SHARE_ROW_EXCLUSIVE = 6;
EXCLUSIVE = 7;
ACCESS_EXCLUSIVE = 8;
}

message SubtxnSetPB {
// This is not a simple set representation, but rather the encoded output of a
// yb::UnsignedIntSet<uint32_t>.
Expand Down
1 change: 1 addition & 0 deletions src/yb/consensus/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,7 @@ add_dependencies(consensus gen_src_yb_rpc_any_proto)
add_dependencies(consensus cdc_service_proto)
add_dependencies(consensus gen_src_yb_master_master_client_proto)
add_dependencies(consensus gen_src_yb_master_master_types_proto)
add_dependencies(consensus gen_src_yb_tserver_tserver_service_proto)

target_link_libraries(consensus
consensus_error
Expand Down
1 change: 0 additions & 1 deletion src/yb/docdb/doc_reader.cc
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@
#include "yb/docdb/docdb_rocksdb_util.h"
#include "yb/docdb/intent_aware_iterator.h"
#include "yb/docdb/read_operation_data.h"
#include "yb/docdb/shared_lock_manager_fwd.h"

#include "yb/dockv/doc_key.h"
#include "yb/dockv/doc_ttl_util.h"
Expand Down
1 change: 0 additions & 1 deletion src/yb/docdb/docdb-internal.cc
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@
#include "yb/common/transaction.h"

#include "yb/docdb/docdb_fwd.h"
#include "yb/docdb/shared_lock_manager_fwd.h"
#include "yb/dockv/value_type.h"

namespace yb {
Expand Down
74 changes: 74 additions & 0 deletions src/yb/docdb/docdb-test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,65 @@

#include "yb/docdb/docdb-test.h"

#include "yb/common/transaction.h"

#include "yb/dockv/reader_projection.h"

#include "yb/util/minmax.h"

namespace yb {
namespace docdb {

namespace {

std::set<std::pair<TableLockType, TableLockType>> MakeTableLockConflicts() {
// Populate the table lock conflict matrix by inserting lock type pairs that conflict.
// For {a, b} in the below 'min_conflicts_with', 'a' conflicts with all lock types >= 'b', with
// the exception of 'SHARE' (it doesn't conflict with self), which is explicitly removed later.
static const std::unordered_map<TableLockType, TableLockType> min_conflicts_with = {
{ACCESS_SHARE, ACCESS_EXCLUSIVE},
{ROW_SHARE, EXCLUSIVE},
{ROW_EXCLUSIVE, SHARE},
{SHARE_UPDATE_EXCLUSIVE, SHARE_UPDATE_EXCLUSIVE},
{SHARE, ROW_EXCLUSIVE},
{SHARE_ROW_EXCLUSIVE, ROW_EXCLUSIVE},
{EXCLUSIVE, ROW_SHARE},
{ACCESS_EXCLUSIVE, ACCESS_SHARE}
};

static std::set<std::pair<TableLockType, TableLockType>> conflicts;
for (auto l1 = TableLockType_MIN + 1; l1 <= TableLockType_MAX; l1++) {
auto it = min_conflicts_with.find(TableLockType(l1));
CHECK(it != min_conflicts_with.end());
for (auto l2 = it->second; l2 <= TableLockType_MAX; l2 = static_cast<TableLockType>(l2 + 1)) {
conflicts.insert({TableLockType(l1), TableLockType(l2)});
}
}
conflicts.erase({SHARE, SHARE});
return conflicts;
}

} // namespace

Result<bool> DocDBTableLocksConflictMatrixTest::ObjectLocksConflict(
const std::vector<std::pair<KeyEntryType, dockv::IntentTypeSet>>& lhs,
const std::vector<std::pair<KeyEntryType, dockv::IntentTypeSet>>& rhs) {
for (const auto& [lhs_type, lhs_intents] : lhs) {
bool found_entry_with_type = false;
for (const auto& [rhs_type, rhs_intents] : rhs) {
if (lhs_type != rhs_type) {
continue;
}
SCHECK(!found_entry_with_type, IllegalState, "Found $0 more than once in $1", rhs_type, rhs);
found_entry_with_type = true;
if (IntentTypeSetsConflict(lhs_intents, rhs_intents)) {
return true;
}
}
}
return false;
}

// This test confirms that we return the appropriate value for doc_found in the case that the last
// projection we look at is not present. Previously we had a bug where we would set doc_found to
// true if the last projection was present, and false otherwise, reguardless of other projections
Expand Down Expand Up @@ -240,6 +292,28 @@ TEST_F(DocDBTestQl, YsqlSystemTableTombstoneCompaction) {
ASSERT_RESULT(Uuid::FromString("66666666-7777-8888-9999-000000000000")));
}

// YB associates a list of <KeyEntryType, IntentTypeSet> to each table lock type such that
// the table lock conflict matrix of postgres is achieved. The below test asserts the same.
//
// Pg conflict matrix - https://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-TABLES
TEST_F(DocDBTableLocksConflictMatrixTest, TableConflictMatrix) {
const std::set<std::pair<TableLockType, TableLockType>> conflicts = MakeTableLockConflicts();

for (auto l1 = TableLockType_MIN + 1; l1 <= TableLockType_MAX; l1++) {
auto lock1 = TableLockType(l1);
const auto& entries1 = GetEntriesForLockType(lock1);
for (auto l2 = l1; l2 <= TableLockType_MAX; l2++) {
auto lock2 = TableLockType(l2);
const auto& entries2 = GetEntriesForLockType(lock2);
auto has_conflict = ASSERT_RESULT(ObjectLocksConflict(entries1, entries2));
ASSERT_EQ(has_conflict, conflicts.find({lock1, lock2}) != conflicts.end())
<< Format("Expected $0 to $1have conflicted with $2", TableLockType_Name(lock1),
has_conflict ? "" : "not ", TableLockType_Name(lock2));
ASSERT_EQ(has_conflict, ASSERT_RESULT(ObjectLocksConflict(entries2, entries1)));
}
}
}

class DocDBTestBoundaryValues: public DocDBTestWrapper {
protected:
void TestBoundaryValues(size_t flush_rate) {
Expand Down
15 changes: 15 additions & 0 deletions src/yb/docdb/docdb-test.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
#include "yb/docdb/docdb_test_base.h"
#include "yb/docdb/doc_reader.h"
#include "yb/docdb/doc_reader_redis.h"
#include "yb/docdb/shared_lock_manager.h"

#include "yb/dockv/doc_key.h"

Expand Down Expand Up @@ -528,6 +529,20 @@ class DocDBTestQl : public DocDBTest {
void TestTableTombstoneCompaction(T id);
};

class DocDBTableLocksConflictMatrixTest : public DocDBTest {
public:
void GetSubDoc(
const KeyBytes& subdoc_key, SubDocument* result, bool* found_result,
const TransactionOperationContext& txn_op_context = TransactionOperationContext(),
const ReadHybridTime& read_time = ReadHybridTime::Max()) override {
LOG_WITH_FUNC(FATAL) << "Not implemented";
}

static Result<bool> ObjectLocksConflict(
const std::vector<std::pair<KeyEntryType, dockv::IntentTypeSet>>& lhs,
const std::vector<std::pair<KeyEntryType, dockv::IntentTypeSet>>& rhs);
};

class DocDBTestRedis : public DocDBTest {
public:
void GetSubDoc(
Expand Down
120 changes: 105 additions & 15 deletions src/yb/docdb/docdb.cc
Original file line number Diff line number Diff line change
Expand Up @@ -87,32 +87,34 @@ namespace {

// key should be valid prefix of doc key, ending with some complete primitive value or group end.
Status ApplyIntent(
RefCntPrefix key, dockv::IntentTypeSet intent_types, LockBatchEntries* keys_locked) {
RefCntPrefix key, dockv::IntentTypeSet intent_types,
LockBatchEntries<RefCntPrefix>* keys_locked) {
RSTATUS_DCHECK(!intent_types.None(), InternalError, "Empty intent types is not allowed");
// Have to strip kGroupEnd from end of key, because when only hash key is specified, we will
// get two kGroupEnd at end of strong intent.
RETURN_NOT_OK(dockv::RemoveGroupEndSuffix(&key));
keys_locked->push_back({std::move(key), intent_types});
keys_locked->push_back(
LockBatchEntry<RefCntPrefix> {std::move(key), intent_types});
return Status::OK();
}

struct DetermineKeysToLockResult {
LockBatchEntries lock_batch;
bool need_read_snapshot;

std::string ToString() const {
return YB_STRUCT_TO_STRING(lock_batch, need_read_snapshot);
}
};
Status FormSharedLock(
ObjectLockPrefix key, dockv::IntentTypeSet intent_types,
LockBatchEntries<ObjectLockPrefix>* keys_locked) {
SCHECK(!intent_types.None(), InternalError, "Empty intent types is not allowed");
keys_locked->push_back(
LockBatchEntry<ObjectLockPrefix> {.key = std::move(key), .intent_types = intent_types});
return Status::OK();
}

Result<DetermineKeysToLockResult> DetermineKeysToLock(
Result<DetermineKeysToLockResult<RefCntPrefix>> DetermineKeysToLock(
const std::vector<std::unique_ptr<DocOperation>>& doc_write_ops,
const ArenaList<LWKeyValuePairPB>& read_pairs,
IsolationLevel isolation_level,
RowMarkType row_mark_type,
bool transactional_table,
dockv::PartialRangeKeyIntents partial_range_key_intents) {
DetermineKeysToLockResult result;
DetermineKeysToLockResult<RefCntPrefix> result;
boost::container::small_vector<RefCntPrefix, 8> doc_paths;
boost::container::small_vector<size_t, 32> key_prefix_lengths;
result.need_read_snapshot = false;
Expand Down Expand Up @@ -202,7 +204,8 @@ Result<DetermineKeysToLockResult> DetermineKeysToLock(
// (k3, {kStrongRead, kStrongWrite}),
// ]
// Note that only keys which appear in order in keys_locked will be collapsed in this manner.
void FilterKeysToLock(LockBatchEntries *keys_locked) {
template <typename T>
void FilterKeysToLock(LockBatchEntries<T> *keys_locked) {
if (keys_locked->empty()) {
return;
}
Expand All @@ -226,7 +229,7 @@ void FilterKeysToLock(LockBatchEntries *keys_locked) {
keys_locked->erase(w, keys_locked->end());
}

} // namespace
} // namespace

Result<PrepareDocWriteOperationResult> PrepareDocWriteOperation(
const std::vector<std::unique_ptr<DocOperation>>& doc_write_ops,
Expand All @@ -252,7 +255,7 @@ Result<PrepareDocWriteOperationResult> PrepareDocWriteOperation(
}
result.need_read_snapshot = determine_keys_to_lock_result.need_read_snapshot;

FilterKeysToLock(&determine_keys_to_lock_result.lock_batch);
FilterKeysToLock<RefCntPrefix>(&determine_keys_to_lock_result.lock_batch);
VLOG_WITH_FUNC(4) << "filtered determine_keys_to_lock_result="
<< determine_keys_to_lock_result.ToString();
const MonoTime start_time = (tablet_metrics != nullptr) ? MonoTime::Now() : MonoTime();
Expand Down Expand Up @@ -509,5 +512,92 @@ void CombineExternalIntents(
provider->SetValue(buffer.AsSlice());
}

// We associate a list of <KeyEntryType, IntentTypeSet> to each table lock type such that the
// table lock conflict matrix of postgres is preserved.
//
// For instance, let's consider 'ROW_SHARE' and 'EXCLUSIVE' lock modes.
// 1. 'ROW_SHARE' lock mode on object would lead to the following keys
// [<object/object hash/other prefix> kWeakObjectLock] [kStrongRead]
// 2. 'EXCLUSIVE' lock mode on the same object would lead to the following keys
// [<object/object hash/other prefix> kWeakObjectLock] [kWeakWrite]
// [<object/object hash/other prefix> kStrongObjectLock] [kStrongRead, kStrongWrite]
//
// When checking conflicts for the same key, '[<object/object hash/other prefix> kWeakObjectLock]'
// in this case, we see that the intents requested are [kStrongRead] and [kWeakWrite] for modes
// 'ROW_SHARE' and 'EXCLUSIVE' respectively. And since the above intenttype sets conflict among
// themselves, we successfully detect the conflict.
const std::vector<std::pair<KeyEntryType, dockv::IntentTypeSet>>& GetEntriesForLockType(
TableLockType lock) {
static const std::array<
std::vector<std::pair<KeyEntryType, dockv::IntentTypeSet>>,
TableLockType_ARRAYSIZE> lock_entries = {{
// NONE
{{}},
// ACCESS_SHARE
{{
{KeyEntryType::kWeakObjectLock, dockv::IntentTypeSet {dockv::IntentType::kWeakRead}}
}},
// ROW_SHARE
{{
{KeyEntryType::kWeakObjectLock, dockv::IntentTypeSet {dockv::IntentType::kStrongRead}}
}},
// ROW_EXCLUSIVE
{{
{KeyEntryType::kStrongObjectLock, dockv::IntentTypeSet {dockv::IntentType::kWeakRead}}
}},
// SHARE_UPDATE_EXCLUSIVE
{{
{
KeyEntryType::kStrongObjectLock,
dockv::IntentTypeSet {dockv::IntentType::kStrongRead, dockv::IntentType::kWeakWrite}
}
}},
// SHARE
{{
{KeyEntryType::kStrongObjectLock, dockv::IntentTypeSet {dockv::IntentType::kStrongWrite}}
}},
// SHARE_ROW_EXCLUSIVE
{{
{
KeyEntryType::kStrongObjectLock,
dockv::IntentTypeSet {dockv::IntentType::kWeakRead, dockv::IntentType::kStrongWrite}
}
}},
// EXCLUSIVE
{{
{KeyEntryType::kWeakObjectLock, dockv::IntentTypeSet {dockv::IntentType::kWeakWrite}},
{
KeyEntryType::kStrongObjectLock,
dockv::IntentTypeSet {dockv::IntentType::kStrongRead, dockv::IntentType::kStrongWrite}
}
}},
// ACCESS_EXCLUSIVE
{{
{KeyEntryType::kWeakObjectLock, dockv::IntentTypeSet {dockv::IntentType::kStrongWrite}},
{
KeyEntryType::kStrongObjectLock,
dockv::IntentTypeSet {dockv::IntentType::kStrongRead, dockv::IntentType::kStrongWrite}
}
}}
}};
return lock_entries[lock];
}

// Returns a DetermineKeysToLockResult object with its lock_batch containing a list of entries with
// 'key' as <object id, KeyEntry> and 'intent_types' set.
Result<DetermineKeysToLockResult<ObjectLockPrefix>> DetermineObjectsToLock(
const google::protobuf::RepeatedPtrField<ObjectLockPB>& objects_to_lock) {
DetermineKeysToLockResult<ObjectLockPrefix> result;
for (const auto& object_lock : objects_to_lock) {
SCHECK(object_lock.has_id(), IllegalState, "Expected non-empty id in ObjectLockPB");
for (const auto& [lock_key, intent_types] : GetEntriesForLockType(object_lock.lock_type())) {
ObjectLockPrefix key(object_lock.id(), lock_key);
RETURN_NOT_OK(FormSharedLock(key, intent_types, &result.lock_batch));
}
}
FilterKeysToLock<ObjectLockPrefix>(&result.lock_batch);
return result;
}

} // namespace docdb
} // namespace yb
32 changes: 31 additions & 1 deletion src/yb/docdb/docdb.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,9 @@
#include "yb/common/transaction.h"

#include "yb/docdb/docdb_fwd.h"
#include "yb/docdb/shared_lock_manager_fwd.h"
#include "yb/dockv/doc_path.h"
#include "yb/docdb/doc_write_batch.h"
#include "yb/docdb/docdb.messages.h"
#include "yb/docdb/docdb.pb.h"
#include "yb/docdb/docdb_types.h"
#include "yb/docdb/lock_batch.h"
Expand Down Expand Up @@ -157,6 +157,20 @@ Status EnumerateIntents(
const dockv::EnumerateIntentsCallback& functor,
dockv::PartialRangeKeyIntents partial_range_key_intents);

// With the exception of table-locks/object-locks, type T below always takes value 'RefCntPrefix'.
// The TSLocalLockManager instantiates an ObjectLockManager that uses LockManagerImpl with
// 'ObjectLockPrefix' type and the relevant locking codepath uses DetermineKeysToLockResult struct
// with type 'ObjectLockPrefix'.
template <typename T>
struct DetermineKeysToLockResult {
LockBatchEntries<T> lock_batch;
bool need_read_snapshot;

std::string ToString() const {
return YB_STRUCT_TO_STRING(lock_batch, need_read_snapshot);
}
};

// replicated_batches_state format does not matter at this point, because it is just
// appended to appropriate value.
void PrepareTransactionWriteBatch(
Expand Down Expand Up @@ -261,5 +275,21 @@ void CombineExternalIntents(
SubTransactionId subtransaction_id,
ExternalIntentsProvider* provider);

// We achieve the same table lock conflict matrix as that of pg documented here,
// https://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-TABLES
//
// We only have 4 lock/intent modes kWeak/kStrong Read/Write, but to generate the above conflict
// matrix, we would need more lock types. Instead of introducing additional lock types, we use two
// KeyEntryType values and associate a list of <KeyEntryType, IntentTypeSet> to each table lock.
// Since our conflict detection mechanism checks conflicts against each key, we indirectly achieve
// the exact same conflict matrix. Refer comments on the function definition for more details.
const std::vector<std::pair<dockv::KeyEntryType, dockv::IntentTypeSet>>&
GetEntriesForLockType(TableLockType lock);

// Returns DetermineKeysToLockResult<ObjectLockPrefix> which can further be passed to
// ObjectLockManager to acquire locks against the required objects with the given lock type.
Result<DetermineKeysToLockResult<ObjectLockPrefix>> DetermineObjectsToLock(
const google::protobuf::RepeatedPtrField<ObjectLockPB>& objects_to_lock);

} // namespace docdb
} // namespace yb
Loading

0 comments on commit a03ccda

Please sign in to comment.