Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Snapshot Isolation OCC to DBTransaction #19

Merged
merged 25 commits into from
Jun 30, 2022

Conversation

CMCDragonkai
Copy link
Member

@CMCDragonkai CMCDragonkai commented May 24, 2022

Derived from #18

Description

This implements the snapshot isolation DB transaction.

This means DBTransaction will be automatically snapshot isolated, which means most locking will be unnecessary.

Instead when performing a transaction, there's a chance for a ErrorDBTransactionConflict exception which means there was a write conflict with another transaction.

Users can then decide on their discretion to retry the operation if they need to (assuming any non-DB side-effects are idempotent, noops or can be compensated). This should reduce the amount of locking overhead we need to do in Polykey. We may bubble up the conflict exception to the user, so the user can re-run their command, or in some cases, in-code we will automatically perform a retry. The user in this case can be the PK client, or the another PK agent or the PK GUI.

There is still one situation where user/application locks are needed, and that's where there may be a write-skew. See snapshot isolation https://en.wikipedia.org/wiki/Snapshot_isolation for more details and also https://www.cockroachlabs.com/blog/what-write-skew-looks-like/.

In the future we may upgrade to SSI (serializable snapshot isolation) which will eliminate this write-skew possibility.

Additionally this PR will also enable the keyAsBuffer and valueAsBuffer options on the iterators, enabling easier usage of the iterators without having to use dbUtils.deserialize<T>(value) where it can be configured ahead of time. - already merged

See this https://www.fluentcpp.com/2019/08/30/how-to-disable-a-warning-in-cpp/ as to how to disable warnings in C++ cross platform.

Also see: https://nodejs.github.io/node-addon-examples/special-topics/context-awareness/

Issues Fixed

Tasks

  • [x] 1. Added additional overloads to iterator for keyAsBuffer and valueAsBuffer - already merged in staging
  • 2. Added transaction snapshot iterator and tie iterator lifecycle with the DBTransaction lifecycle
  • 3. Add snapshotLock to ensure mutual exclusion when using the snapshot iterator
  • 4. Implemented getSnapshot as the last-resort getter for DBTransaction.get
  • 5. Change DBTransaction.iterator to use the snapshot iterator
  • [x] 6. Upgraded dependencies to match TypeScript-Demo-Lib (excluding node v16 upgrade) - already merged in staging
  • [x] 7. Add tests for keyAsBuffer and valueAsBuffer usage on the iterator, expect string and V types. Existing tests should still work. Since by default iterator returns buffers. This is different from get which by default does not return raw buffers. - already merged into staging
  • [ ] 8. Maintain the last updated timestamp for each key being updated in transactions and also in the main DB - relying on rocksdb SI directly now
  • 9. Detect write conflicts and throw ErrorDBTransactionConflict
  • 10. Use the sync option when committing DBTransaction by default, allow it to be set off
  • [ ] 11. See if db_destroy can replace our removal of the db path - no need for this, because we may have more interesting state in the db path
  • [ ] 12. Catch CORRUPTION error code and attempt automatic repair and reopen, or at least provide users the ability to call the command with a special exception for this - Introduce Snapshot Isolation OCC to DBTransaction #19 (comment) - To be addressed later in Catch CORRUPTION error code and attempt automatic repair and reopen, or at least provide users the ability to directly repair #35
  • 13. Added the native code for leveldb and corresponding deps
  • 14. Added TS interfaces for the native module LevelDB
  • 15. Promisified native module into LevelDBP
  • 16. Port over native methods into DB.ts directly
  • [ ] 17. Update the CI/CD to auto-build the prebuilds, and to do auto-release as part of prerelease and release - to be done in staging ci: merge staging to master #38
  • 18. Create DBIterator to maintain async lifecycle of the iterator - Introduce Snapshot Isolation OCC to DBTransaction #19 (comment) (it was a good thing we caught this during benchmarking, a serious performance regression that led us to discover a resource leak)
  • [ ] 19. Use ErrorDBLiveReference to mean that an iterator or transaction object is still alive when db.stop() is called, and therefore it must prevent any stopping, we must there maintain a weakset for every transaction/iterator objects that we create. This can also be used for subdatabases, where you create a prefixed DB in the future.
    • the WeakSet does not support size or length, there's no way to know how many of these objects are still alive
    • instead subobjects will need to maintain reference back to the DB and subtract an allocation counter instead, or just maintain reference to objects and remove them
    • instead we end up using Set then, and destroy removes them from the set
    • no longer uses ErrorDBLiveReference, just auto-closes the same as in C++
  • 20. Create the transaction_* native functions:
    • transactionInit
    • transactionCommit
    • transactionRollback
    • transactionGet
    • transationGetForUpdate
    • transactionPut
    • transactionDel
    • transactionMultiGet
    • transactionClear
    • transactionIteratorInit
    • transactionSnapshot
    • transactionMultiGetForUpdate
  • 21. Prove that native binding to leveldb works
  • 22. Switched over to rocksdb native bindings entirely
  • 23. Modularized rocksdb C++ code for separate compilation, and make it easier to maintian
  • 24. Investigate the releasing of the transaction snapshot if it is used
  • 25. Solve write skew problems with GetForUpdate, this "materializes the conflict" in a write skew so that it becomes a write write conflict. Although rocksdb calls this a read write conflcit. SSI is not provided by rocksdb, it is however available in cockroachdb and badgerdb, but hopefully someone backports that to rocksdb.
  • 26. Provide JS-level locks because rocksdb optimistic transaction does not work with pessimistic locks atm, no lock manager required for now, since if the C++ code eventually provides it, then we would just switch to what C++ provides - https://groups.google.com/g/rocksdb/c/5v64dTxYKEw
  • 27. The DBIterator doesn't seem to work when no level path is specified, it iterates over no records at all, it's possible that our iterator options don't make sense when the data sublevel isn't used.
  • 28. Swap around levelPath and options parameters, because levelPath is a far more used than options, this does imply an API break for EFS... etc, but it should be a quick find and replace

Final checklist

  • Domain specific tests
  • Full tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

@CMCDragonkai
Copy link
Member Author

Bringing in the new changes from TypeScript-Demo-Lib-Native. But without the application builds because this is a pure library. And also removing deployment jobs.

@CMCDragonkai
Copy link
Member Author

Ok it's time to finally bring in the leveldb source code and start hacking C++.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented May 25, 2022

I'm also going to solve the problem with key path here and probably prepare it for merging by cherry picking into staging, this can go along with a number of other CI/CD changes too.

@emmacasolin

@CMCDragonkai CMCDragonkai self-assigned this May 25, 2022
@CMCDragonkai CMCDragonkai requested a review from emmacasolin May 25, 2022 08:07
@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented May 25, 2022

@emmacasolin I'm changing the DBIterator type to be like this and introducing DBIteratorOptions:

/**
 * Iterator options
 * The `keyAsBuffer` property controls
 * whether DBIterator returns KeyPath as buffers or as strings
 * It should be considered to default to true
 * The `valueAsBuffer` property controls value type
 * It should be considered to default to true
 */
type DBIteratorOptions = {
  gt?: KeyPath | Buffer | string;
  gte?: KeyPath | Buffer | string;
  lt?: KeyPath | Buffer | string;
  lte?: KeyPath | Buffer | string;
  limit?: number;
  keys?: boolean;
  values?: boolean;
  keyAsBuffer?: boolean;
  valueAsBuffer?: boolean
};

/**
 * Iterator
 */
type DBIterator<K extends KeyPath | undefined, V> = {
  seek: (k: KeyPath | string | Buffer) => void;
  end: () => Promise<void>;
  next: () => Promise<[K, V] | undefined>;
  [Symbol.asyncIterator]: () => AsyncGenerator<[K, V]>;
};

This means now KeyPath becomes pre-eminent, and anywhere I have KeyPath | Buffer | string, the Buffer or string is intepreted as a singleton KeyPath.

@CMCDragonkai
Copy link
Member Author

This also allows the key returned by the iterator to be later used by seek or the range options.

This will impact downstream EFS and PK usage though. But find and replace should be sufficient.

@CMCDragonkai
Copy link
Member Author

I've updated the DB._iterator to use the new DBIteratorOptions and DBIterator. I haven't tested yet, only type checked.

@CMCDragonkai
Copy link
Member Author

I've also created a utils.toKeyPath function that can be used to easily convert possible keypaths into keypaths. This can be used in our get, put, del functions to Buffer and string into KeyPath.

The keyAsBuffer option now means that the returned KeyPath is converted to an array of string compared to an array of buffers as it would normally be.

@tegefaulkes
Copy link
Contributor

If the problem is double encoding then couldn't this be solved by having clear barriers to where and when the encoding is applied? We need the encoded form internally but the the user needs the un-encoded form. Doesn't this mean encoding conversion only needs to happen when we pass to and from the user?

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented May 26, 2022

If the problem is double encoding then couldn't this be solved by having clear barriers to where and when the encoding is applied? We need the encoded form internally but the the user needs the un-encoded form. Doesn't this mean encoding conversion only needs to happen when we pass to and from the user?

There are clear barriers. It's only applied when user passes input into the system. The problem is only for the iterator. Because iterator returns your the "key" that you're supposed to use later for other operations. Then the solution is not encode the key then again while inside the iterator. Solving the double escaping problem.

@CMCDragonkai
Copy link
Member Author

Something I recently discovered the effect of the empty key.

That is if you have a KeyPath of [] this becomes [''].

Therefore if you use the empty key, in master I believe you would not be able to see this key when iterating.

To solve this, I had to change the default iterator option to instead of using gt to use gte.

For example:

    if (options_.gt == null && options_.gte == null) {
      options_.gte = utils.levelPathToKey(levelPath);
    }

This now ensures that the empty key shows up within the level during iteration.

@CMCDragonkai
Copy link
Member Author

Extra test cases were added for empty keys now, and this actually resolves another bug involving empty keys.

describe('utils', () => {
  const keyPaths: Array<KeyPath> = [
    // Normal keys
    ['foo'],
    ['foo', 'bar'],
    // Empty keys are possible
    [''],
    ['', ''],
    ['foo', ''],
    ['foo', '', ''],
    ['', 'foo', ''],
    ['', '', ''],
    ['', '', 'foo'],
    // Separator can be used in key part
    ['foo', 'bar', Buffer.concat([utils.sep, Buffer.from('key'), utils.sep])],
    [utils.sep],
    [Buffer.concat([utils.sep, Buffer.from('foobar')])],
    [Buffer.concat([Buffer.from('foobar'), utils.sep])],
    [Buffer.concat([utils.sep, Buffer.from('foobar'), utils.sep])],
    [Buffer.concat([utils.sep, Buffer.from('foobar'), utils.sep, Buffer.from('foobar')])],
    [Buffer.concat([Buffer.from('foobar'), utils.sep, Buffer.from('foobar'), utils.sep]),
    ],
    // Escape can be used in key part
    [utils.esc],
    [Buffer.concat([utils.esc, Buffer.from('foobar')])],
    [Buffer.concat([Buffer.from('foobar'), utils.esc])],
    [Buffer.concat([utils.esc, Buffer.from('foobar'), utils.esc])],
    [Buffer.concat([utils.esc, Buffer.from('foobar'), utils.esc, Buffer.from('foobar')])],
    [Buffer.concat([Buffer.from('foobar'), utils.esc, Buffer.from('foobar'), utils.esc])],
    // Separator can be used in level parts
    [Buffer.concat([utils.sep, Buffer.from('foobar')]), 'key'],
    [Buffer.concat([Buffer.from('foobar'), utils.sep]), 'key'],
    [Buffer.concat([utils.sep, Buffer.from('foobar'), utils.sep]), 'key'],
    [Buffer.concat([utils.sep, Buffer.from('foobar'), utils.sep, Buffer.from('foobar')]), 'key'],
    [Buffer.concat([Buffer.from('foobar'), utils.sep, Buffer.from('foobar'), utils.sep]), 'key'],
    // Escape can be used in level parts
    [Buffer.concat([utils.sep, utils.esc, utils.sep]), 'key'],
    [Buffer.concat([utils.esc, utils.esc, utils.esc]), 'key'],
  ];
  test.each(keyPaths.map(kP => [kP]))(
    'parse key paths %s',
    (keyPath: KeyPath) => {
      const key = utils.keyPathToKey(keyPath);
      const keyPath_ = utils.parseKey(key);
      expect(keyPath.map((b) => b.toString())).toStrictEqual(
        keyPath_.map((b) => b.toString()),
      );
    }
  );
});

@CMCDragonkai
Copy link
Member Author

Needs rebase on top of staging now.

@CMCDragonkai
Copy link
Member Author

Time to rebase.

@ghost
Copy link

ghost commented May 26, 2022

👇 Click on the image for a new way to code review
  • Make big changes easier — review code in small groups of related files

  • Know where to start — see the whole change at a glance

  • Take a code tour — explore the change with an interactive tour

  • Make comments and review — all fully sync’ed with github

    Try it now!

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map Legend

@CMCDragonkai CMCDragonkai force-pushed the feature-control branch 3 times, most recently from bd9ca76 to 7b69ba9 Compare May 26, 2022 08:20
@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented May 30, 2022

I was looking at 2 codebases to understand how to integrate leveldb.

  1. https://github.com/Level/classic-level
  2. https://github.com/Level/rocksdb

It appears the classic-level is still using leveldb 1.20 which is 5 years old. The rocksdb is bit more recent.

The leveldb codebase has a bit more of a complicated build process.

The top level binding.gyp includes a lower level gyp file:

    "dependencies": [
      "<(module_root_dir)/deps/leveldb/leveldb.gyp:leveldb"
    ],

The deps/leveldb/leveldb.gyp file contains all the settings to actually compile the leveldb as a shared object.

Note that the binding.cc does import leveldb headers like:

#include <leveldb/db.h>

These headers are not specified by the binding.gyp, it's possible that by specifying the dependencies, the inclusion headers made available to the top-level target.

The leveldb.gyp also specifies a dependency in snappy:

    "dependencies": [
      "../snappy/snappy.gyp:snappy"
    ],

These are all organised under deps. These are not git submodules. Except for the snappy.

I think for us, we should just copy the structure of deps, as well as the submodule configuration. We can preserve the leveldb.gyp and snappy.gyp, and then just write our own binding.gyp that uses it. Then things should proceed as normal.

So it seems that leveldb has alot of legacy aspects, rocksdb compilation is alot cleaner. In fact lots of tooling has stopped using gyp file. We can explore that later.

@CMCDragonkai
Copy link
Member Author

The presence of the snappy submodule means cloning has to be done now with git clone --recursive. If you have already cloned, setup the git submodule with git submodule update --init --recursive. It should bring in data into deps/snappy/snappy.

@CMCDragonkai
Copy link
Member Author

While porting over the binding.gyp from classic-level, we have to be aware of: MatrixAI/TypeScript-Demo-Lib#38 (comment)

The cflags and cflags_cc both apply to g++ and g++ is used when the file is cpp.

However both c and cpp files may be used at the same time, so we should be setting relevant flags for both cflags and cflags_cc.

I'm not sure if this is true for non-linux platforms. The upstream classic-level does not bother with cflags_cc. However we will set it just so that we can get the proper standard checks.

@CMCDragonkai
Copy link
Member Author

I found out what cflags+ means. It is based on:

If the key ends with a plus sign (+), the policy is for the source list contents to be prepended to the destination list. Mnemonic: + for addition or concatenation.

So cflags+ will prepend rather than appending as normal. This sets the visibility=hidden to be true.

This is required due to: https://github.com/nodejs/node-addon-api/blob/main/doc/setup.md. The reason this is required is documented here: nodejs/node-addon-api#460 (comment). This should go into TS-Demo-Lib-Native too.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Jun 26, 2022

I've added a test PCC locking to prevent thrashing for racing counters that demonstrates how to address racing counters atm. This is particularly relevant to @tegefaulkes.

Note that since the DBTransaction doesn't have any native locking yet as per task 26., this means locking has to be done outside of transaction construction. Let me know if this will cause problems. If it does, we have to address task 26, otherwise I can push that to be done later.

The main thing for implementing task 26, is to integrate LockBox into DBTransaction. It would need to ensure that locks are only released when the transaction is committed or rollbacked. Deadlock detection is an optional feature on top of that. Any implementation of this should also look into if we can also optimise the ability to retry transactions.

Retrying transactions is inefficient. If a transaction conflict occurs, it is currently necessary to recreate the entire transaction object. This is because the C++ code itself destroys the properties early if committing was not allowed. If we loosen this, then allow resetting the transaction snapshot, it should be possible to "retry" a transaction just be recalling the commit. (Actually I might try this now to see if it can be easily implemented after getting all the tests passing).

@CMCDragonkai
Copy link
Member Author

Task 17 can only be done after merging to staging.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Jun 27, 2022

Ok to solve task 26, we need to introduce the LockBox.

It has to be shared between all the transactions, which means it's going to be stored on DB.

Each transaction will then expose a Transaction.lock method.

So it may look like:

t1 = withF([db.transaction()], async ([tran]) => {
  await tran.lock(['counter', Lock], ['someotherkey', Lock]);
});

t2 = withF([db.transction()], async ([tran]) => {
  await tran.lock(['counter', Lock]);
});

await Promise.allSettled([t1, t2]);

Note that tran.lock would be the LockBox.lock.

However LockBox.lock takes Array<LockRequest>.

I'm wondering if its worth simplifying this. By convention we should be locking based on key paths, but really anything could be used. If we simplify things, one has to then be able to take just a string, and then the default lock constructor can be Lock to ensure mutex.

Furthermore if RWLock is used, then you get alot of flexibility with how you want to lock things.

Most important is that the locks are only released after you commit or rollback.

No deadlock detection, this can be addressed later.

As for re-entrant locking, that's something that should also be done, but can also be done later.

So future work:

  • Deadlock detection
  • Allow re-entrant locking, by tracking what has been locked within a transaction. Use a Set<ToString> to check for membership.
  • Allow lock upgrading - this is more for js-async-locks, to allow one to upgrade locks from read to write locks or downgrading

But I do want to add in default lock constructor if it is not specified, it should just be Lock. This will need update on js-async-locks. This means await tran.lock('abc', 'foo') means an automatic lock on abc and foo with just Lock.

@CMCDragonkai
Copy link
Member Author

Yea so LockBox.lock returns a ResourceAcquire. This means it's a bit clunky to use, requiring you to use it with the withF internally.

I reckon this is a bit incorrect, since locks are not a separate resource within the transaction, instead transaction locks are properties of the transaction itself. Therefore Transaction.lock will instead just return Promise<void>.

This would allow one to do things like:

await tran.lock('abc');
tran.unlock('abc');

In addition to this, the tran.lock('abc') may return a Promise<[ResourceRelease]>. Alloying you keep a function reference to the release.

To be able to do tran.unlock('abc') would require us to keep a reference to all releasers in a Map. If we need a Map to do re-entrant locking, we might as well make this possible.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Jun 28, 2022

Ok turns out I have some problems with the design of LockBox, and I'm making changes to MatrixAI/js-async-locks#14.

In our DBTransaction, we want to be able to imperatively unlock keys or a subset of keys that are locked by the transaction.

I originally programmed this by adding the ability to LockBox.unlock.

However it turns out that this doesn't work if the LockBox is used with RWLockWriter or RWLockReader. The reason is that there could be multiple possible releases for multiple reader locks on the same key. The Lockbox.unlock is only given a key, it doesn't know which of the reader locks to unlock.

This means the tracking of imperative unlocking has to be done by the user/owner of the LockBox, but not the LockBox itself.

Since locked keys are isolated to each transaction, then we can have each DBTransaction instance keep track of the releasers for each locked key. With lock re-entrancy, it doesn't make sense for the Transaction to be able to acquire 2 read locks on the same key. Therefore a transaction can maintain a map of lockReleasers: Map<string, ResourceRelease> rather than the LockBox.

However the issue is that the LockBox.lock does not expose this information. Each call to the method returns a single ResourceRelease that release all the locks held. We could eliminate the ability to lock multiple keys in one go in DBTransaction.lock, then we could store each 1 release for each key locked. But then we do lose some of the niceties that LockBox.lock provides like sorting, unique, and key.toString(). However I also noticed that we have to redo these functions anyway in DBTransaction.lock, and the reason is that we have to keep track of all locks locked, so we can unlock them in reverse order during DBTransaction.destroy. If this is the case, then our DBTransaction.lock replicates the API of LockBox.lock, but internally it would have to use LockBox.lock one at a time.

Furthermore I found that it's not a good idea to use setupSnapshot() in the DBTransaction.lock method. Definitely not before, as it prevents its ability to solve the counter racing problem, since the snapshots set before locking results in a conflict. I'm not sure about setting snapshots after locking, so I just removed lazy snapshot setting from DBTransaction.lock entirely.

@CMCDragonkai
Copy link
Member Author

Ok js-async-locks is queued for 3.0.0 update.

It's all done, PCC locking is now available in DBTransaction.

@CMCDragonkai
Copy link
Member Author

Subsequent work must take place on the staging branch in order to run the builds and test all the release jobs.

@CMCDragonkai CMCDragonkai marked this pull request as ready for review June 30, 2022 10:45
@CMCDragonkai CMCDragonkai changed the title WIP: Introduce Snapshot Isolation OCC to DBTransaction Introduce Snapshot Isolation OCC to DBTransaction Jun 30, 2022
@CMCDragonkai CMCDragonkai merged commit 8fd0c9a into staging Jun 30, 2022
@CMCDragonkai
Copy link
Member Author

This should trigger v5 of js-db.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants