Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mas i350 selectivesync #1794

Merged
merged 13 commits into from
Oct 6, 2021
Merged

Mas i350 selectivesync #1794

merged 13 commits into from
Oct 6, 2021

Conversation

martinsumner
Copy link
Contributor

Add selective sync to Riak KV. Prior to this change either all vnodes sync'd a PUT (i.e. flushed to disk as part of accepting the PUT) or no vnodes. With no sync the flush decision is controlled by either the database (bitcask, eleveldb) or the operating system (leveled).

This changes allows for three options:

backend - rely on backend configuration as now
all - force a sync on all vnodes regardless of backend configuration
one - force a sync only on the coordinating vnode

The sync option can be set on a per-bucket basis, and over-written on a per-write basis. This provides much more flexibility.

Previously customer who wanted to make sure data was flushed to disk in a least one place for some buckets, had to set backend sync settings to sync, and sync to all vnodes on every write on every bucket. Throughput between sync and non-sync configurations have been discovered to be large even with hardware acceleration (e.g. flash-backed write caches) - see martinsumner/leveled#350

@martinsumner
Copy link
Contributor Author

This PR adds selective sync support only with leveled and eleveldb backends (there is a backend_capability called flush_put and only backends with this capability receive non-standards PUT requests).

Adding support for the bitcask backend is trivial, as the bitcask:sync/1 just needs to be called after a put when flush_put/4 is used. This would have no change to any existing bitcask user, as the default bucket configuration is to revert to backend as now.

@martincox - would you be OK if bitcask support was added to this PR? Normally I try and avoid bitcask changes, to avoid any risk for you guys. The only change required will be in riak_kv_bitcask_backend (not in the actual bitcask repo). Although you may not use the feature, it might be easier to implement across all backends rather than manage the ongoing caveat that it only works in some backends

@martinsumner
Copy link
Contributor Author

basho/riak#1080

Sync straight after a PUT, if strategy is not already to o_sync.
@martinsumner
Copy link
Contributor Author

basho/riak_test#1358

Copy link
Contributor

@ThomasArts ThomasArts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some smaller comments, but did not find serious issues

src/riak_kv_bucket.erl Show resolved Hide resolved
%% Setup write options...
WriteOpts2 = case Sync of
true ->
lists:keyreplace(sync,1,WriteOpts, {sync,Sync});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observe that keyreplace does not add sync in this parameter is missing in the WriteOpts. Is that really what you want?

This is not equal to keydelete(sync, 1, WriteOpts) ++ [{sync, Sync}], which intuitively one probably wants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relies on the presence of sync in the eleveldb cuttlefish:

https://github.com/basho/eleveldb/blob/develop-3.0/priv/eleveldb.schema#L32-L39

which then becomes part of the write_opts at startup:

https://github.com/basho/riak_kv/blob/mas-i350-selectivesync/src/riak_kv_eleveldb_backend.erl#L633-L635

So it will always be there, base don the current logic. However, this does appear to be quite a brittle chain.

src/riak_kv_eleveldb_backend.erl Outdated Show resolved Hide resolved
{ok, state()} |
{error, term(), state()}.
flush_put(Bucket, PrimaryKey, IndexSpecs, Val, State) ->
Sync = true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check comment in leveled_backend, probably just write the boolean as argument

src/riak_kv_leveled_backend.erl Show resolved Hide resolved
expand_sync_on_write(default, BucketProps) ->
normalize_value(get_bucket_option(sync_on_write, BucketProps));
expand_sync_on_write(Value, _BucketProps) ->
Value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should you not normalise here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think normalize_value may be unnecessary in both cases due to validation at the API/riak_kv_bucket

src/riak_kv_vnode.erl Outdated Show resolved Hide resolved
src/riak_kv_vnode.erl Outdated Show resolved Hide resolved
src/riak_kv_vnode.erl Outdated Show resolved Hide resolved
src/riak_kv_vnode.erl Outdated Show resolved Hide resolved
@martinsumner martinsumner merged commit 174a737 into develop-3.0 Oct 6, 2021
@martinsumner martinsumner deleted the mas-i350-selectivesync branch October 6, 2021 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants