Implement 'batch mode' for persisting allocations on the client. #9093

ashtuchkin · 2020-10-14T20:10:24Z

Fixes #9047, see problem details there.

As a solution, we use BoltDB's 'Batch' mode that opportunistically combines multiple parallel writes into a small number of transactions. See https://github.com/boltdb/bolt#batch-read-write-transactions for more information.

hashicorp-cla · 2020-10-14T20:10:28Z

All committers have signed the CLA.

tgross

Hi @ashtuchkin!

The concept of this change looks good. From an implementation perspective we generally try to avoid having boolean flags, especially in a case like this where the flag ends up having to be passed down thru a few layers of method calls to actually be used. I've marked the specific changes I'd recommend in the review.

(As an aside, I was wondering why we don't see this kind of problem in the server and of course it turns out that raft is batching changes. So the concept here is certainly solid and has precedent in Nomad.)

I'd also like to flag this change for @schmichael and @nickethier because I know they're working on an experiment that ran into client performance issues, so it might be of interest to them.

tgross · 2020-10-15T12:28:49Z

client/allocrunner/alloc_runner.go

@@ -877,36 +877,37 @@ func (ar *allocRunner) destroyImpl() {
 	ar.destroyedLock.Unlock()
 }

-func (ar *allocRunner) PersistState() error {
+func (ar *allocRunner) PersistState(inBatch bool) error {


There's only one caller for PersistState, so this argument is always true. Let's remove this flag.

tgross · 2020-10-15T12:31:27Z

client/state/state_database.go

-func (s *BoltStateDB) PutAllocation(alloc *structs.Allocation) error {
-	return s.db.Update(func(tx *boltdd.Tx) error {
+func (s *BoltStateDB) PutAllocation(alloc *structs.Allocation, inBatch bool) error {
+	return s.updateOrBatch(inBatch, func(tx *boltdd.Tx) error {


Rather than having the boolean behavior flag and this function indirection, let's make the signature really super clear by splitting out a PutAllocation and a PutAllocationBatch, both of which call into a putAllocationImpl with the appropriate boltdb function.

Same for PutNetworkStatus and DeleteAllocationBucket

That makes sense. The naming here is tricky; I'd definitely expect PutAllocationBatch to have a slice of allocations as the argument, not a single allocation.

That makes me think - if we add a new method to the interface anyway, maybe we can make it an actual "batch" variant that explicitly saves multiple allocations and not rely on hidden state using boltdb's Batch method? Wdyt?
This would require changing PersistState to PersistStateBatch too, but that shouldn't be hard to do.

maybe we can make it an actual "batch" variant that explicitly saves multiple allocations

I'm open to the idea of this but given how we run PersistState by iterating over allocation runners (which requires setting and releasing locks on those runner), I suspect that's not going to be very nice for the callers. Give it a go if you'd like though!

Yeah handling destroyedLock and stateLock might be a problem with real batching.

I've fleshed out the two-method approach you suggested and it doesn't look tidy enough to my taste, so wanted to circle back for advice. Conceptually, that boolean flag is an adjustment in behavior (basically opt-in to a perf optimization), not a new behavior, so it looks weird when I have to duplicate more or less the same code in several places. It looks like this:

// PutAllocation stores an allocation or returns an error if it could // not be stored. PutAllocation(*structs.Allocation) error // PutAllocationInBatch has exactly the same semantics as PutAllocation above, // with the addition that the write transactions can be shared with other concurrent // writes, increasing performance in batch scenarios. PutAllocationInBatch(*structs.Allocation) error // Get/Put NetworkStatus get and put the allocation's network // status. It may be nil. GetNetworkStatus(allocID string) (*structs.AllocNetworkStatus, error) PutNetworkStatus(allocID string, ns *structs.AllocNetworkStatus) error // PutNetworkStatusInBatch has exactly the same semantics as PutNetworkStatus above, // with the addition that the write transactions can be shared with other concurrent // writes, increasing performance in batch scenarios. PutNetworkStatusInBatch(*structs.Allocation) error

So while I agree that using booleans to change behavior is a red flag, in this particular case it's arguably not changing semantics, just performance in particular scenarios.

I saw another approach to implementing such behavior adjustments using variadic options, something like this:

PutAllocation(*structs.Allocation, ...WriteOption) error

(see e.g. https://medium.com/soon-london/variadic-configuration-functions-in-go-8cef1c97ce99 for details)

That would avoid polluting cases where no such option is needed, while allowing these adjustments where needed. What do you think?

That would avoid polluting cases where no such option is needed, while allowing these adjustments where needed. What do you think?

Oh, that's super nice. Go for it.

Ok I think it's ready for review.

ashtuchkin · 2020-10-15T17:38:10Z

Thanks for the review Tim! Will make the changes soon.

schmichael

Looks great, thanks for submitting this @ashtuchkin. I'll let @tgross take care of reviewing and merging.

Have you confirmed this fixes your IO issue @ashtuchkin?

I'm pretty concerned the de-duplication in boltdd isn't helping. Seems like the transactional overhead may remove the IO benefit of the de-duplication in which case we may just be wasting CPU/memory doing it!

Temporary metrics or logging around de-duplication hits/misses may help tell if de-duping is happening. Sadly at a glance it seems like if the transaction overhead does destroy the theoretical IO savings it would take quite a bit of reworking to only opportunistically create the transaction on change.

In hindsight doing de-duplication at the StateDB level may have made more sense.

ashtuchkin · 2020-10-20T00:05:51Z

Thanks Michael!

Have you confirmed this fixes your IO issue @ashtuchkin?

Yes, I've deployed this version on my cluster and the IOPS does return to normal when the cluster becomes idle.

I'm pretty concerned the de-duplication in boltdd isn't helping. Seems like the transactional overhead may remove the IO benefit of the de-duplication in which case we may just be wasting CPU/memory doing it!

I traced commit path in boltdb and it does 2 unconditional fdatasync() calls per commit (unless NoSync=true), see Tx.write, Tx.writeMeta, called directly from Tx.Commit in https://github.com/boltdb/bolt/blob/master/tx.go. No writes happen if the transaction is discarded, though, so one option would be to discard empty transactions (I wonder if we can do it automatically at boltdd level?).

tgross

This is looking great @ashtuchkin! Thanks so much for your patience in getting this into shape.

I've left one question about the test assertion.

tgross · 2020-10-20T12:43:57Z

client/state/db_test.go

+		// and MaxBatchSize parameters).
+		if getTxID != nil {
+			numTransactions := getTxID() - prevTxID
+			require.Less(numTransactions, 10)


Was this "< 10 transactions" determined empirically? The default MaxBatchSize is 1000, which implies we can assert "at least 2" but I'm less certain about the "at most" we can assert. Can we use a timer to figure out how many 10ms periods have passed to make a stronger assertion?

I just want to make sure we're not introducing a potential test flake (this has been a pain point), especially for full unit test runs with parallelism on the VMs used for CI.

Yep flake tests is a common pain point - I've wasted so much time on it too.

You're right, most of the time when running this test we get 2 transactions, each with 1000 writes. Time is usually not a limiting factor here – spawning goroutines is pretty fast and writes are all in-memory (the only I/O we have is when these 2 transactions are committed). I used 10 as a "safe" threshold that should hold even under load, but that's of course arbitrary.

I initially opted against adding a more specific "expected number of transactions" calculation here because that would add coupling to specific algorithm that boltdb uses, in addition to adding complexity. Now that I'm thinking about, the first concern is probably not relevant as boltdb is frozen, so ok, let's do it.

Ok, I think it's ready for review. I've added a separate commit for this change so that it's easier to see what exactly changed.

Fixes hashicorp#9047, see problem details there. As a solution, we use BoltDB's 'Batch' mode that combines multiple parallel writes into small number of transactions. See https://github.com/boltdb/bolt#batch-read-write-transactions for more information.

…n tests.

schmichael · 2020-10-20T18:59:09Z

I traced commit path in boltdb and it does 2 unconditional fdatasync() calls per commit (unless NoSync=true), see Tx.write, Tx.writeMeta, called directly from Tx.Commit in https://github.com/boltdb/bolt/blob/master/tx.go. No writes happen if the transaction is discarded, though, so one option would be to discard empty transactions (I wonder if we can do it automatically at boltdd level?).

@ashtuchkin That makes complete sense, and yes! Boltdd can definitely do that tracking. Should be as easy as adding a dirty bool to boltdd.Tx and making every mutation method that isn't de-duped set tx.dirty = true. Then boltdd.Tx can That also requires passing boltdd.Tx to the boltdd.Bucket. Since transactions aren't safe for concurrent use, there's no need to worry about locking around the dirty flag.

Probably worth it's own PR since this one is so close to merging. If you don't have time to do that PR, feel free to file an issue or let me know and I can file an issue.

tgross · 2020-10-20T20:18:10Z

@ashtuchkin I've merged this and got it onto the changelog in #9132. This will ship in Nomad 1.0! Thanks again!

ashtuchkin · 2020-10-20T20:19:07Z

Awesome, thank you Tim!

…hicorp#9093) Fixes hashicorp#9047, see problem details there. As a solution, we use BoltDB's 'Batch' mode that combines multiple parallel writes into small number of transactions. See https://github.com/boltdb/bolt#batch-read-write-transactions for more information.

github-actions · 2022-12-14T02:18:28Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

vercel bot deployed to Preview October 14, 2020 20:10 View deployment

ashtuchkin force-pushed the batch-persist branch from 22c5c43 to 657a7d9 Compare October 14, 2020 20:45

vercel bot deployed to Preview October 14, 2020 20:45 View deployment

tgross requested changes Oct 15, 2020

View reviewed changes

ashtuchkin force-pushed the batch-persist branch from 657a7d9 to 6d9e54c Compare October 16, 2020 20:43

vercel bot had a problem deploying to Preview October 16, 2020 20:43 Failure

ashtuchkin force-pushed the batch-persist branch from 6d9e54c to 778f797 Compare October 16, 2020 21:30

vercel bot had a problem deploying to Preview October 16, 2020 21:30 Failure

schmichael approved these changes Oct 19, 2020

View reviewed changes

tgross reviewed Oct 20, 2020

View reviewed changes

vercel bot deployed to Preview October 20, 2020 17:35 View deployment

ashtuchkin force-pushed the batch-persist branch from 170b77e to 5c43490 Compare October 20, 2020 17:36

vercel bot deployed to Preview October 20, 2020 17:40 View deployment

ashtuchkin added 2 commits October 20, 2020 13:46

Added more specific calculation for expected number of transactions i…

556043b

…n tests.

ashtuchkin force-pushed the batch-persist branch from 5c43490 to 556043b Compare October 20, 2020 17:46

vercel bot deployed to Preview October 20, 2020 17:46 View deployment

tgross approved these changes Oct 20, 2020

View reviewed changes

tgross merged commit 1be5243 into hashicorp:master Oct 20, 2020

tgross added a commit that referenced this pull request Oct 20, 2020

changelog entry for #9093

34a2c47

tgross added this to the 1.0 milestone Oct 20, 2020

tgross added a commit that referenced this pull request Oct 20, 2020

changelog entry for #9093 (#9132)

f6ca24c

fredrikhgrelland pushed a commit to fredrikhgrelland/nomad that referenced this pull request Oct 22, 2020

changelog entry for hashicorp#9093 (hashicorp#9132)

a70d318

github-actions bot locked as resolved and limited conversation to collaborators Dec 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement 'batch mode' for persisting allocations on the client. #9093

Implement 'batch mode' for persisting allocations on the client. #9093

ashtuchkin commented Oct 14, 2020

hashicorp-cla commented Oct 14, 2020 •

edited

Loading

tgross left a comment •

edited

Loading

tgross Oct 15, 2020

tgross Oct 15, 2020

ashtuchkin Oct 15, 2020

tgross Oct 15, 2020

ashtuchkin Oct 15, 2020

tgross Oct 16, 2020

ashtuchkin Oct 16, 2020

ashtuchkin commented Oct 15, 2020

schmichael left a comment

ashtuchkin commented Oct 20, 2020

tgross left a comment

tgross Oct 20, 2020

ashtuchkin Oct 20, 2020

ashtuchkin Oct 20, 2020

tgross Oct 20, 2020

schmichael commented Oct 20, 2020

tgross commented Oct 20, 2020

ashtuchkin commented Oct 20, 2020

github-actions bot commented Dec 14, 2022

Implement 'batch mode' for persisting allocations on the client. #9093

Implement 'batch mode' for persisting allocations on the client. #9093

Conversation

ashtuchkin commented Oct 14, 2020

hashicorp-cla commented Oct 14, 2020 • edited Loading

tgross left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ashtuchkin commented Oct 15, 2020

schmichael left a comment

Choose a reason for hiding this comment

ashtuchkin commented Oct 20, 2020

tgross left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

schmichael commented Oct 20, 2020

tgross commented Oct 20, 2020

ashtuchkin commented Oct 20, 2020

github-actions bot commented Dec 14, 2022

hashicorp-cla commented Oct 14, 2020 •

edited

Loading

tgross left a comment •

edited

Loading