Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: prepare for kv.atomic_replication_changes=true #40370

Merged
merged 15 commits into from
Sep 4, 2019

Commits on Sep 4, 2019

  1. storage: populate crt.Desc unconditionally

    This allows simplifying the code because we get to use crt.Desc anywhere
    we know the trigger was created locally (which is almost everywhere).
    
    Release note: None
    tbg committed Sep 4, 2019
    Configuration menu
    Copy the full SHA
    2beb634 View commit details
    Browse the repository at this point in the history
  2. storage: remove learners one by one

    We don't support removing multiple learners atomically just yet, though
    \cockroachdb#40268 will fix this (likely in raft). That PR though is obstructed by cockroachdb#40207
    because we'll need that first to be able to bump raft again (assuming we
    don't want to fork). Instead of dealing with all that upfront, let's
    just not remove multiple learners at once right now so that we can flip
    the default for atomic replication changes to on.
    
    If anyone is still trying to remove only learners atomically, they will
    fail. However the replicate queues place a high priority on removing
    stray learners whenever it finds them, so this wouldn't be a permanent
    problem. Besides, we never add multiple learners at once so it's
    difficult to get into that state in the first place.
    
    Without this commit, TestLearnerAdminRelocateRange fails once atomic
    replication changes are enabled.
    
    Release note: None
    tbg committed Sep 4, 2019
    Configuration menu
    Copy the full SHA
    09ec9b7 View commit details
    Browse the repository at this point in the history
  3. storage: factor out prepareChangeReplicasTrigger

    This separates concerns nicely, but it's also a required refactor to
    be more nuanced about the descriptor changing out from under us.
    
    Release note: None
    tbg committed Sep 4, 2019
    Configuration menu
    Copy the full SHA
    269d79f View commit details
    Browse the repository at this point in the history
  4. storage: request that AdminMerge addresses to desc.StartKey

    This will allow simplifying the implementation by loading the descriptor
    from KV (instead of using r.Desc() which makes an assumption that the
    caller picked the right `r`), and it's already mostly true today. (There
    are no remote production callers of AdminMerge, so no migration
    concerns).
    
    Release note: None
    tbg committed Sep 4, 2019
    Configuration menu
    Copy the full SHA
    3a1c37f View commit details
    Browse the repository at this point in the history
  5. storage: rearrange AdminMerge

    This is in preparation for not loading the descriptor from `r.Desc()`
    but using conditionalGetDescValueFromDB to obtain it.
    
    Release note: None
    tbg committed Sep 4, 2019
    Configuration menu
    Copy the full SHA
    974e9e3 View commit details
    Browse the repository at this point in the history
  6. storage: separate rangeID allocation from descriptor init

    We'll soon need to allocate the RangeID only once, but potentially
    create the descriptor multiple times (for each split txn restart).
    
    Release note: None
    tbg committed Sep 4, 2019
    Configuration menu
    Copy the full SHA
    a775bce View commit details
    Browse the repository at this point in the history
  7. storage: rearrange adminSplitWithDescriptor

    Prepare for getting an updated descriptor handed back from
    conditionalGetDescValueFromDB in each txn attempt.
    
    Release note: None
    tbg committed Sep 4, 2019
    Configuration menu
    Copy the full SHA
    17c56ca View commit details
    Browse the repository at this point in the history
  8. storage: remove a TODO

    We're not going to allow parallel writes anytime soon.
    
    Release note: None
    tbg committed Sep 4, 2019
    Configuration menu
    Copy the full SHA
    c8c27c4 View commit details
    Browse the repository at this point in the history
  9. storage: remove split closure events

    I don't think those have been useful in a long time.
    
    Release note: None
    tbg committed Sep 4, 2019
    Configuration menu
    Copy the full SHA
    d8dac1b View commit details
    Browse the repository at this point in the history
  10. storage: extract splitTxnAttempt closure

    That way it will be easier to avoid confusion between all of the
    descriptors floating around.
    
    Release note: None
    tbg committed Sep 4, 2019
    Configuration menu
    Copy the full SHA
    eb25288 View commit details
    Browse the repository at this point in the history
  11. storage: extract splitStickyTxnUpdateAttempt

    Release note: None
    tbg committed Sep 4, 2019
    Configuration menu
    Copy the full SHA
    a41b9fa View commit details
    Browse the repository at this point in the history
  12. storage: refactor conditionalGetDescValueFromDB

    It now takes a closure that gets to decide whether the descriptor
    returned from KV is "acceptable". This descriptor is then returned,
    and in turn the method doesn't accept a descriptor, just a key from
    which to look up the descriptor.
    
    The point of all this is to allow different callers to check different
    things. For instance, a split doesn't care whether the set of replicas
    changed, and a replication change shouldn't fail if the range has since
    split.
    
    Release note: None
    tbg committed Sep 4, 2019
    Configuration menu
    Copy the full SHA
    c43becf View commit details
    Browse the repository at this point in the history
  13. storage: clarify a knob

    We only want to return early on the "stop after learners" knob if we
    actually did add learners (and not, for example, on a replica removal).
    
    Release note: None
    tbg committed Sep 4, 2019
    Configuration menu
    Copy the full SHA
    dbb5044 View commit details
    Browse the repository at this point in the history
  14. storage: make leaving joint config idempotent

    ChangeReplicas (and AdminSplit, and AdminMerge) take a RangeDescriptor
    that they use as a basis for a CPut to make sure the operations mutating
    the range serialize. This is great for correctness but generally
    unfortunate for usability since on a mismatch, the caller usually wanted
    to do the thing they were trying to do anyway, using the new descriptor.
    The fact that every split (replication change, merge) basically needs a
    retry loop is constant trickle of test flakes and UX papercuts.
    
    It became more pressing to do something against this as we are routinely
    using joint configs when atomic replication changes are enabled. A joint
    configuration is transitioned out of opportunistically whenever it is
    encountered, but this frequently causes a race in which actor A finds
    a joint config, begins a transaction out of it but is raced by actor B
    getting there first. The end result is that what actor A wanted to
    achieve has been achieved, though by someone else, and the result
    is a spurious error.
    
    This commit fixes that behavior in the targeted case of wanting to leave
    a joint configuration: actor A will get a successful result.
    
    Before this change,
    
    make stress PKG=./pkg/sql TESTS=TestShowRangesWithLocal
    
    would fail immediately when `kv.atomic_replication_changes.enabled` was
    true because the splits this test carries out would run into the joint
    configuration changes of the initial upreplication, and would race the
    replicate queue to transition out of them, which at least one split
    would typically lose. This still happens, but now it's not an error any
    more.
    
    I do think that it makes sense to use a similar strategy in general
    (fail replication changes only if the *replicas* changed, allow all
    splits except when the split key moves out of the current descriptor,
    etc) but in the process of coding this up I got reminded of all of
    the problems relating to range merges and also found what I think is
    a long-standing pretty fatal bug, cockroachdb#40367, so I don't want to do anything
    until the release is out of the door. But I'm basically convinced that
    if we did it, it wouldn't cause a new "bug" because any replication
    change carried out in that way is just one that could be triggered
    just the same by a user under the old checks.
    
    Release note: None
    tbg committed Sep 4, 2019
    Configuration menu
    Copy the full SHA
    c7f8c59 View commit details
    Browse the repository at this point in the history
  15. storage: introduce internalReplicationChanges helper type

    This hopefully makes it easier to reason about the code.
    
    Release note: None
    tbg committed Sep 4, 2019
    Configuration menu
    Copy the full SHA
    2ce5bb7 View commit details
    Browse the repository at this point in the history