Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spanconfig: introduce spanconfig.StoreWriter (and its impl) #70287

Merged
merged 1 commit into from
Oct 2, 2021

Conversation

irfansharif
Copy link
Contributor

@irfansharif irfansharif commented Sep 15, 2021

In #69172 we introduced a spanconfig.StoreReader interface to abstract
away the gossiped system config span. We motivated that PR by teasing
a future implementation of the same interface, an in-memory data
structure to maintain a mapping between between spans and configs
(powered through a view over system.span_configurations introduced in
#69047). This PR introduces just that.

Intended (future) usages:

  • #69614 introduces the KVWatcher interface, listening in on
    system.span_configurations. The updates generated by it will be used
    to populate per-store instantiations of this data structure, with an
    eye towards providing a "drop-in" replacement of the gossiped system
    config span (conveniently implementing the sibling
    spanconfig.StoreReader interface).
  • #69661 introduces the SQLWatcher interface, listening in on changes to
    system.{descriptor,zones} and generating denormalized span config
    updates for every descriptor/zone config change. These updates will
    need to be diffed against a spanconfig.StoreWriter populated with the
    existing contents of KVAccessor to generate the "targeted" diffs
    KVAccessor expects.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@irfansharif
Copy link
Contributor Author

(Bump.)

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 15 of 15 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @arulajmani, and @irfansharif)


pkg/spanconfig/spanconfig.go, line 54 at r1 (raw file):

type Store interface {
	StoreReader
	StoreWriter

nit: keep these in the same order as the definitions.


pkg/spanconfig/spanconfig.go, line 65 at r1 (raw file):

	//
	// Span configs are stored in non-overlapping fashion. When an update
	// overlaps with existing configs, they're deleted. If the overlap is only

s/they're/the existing configs are/


pkg/spanconfig/spanconfig.go, line 66 at r1 (raw file):

	// Span configs are stored in non-overlapping fashion. When an update
	// overlaps with existing configs, they're deleted. If the overlap is only
	// partial, the non-overlapping components are re-added. If the update

s/non-overlapping components/non-overlapping components of the existing configs/


pkg/spanconfig/spanconfig.go, line 85 at r1 (raw file):

	// update against a StoreWriter (pre-populated with the entries present in
	// KV) to generate the targeted deletes and upserts we'd need to issue.
	// After successfully installing them in KV, we can keep our StoreWrite

StoreWriter


pkg/spanconfig/spanconfig.go, line 117 at r1 (raw file):

	// [2]: We could instead expose a GetAllOverlapping() API if needed -- would
	//      make things a bit clearer.
	// [3]: We could skip the delete + upsert if it's a no-op, i.e. the

It wasn't clear from reading this comment whether we are doing this or not. Is this part of the contract of this interface? [2] seems to imply that it is, but this could be made more explicit either way.


pkg/spanconfig/spanconfig.go, line 122 at r1 (raw file):

	//      indicating a no-op.
	Apply(ctx context.Context, update Update, dryrun bool) (
		deleted []roachpb.Span, added []roachpb.SpanConfigEntry,

The types here are a little strange. We have three different types, Update, roachpb.Span, and roachpb.SpanConfigEntry that are all closely related and all representable as either an Updates or roachpb.SpanConfigEntry. Is this possible to clean up?

I imagine that if we unified the typing, some clarity about the role of this interface would emerge. For instance, if the signature was Apply(ctx, Update, bool) []Update, it would be more clear that the StoreWriter is used to perform a stateful mapping from an arbitrary Update to the set of Updates needed to meet the KVAccessors more restricted interface.


pkg/spanconfig/spanconfigstore/shadow.go, line 43 at r1 (raw file):

func (s *ShadowReader) NeedsSplit(ctx context.Context, start, end roachpb.RKey) bool {
	newResult := s.new.NeedsSplit(ctx, start, end)
	oldResult := s.old.NeedsSplit(ctx, start, end)

Should we even call the corresponding method on s.old in these methods if !log.ExpensiveLogEnabled(ctx, 1)?


pkg/spanconfig/spanconfigstore/store.go, line 60 at r1 (raw file):

// ComputeSplitKey is part of the spanconfig.StoreReader interface.
func (s *Store) ComputeSplitKey(ctx context.Context, start, end roachpb.RKey) roachpb.RKey {

Do we need to handle the rest of the staticSplits points in this function? Or is this meant to be handled above the StoreReader? If so, should keys.SystemConfigSpan be handled in here?


pkg/spanconfig/spanconfigstore/store.go, line 78 at r1 (raw file):

	// Iterate over all overlapping ranges, return corresponding span config
	// entries.

This comment doesn't look right.


pkg/spanconfig/spanconfigstore/store.go, line 98 at r1 (raw file):

func (s *Store) GetSpanConfigForKey(
	ctx context.Context, key roachpb.RKey,
) (roachpb.SpanConfig, error) {

Do we need the RLock in this method?


pkg/spanconfig/spanconfigstore/store.go, line 133 at r1 (raw file):

	deleted = make([]roachpb.Span, 0, len(entriesToDelete))
	for _, entry := range entriesToDelete {
		entry := entry // copy out of loop variable

To avoid a heap allocation per entry:

for i := range entriesToDelete {
    entry := &entriesToDelete[i]

and then might as well:

    deleted[i] = entry.Span

Same thing below.


pkg/spanconfig/spanconfigstore/store.go, line 211 at r1 (raw file):

			// ex:     [-----------------)
			//
			// up:     ------------------)

Are these three cases where existing.Span.ContainsKey(update.Span.EndKey)? Are there cases where the first condition here is true but the second is false?


pkg/spanconfig/spanconfigstore/store_test.go, line 80 at r1 (raw file):

}

// parseConfig is helper function that constructs a roachpb.SpanConfig with

s/with/that is/


pkg/spanconfig/spanconfigstore/store_test.go, line 165 at r1 (raw file):

					b.WriteString(fmt.Sprintf("added %s\n", printSpanConfigEntry(ent)))
				}
				return b.String()

tiny nit: new-line after these two b.String() for consistency.


pkg/spanconfig/spanconfigstore/testdata/overlap, line 15 at r1 (raw file):

added [b,d):A
added [f,h):A
added [d,f):B

Did you ever consider implementing accumulateOpsForLocked in a way that led to an ordered list of added configs? It wouldn't be too bad because you can assume that the existing spans are already non-overlapping. But it may also not be necessary so not worth adding to the StoreWriter contract.


pkg/spanconfig/spanconfigstore/testdata/overlap, line 26 at r1 (raw file):

# Check that writing a span that partially overlaps with multiple existing
# entries deletes all of them, and re-adds the right non-overlapping fragments
# with the right configs..

nit: two periods.

In cockroachdb#69172 we introduced a spanconfig.StoreReader interface to abstract
away the gossiped system config span. We motivated that PR by teasing
a future implementation of the same interface, an in-memory data
structure to maintain a mapping between between spans and configs
(powered through a view over system.span_configurations introduced in
\cockroachdb#69047). This PR introduces just that.

Intended (future) usages:
- cockroachdb#69614 introduces the KVWatcher interface, listening in on
  system.span_configurations. The updates generated by it will be used
  to populate per-store instantiations of this data structure, with an
  eye towards providing a "drop-in" replacement of the gossiped system
  config span (conveniently implementing the sibling
  spanconfig.StoreReader interface).
- cockroachdb#69661 introduces the SQLWatcher interface, listening in on changes to
  system.{descriptor,zones} and generating denormalized span config
  updates for every descriptor/zone config change. These updates will
  need to be diffed against a spanconfig.StoreWriter populated with the
  existing contents of KVAccessor to generate the "targeted" diffs
  KVAccessor expects.

Release note: None
Copy link
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @arulajmani, and @nvanbenschoten)


pkg/spanconfig/spanconfig.go, line 117 at r1 (raw file):

whether we are doing this or not

By "this" do you mean whether attempting to apply a "no-op" update would return empty lists for spans that were deleted and span configs that were added? I was hoping the next sentence captured this contract:

//  Using Apply (dryrun=true) for e.g. would return empty lists, indicating a no-op.

If by "this" you mean "We could skip the delete + upsert if it's a no-op", that's not something we're doing yet. This comment block was intended to outline what a future integration would a KVAccessor would look like (something @arulajmani will pick up after #69661). In that future PR, when diffing against an image of KV span config state stored in this Store, during full reconciliation we'd be able to detect whether an intended update is a no-op and thus skip over deleting and upserting new entries in KV. I've tried rewording to make it clearer.


pkg/spanconfig/spanconfig.go, line 122 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

The types here are a little strange. We have three different types, Update, roachpb.Span, and roachpb.SpanConfigEntry that are all closely related and all representable as either an Updates or roachpb.SpanConfigEntry. Is this possible to clean up?

I imagine that if we unified the typing, some clarity about the role of this interface would emerge. For instance, if the signature was Apply(ctx, Update, bool) []Update, it would be more clear that the StoreWriter is used to perform a stateful mapping from an arbitrary Update to the set of Updates needed to meet the KVAccessors more restricted interface.

Yea, I've gone back and forth on it. I did originally have it as Updates all the way through, but I found myself needlessly translating it back into this []roachpb.Span, []roachpb.SpanConfigEntry form that the KVAccessor this will get wired to expects:

// UpdateSpanConfigsRequest is used to update the span configurations over the
// given spans.
//
// This is a "targeted" API: the spans being deleted are expected to have been
// present with the same bounds (same start/end key); the same is true for spans
// being upserted with new configs. If bounds are mismatched, an error is
// returned. If spans are being added, they're expected to not overlap with any
// existing spans. When divvying up an existing span into multiple others,
// callers are expected to delete the old and upsert the new ones. This can
// happen as part of the same request; we delete the spans marked for deletion
// before upserting whatever was requested.
//
// Spans are not allowed to overlap with other spans in the same list but can
// across lists. This is necessary to support the delete+upsert semantics
// described above.
message UpdateSpanConfigsRequest {
// ToDelete captures the spans we want to delete configs for.
repeated Span to_delete = 1 [(gogoproto.nullable) = false];
// ToUpsert captures the spans we want to upsert and the configs we want to
// upsert with.
repeated SpanConfigEntry to_upsert = 2 [(gogoproto.nullable) = false];
};

The translation would allocate on the heap each time. We could alternatively re-orient the KVAccessor + UpdateSpanConfig interface to be more Update oriented, but I think that ends up being less clear compared to the "delete + upsert" semantics it now has. I'll keep this as is for now.


pkg/spanconfig/spanconfigstore/shadow.go, line 43 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should we even call the corresponding method on s.old in these methods if !log.ExpensiveLogEnabled(ctx, 1)?

Good catch, done.


pkg/spanconfig/spanconfigstore/store.go, line 60 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do we need to handle the rest of the staticSplits points in this function? Or is this meant to be handled above the StoreReader? If so, should keys.SystemConfigSpan be handled in here?

Yup, everything else in staticSplits will end up getting handled above StoreReader -- all the other special spans happen to have explicit span configs defined on them (RANGE LIVENESS, TIMESERIES, META, etc.). The system config span is an anomaly though, since it's not addressable either to set a config on -- so made sense to special case it here. This property will be more rigorously tested once we wire it up for realsies and can print out all the split points.


pkg/spanconfig/spanconfigstore/store.go, line 78 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This comment doesn't look right.

Oops, removed.


pkg/spanconfig/spanconfigstore/store.go, line 98 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do we need the RLock in this method?

Done.


pkg/spanconfig/spanconfigstore/store.go, line 133 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

To avoid a heap allocation per entry:

for i := range entriesToDelete {
    entry := &entriesToDelete[i]

and then might as well:

    deleted[i] = entry.Span

Same thing below.

Done.


pkg/spanconfig/spanconfigstore/store.go, line 211 at r1 (raw file):

Are these three cases where existing.Span.ContainsKey(update.Span.EndKey)?

Ah, no, removed these examples.

Are there cases where the first condition here is true but the second is false?

You mean existing.Span.ContainsKey(update.Span.EndKey) && !post.Valid()? Hm, I guess not. Simplified the conditional.


pkg/spanconfig/spanconfigstore/testdata/overlap, line 15 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Did you ever consider implementing accumulateOpsForLocked in a way that led to an ordered list of added configs? It wouldn't be too bad because you can assume that the existing spans are already non-overlapping. But it may also not be necessary so not worth adding to the StoreWriter contract.

I hadn't, and I'll skip it for now given that this list is still deterministic for testing purposes. That said it's a pretty small change to make later if it makes the interfaces easier to grok.

@irfansharif
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Oct 2, 2021

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Oct 2, 2021

Build succeeded:

@craig craig bot merged commit 109e07b into cockroachdb:master Oct 2, 2021
@irfansharif irfansharif deleted the 210915.spanconfig-store branch October 2, 2021 02:50
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Nov 25, 2021
We first introduced spanconfig.StoreWriter in cockroachdb#70287. Here we extend the
interface to accept a batch of updates instead of just one.

    type StoreWriter interface {
        Apply(ctx context.Context, dryrun bool, updates ...Update) (
	    deleted []roachpb.Span, added []roachpb.SpanConfigEntry,
        )
    }

The implementation is subtle -- we're not processing one update at a
time. The semantics we're after is applying a batch of updates
atomically on a consistent snapshot of the underlying store. This comes
up in the upcoming spanconfig.Reconciler (cockroachdb#71994) -- there, following a
zone config/descriptor change, we want to update KV state in a single
request/RTT instead of an RTT per descendent table. The intended
usage is better captured in the aforementioned PR; let's now talk
about what this entails for the datastructure.

To apply a single update, we want to find all overlapping spans and
clear out just the intersections. If the update is adding a new span
config, we'll also want to add the corresponding store entry after. We
do this by deleting all overlapping spans in their entirety and
re-adding the non-overlapping segments. Pseudo-code:

  for entry in store.overlapping(update.span):
      union, intersection = union(update.span, entry), intersection(update.span, entry)
      pre  = span{union.start_key, intersection.start_key}
      post = span{intersection.end_key, union.end_key}

      delete {span=entry.span, conf=entry.conf}
      if entry.contains(update.span.start_key):
          # First entry overlapping with update.
          add {span=pre, conf=entry.conf} if non-empty
      if entry.contains(update.span.end_key):
          # Last entry overlapping with update.
          add {span=post, conf=entry.conf} if non-empty

  if adding:
      add {span=update.span, conf=update.conf} # add ourselves

When extending to a set of updates, things are more involved. Let's
assume that the updates are non-overlapping and sorted by start key. As
before, we want to delete overlapping entries in their entirety and
re-add the non-overlapping segments. With multiple updates, it's
possible that a segment being re-added will overlap another update.  If
processing one update at a time in sorted order, we want to only re-add
the gap between the consecutive updates.

    keyspace         a  b  c  d  e  f  g  h  i  j
    existing state      [--------X--------)
    updates          [--A--)           [--B--)

When processing [a,c):A, after deleting [b,h):X, it would be incorrect
to re-add [c,h):X since we're also looking to apply [g,i):B. Instead of
re-adding the trailing segment right away, we carry it forward and
process it when iterating over the second, possibly overlapping update.
In our example, when iterating over [g,i):B we can subtract the overlap
from [c,h):X and only re-add [c,g):X.

It's also possible for the segment to extend past the second update. In
the example below, when processing [d,f):B and having [b,h):X carried
over, we want to re-add [c,d):X and carry forward [f,h):X to the update
after (i.e. [g,i):C)).

    keyspace         a  b  c  d  e  f  g  h  i  j
    existing state      [--------X--------)
    updates          [--A--)  [--B--)  [--C--)

One final note: we're iterating through the updates without actually
applying any mutations. Going back to our first example, when processing
[g,i):B, retrieving the set of overlapping spans would (again) retrieve
[b,h):X -- an entry we've already encountered when processing [a,c):A.
Re-adding non-overlapping segments naively would re-add [b,g):X -- an
entry that overlaps with our last update [a,c):A. When retrieving
overlapping entries, we need to exclude any that overlap with the
segment that was carried over. Pseudo-code:

  carry-over = <empty>
  for update in updates:
      carried-over, carry-over = carry-over, <empty>
      if update.overlap(carried-over):
          # Fill in the gap between consecutive updates.
          add {span=span{carried-over.start_key, update.start_key}, conf=carried-over.conf}
          # Consider the trailing span after update; carry it forward if non-empty.
          carry-over = {span=span{update.end_key, carried-over.end_key}, conf=carried-over.conf}
      else:
          add {span=carried-over.span, conf=carried-over.conf} if non-empty

      for entry in store.overlapping(update.span):
         if entry.overlap(processed):
              continue # already processed

          union, intersection = union(update.span, entry), intersection(update.span, entry)
          pre  = span{union.start_key, intersection.start_key}
          post = span{intersection.end_key, union.end_key}

          delete {span=entry.span, conf=entry.conf}
          if entry.contains(update.span.start_key):
              # First entry overlapping with update.
              add {span=pre, conf=entry.conf} if non-empty
          if entry.contains(update.span.end_key):
              # Last entry overlapping with update.
              carry-over = {span=post, conf=entry.conf}

       if adding:
          add {span=update.span, conf=update.conf} # add ourselves

  add {span=carry-over.span, conf=carry-over.conf} if non-empty

We've extended the randomized testing suite to generate batches of
updates at a time. We've also added a few illustrated datadriven tests.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Nov 30, 2021
We first introduced spanconfig.StoreWriter in cockroachdb#70287. Here we extend the
interface to accept a batch of updates instead of just one.

    type StoreWriter interface {
        Apply(ctx context.Context, dryrun bool, updates ...Update) (
	    deleted []roachpb.Span, added []roachpb.SpanConfigEntry,
        )
    }

The implementation is subtle -- we're not processing one update at a
time. The semantics we're after is applying a batch of updates
atomically on a consistent snapshot of the underlying store. This comes
up in the upcoming spanconfig.Reconciler (cockroachdb#71994) -- there, following a
zone config/descriptor change, we want to update KV state in a single
request/RTT instead of an RTT per descendent table. The intended
usage is better captured in the aforementioned PR; let's now talk
about what this entails for the datastructure.

To apply a single update, we want to find all overlapping spans and
clear out just the intersections. If the update is adding a new span
config, we'll also want to add the corresponding store entry after. We
do this by deleting all overlapping spans in their entirety and
re-adding the non-overlapping segments. Pseudo-code:

  for entry in store.overlapping(update.span):
      union, intersection = union(update.span, entry), intersection(update.span, entry)
      pre  = span{union.start_key, intersection.start_key}
      post = span{intersection.end_key, union.end_key}

      delete {span=entry.span, conf=entry.conf}
      if entry.contains(update.span.start_key):
          # First entry overlapping with update.
          add {span=pre, conf=entry.conf} if non-empty
      if entry.contains(update.span.end_key):
          # Last entry overlapping with update.
          add {span=post, conf=entry.conf} if non-empty

  if adding:
      add {span=update.span, conf=update.conf} # add ourselves

When extending to a set of updates, things are more involved. Let's
assume that the updates are non-overlapping and sorted by start key. As
before, we want to delete overlapping entries in their entirety and
re-add the non-overlapping segments. With multiple updates, it's
possible that a segment being re-added will overlap another update.  If
processing one update at a time in sorted order, we want to only re-add
the gap between the consecutive updates.

    keyspace         a  b  c  d  e  f  g  h  i  j
    existing state      [--------X--------)
    updates          [--A--)           [--B--)

When processing [a,c):A, after deleting [b,h):X, it would be incorrect
to re-add [c,h):X since we're also looking to apply [g,i):B. Instead of
re-adding the trailing segment right away, we carry it forward and
process it when iterating over the second, possibly overlapping update.
In our example, when iterating over [g,i):B we can subtract the overlap
from [c,h):X and only re-add [c,g):X.

It's also possible for the segment to extend past the second update. In
the example below, when processing [d,f):B and having [b,h):X carried
over, we want to re-add [c,d):X and carry forward [f,h):X to the update
after (i.e. [g,i):C)).

    keyspace         a  b  c  d  e  f  g  h  i  j
    existing state      [--------X--------)
    updates          [--A--)  [--B--)  [--C--)

One final note: we're iterating through the updates without actually
applying any mutations. Going back to our first example, when processing
[g,i):B, retrieving the set of overlapping spans would (again) retrieve
[b,h):X -- an entry we've already encountered when processing [a,c):A.
Re-adding non-overlapping segments naively would re-add [b,g):X -- an
entry that overlaps with our last update [a,c):A. When retrieving
overlapping entries, we need to exclude any that overlap with the
segment that was carried over. Pseudo-code:

  carry-over = <empty>
  for update in updates:
      carried-over, carry-over = carry-over, <empty>
      if update.overlap(carried-over):
          # Fill in the gap between consecutive updates.
          add {span=span{carried-over.start_key, update.start_key}, conf=carried-over.conf}
          # Consider the trailing span after update; carry it forward if non-empty.
          carry-over = {span=span{update.end_key, carried-over.end_key}, conf=carried-over.conf}
      else:
          add {span=carried-over.span, conf=carried-over.conf} if non-empty

      for entry in store.overlapping(update.span):
         if entry.overlap(processed):
              continue # already processed

          union, intersection = union(update.span, entry), intersection(update.span, entry)
          pre  = span{union.start_key, intersection.start_key}
          post = span{intersection.end_key, union.end_key}

          delete {span=entry.span, conf=entry.conf}
          if entry.contains(update.span.start_key):
              # First entry overlapping with update.
              add {span=pre, conf=entry.conf} if non-empty
          if entry.contains(update.span.end_key):
              # Last entry overlapping with update.
              carry-over = {span=post, conf=entry.conf}

       if adding:
          add {span=update.span, conf=update.conf} # add ourselves

  add {span=carry-over.span, conf=carry-over.conf} if non-empty

We've extended the randomized testing suite to generate batches of
updates at a time. We've also added a few illustrated datadriven tests.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Nov 30, 2021
We first introduced spanconfig.StoreWriter in cockroachdb#70287. Here we extend the
interface to accept a batch of updates instead of just one.

    type StoreWriter interface {
        Apply(ctx context.Context, dryrun bool, updates ...Update) (
	    deleted []roachpb.Span, added []roachpb.SpanConfigEntry,
        )
    }

The implementation is subtle -- we're not processing one update at a
time. The semantics we're after is applying a batch of updates
atomically on a consistent snapshot of the underlying store. This comes
up in the upcoming spanconfig.Reconciler (cockroachdb#71994) -- there, following a
zone config/descriptor change, we want to update KV state in a single
request/RTT instead of an RTT per descendent table. The intended
usage is better captured in the aforementioned PR; let's now talk
about what this entails for the datastructure.

To apply a single update, we want to find all overlapping spans and
clear out just the intersections. If the update is adding a new span
config, we'll also want to add the corresponding store entry after. We
do this by deleting all overlapping spans in their entirety and
re-adding the non-overlapping segments. Pseudo-code:

  for entry in store.overlapping(update.span):
      union, intersection = union(update.span, entry), intersection(update.span, entry)
      pre  = span{union.start_key, intersection.start_key}
      post = span{intersection.end_key, union.end_key}

      delete {span=entry.span, conf=entry.conf}
      if entry.contains(update.span.start_key):
          # First entry overlapping with update.
          add {span=pre, conf=entry.conf} if non-empty
      if entry.contains(update.span.end_key):
          # Last entry overlapping with update.
          add {span=post, conf=entry.conf} if non-empty

  if adding:
      add {span=update.span, conf=update.conf} # add ourselves

When extending to a set of updates, things are more involved. Let's
assume that the updates are non-overlapping and sorted by start key. As
before, we want to delete overlapping entries in their entirety and
re-add the non-overlapping segments. With multiple updates, it's
possible that a segment being re-added will overlap another update.  If
processing one update at a time in sorted order, we want to only re-add
the gap between the consecutive updates.

    keyspace         a  b  c  d  e  f  g  h  i  j
    existing state      [--------X--------)
    updates          [--A--)           [--B--)

When processing [a,c):A, after deleting [b,h):X, it would be incorrect
to re-add [c,h):X since we're also looking to apply [g,i):B. Instead of
re-adding the trailing segment right away, we carry it forward and
process it when iterating over the second, possibly overlapping update.
In our example, when iterating over [g,i):B we can subtract the overlap
from [c,h):X and only re-add [c,g):X.

It's also possible for the segment to extend past the second update. In
the example below, when processing [d,f):B and having [b,h):X carried
over, we want to re-add [c,d):X and carry forward [f,h):X to the update
after (i.e. [g,i):C)).

    keyspace         a  b  c  d  e  f  g  h  i  j
    existing state      [--------X--------)
    updates          [--A--)  [--B--)  [--C--)

One final note: we're iterating through the updates without actually
applying any mutations. Going back to our first example, when processing
[g,i):B, retrieving the set of overlapping spans would (again) retrieve
[b,h):X -- an entry we've already encountered when processing [a,c):A.
Re-adding non-overlapping segments naively would re-add [b,g):X -- an
entry that overlaps with our last update [a,c):A. When retrieving
overlapping entries, we need to exclude any that overlap with the
segment that was carried over. Pseudo-code:

  carry-over = <empty>
  for update in updates:
      carried-over, carry-over = carry-over, <empty>
      if update.overlap(carried-over):
          # Fill in the gap between consecutive updates.
          add {span=span{carried-over.start_key, update.start_key}, conf=carried-over.conf}
          # Consider the trailing span after update; carry it forward if non-empty.
          carry-over = {span=span{update.end_key, carried-over.end_key}, conf=carried-over.conf}
      else:
          add {span=carried-over.span, conf=carried-over.conf} if non-empty

      for entry in store.overlapping(update.span):
         if entry.overlap(processed):
              continue # already processed

          union, intersection = union(update.span, entry), intersection(update.span, entry)
          pre  = span{union.start_key, intersection.start_key}
          post = span{intersection.end_key, union.end_key}

          delete {span=entry.span, conf=entry.conf}
          if entry.contains(update.span.start_key):
              # First entry overlapping with update.
              add {span=pre, conf=entry.conf} if non-empty
          if entry.contains(update.span.end_key):
              # Last entry overlapping with update.
              carry-over = {span=post, conf=entry.conf}

       if adding:
          add {span=update.span, conf=update.conf} # add ourselves

  add {span=carry-over.span, conf=carry-over.conf} if non-empty

We've extended the randomized testing suite to generate batches of
updates at a time. We've also added a few illustrated datadriven tests.

Release note: None
craig bot pushed a commit that referenced this pull request Nov 30, 2021
73150: spanconfig/store: support applying a batch of updates r=irfansharif a=irfansharif

We first introduced spanconfig.StoreWriter in #70287. Here we extend the
interface to accept a batch of updates instead of just one.

```go
type StoreWriter interface {
    Apply(ctx context.Context, dryrun bool, updates ...Update) (
        deleted []roachpb.Span, added []roachpb.SpanConfigEntry,
    )
}
```

The implementation is subtle -- we're not processing one update at a
time. The semantics we're after is applying a batch of updates
atomically on a consistent snapshot of the underlying store. This comes
up in the upcoming spanconfig.Reconciler (#71994) -- there, following a
zone config/descriptor change, we want to update KV state in a single
request/RTT instead of an RTT per descendent table. The intended
usage is better captured in the aforementioned PR; let's now talk
about what this entails for the datastructure.

To apply a single update, we want to find all overlapping spans and
clear out just the intersections. If the update is adding a new span
config, we'll also want to add the corresponding store entry after. We
do this by deleting all overlapping spans in their entirety and
re-adding the non-overlapping segments. Pseudo-code:

```python
for entry in store.overlapping(update.span):
  union, intersection = union(update.span, entry), intersection(update.span, entry)
  pre  = span{union.start_key, intersection.start_key}
  post = span{intersection.end_key, union.end_key}

  delete {span=entry.span, conf=entry.conf}
  if entry.contains(update.span.start_key):
    # First entry overlapping with update.
    add {span=pre, conf=entry.conf} if non-empty
  if entry.contains(update.span.end_key):
    # Last entry overlapping with update.
    add {span=post, conf=entry.conf} if non-empty

if adding:
  add {span=update.span, conf=update.conf} # add ourselves
```

When extending to a set of updates, things are more involved. Let's
assume that the updates are non-overlapping and sorted by start key. As
before, we want to delete overlapping entries in their entirety and
re-add the non-overlapping segments. With multiple updates, it's
possible that a segment being re-added will overlap another update.  If
processing one update at a time in sorted order, we want to only re-add
the gap between the consecutive updates.

    keyspace         a  b  c  d  e  f  g  h  i  j
    existing state      [--------X--------)
    updates          [--A--)           [--B--)

When processing [a,c):A, after deleting [b,h):X, it would be incorrect
to re-add [c,h):X since we're also looking to apply [g,i):B. Instead of
re-adding the trailing segment right away, we carry it forward and
process it when iterating over the second, possibly overlapping update.
In our example, when iterating over [g,i):B we can subtract the overlap
from [c,h):X and only re-add [c,g):X.

It's also possible for the segment to extend past the second update. In
the example below, when processing [d,f):B and having [b,h):X carried
over, we want to re-add [c,d):X and carry forward [f,h):X to the update
after (i.e. [g,i):C)).

    keyspace         a  b  c  d  e  f  g  h  i  j
    existing state      [--------X--------)
    updates          [--A--)  [--B--)  [--C--)

One final note: we're iterating through the updates without actually
applying any mutations. Going back to our first example, when processing
[g,i):B, retrieving the set of overlapping spans would (again) retrieve
[b,h):X -- an entry we've already encountered when processing [a,c):A.
Re-adding non-overlapping segments naively would re-add [b,g):X -- an
entry that overlaps with our last update [a,c):A. When retrieving
overlapping entries, we need to exclude any that overlap with the
segment that was carried over. Pseudo-code:

```python
carry-over = <empty>
for update in updates:
  carried-over, carry-over = carry-over, <empty>
  if update.overlap(carried-over):
      # Fill in the gap between consecutive updates.
      add {span=span{carried-over.start_key, update.start_key}, conf=carried-over.conf}
      # Consider the trailing span after update; carry it forward if non-empty.
      carry-over = {span=span{update.end_key, carried-over.end_key}, conf=carried-over.conf}
  else:
      add {span=carried-over.span, conf=carried-over.conf} if non-empty

  for entry in store.overlapping(update.span):
     if entry.overlap(processed):
          continue # already processed

      union, intersection = union(update.span, entry), intersection(update.span, entry)
      pre  = span{union.start_key, intersection.start_key}
      post = span{intersection.end_key, union.end_key}

      delete {span=entry.span, conf=entry.conf}
      if entry.contains(update.span.start_key):
          # First entry overlapping with update.
          add {span=pre, conf=entry.conf} if non-empty
      if entry.contains(update.span.end_key):
          # Last entry overlapping with update.
          carry-over = {span=post, conf=entry.conf}

   if adding:
      add {span=update.span, conf=update.conf} # add ourselves

add {span=carry-over.span, conf=carry-over.conf} if non-empty
```
We've extended the randomized testing suite to generate batches of
updates at a time. We've also added a few illustrated datadriven tests.

Release note: None


Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants