Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: change meta ranges to honor fine grained data domiciliation zone configs over indexed values #70912

Open
knz opened this issue Sep 30, 2021 · 4 comments
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@knz
Copy link
Contributor

knz commented Sep 30, 2021

Describe the problem

When using zone configs to home region-sensitive data to their particular regions, the meta ranges do not obey the zone configs and any region-sensitive data in table keys "escape" their region.

This makes it impossible to do strict data sovereignty partitioning using multi-region CockroachDB when domicilied data is indexed. (The issue does not exist when domicilied data is not indexed.)

Note: we already document this limitation in https://www.cockroachlabs.com/docs/stable/data-domiciling.html#limitations

Epic: CRDB-10287

To Reproduce

  1. create a geo-partitioned table with sensitive data in some indexed columns
  2. use a zone config to map the region-specific data to separate regions
  3. run cockroach debug keys on all nodes

(A simpler version of steps 1-2 is to create a non-partitioned table and introduce split point manually, and simply "imagine" that we have applied separate zone config to each table range. The point below remains the same.)

At step 3, we can see that the indexed values from the table show up in Meta2 keys in nodes that are unrelated to the region specified by the zone config.

Expected behavior

The meta ranges that include data from zoned tables (in the range key boundaries) should not be stored outside of the zone-specified regions.

Today, this is impossible because we do not split the meta ranges at the same boundaries as the tables.

Environment:

crdb v21.2

Jira issue: CRDB-10283

@knz knz added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Sep 30, 2021
@blathers-crl blathers-crl bot added the T-kv KV Team label Sep 30, 2021
@knz
Copy link
Contributor Author

knz commented Sep 30, 2021

@mwang1026 @awoods187 you'll want to follow up on this in the GDPR roadmap.

@knz
Copy link
Contributor Author

knz commented Sep 30, 2021

I think there are two ways we can achieve this:

  • split the meta ranges at region boundaries. This would require us to learn about desired boundaries in the KV logic that populates the meta ranges. Maybe @irfansharif has an idea about this in the context of rfcs: introduce rfc for multi-tenant zone configs #66348.
  • introduce two separate meta level: at one level, the start/end keys would only include the key prefix up to and including table ID, but nothing after that (so that indexed values are not included). Then at the next level we'd have the full keys. Then we'd ensure that this 2nd level is always split at table/partition boundaries. The first level would not need be restricted by zone configs.

@knz knz added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) and removed C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels Sep 30, 2021
@knz knz changed the title kv: meta ranges violate data domiciliation zone configs kv: meta ranges violate data domiciliation zone configs pertaining to indexed values Sep 30, 2021
@knz knz changed the title kv: meta ranges violate data domiciliation zone configs pertaining to indexed values kv: meta ranges does not honor data domiciliation zone configs pertaining to indexed values Sep 30, 2021
@knz knz changed the title kv: meta ranges does not honor data domiciliation zone configs pertaining to indexed values kv: meta ranges do not honor data domiciliation zone configs pertaining to indexed values Sep 30, 2021
@exalate-issue-sync exalate-issue-sync bot changed the title kv: meta ranges do not honor data domiciliation zone configs pertaining to indexed values kv: meta ranges violate data domiciliation zone configs Sep 30, 2021
@exalate-issue-sync exalate-issue-sync bot added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. and removed C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels Sep 30, 2021
@knz knz changed the title kv: meta ranges violate data domiciliation zone configs kv: change meta ranges to honor fine grained data domiciliation zone configs over indexed values Sep 30, 2021
@knz knz added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) and removed C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels Sep 30, 2021
@irfansharif
Copy link
Contributor

This has little overlap with #66348, which is more about improving our existing infrastructure for zone configs (from how they're stored, disseminated, and applied) to be compatible with having secondary tenants. Certainly we'll want to think about how/where we store domiciled keys (using order-preserving hashes for meta2 might be another option).

I see we've filed issues for a few places where we're storing domicile-able keys ( A-gdpr-compliance ). Absent an accompanying RFC (and/or a thorough audit), it might make more sense to aggregate fold everything into a single issue instead. Likely whatever we do for one (say, system.jobs) would apply to everything else (system.zones); the disparate issues are less easy to read or contextualize.

@jordanlewis jordanlewis added the A-cdc Change Data Capture label Jan 26, 2023
@blathers-crl blathers-crl bot added the T-cdc label Jan 26, 2023
@blathers-crl
Copy link

blathers-crl bot commented Jan 26, 2023

cc @cockroachdb/cdc

@jordanlewis jordanlewis removed the A-cdc Change Data Capture label Jan 26, 2023
@jordanlewis jordanlewis removed the T-cdc label Jan 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
None yet
Development

No branches or pull requests

3 participants