Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cdc: Enhance handling of various options in expressions #83322

Open
2 of 6 tasks
miretskiy opened this issue Jun 24, 2022 · 3 comments
Open
2 of 6 tasks

cdc: Enhance handling of various options in expressions #83322

miretskiy opened this issue Jun 24, 2022 · 3 comments
Labels
A-cdc Change Data Capture A-cdc-expressions Features related to changefeed projections and filters C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-cdc

Comments

@miretskiy
Copy link
Contributor

miretskiy commented Jun 24, 2022

CDC expressions remove the need for some standard CDC options.
We should validate/enhance handling of some of those options:

  • key_in_value -- probably not necessary; just select *.
  • key_only -- in its current "shape" is not necessary; just select the columns you want
    - [ ] There is an argument to be made that perhaps we should support "key only" output with expressions. That is, you want to specify the predicate portion only to restrict the set of data being watched; but you want to emit nothing but the key. To do this, we would need to update sql grammar because as currently written changefeed expressions require at least 1 projection expression.
  • [x ] split_column_families -- remove; see changefeedccl: Smart family selection with expressions #82607
  • topic_in_value -- remove; see cdc: Add cdc_topic function #83321

In addition, some of the options can be detected based on the expression itself.
By walking expression and seeing if some call function is used allows us to determine if an option needs to be set.

  • diff -- detect if diff option needs to be set based on the use of cdc_prev() function.
  • Produce a warning (or maybe an error) if diff option was specified by the user, but cdc_prev() was not used.
  • (Future): once rangefeed natively support filtering deletions, determine if expression wants to see deleted rows. If not, push down predicate to rangefeed.

Jira issue: CRDB-17002

Epic CRDB-21713

@miretskiy miretskiy added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-cdc Change Data Capture T-cdc A-cdc-expressions Features related to changefeed projections and filters labels Jun 24, 2022
@blathers-crl
Copy link

blathers-crl bot commented Jun 24, 2022

cc @cockroachdb/cdc

@amruss
Copy link
Contributor

amruss commented Jun 29, 2022

This is 22.2 work

HonoreDB added a commit to HonoreDB/cockroach that referenced this issue Jul 1, 2022
Informs cockroachdb#83322. Specifically this does the "infer whether
or not to set diff" part of it.

This went through a more ambitious phase
(https://github.com/cockroachdb/cockroach/compare/master...HonoreDB:cockroach:cdc_expressions_infer_diff?expand=1)
where it also handled user defined types, but I ended up
agreeing with wiser heads that this was adding too much complexity
for no known use case.

I also came really close to setting envelope=row as the
default, but envelope=row is not supported in all sinks
so again, it felt too messy to do without a little further
conversation.

Release note (sql change): CREATE CHANGEFEED AS statements no longer need to include WITH DIFF when using cdc_prev().
HonoreDB added a commit to HonoreDB/cockroach that referenced this issue Jul 1, 2022
Informs cockroachdb#83322. Specifically this does the "infer whether
or not to set diff" part of it.

This went through a more ambitious phase
(https://github.com/cockroachdb/cockroach/compare/master...HonoreDB:cockroach:cdc_expressions_infer_diff?expand=1)
where it also handled user defined types, but I ended up
agreeing with wiser heads that this was adding too much complexity
for no known use case.

I also came really close to setting envelope=row as the
default, but envelope=row is not supported in all sinks
so again, it felt too messy to do without a little further
conversation.

Release note (sql change): CREATE CHANGEFEED AS statements no longer need to include WITH DIFF when using cdc_prev().
craig bot pushed a commit that referenced this issue Jul 5, 2022
82473: outliers: latency quantile detector r=matthewtodd a=matthewtodd

Closes #79451.

Note that this detector comes disabled by default. To enable, use the
`sql.stats.outliers.experimental.latency_quantile_detection.enabled`
cluster setting.

This new detection "algorithm" is a simple, perhaps naïve, heuristic,
though it may be good enough to be useful: we consider a statement an
outlier if its execution latency is > p99 latency for its fingerprint,
so long as it's also greater than twice the median latency and greater
than an arbitrary user-defined threshold.[^1]

We expect the operation of this detector to be memory constrained. To
avoid OOM errors, we offer a cluster setting[^2] to cap the overall
memory used, along with metrics reporting the number of fingerprints
being tracked, memory usage, and "evictions," or fingerprint histories
being dropped due to memory pressure. If a high rate of evictions is
observed, it will be necessary to raise either one or both of the memory
limit and the "interesting" threshold settings to regain a steady state.

Note also that this detector is not designed to be used concurrently by
multiple goroutines. Correctness currently relies on the global lock in
`outliers.Registry`, to be addressed next in #81021.

[^1]: This threshold is defined by a cluster setting,
`sql.stats.outliers.experimental.latency_quantile_detection.interesting_threshold`.
Without such a threshold, we would otherwise flag a statement that, say,
normally executes in 3ms taking 7ms; but 7ms is probably still fast
enough that there's nothing to be done.

[^2]: `sql.stats.outliers.experimental.latency_quantile_detection.memory_limit`

Release note: None

82741: obsservice: ui handler serves db console assets  r=sjbarag,abarganier a=dhartunian

This change enables the Observabilty Service to serve the DB Console UI
itself. Endpoints that CRDB can handle are proxied (`/_admin/`,
`/_status/`, `/ts/`) but the base URL will return a blank HTML page with
JS assets that load the DB Console.

In order to build with the ui code, the `--config=with_ui` flag must be
passed to bazel.

This commit also adds a shortcut to the `dev` command to build
`obsservice` which includes the `--config=with_ui` flag just as we do by
default in `cockroach` builds.

Release note: None

83717: changefeedccl: infer WITH diff from changefeed expression r=[miretskiy] a=HonoreDB

Informs #83322. Specifically this does the "infer whether
or not to set diff" part of it.

This went through a more ambitious phase
(https://github.com/cockroachdb/cockroach/compare/master...HonoreDB:cockroach:cdc_expressions_infer_diff?expand=1)
where it also handled user defined types, but I ended up
agreeing with wiser heads that this was adding too much complexity
for no known use case.

I also came really close to setting envelope=row as the
default, but envelope=row is not supported in all sinks
so again, it felt too messy to do without a little further
conversation.

Release note (sql change): CREATE CHANGEFEED AS statements no longer need to include WITH DIFF when using cdc_prev().

83772: storage: remove experimental markers for MVCC range tombstones r=aliher1911 a=erikgrinaker

GC is still under development, but I think it's fine to remove the marker before GC works to let people start building with it.

---

This removes the experimental markers for MVCC range tombstones, by
renaming functions, methods, and parameters to not include
`experimental`, as well as warning comments.

The changes are entirely mechanical.

Release note: None

83784: scexec: refactor schema changer job update r=postamar a=postamar

This PR rebases #83724 on top of a few commits which refactor the schema changer job update machinery.


Co-authored-by: Matthew Todd <todd@cockroachlabs.com>
Co-authored-by: David Hartunian <davidh@cockroachlabs.com>
Co-authored-by: Aaron Zinger <zinger@cockroachlabs.com>
Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
Co-authored-by: Marius Posta <marius@cockroachlabs.com>
surajr10 pushed a commit to surajr10/cockroach that referenced this issue Jul 12, 2022
Informs cockroachdb#83322. Specifically this does the "infer whether
or not to set diff" part of it.

This went through a more ambitious phase
(https://github.com/cockroachdb/cockroach/compare/master...HonoreDB:cockroach:cdc_expressions_infer_diff?expand=1)
where it also handled user defined types, but I ended up
agreeing with wiser heads that this was adding too much complexity
for no known use case.

I also came really close to setting envelope=row as the
default, but envelope=row is not supported in all sinks
so again, it felt too messy to do without a little further
conversation.

Release note (sql change): CREATE CHANGEFEED AS statements no longer need to include WITH DIFF when using cdc_prev().
@amruss
Copy link
Contributor

amruss commented Nov 22, 2022

Plan is to do this post taking CDC Transformations out of preview, and possibly totally remove these options from changefeeds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cdc Change Data Capture A-cdc-expressions Features related to changefeed projections and filters C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-cdc
Projects
None yet
Development

No branches or pull requests

2 participants