-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
outliers: configurable statistical detection per fingerprint #79451
Labels
A-sql-observability
Related to observability of the SQL layer
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Comments
matthewtodd
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-sql-observability
A-sql-cli-observability
Issues related to surfacing SQL observability in SHOW, CRDB_INTERNAL, SYSTEM, etc.
labels
Apr 5, 2022
craig bot
pushed a commit
that referenced
this issue
May 4, 2022
80929: authors: add pjtatlow to authors r=rickystewart a=pjtatlow Release note: None 80975: outliers: extract a detector interface r=matthewtodd a=matthewtodd This gives us a place to hang the upcoming work in #79451. Release note: None Co-authored-by: PJ Tatlow <pj@cockroachlabs.com> Co-authored-by: Matthew Todd <todd@cockroachlabs.com>
matthewtodd
changed the title
Configurable (statistical?) detection for execution outliers
outliers: configurable (statistical?) detection
May 5, 2022
matthewtodd
added
A-sql-observability
Related to observability of the SQL layer
and removed
A-sql-cli-observability
Issues related to surfacing SQL observability in SHOW, CRDB_INTERNAL, SYSTEM, etc.
labels
May 5, 2022
matthewtodd
changed the title
outliers: configurable (statistical?) detection
outliers: configurable statistical detection
May 5, 2022
matthewtodd
changed the title
outliers: configurable statistical detection
outliers: configurable statistical detection per fingerprint
May 5, 2022
craig bot
pushed a commit
that referenced
this issue
May 6, 2022
80422: sql: support virtual table indexes with generators r=AlexTalks a=AlexTalks While previously virtual schema tables had support for defining tables with a `populate` function, which eagerly loads all rows, or a `generator` function, which lazy loads each row when called (possibly running in a worker Goroutine), and also had support for virtual indexes which would have their own `populate` functions, there was a subtle lack of support for using virtual indexes with virtual tables that used a `generator`, since the virtual index constraint logic would fall back to a (possibly undefined) `populate` function in several cases. This change fixes the nil pointer exception that could occur if using virtual indexes with a table using a `generator`, and validates that the virtual index is supported prior to use. Release Note: None Release Justification: Bug fix 80989: outliers: collect statement fingerprint id r=matthewtodd a=matthewtodd This change also helps set us up for #79451, where we'll be working with per-fingerprint statement latencies. Release note: None 81076: Authors: add linville to authors r=davidwding a=mdlinville Release note: None 81080: opt: calculate lookup join remaining filters more accurately r=mgartner a=mgartner #### opt: return ordinals of equality filters from memo.ExtractJoinEqualityColumns This is a prerequisite for future refactoring of `lookupjoin.ConstraintBuilder`. Release note: None #### opt: calculate lookup join remaining filters more accurately `lookupjoin.ConstraintBuilder` now determines the remaining filters for a lookup join constraint more accurately by tracking the ordinals of filters that are covered by the constraint, rather than trying to reduce the filters original filters based on the key columns, lookup expression, and constant expression. Release note: None 81089: vendor: bump panicparse r=erikgrinaker,nvanbenschoten a=tbg This picks up maruel/panicparse#74. Without this commit, the `pkg/kv/kvserver/concurrency` datadriven tests are flaky on Mac M1s since those tests rely on panicparse to parse strack traces for goroutine monitoring. Release note: None Co-authored-by: Alex Sarkesian <sarkesian@cockroachlabs.com> Co-authored-by: Matthew Todd <todd@cockroachlabs.com> Co-authored-by: Matt Linville <linville@cockroachlabs.com> Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com> Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
craig bot
pushed a commit
that referenced
this issue
May 16, 2022
80991: backupccl: Cosmetic changes to backup_processor.go to improve clarity. r=benbardin a=benbardin Release note: None Renames a few things, and moves the sstsink into its own file. 81141: ui: add TTL metrics dashboard r=rafiss,matthewtodd a=otan Release note: None example: <img width="981" alt="image" src="https://user-images.githubusercontent.com/3646147/167403258-c3d546ab-a6af-4088-a090-87cd0ed8bbe7.png"> 81194: storage: remove `Engine.SetUpperBound` r=jbowens a=erikgrinaker Partial resubmit of #79291, which was merged into the `mvcc-range-tombstones` branch. --- **storage: make `Engine.ClearIterRange` construct iterator internally** `Engine.ClearIterRange` now creates the iterator itself, instead of having it passed in. This ensures the iterator is created with appropriate bounds and options, which becomes even more important with the introduction of range keys. It will also allow removal of `Engine.SetUpperBound` in a subsequent commit. If further flexibility is needed, we can consider passing an `IterOptions` struct with appropriate bounds instead of a key span. Release note: None **storage: remove `Engine.SetUpperBound`** This patch removes the `Engine.SetUpperBound` method, as it is no longer in use. Release note: None 81195: storage: allow multiple iterators in `pebbleReadOnly` and `pebbleBatch` r=jbowens a=erikgrinaker Partial resubmit of #79291, which was merged into the mvcc-range-tombstones branch. --- Previously, `pebbleReadOnly` and `pebbleBatch` only supported a single concurrent iterator of a given type, as this iterator was reused between `NewMVCCIterator` calls. Attempts to create an additional iterator would panic with "iterator already in use". Time-bound iterators were excluded from this reuse, and always created from scratch, so they sidestepped this limitation. However, we will soon make these iterators reusable too via a new `SetOptions()` method in Pebble. When TBIs also become reusable, this will cause `MVCCIncrementalIterator` to panic with "iterator already in use", because it always creates two concurrent iterators: one regular and one time-bound. This patch therefore ensures all `Reader` implementations support multiple concurrent iterators, by cloning the existing cached iterator if it is already in use. This also preserves the pinned engine state for the new iterator. Release note: None 81295: outliers: introduce a composite "any" detector r=matthewtodd a=matthewtodd This gives us a place to hang the upcoming histogram-based detector coming in #79451. Release note: None Co-authored-by: Ben Bardin <bardin@cockroachlabs.com> Co-authored-by: Oliver Tan <otan@cockroachlabs.com> Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com> Co-authored-by: Matthew Todd <todd@cockroachlabs.com>
manually reviewed and brought up to date |
craig bot
pushed a commit
that referenced
this issue
May 31, 2022
81676: changefeedccl: Predicates and projections in CDC. r=miretskiy a=miretskiy Introduce `cdceval` package -- a library for expression evaluation for CDC. Changefeed users for a long time requested ability to emit only a subset of columns. They have also requested ability to filter out unwanted events (for example, filter out deletions). This library aims to accomplish those goals. However, instead of focusing on a narrow use cases, which would usually be addressed via addition of new `WITH` option (as done in #80499), this library aims to provide support for general expression evaluation. `cdceval` library provides the following functionality: * Ability to evaluate predicates (filters) so that events may be filtered. * Ability to evaluate projection expressions (`select *`, `select a, b,c`, or even `select a + b - c as math_column`) * Ability to evaluate virtual compute columns (currently not implemented in this PR). `cdceval` library reuses existing parsing and evaluation libraries, but adopts them for CDC use case. CDC events are row level events, and as such, CDC expressions only make sense in the context of a single row/single table. In addition, because CDC events are at least once semantics, the emitted events must not depend on volatile state. In summary, any expression is supported except: * Volatile functions -- not supported * Stable functions, such as `now()`, `current_timestamp()`, etc are modified so that they return stable values -- namely events MVCC timestamp. * Multi row functions (aggregates, windowing functions) are disallowed. `cdceval` also defined few custom, CDC specific functions, such as: * `cdc_prev()`: Returns the previous row values as a JSONB object. * `cdc_is_delete()`: Returns true if the row was deleted. * Others -- see `functions.go` The follow PRs will add a "front end" to this library to enable creation and management of predicated changefeeds. Release Notes: None 82146: util/quantile: import quantile library r=matthewtodd a=matthewtodd For upcoming outliers work in #79451, we'll re-use the biased streaming quantiles implementation underlying the prometheus client library's [summary][1] type. But in order to monitor and possibly constrain our memory usage, we'll need a way to measure the size of each `quantile.Stream`. That functionality is not available upstream, and contributions are explicitly [not being accepted][2] (and the [original upstream][3], from bmizerany, lacks further functionality and is similarly inactive), so we vendor the library here, unmodified from its [v1.0.1][4], in advance of adding the methods we need. [1]: https://prometheus.io/docs/practices/histograms/ [2]: beorn7/perks#5 (comment) [3]: https://github.com/bmizerany/perks [4]: https://github.com/beorn7/perks/tree/v1.0.1 Release note: None Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com> Co-authored-by: Matthew Todd <todd@cockroachlabs.com>
craig bot
pushed a commit
that referenced
this issue
Jul 5, 2022
82473: outliers: latency quantile detector r=matthewtodd a=matthewtodd Closes #79451. Note that this detector comes disabled by default. To enable, use the `sql.stats.outliers.experimental.latency_quantile_detection.enabled` cluster setting. This new detection "algorithm" is a simple, perhaps naïve, heuristic, though it may be good enough to be useful: we consider a statement an outlier if its execution latency is > p99 latency for its fingerprint, so long as it's also greater than twice the median latency and greater than an arbitrary user-defined threshold.[^1] We expect the operation of this detector to be memory constrained. To avoid OOM errors, we offer a cluster setting[^2] to cap the overall memory used, along with metrics reporting the number of fingerprints being tracked, memory usage, and "evictions," or fingerprint histories being dropped due to memory pressure. If a high rate of evictions is observed, it will be necessary to raise either one or both of the memory limit and the "interesting" threshold settings to regain a steady state. Note also that this detector is not designed to be used concurrently by multiple goroutines. Correctness currently relies on the global lock in `outliers.Registry`, to be addressed next in #81021. [^1]: This threshold is defined by a cluster setting, `sql.stats.outliers.experimental.latency_quantile_detection.interesting_threshold`. Without such a threshold, we would otherwise flag a statement that, say, normally executes in 3ms taking 7ms; but 7ms is probably still fast enough that there's nothing to be done. [^2]: `sql.stats.outliers.experimental.latency_quantile_detection.memory_limit` Release note: None 82741: obsservice: ui handler serves db console assets r=sjbarag,abarganier a=dhartunian This change enables the Observabilty Service to serve the DB Console UI itself. Endpoints that CRDB can handle are proxied (`/_admin/`, `/_status/`, `/ts/`) but the base URL will return a blank HTML page with JS assets that load the DB Console. In order to build with the ui code, the `--config=with_ui` flag must be passed to bazel. This commit also adds a shortcut to the `dev` command to build `obsservice` which includes the `--config=with_ui` flag just as we do by default in `cockroach` builds. Release note: None 83717: changefeedccl: infer WITH diff from changefeed expression r=[miretskiy] a=HonoreDB Informs #83322. Specifically this does the "infer whether or not to set diff" part of it. This went through a more ambitious phase (https://github.com/cockroachdb/cockroach/compare/master...HonoreDB:cockroach:cdc_expressions_infer_diff?expand=1) where it also handled user defined types, but I ended up agreeing with wiser heads that this was adding too much complexity for no known use case. I also came really close to setting envelope=row as the default, but envelope=row is not supported in all sinks so again, it felt too messy to do without a little further conversation. Release note (sql change): CREATE CHANGEFEED AS statements no longer need to include WITH DIFF when using cdc_prev(). 83772: storage: remove experimental markers for MVCC range tombstones r=aliher1911 a=erikgrinaker GC is still under development, but I think it's fine to remove the marker before GC works to let people start building with it. --- This removes the experimental markers for MVCC range tombstones, by renaming functions, methods, and parameters to not include `experimental`, as well as warning comments. The changes are entirely mechanical. Release note: None 83784: scexec: refactor schema changer job update r=postamar a=postamar This PR rebases #83724 on top of a few commits which refactor the schema changer job update machinery. Co-authored-by: Matthew Todd <todd@cockroachlabs.com> Co-authored-by: David Hartunian <davidh@cockroachlabs.com> Co-authored-by: Aaron Zinger <zinger@cockroachlabs.com> Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com> Co-authored-by: Marius Posta <marius@cockroachlabs.com>
surajr10
pushed a commit
to surajr10/cockroach
that referenced
this issue
Jul 12, 2022
Closes cockroachdb#79451. Note that this detector comes disabled by default. To enable, use the `sql.stats.outliers.experimental.latency_quantile_detection.enabled` cluster setting. This new detection "algorithm" is a simple, perhaps naïve, heuristic, though it may be good enough to be useful: we consider a statement an outlier if its execution latency is > p99 latency for its fingerprint, so long as it's also greater than twice the median latency and greater than an arbitrary user-defined threshold.[^1] We expect the operation of this detector to be memory constrained. To avoid OOM errors, we offer a cluster setting[^2] to cap the overall memory used, along with metrics reporting the number of fingerprints being tracked, memory usage, and "evictions," or fingerprint histories being dropped due to memory pressure. If a high rate of evictions is observed, it will be necessary to raise either one or both of the memory limit and the "interesting" threshold settings to regain a steady state. Note also that this detector is not designed to be used concurrently by multiple goroutines. Correctness currently relies on the global lock in `outliers.Registry`, to be addressed next in cockroachdb#81021. [^1]: This threshold is defined by a cluster setting, `sql.stats.outliers.experimental.latency_quantile_detection.interesting_threshold`. Without such a threshold, we would otherwise flag a statement that, say, normally executes in 3ms taking 7ms; but 7ms is probably still fast enough that there's nothing to be done. [^2]: `sql.stats.outliers.experimental.latency_quantile_detection.memory_limit` Release note: None
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-sql-observability
Related to observability of the SQL layer
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Here we expand on the walking skeleton by developing a smarter detection algorithm for outliers, perhaps leaning on existing sql stats infrastructure.
Epic: CRDB-13544
Jira issue: CRDB-14871
The text was updated successfully, but these errors were encountered: