-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dbnode] Make caching after block retrieval a configuration option #2613
Conversation
cbe898b
to
a1ca788
Compare
src/dbnode/persist/fs/retriever.go
Outdated
@@ -342,7 +342,7 @@ func (r *blockRetriever) fetchBatch( | |||
} | |||
|
|||
// We don't need to call onRetrieve.OnRetrieveBlock if the ID was not found. | |||
callOnRetrieve := req.onRetrieve != nil && req.foundAndHasNoError() | |||
callOnRetrieve := r.opts.CacheOnRetrieve() && req.onRetrieve != nil && req.foundAndHasNoError() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, may be better to refactor this option to a per-namespace level, so that e.g. unaggregated will remain cached but unaggregated namespaces can avoid caching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arnikola and I synced offline. Per-namespace config is much safer imo, so will rework this PR with that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to also allow a global though if folks don't want to specify it per-namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little bit reticent of giving people an easy knob that could accidentally hobble their query/alert perf, especially for unaggregated ns... what would you think about a "timeboxed" approach? I.e. you'd specify that you only want to cache blocks seen in the last e.g. 2 hours, and then that would also set cache expiry to the same value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we start with just the per-namespace config for now? Agree with @arnikola 's concern around global knob + unaggregated namespace (though maybe the global option could just apply to aggregated namespaces?). But also not sure we need to optimize with something like a timeboxed approach just yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to update this, but, for posterity, synced with @robskillington and @arnikola via Slack. Consensus was to add both a global and namespace-specific configuration option for disabling caching. PR has been updated to reflect this.
a1ca788
to
7edc7fa
Compare
SchemaOptions schemaOptions = 9; | ||
bool coldWritesEnabled = 10; | ||
NamespaceRuntimeOptions runtimeOptions = 11; | ||
google.protobuf.BoolValue cacheBlocksOnRetrieve = 12; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using google.protobuf.BoolValue
to support backwards compatibility with namespace definitions already in etcd. A null value here allows us to correctly default to enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM barring the nits / test comments
7edc7fa
to
e9ea65b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 lgtm
commit 4f3778dded83ad4d81aaeaad62608f6c7a0f9461 Author: ChrisChinchilla <chris@chronosphere.io> Date: Mon Oct 5 12:53:40 2020 +0200 fix code inclusion Signed-off-by: ChrisChinchilla <chris@chronosphere.io> commit daefb435243ed98041744c520d52ac4a92421694 Author: ChrisChinchilla <chris@chronosphere.io> Date: Mon Oct 5 12:53:34 2020 +0200 Add redirects Signed-off-by: ChrisChinchilla <chris@chronosphere.io> commit 612cfbb319b30177501530f9cd7bdd626ea6b107 Author: ChrisChinchilla <chris@chronosphere.io> Date: Mon Oct 5 12:09:40 2020 +0200 Remove versioning for now Signed-off-by: ChrisChinchilla <chris@chronosphere.io> commit cda48e18dce956a4e684dd4d6d7943df808e6846 Author: ChrisChinchilla <chris@chronosphere.io> Date: Fri Oct 2 18:09:35 2020 +0200 Switch netlify directory Signed-off-by: ChrisChinchilla <chris@chronosphere.io> commit c5d0b1decc95d042a565cbe97d25c9eda94549a9 Author: ChrisChinchilla <chris@chronosphere.io> Date: Fri Oct 2 18:00:34 2020 +0200 Move theme to module Signed-off-by: ChrisChinchilla <chris@chronosphere.io> commit 855d8c7af1bf45978ed180469ecb1b753ec1abdb Author: ChrisChinchilla <chris@chronosphere.io> Date: Thu Oct 1 14:59:14 2020 +0200 Netlify dev Signed-off-by: ChrisChinchilla <chris@chronosphere.io> commit a8a8eda0f2f3b8fa1b3aefa2316942ff692dc8e5 Author: ChrisChinchilla <chris@chronosphere.io> Date: Thu Oct 1 14:03:34 2020 +0200 Update Hugo version Signed-off-by: ChrisChinchilla <chris@chronosphere.io> commit f19ffadb7caad18663dad99ec5230e8357125954 Author: ChrisChinchilla <chris@chronosphere.io> Date: Thu Oct 1 13:54:41 2020 +0200 Random file to fix odd git issues Signed-off-by: ChrisChinchilla <chris@chronosphere.io> commit a6413db7891f2f741510e4b36300e3cb93904f1d Author: ChrisChinchilla <chris@chronosphere.io> Date: Thu Oct 1 13:11:42 2020 +0200 Update versions Signed-off-by: ChrisChinchilla <chris@chronosphere.io> commit 9919f004633ace0f88bb9cb721a6373416f28781 Author: ChrisChinchilla <chris@chronosphere.io> Date: Thu Oct 1 12:45:25 2020 +0200 Convert docs theme to module Signed-off-by: ChrisChinchilla <chris@chronosphere.io> commit 3a1e013338b6671eb601df3f5cbdd89d770ca171 Author: ChrisChinchilla <chris@chronosphere.io> Date: Thu Oct 1 12:27:31 2020 +0200 Remove subtree again, not working Signed-off-by: ChrisChinchilla <chris@chronosphere.io> commit 282ca2d870eb656ba15e20ade3d8aec6493523fe Author: ChrisChinchilla <chris@chronosphere.io> Date: Thu Oct 1 12:20:11 2020 +0200 Add versions to config Signed-off-by: ChrisChinchilla <chris@chronosphere.io> commit 73f03d34990b740d5bb1c57df1c5fbb328d0f701 Author: ChrisChinchilla <chris@chronosphere.io> Date: Wed Sep 30 18:29:35 2020 +0200 Consolidate all commits commit f2ebf5c Author: teddywahle <69990143+teddywahle@users.noreply.github.com> Date: Mon Sep 21 12:33:37 2020 -0700 [query] Implemented the Graphite `integralByInterval` function (#2596) commit a66fb7d Author: arnikola <artem@chronosphere.io> Date: Mon Sep 21 14:33:57 2020 -0400 [dbnode] Tile iterators for wide aggregations (#2646) commit 9ea5682 Author: teddywahle <69990143+teddywahle@users.noreply.github.com> Date: Sun Sep 20 21:50:31 2020 -0700 [query] Implemented the Graphite `divideSeriesLists` function (#2585) commit 35cac59 Author: Rob Skillington <rob.skillington@gmail.com> Date: Mon Sep 21 00:21:58 2020 -0400 [coordinator] Update OpenAPI specs for namespace update endpoint (#2629) commit ef83ec4 Author: Rob Skillington <rob.skillington@gmail.com> Date: Sun Sep 20 21:57:26 2020 -0400 [changelog] Add changelog for 0.15.15 (#2649) commit 091f833 Author: Rob Skillington <rob.skillington@gmail.com> Date: Fri Sep 18 11:30:57 2020 -0400 [coordinator] Allow configuration of tag validation (#2647) commit 3476b4e Author: Gediminas Guoba <gediminas@chronosphere.io> Date: Fri Sep 18 17:24:09 2020 +0300 [dbnode] Streaming writer (#2618) * [dbnode] Large tiles writer * minor refactorings * minor refactoring * Skip tagsEncoder, use encodedTags directly * Rename LargeTilesWriter to StreamingWriter * Add FIXME wrt stegment.Tail.Finalize * Address PR feedback Co-authored-by: Linas Medziunas <linas.medziunas@gmail.com> Co-authored-by: Linas Medžiūnas <linasm@users.noreply.github.com> commit 88164cf Author: Ryan Hall <ryanhall07@gmail.com> Date: Thu Sep 17 16:38:04 2020 -0700 Only read the commit log once during bootstrapping (#2645) * Only read the commit log once during bootstrapping A recent refactoring of cold writes ( #2508) introduced a regression that increases the chances the commit log is read twice while bootstrapping. The referenced PR changed the commitlog bootstrapper to read all requested time ranges, even if a range had been fulfilled by a previous bootstrapper. This was necessary since the commitlog may have cold writes that were never commmited to a fileset. The fileystem bootstrapper would report a time range as fulfilled, but might be missing cold writes only in the commit log. It should be noted this bug was always theoretically possible, but unlikely since the commitlog bootstrapper typically wouldn't run in the first pass (cold time ranges) since the filesystem would fulfill all cold ranges and short circuit the first pass of the boostrapper. This change only reads the commit log on the first pass of the boostrapper and caches the result to skip reading it in subsequent passes. It doesn't actually matter which pass we read the commit log, the first was just chosen arbitrarily. Other attempts at fixing this bug attempted to disable the entire commit log bootstrapper during a pass, but that's not possible since the commit log bootstrapper is actually 2 bootstrappers in one, both the the commit log and snapshot files. To minimize the refactoring changes we still want to only read the snapshot files of the requested ranges. commit 3d2915f Author: nate <nbroyles@gmail.com> Date: Thu Sep 17 13:03:48 2020 -0400 [dbnode] Make caching after block retrieval a configuration option (#2613) commit 0ef7aba Author: nate <nbroyles@gmail.com> Date: Thu Sep 17 12:33:53 2020 -0400 [docs] Add documentation on fileset migrations (#2630) Signed-off-by: ChrisChinchilla <chris@chronosphere.io>
What this PR does / why we need it:
This PR allows us to configure whether we cache blocks in M3 after fetching them from disk. Disabling caching can be helpful for expensive reads as it reduces the amount of memory consumed by the query. Additionally, after a series of expensive reads, this change allows us to return to a normal steady state more quickly as we're not waiting for a bunch of historical data to eventually fall out of the cache.
Special notes for your reviewer:
Does this PR introduce a user-facing and/or backwards incompatible change?:
Does this PR require updating code package or user-facing documentation?: