-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a Promtail stage for probabilistic sampling #6654
Labels
Comments
jeschkies
added
the
keepalive
An issue or PR that will be kept alive and never marked as stale.
label
Jul 11, 2022
4 tasks
MasslessParticle
pushed a commit
that referenced
this issue
Mar 17, 2023
…ng (#7127) <!-- Thanks for sending a pull request! Before submitting: 1. Read our CONTRIBUTING.md guide 2. Name your PR as `<Feature Area>: Describe your change`. a. Do not end the title with punctuation. It will be added in the changelog. b. Start with an imperative verb. Example: Fix the latency between System A and System B. c. Use sentence case, not title case. d. Use a complete phrase or sentence. The PR title will appear in a changelog, so help other people understand what your change will be. 3. Rebase your PR if it gets out of sync with main --> **What this PR does / why we need it**: The sampling stage can be directly sampled. The implementation of sampling is to use the algorithm in jaeger go client ``` pipeline_stages: - sampling: rate: 0.1 ``` or it can be used with match for precise sampling. ``` pipeline_stages: - json: expressions: app: - match: pipeline_name: "app2" selector: "{app=\"poki\"}" stages: - sampling: rate: 0.1 ``` **Which issue(s) this PR fixes**: Fixes #6654 **Special notes for your reviewer**: The promtail 'rate' stage is also used with the 'match' stage for log filtering.This design makes the code very clean. Other log agents vector have log filtering built into the sampling operator, which I think is too complicated https://vector.dev/docs/reference/configuration/transforms/sample/ ``` transforms: my_transform_id: type: sample inputs: - my-source-or-transform-id exclude: null rate: 10 ``` 'rate' stage review suggestions . #5051 ![image](https://user-images.githubusercontent.com/9583245/189461481-6ee4d835-2573-4b8e-8dec-2814620d758a.png) <!-- Note about CHANGELOG entries, if a change adds: * an important feature * fixes an issue present in a previous release, * causes a change in operation that would be useful for an operator of Loki to know then please add a CHANGELOG entry. For documentation changes, build changes, simple fixes etc please skip this step. We are attempting to curate a changelog of the most relevant and important changes to be easier to ingest by end users of Loki. Note about the upgrade guide, if this changes: * default configuration values * metric names or label names * changes existing log lines such as the metrics.go query output line * configuration parameters * anything to do with any API * any other change that would require special attention or extra steps to upgrade Please document clearly what changed AND what needs to be done in the upgrade guide. --> **Checklist** - [x] Documentation added - [x] Tests updated - [ ] Is this an important fix or new feature? Add an entry in the `CHANGELOG.md`. - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` --------- Co-authored-by: J Stickler <julie.stickler@grafana.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Is your feature request related to a problem? Please describe.
Sometimes we don't have direct access to what or how often an application logs, especially for third-party dependencies. Furthermore, I (personally) will primarily dig into logs when there’s something going wrong. 😅
For this reason, users may find it useful to be able to limit the ingestion rate of certain types of logs, both for lower costs and signal-to-noise reduction. Like we do for traces, probabilistic sampling is a good option here, and Loki’s label-based approach is also a natural fit. For example, users could sample to keep 10% of ‘trace’ level logs, or 20% of logs coming from an 200 - OK response, but always keep logs containing bad words like ‘fatal’ or ‘error’.
One thing to note here, is that ideally we'd try to preserve this information (using a new label?), so that the user can understand that it’s not that "200 - OK" responses have dropped by 80%, it’s just that they’re only logging a subset of them.
Describe the solution you'd like
Add a
sampling
stage to Promtail. Similar to thedrop
stage, it would accept a regex, a source, and a float between [0, 1].Then, it would check the log line contents or the given source and probabilistically drop or keep the log line.
Describe alternatives you've considered
Other than instrumenting the application emitting the logs itself, a similar effect could be obtained by a complex pipeline containing the timestamp and a regex on it (eg. to only ingest lines whose for timestamps where 'second' is between [0, 10]).
A similar effect is also obtained by using the
limit
stage, although it is a hardcoded limit, and not a probabilistic one.Additional context
A similar proposal was opened a long time ago, but failed to get any traction.
The text was updated successfully, but these errors were encountered: