Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Loki size-based purge mechanism #2685

Open
alexandre-allard opened this issue Jul 29, 2020 · 1 comment
Open

Add Loki size-based purge mechanism #2685

alexandre-allard opened this issue Jul 29, 2020 · 1 comment
Labels
kind:enhancement New feature or request state:blocked Something prevents this from being worked on state:question Further information is requested topic:log Anything related to log centralization system

Comments

@alexandre-allard
Copy link
Contributor

alexandre-allard commented Jul 29, 2020

Component: salt, loki

Why this is needed:
Even with a max retention period, the logs could grow faster than what was expected and fill up the volume.
To ensure that Loki service will stay available, even in such cases, we need to implement a mechanism to purge old log chunks when the volume filling ratio is above a certain threshold.

What should be done:
Add a side-car container to Loki pod that will take care of purging the oldest log chunks when volume is almost full.

Implementation proposal (strongly recommended):

Test plan:

@alexandre-allard alexandre-allard added kind:enhancement New feature or request topic:log Anything related to log centralization system labels Jul 29, 2020
@gdemonet gdemonet added state:blocked Something prevents this from being worked on state:question Further information is requested labels Jul 29, 2020
@gdemonet
Copy link
Contributor

We do not want to implement this for now, for the following reasons:

  1. Oldest tables (mtime) do not mean oldest logs, so we may lose some interesting logs within our retention period because of a "side effect"
  2. Purging old tables automatically can be considered dangerous (we are hiding a capacity limit issue)
  3. Loki will get an equivalent to Prometheus' "delete-series" API (see Delete log streams via API grafana/loki#577), which will provide finer control for users (or software) when deleting logs/tables
  4. Loki aims to add support for volume-based retention in the future (see Loki crashes when the storage is full grafana/loki#2314 (comment))

For this reason, we want to hold back development of such a purge mechanism, to instead focus on:

  1. Defining a precise sizing rule for Loki (see Define the sizing rule for Loki volume #2687)

  2. Providing efficient alerting on our Loki deployment (issue TBD)

    • build composite alerts from metrics and other alerts (e.g. re-use PV alerts and metrics)
    • provide as much actionable info as possible (e.g. suggested retention period to match observed ingestion rate and storage capacity, links to procedures for cleaning some log tables)

@alexandre-allard alexandre-allard changed the title Add Loki volume purge mechanism Add Loki volume monitoring and alert Jul 31, 2020
@alexandre-allard alexandre-allard changed the title Add Loki volume monitoring and alert Add Loki size-based purge mechanism Aug 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:enhancement New feature or request state:blocked Something prevents this from being worked on state:question Further information is requested topic:log Anything related to log centralization system
Projects
None yet
Development

No branches or pull requests

2 participants