Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add loki integration #21

Merged
merged 5 commits into from
Oct 21, 2024
Merged

feat: add loki integration #21

merged 5 commits into from
Oct 21, 2024

Conversation

amandahla
Copy link
Collaborator

Applicable spec:

Overview

This PR adds loki integration. Collects metrics from workload and the charm. Also provides alert rules in case of errors.

Rationale

Collect and monitor logs.

Juju Events Changes

Module Changes

Library Changes

lib/charms/observability_libs/

Checklist

@amandahla amandahla changed the title feat: Add loki integration feat: add loki integration Oct 16, 2024
Copy link

Test coverage for e7dc373

Name           Stmts   Miss Branch BrPart  Cover   Missing
----------------------------------------------------------
src/charm.py     154     11     22      6    90%   158-160, 179, 183, 205, 209, 227, 254, 259, 284
----------------------------------------------------------
TOTAL            154     11     22      6    90%

Static code analysis report

Run started:2024-10-17 17:29:32.702930

Test results:
  No issues identified.

Code scanned:
  Total lines of code: 683
  Total lines skipped (#nosec): 0
  Total potential issues skipped due to specifically being disabled (e.g., #nosec BXXX): 0

Run metrics:
  Total issues (by severity):
  	Undefined: 0
  	Low: 0
  	Medium: 0
  	High: 0
  Total issues (by confidence):
  	Undefined: 0
  	Low: 0
  	Medium: 0
  	High: 0
Files skipped (0):

Copy link

@yanksyoon yanksyoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! questions for my own understandings only :D

src/charm.py Show resolved Hide resolved
src/charm.py Show resolved Hide resolved
@yanksyoon
Copy link

Btw, would you have integration tests for loki logs? Perhaps something like https://github.com/canonical/jenkins-k8s-operator/blob/main/tests/integration/test_cos.py#L60 might suffice :D

@amandahla
Copy link
Collaborator Author

amandahla commented Oct 21, 2024

Btw, would you have integration tests for loki logs? Perhaps something like https://github.com/canonical/jenkins-k8s-operator/blob/main/tests/integration/test_cos.py#L60 might suffice :D

Good point, thanks for sharing, I'll add one in a further PR.
Update: ISD-2574 store created.

@amandahla amandahla merged commit 321465d into main Oct 21, 2024
17 checks passed
@amandahla amandahla deleted the ISD-2498-loki branch October 21, 2024 19:19
@@ -0,0 +1,8 @@
alert: MaubotErrorFound
expr: rate({%%juju_topology%%, pebble_service="maubot"} |= "level=error"[1m]) > 0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amandahla Since alerts should always be actionable, is it expected that there is an action for every logged error? Might it be worth raising the threshold a bit (e.g. 5 errors per minute, which would be > 0.083)? Same for the NGINX alert below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I have no idea how Maubot behaves in a production environment with high usage, I decided to be extra cautious. I don't have a strong opinion, so we could start with a 5m rate and see what happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants