Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise Metrics #460

Merged
merged 185 commits into from
Nov 17, 2023
Merged

Revise Metrics #460

merged 185 commits into from
Nov 17, 2023

Conversation

dtrai2
Copy link
Contributor

@dtrai2 dtrai2 commented Oct 11, 2023

  • reimplemented metrics so the former metrics configuration won't work anymore
  • metric content changed and existent grafana dashboards will break
  • new rule id could possibly break configurations if the same rule is used in both rule trees
    • can be fixed by adding a unique id to each rule or delete the possibly redundant rule
  • add metrics on rule level
  • add grafana example dashboards under quickstart/exampledata/config/grafana/dashboards
  • add new configuration field id for all rules to identify rules in metrics and logs
    • if no id is given, the id will be generated in a stable way
    • add verification of rule id uniqueness on processor level over both rule trees to ensure metrics are counted correctly on rule level
  • reimplemented prometheus metrics exporter to provide gauges, histograms and counter metrics
  • removed shared counter, because it is redundant to the metrics
  • get exception stack trace by setting environment variable DEBUG
image

@codecov-commenter
Copy link

codecov-commenter commented Oct 11, 2023

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (2aa70de) 92.03% compared to head (45ac36e) 92.15%.
Report is 1 commits behind head on main.

Files Patch % Lines
logprep/abc/component.py 94.73% 1 Missing ⚠️
logprep/abc/processor.py 96.55% 1 Missing ⚠️
logprep/metrics/exporter.py 96.29% 1 Missing ⚠️
logprep/processor/grokker/processor.py 50.00% 1 Missing ⚠️
logprep/util/configuration.py 94.44% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #460      +/-   ##
==========================================
+ Coverage   92.03%   92.15%   +0.11%     
==========================================
  Files         132      130       -2     
  Lines        9508     9316     -192     
==========================================
- Hits         8751     8585     -166     
+ Misses        757      731      -26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dtrai2 dtrai2 added enhancement New feature or request refactoring only refactoring, no change in behavior labels Nov 8, 2023
improves also test-coverage by disabling tests on type checking imports
Copy link
Collaborator

@ppcad ppcad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those changes are amazing! I have only one small remark: The pipeline times are never added to events, even if APPEND_TO_EVENT is set. Still having those times would be great if that's possible without a massive hassle. Otherwise I would approve it now. What do you say?

There is one more thing: The directory PROMETHEUS_MULTIPROC_DIR is now deleted after Logprep is shut down.
This means that that the directory needs to be re-created each time Logprep restarts. Is this the desired behaviour?

@ekneg54
Copy link
Collaborator

ekneg54 commented Nov 10, 2023

hello @ppcad ,

the PROMETHEUS_MULTIPROC_DIR should only be cleared. not deleted. but you have to provide it.

in a kubernetes environment you could add a empty dir volume to the deployment and mount this. so the volume is there when logprep starts.

@ppcad
Copy link
Collaborator

ppcad commented Nov 10, 2023

Okay, in that case it needs to be fixed.
Logprep deletes it instead of clearing it.
At the moment you have to re-create it every time you restart Logprep.

just empty its content, restored a code state from an earlier version
@dtrai2
Copy link
Contributor Author

dtrai2 commented Nov 14, 2023

I checked out the pipeline processing times and how we could add them to the events. The problem is with the current implementation it is not possible as we are using two different decorators for this. If the events should be appended with the processing times we use one decorator for it and if we do not want to append the processing times we use a different decorator. Now the problem is that the decorators are applied on module load time. Meaning if we load a module before the configuration is parsed we might decorate a function with the wrong decorator. The processor python files are luckily loaded after the configuration was parsed, the pipeline though seems to be loaded before the configuration was parsed. So event if I extend the appending functionality the pipeline processing times still won't be appended as the configuration is set too late. I see now the following options:

  1. We refactor some of the run_logprep implementation such that most modules are not imported at the top of the file, but rather after the configuration was parsed.
  2. We do not use two separate decorators, but instead have one decorator with an if-statement inside, which would mean though that it will be evaluated on everything that tracks processing times.
  3. We do make it necessary that this is something that has to be configured via an environment variable. That way it would be independent of the load of pipeline.yaml config file. Or
  4. We drop pipeline processing times and only append the processor/rule processing times in the events.

If we want to add this again I personally would pick option 3. It is not consistent with our current configuration pattern, but has the smallest impact on code changes. Considering that this might be dropped in the future I think it is a good enough solution.

@ekneg54 @ppcad any thoughts on this?

@ppcad
Copy link
Collaborator

ppcad commented Nov 14, 2023

I agree. If it has the least impact on the code, then I would pick option 3, since appending to events will be eventually completely removed.

@ekneg54
Copy link
Collaborator

ekneg54 commented Nov 14, 2023

option 3 please 👍

- needs to be configured via a environment variable
@ekneg54 ekneg54 self-requested a review November 14, 2023 14:54
and log exit code of failed pipelines
@ekneg54 ekneg54 self-requested a review November 16, 2023 08:03
Copy link
Collaborator

@ekneg54 ekneg54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz add my changes in dashboards, cause it leads to not showing garbage collections

dtrai2 and others added 3 commits November 16, 2023 09:50
…oard.json

Co-authored-by: Jörg Zimmermann <101292599+ekneg54@users.noreply.github.com>
…oard.json

Co-authored-by: Jörg Zimmermann <101292599+ekneg54@users.noreply.github.com>
…oard.json

Co-authored-by: Jörg Zimmermann <101292599+ekneg54@users.noreply.github.com>
@dtrai2 dtrai2 requested review from ppcad and ekneg54 November 16, 2023 08:51
Copy link
Collaborator

@ppcad ppcad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I marked two places where the environment variable was not replaced with the new name.
Other than that it works great and I will approve it as soon as this is changed.

tests/unit/metrics/test_metrics.py Outdated Show resolved Hide resolved
tests/unit/metrics/test_metrics.py Outdated Show resolved Hide resolved
- used wrong env and with that wrong decorator
- added another assertion to ensure that correct decorator was used
@dtrai2
Copy link
Contributor Author

dtrai2 commented Nov 17, 2023

I marked two places where the environment variable was not replaced with the new name. Other than that it works great and I will approve it as soon as this is changed.

ah thanks for that find! It was actually also a broken test. With that wrong environment variable the test should have failed. I fixed the env and added an assertion.

Copy link
Collaborator

@ppcad ppcad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@ekneg54 ekneg54 merged commit c828dfc into main Nov 17, 2023
10 checks passed
@ekneg54 ekneg54 deleted the dev-revise-metrics branch April 3, 2024 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request refactoring only refactoring, no change in behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants