-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEATURE: Metrics #3214
Closed
tillprochaska opened this issue
Jul 17, 2023
· 2 comments
· Fixed by #3216 or alephdata/servicelayer#111
Closed
FEATURE: Metrics #3214
tillprochaska opened this issue
Jul 17, 2023
· 2 comments
· Fixed by #3216 or alephdata/servicelayer#111
Labels
backend
Issues related to Aleph’s backend, API, CLI etc.
feature-request
Requests for new features or enhancements of existing features
Comments
tillprochaska
added
backend
Issues related to Aleph’s backend, API, CLI etc.
feature-request
Requests for new features or enhancements of existing features
labels
Jul 17, 2023
Merged
50 tasks
1 task
Reopening because the issue was auto-closed by a subtask |
tillprochaska
added a commit
that referenced
this issue
Nov 22, 2023
tillprochaska
added a commit
that referenced
this issue
Jan 15, 2024
tillprochaska
added a commit
that referenced
this issue
Jan 16, 2024
* Add Prometheus instrumentation Closes #3214 * Fix missing bind argument * Run Prometheus exporter as a separate service * Expose number of streaming requests and number of streamed entities as metrics * Expose number of auth attempts as Prometheus metrics * Update Helm chart to expose metrics endpoints, setup ServiceMonitors * Handle requests without Authz object gracefully * Rename Prometheus label to "api_endpoint" to prevent naming clashes Prometheus Operator also uses the "endpoint" label and automatically renames "endpoint" labels exposed by the metrics endpoint to "exported_endpoints" which is ugly. * Add xref metrics * Use common prefix for all metric names Even though it is considered an anti-pattern to add a prefix with the name of the software or component to metrics (according to the official Prometheus documentation), I have decided to add a prefix. I’ve found that this makes it much easier to find relevant metrics. The main disadvantage of per-component prefixes queries become slightly more complex if you want to query the same metric (e.g. HTTP request duration) across multiple components. This isn’t super important in our case though, so I think the trade-off is acceptable. * Expose Python platform information as Prometheus metrics * Remove unused port, network policy from K8s specs Although I'm not 100% sure, the exposed port 3000 probably is a left-over from the past, possibly when convert-document was still part of ingest-file. The network policy prevented Prometheus from scraping ingest-file metrics (and as the metrics port is now the only port exposed by ingest-file, should be otherwise unnecessary). * Use keyword args to set Prometheus metric labels As suggested by @stchris * Bump servicelayer from 1.22.0 to 1.22.1 * Simplify entity streaming metrics code There’s no need to do batched metric increments until this becomes a performance bottleneck. * Limit maximum size of Prometheus multiprocessing directory * Do not let collector classes inherit from `object` I copied the boilerplate for custom collectors from the docs without thinking about it too much, but inheriting from `object` really isn’t necessary anymore in Python 3. The Prometheus client also exports an abstract `Collector` class -- it doesn’t do anything except providing type hints for the `collect` method which is nice. * Add `aleph_` prefix to Prometheus API metrics * Fix metrics name (singular -> plural) * Add documentation on how to test Prometheus instrumentation in local Kubernetes cluster
simonwoerpel
pushed a commit
to investigativedata/aleph
that referenced
this issue
Apr 22, 2024
* Add Prometheus instrumentation Closes alephdata#3214 * Fix missing bind argument * Run Prometheus exporter as a separate service * Expose number of streaming requests and number of streamed entities as metrics * Expose number of auth attempts as Prometheus metrics * Update Helm chart to expose metrics endpoints, setup ServiceMonitors * Handle requests without Authz object gracefully * Rename Prometheus label to "api_endpoint" to prevent naming clashes Prometheus Operator also uses the "endpoint" label and automatically renames "endpoint" labels exposed by the metrics endpoint to "exported_endpoints" which is ugly. * Add xref metrics * Use common prefix for all metric names Even though it is considered an anti-pattern to add a prefix with the name of the software or component to metrics (according to the official Prometheus documentation), I have decided to add a prefix. I’ve found that this makes it much easier to find relevant metrics. The main disadvantage of per-component prefixes queries become slightly more complex if you want to query the same metric (e.g. HTTP request duration) across multiple components. This isn’t super important in our case though, so I think the trade-off is acceptable. * Expose Python platform information as Prometheus metrics * Remove unused port, network policy from K8s specs Although I'm not 100% sure, the exposed port 3000 probably is a left-over from the past, possibly when convert-document was still part of ingest-file. The network policy prevented Prometheus from scraping ingest-file metrics (and as the metrics port is now the only port exposed by ingest-file, should be otherwise unnecessary). * Use keyword args to set Prometheus metric labels As suggested by @stchris * Bump servicelayer from 1.22.0 to 1.22.1 * Simplify entity streaming metrics code There’s no need to do batched metric increments until this becomes a performance bottleneck. * Limit maximum size of Prometheus multiprocessing directory * Do not let collector classes inherit from `object` I copied the boilerplate for custom collectors from the docs without thinking about it too much, but inheriting from `object` really isn’t necessary anymore in Python 3. The Prometheus client also exports an abstract `Collector` class -- it doesn’t do anything except providing type hints for the `collect` method which is nice. * Add `aleph_` prefix to Prometheus API metrics * Fix metrics name (singular -> plural) * Add documentation on how to test Prometheus instrumentation in local Kubernetes cluster
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backend
Issues related to Aleph’s backend, API, CLI etc.
feature-request
Requests for new features or enhancements of existing features
Is your feature request related to a problem? Please describe.
Aleph currently doesn’t expose any metrics directly. At OCCRP, we track some log-based metrics as well as metrics from ElasticSearch, but we’d like to start instrumenting Aleph directly in order to operate Aleph better and to get insights into how Aleph features are adopted.
Metrics we’re interested in include the following. We already collect some of these metrics indirectly, but we should start collecting them using explicit instrumentation.
Describe the solution you'd like
We should expose metrics in a standard format like Prometheus or OpenTelemetry.
Additional context:
There are a few challenges implementing this:
The text was updated successfully, but these errors were encountered: