-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Grafana dashboard for monitoring OPEA application scaling in k8s #541
Conversation
Currently dashboard relies on HTTP But depending on whether following PR is merged for v1.1, that particular metric may need to be changed before v1.1: opea-project/GenAIComps#864 |
I can add blurb about this also to README, but scaling is currently a bit of corner case, so IMHO it could come also in next release. Larger question about Observability README, and things it refers to, is what to do with Regarding the dashboards under that:
I.e. neither handles properly cases when cluster is running multiple OPEA applications with TGI instances. The new dashboard covers first one to some extent. TGI details dashboard could be updated to have similar selectors as this new dashboard. |
FYI: I'm going to change dashboard "Failures" heading to "Incomplete requests". I do not think half of TGI requests are failures, but that frontend needs to request rest of reply with another query before TGI deems it "complete" (successful). |
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
for more information, see https://pre-commit.ci
Dashboard changes:
|
Description
Adds Grafana dashboard for monitoring OPEA application scaling:
And a helper script for installing dashboard k8s
configMap
s for Grafana.Unlike earlier ChatQnA dashboard, this handles multiple OPEA application having same names but being in separate namespaces. User selects namespace and then the OPEA application from that. If cluster has only one running, Dashboard will default to that.
(Therefore it does not make sense to install dashboard with application specific Helm charts, as it can cover all apps that use TGI for LLM, i.e. most of them.)
Issues
n/a
.Type of change
Dependencies
n/a
.Tests
Manual testing of the script and dashboard working.