Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

marker in panels to make user aware of data issues #6448

Closed
Dieterbe opened this issue Nov 2, 2016 · 12 comments
Closed

marker in panels to make user aware of data issues #6448

Dieterbe opened this issue Nov 2, 2016 · 12 comments
Labels
area/panel/common datasource/Graphite prio/low It's a good idea, but not scheduled for any release stale Issue with no recent activity type/feature-request

Comments

@Dieterbe
Copy link
Contributor

Dieterbe commented Nov 2, 2016

in metrictank it's possible to run into partial cassandra errors, which results in incomplete data returned for a request (the idea being it's better to return partial data then no data at all).

It would be nice if Grafana could show a warning icon on panels that have partial data. Metrictank (or graphite-api) could inform grafana by setting a http header or something.

Thinking out loud, there could be various classes of problems with a response that a datasource may want to return, and grafana could display:

  • incomplete data
  • lower resolution (e.g. when for some reason raw data is not available, but rolled up data is)
  • permission denied maybe
  • conflict between which rollup is selected and which aggregator is used in summarize()

these could be displayed as a warning icon or something

@torkelo
Copy link
Member

torkelo commented Nov 2, 2016

would be really useful when having multiple queries with different data sources and some complete and some fail

@Dieterbe
Copy link
Contributor Author

another case is when certain shards are down and the response may be partial

@Dieterbe
Copy link
Contributor Author

cc @ryantxu

@Dieterbe
Copy link
Contributor Author

basically what we should do is metrictank response json should have a section "warnings" or something, that is a list of string. each string is 1 warning message, in freefrom text (but can be expected to typically be short, maybe a sentence or two)

and grafana can then plot the response and display a warning icon if there are warnings

@ryantxu
Copy link
Member

ryantxu commented Dec 19, 2019

You can put errors in the result and still return data -- cloudwatch currently does this.

The end use sees graphs as usual, but it has the red explanation mark in the corner that you click for more info.

Is there a strong reason to have yellow vs red? In the case you list above error vs warning really depends on what you are trying to use it for. I can see warning for "it was slow" but not for "it may be incomplete!"

@ryantxu
Copy link
Member

ryantxu commented Dec 19, 2019

Short answer... if metrictank returns a "warning" or "error" message, we can display that today

@Dieterbe
Copy link
Contributor Author

Dieterbe commented Dec 19, 2019

Yeah, the severity of specific messages is subject to context, interpretation and use case. this could easily become a bikeshed discussion.
But I think it's fair to say that there are broadly 2 categories:

  1. things that meant the request could not be responded to at all (http 4xx or http 5xx errors, datasource cannot be reached, etc)
  2. when the response could be responded to, but it comes with "caveats", with varying degrees of importance.

I can see your line of thinking in that in my examples of warnings above, they could also be considered errors, and perhaps we should err on the cautious side.
But:
a) in metrictank you can configure a toleration for the degree of cluster degradation (shard unavailability), before we start erroring responses. implying that if you set this to a certain number, it means you don't consider a certain degree of incompleteness that problematic. so perhaps this should be a warning
b) when there is a warning/error, the 2 cases whether or not response data is included seems like a useful separation. perhaps there's where we should draw the line between warning and error. But perhaps this needs no further separation because the difference is fairly obvious between these cases, visually ;)

ultimately though I mainly care that responses can be plotted and additionally the messages shown via an indicator (rather then the situation right now: you either communicate an error via non-2xx response code and return no data, or return 2xx with data, but don't get to communicate an error. as far as i know that's currently the only option for graphite/metrictank )
whether that indicator is an error or warning indicator or is orange or red, i won't lose my sleep over it.

Do you have a preference for exactly how/where in the response body this included?
The response body from metrictank currently looks like the below. Perhaps we can simply include "warnings" and/or (?) "errors" as a top-level key, along with version, meta and series.

{
    "version": "v0.1",
    "meta": {
        "stats": {
	    ...
        }
    },
    "series": [
        {
            "target": "foo",
            "tags": {
		    ...
            },
            "datapoints": [
	        ...
            ],
            "meta": [
		    <series meta>
            ]
        },
    ]
}

@Dieterbe
Copy link
Contributor Author

The other question is, what statuscode to use.
200 OK with a bunch of warnings/errors in the body is probably not the most sensible.
I can't quite find the appropriate status code.. perhaps "503 Service Unavailable", but then Grafana should still try to read a json body (formatted as above). I believe it currently gives up on any data rendering when the response is 4xx or 5xx.

@Dieterbe
Copy link
Contributor Author

Dieterbe commented Jan 6, 2020

dieter 8:38 PM
so ryan was suggesting for warnings, we can return a non-2xx response code but still contain data in the response to render
torkelo 8:47 PM
not sure that is ideal, best way would be to have a 200 response but something in the response body
as non 200 response will not be processed as a normal data response
dieter 8:48 PM
ok that works for me too

@ryantxu
Copy link
Member

ryantxu commented Jan 6, 2020

non-2xx response code but still contain data in the response to render

I would say a 200, but with errors in the body. This is what we do for cloudwatch

@aocenas aocenas added the prio/low It's a good idea, but not scheduled for any release label Sep 23, 2021
Copy link
Contributor

This issue has been automatically marked as stale because it has not had activity in the last year. It will be closed in 30 days if no further activity occurs. Please feel free to leave a comment if you believe the issue is still relevant. Thank you for your contributions!

@github-actions github-actions bot added the stale Issue with no recent activity label Jan 19, 2024
Copy link
Contributor

This issue has been automatically closed because it has not had any further activity in the last 30 days. Thank you for your contributions!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/panel/common datasource/Graphite prio/low It's a good idea, but not scheduled for any release stale Issue with no recent activity type/feature-request
Projects
None yet
Development

No branches or pull requests

5 participants