Weekly Emails - generating reports is sometimes broken #2472

fbarl · 2019-01-17T15:17:46Z

I just tried to generate a report preview in https://frontend.dev.weave.works/admin/users/weeklyreports for our Weave Cloud (Dev) instance and got {"errors":[{"message":"An internal server error occurred"}]} in the browser.

A closer inspection into the users service shows:

2019-01-17T15:00:34.030603718Z time="2019-01-17T15:00:34Z" level=error msg="POST /admin/users/weeklyreports/preview: execution: multiple matches for labels: grouping labels must ensure unique matches"

The error seems to occur with Prometheus queries and points at this line of code: https://github.com/prometheus/prometheus/blob/a1f34bec2e6584a2fee9aec901f3157e3e12cbaa/promql/engine.go#L1498

It probably somehow links to:

service/users/weeklyreports/report.go

Line 47 in 3c32621

func buildWorkloadsResourceConsumptionQuery(resourceQuery string) string {

The scope of the issue is unclear.

The text was updated successfully, but these errors were encountered:

ngehani · 2019-01-17T16:51:06Z

oy! who is working on deciphering it? @guyfedwards @foot while @fbarl is on vacation next week

fbarl · 2019-01-17T17:00:28Z

FYI running the same queries as weekly reporter on our Weave Cloud (Dev) instance notebooks does result in the same errors for the period of last week: https://frontend.dev.weave.works/proud-wind-05/monitor/notebook/931f18f1-5516-4f40-bdd9-a03aa3f24f60?timestamp=2019-01-14T00:00:00Z

The same query passes if we shift the window 3 days later (https://frontend.dev.weave.works/proud-wind-05/monitor/notebook/931f18f1-5516-4f40-bdd9-a03aa3f24f60?timestamp=2019-01-17T00:00:00Z), so I wonder if some sort of outage or corrupted data is to blame.

In any case, we should probably edit the queries to make them more robust (after we pin down the exact issue).

foot · 2019-01-18T16:57:58Z

Yep, I can have a look on Monday!

foot · 2019-01-21T15:49:58Z

Didn't mean to close this..

foot · 2019-01-21T16:22:56Z

This seems to be the worst point, where you cannot get a table of first query: https://frontend.dev.weave.works/proud-wind-05/monitor/notebook/39882902-c2f5-4030-af6c-92aeda4f7e1d?timestamp=2019-01-07T18:00:00Z

foot · 2019-01-21T17:21:07Z

Down to 1m resolution w/ error https://frontend.dev.weave.works/proud-wind-05/monitor/notebook/39882902-c2f5-4030-af6c-92aeda4f7e1d?range=15m&timestamp=2019-01-07T17:42:49Z

foot · 2019-01-22T11:49:43Z

I'm not getting very far w/ this. Comparing the sum by (namespace, pod_name) ... side of the join when its good and when its bad looks incredibly similar so I'm really not sure what the error message means.

@dlespiau any ideas about making this query more robust?

We could still roll this out. Some users might not get an error report one week...

foot · 2019-01-30T09:27:33Z

Some more poking around here: https://frontend.dev.weave.works/proud-wind-05/monitor/notebook/ddd09f7e-17e4-4ca2-8017-043d3f463353?range=15m&timestamp=2019-01-07T17:42:49Z

I can make it work by excluding a particular container (dbshell.*), but I haven't figured out what it is about that vector that clashes with the other one...

foot · 2019-01-30T10:36:51Z

@bboreham any thoughts on where the Error: multiple matches for labels: grouping labels must ensure unique matches message might be coming from in the above notebook?

My next step would be to try and dump out that time block into a local prom instance that I could perhaps adds additional debugging code into. I will read about exporting in a bit..

foot · 2019-01-30T12:36:58Z

Alrighty updated notebook again w/ another variation that works down the very bottom:

sum by (namespace, pod_name, job) (rate(container_cpu_usage_seconds_total{image!=''}[1m])) / ignoring(namespace, pod_name, job) group_left count(node_cpu{mode='idle'})

Will job always be cadvisor?

foot · 2019-02-01T13:30:22Z

Another repro: https://frontend.dev.weave.works/frozen-violet-32/monitor/notebook/126f6a49-14f4-4655-ad37-a8f6da47aa59?range=24h&timestamp=2019-01-23T23:20:09Z

foot · 2019-02-01T13:32:44Z

View this after a weekly email run to see if there were any hits:

https://cloud.weave.works/admin/kibana/app/kibana#/discover?_g=(refreshInterval:(display:Off,pause:!f,value:0),time:(from:now-24h,mode:quick,to:now))&_a=(columns:!(_source),index:'467ce210-9a2f-11e8-bd66-3ba0e34a97d0',interval:auto,query:(language:lucene,query:'%22ensure%20unique%20matches%22'),sort:!('@timestamp',asc))

foot · 2019-02-25T00:59:12Z

Only 2 instances had this error on Feb 18th: https://cloud.weave.works/admin/kibana/app/kibana#/discover?_g=(refreshInterval:(display:Off,pause:!f,value:0),time:(from:'2019-02-18T07:56:21.378Z',mode:absolute,to:'2019-02-18T08:17:50.831Z'))&_a=(columns:!(_source),index:'467ce210-9a2f-11e8-bd66-3ba0e34a97d0',interval:auto,query:(language:lucene,query:'%22ensure%20unique%20matches%22'),sort:!('@timestamp',asc))

foot · 2019-02-25T00:59:25Z

Opened an issue in cortexproject/cortex#1245

fbarl added bug broken end user functionality; not working as the developers intended it component/users labels Jan 17, 2019

foot closed this as completed Jan 21, 2019

foot reopened this Jan 21, 2019

foot self-assigned this Jan 23, 2019

foot mentioned this issue Jan 31, 2019

[wip] Fixes a weekly-report query that sometimes fails #2490

Open

ozamosi added the stale Bulk closing old, stale issues label Nov 4, 2021

ozamosi closed this as completed Nov 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weekly Emails - generating reports is sometimes broken #2472

Weekly Emails - generating reports is sometimes broken #2472

fbarl commented Jan 17, 2019

ngehani commented Jan 17, 2019

fbarl commented Jan 17, 2019 •

edited

Loading

foot commented Jan 18, 2019

foot commented Jan 21, 2019

foot commented Jan 21, 2019

foot commented Jan 21, 2019 •

edited

Loading

foot commented Jan 22, 2019

foot commented Jan 30, 2019

foot commented Jan 30, 2019

foot commented Jan 30, 2019

foot commented Feb 1, 2019

foot commented Feb 1, 2019

foot commented Feb 25, 2019

foot commented Feb 25, 2019

Weekly Emails - generating reports is sometimes broken #2472

Weekly Emails - generating reports is sometimes broken #2472

Comments

fbarl commented Jan 17, 2019

ngehani commented Jan 17, 2019

fbarl commented Jan 17, 2019 • edited Loading

foot commented Jan 18, 2019

foot commented Jan 21, 2019

foot commented Jan 21, 2019

foot commented Jan 21, 2019 • edited Loading

foot commented Jan 22, 2019

foot commented Jan 30, 2019

foot commented Jan 30, 2019

foot commented Jan 30, 2019

foot commented Feb 1, 2019

foot commented Feb 1, 2019

foot commented Feb 25, 2019

foot commented Feb 25, 2019

fbarl commented Jan 17, 2019 •

edited

Loading

foot commented Jan 21, 2019 •

edited

Loading