-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Promxy returns no rows when using long range #537
Comments
I did some poking around on this and was unable to reproduce this issue. If you could provide some more information to reproduce the issue (ideally some example pointing at |
@jacksontj I've collected tcpdump for you In this case promxy is trying to collect data with query No issues exists when request is done directly from one of instances |
Thanks for the pcap, that definitely clears things up quite a bit! Detailed ExplanationIn this pcap there are 4 entities:
In the pcap we can see the client send the following query to promxy:
All good so far. When we look at the queries that promxy sends to the VM downstreams we see similar data:
At this point things still look good, the query was effectively just passed down to the downstream VM boxes -- which is what we expect. Next if we check the response from either (looking at "A" here):
Which at first glance seems reasonable, but upon further inspection we note that something is off with the times:
So to put that in more concrete language, this shows that the downstream VictoriaMetrics boxes are returning data for a time close to what was requested, but not what was actually requested. This is actually an (unfortunately) already known behavior with VictoriaMetrics that was captured in #202. The short version there is that VM has some internal caching -- but the way the prometheus API contract/iterators/promql-engine/etc. work -- the times aren't "close enough" (in this case "A"'s response was 1297.432s (~22 minutes) off of the requested time. This ends up working for shorter time ranges as the "incorrect" times are "close enough". Thankfully since this is a known issue, this can be configured around by adding TLDRThis appears to be some VictoriaMetrics caching behavior causing issues -- which can be configured around by adding |
Thanks for your detailed response! |
@jacksontj thanks, your solution is working. But, unfortunately, it causes high CPU utilization on VM instances, even though it's expected behaviour Is there any way to configure Promxy to make parallel requests to each instance? Can it be realised at this moment? |
Sorta, this can be achieved with relative or absolute time filters -- basically creating 2 server groups one for "recent' and one for "old". But this wouldn't be HA -- it would just "shard" the query across those 2 nodes for performance improvement at the expense of redundancy (think RAID 0 vs RAID1 -- if thats a helpful analogy).
The only other hacky solution I can think of is to mess with the LookbackDelta (I think they changed the name, but that option) to have promql honor those incorrect timestamps from VM. The only other idea I have is we could hack in the same time adjustment into promxy (basically implement the logic here) but I'm not so sure about that as we'd start breaking the API contract ourselves... have to think about that some.
Out of curiosity, do you have some data on how much increase (how much QPS and what CPU util before and after was). In general this is unfortunate as there seemingly no way to enable caching but still honor the API contract. (Remember the issue here was that the VM response didn't adhere to the start/end defined in the API call). If the caller is something like grafana you could consider using trickster to cache at the API layer -- that does add complexity but may be a reasonable approach? I do have an issue to add caching to promxy but that is a relatively large lift and hasn't been a major priority for most. |
After some consideration I don't think its a good plan to implement the same adjust logic within promxy (as it does break the prometheus API contract). That being said, making a middleware/proxy to do the VMAdjust for the query_range method should be pretty trivial to do (I hacked it up and it adds ~200k lines of dependencies, so just a bit much to add to this project -- which already has so many dependencies). IMO this sort of Timestamp adjusting would make sense as a VM middleware proxy -- since its their custom logic (which is non-standard and technically a violation of the API contract). |
Hi, trying to use promxy as a frontend in front of two VictoriaMetrics using a very simple config:
CLI arguments are
Everything works like a charm except strange behavior when I use some long timerange. For example, I have a query
up == 1
- it works fine with any time ranges within my date retention. But when I queryingnode_filesystem_size_bytes{hostname="foobar"}
I had no results for timeranges >= 43 days.Response is 200 OK
{"status":"success","data":{"resultType":"matrix","result":[]}}
debug logs said nothing special.Could anybody point me what's wrong?
Thanks!
The text was updated successfully, but these errors were encountered: