Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Prometheus metrics igress and egress is inaccurate #6555

Closed
zhendongcmss opened this issue Mar 9, 2022 · 4 comments · Fixed by #6579
Closed

bug: Prometheus metrics igress and egress is inaccurate #6555

zhendongcmss opened this issue Mar 9, 2022 · 4 comments · Fixed by #6579
Assignees
Labels
bug Something isn't working

Comments

@zhendongcmss
Copy link
Contributor

zhendongcmss commented Mar 9, 2022

Issue description

There two routes, one priority is 0 route id = 1, anther is 30 route id = 3.

if request hit route 1, the prometheus metric igress and egress is accurate, but if request hit route 3, the igress and egress valuse is double of bandwidth.

I think that apisix records twice request_length and bytes_sent.

    metrics.bandwidth:inc(vars.request_length,
        gen_arr("ingress", route_id, service_id, consumer_name, balancer_ip))

    metrics.bandwidth:inc(vars.bytes_sent,
        gen_arr("egress", route_id, service_id, consumer_name, balancer_ip))

Environment

  • apisix version (cmd: apisix version): 2.7
  • OS (cmd: uname -a):
  • OpenResty / Nginx version (cmd: nginx -V or openresty -V):
  • etcd version, if have (cmd: run curl http://127.0.0.1:9090/v1/server_info to get the info from server-info API):
  • apisix-dashboard version, if have:
  • the plugin runner version, if the issue is about a plugin runner (cmd: depended on the kind of runner):
  • luarocks version, if the issue is about installation (cmd: luarocks --version):

Steps to reproduce

route configuration

{
  "action": "get",
  "node": {
    "key": "/apisix/routes",
    "dir": true,
    "nodes": [
      {
        "key": "/apisix/routes/1",
        "modifiedIndex": 6332825,
        "value": {
          "status": 1,
          "methods": [
            "PUT",
            "GET",
            "POST",
            "DELETE",
            "PATCH",
            "HEAD",
            "OPTIONS",
            "CONNECT",
            "TRACE"
          ],
          "upstream": {
            "pass_host": "pass",
            "nodes": {
              "10.235.79.7:8079": 10
            },
            "scheme": "http",
            "hash_on": "vars",
            "type": "roundrobin"
          },
          "uri": "/*",
          "create_time": 1646098895,
          "id": "1",
          "priority": 0,
          "update_time": 1646098895,
          "plugins": {
            "prometheus": {
              "prefer_name": false
            }
          }
        },
        "createdIndex": 6332825
      },
      {
        "key": "/apisix/routes/3",
        "modifiedIndex": 4020996,
        "value": {
          "status": 1,
          "methods": [
            "PUT",
            "GET",
            "POST",
            "DELETE",
            "PATCH",
            "HEAD",
            "OPTIONS",
            "CONNECT",
            "TRACE"
          ],
          "upstream": {
            "hash_on": "vars",
            "type": "roundrobin",
            "nodes": {
              "10.235.79.7:9096": 10
            },
            "pass_host": "pass",
            "scheme": "http"
          },
          "uri": "/*",
          "priority": 20,
          "vars": [
            "OR",
            [
              "host",
              "~*",
              ".*(abc.cn)$"
            ],
            [
              "host",
              "==",
              "10.235.82.1"
            ]
          ],
          "id": "3",
          "update_time": 1643262492,
          "create_time": 1638353491
        },
        "createdIndex": 1682
      }
    ]
  }
}

Client write: 145MB/s + 133MB/s + 144MB/s + 1.15GB/s = 1.55GB/s, but apisix igress is 2.85GB/s

image

image

Actual result

The bandwidth = apisix igress or egress

Error log

no

Expected result

No response

@shuaijinchao
Copy link
Member

@zhendongcmss when Global Rule and route enabled prometheus plugin at the same time, it will cause repeated reporting. we should merge the same plugin on the Global Rule and the route.

cc @spacewander

image
image

@shuaijinchao shuaijinchao added the bug Something isn't working label Mar 10, 2022
@zhendongcmss
Copy link
Contributor Author

zhendongcmss commented Mar 11, 2022

Another test case is, config prometheus on global rules only. Request http://127.0.0.1:9080. It will hit route id =3(priority 30) ,but the host doesn't match vars then hit route id = 1(priority 0). At this time , watch the igress and egress.

@spacewander
Copy link
Member

@zhendongcmss when Global Rule and route enabled prometheus plugin at the same time, it will cause repeated reporting. we should merge the same plugin on the Global Rule and the route.

cc @spacewander

image image

We can do it with

run_policy = "prefer_route",

@shuaijinchao
Copy link
Member

Another test case is, config prometheus on global rules only. Request http://127.0.0.1:9080. It will hit route id =3(priority 30) ,but the host doesn't match vars then hit route id = 1(priority 0). At this time , watch the igress and egress.

the solution is the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants