Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(plugins): Datadog for metrics collection #5372

Merged
merged 17 commits into from
Nov 9, 2021

Conversation

bisakhmondal
Copy link
Member

@bisakhmondal bisakhmondal commented Oct 29, 2021

What this PR does / why we need it:

This PR proves the backbone for integrating Datadog plugin with Apache APISIX.
At present it provides the following metrics:

  • request.count - counter
  • request.latency - histogram
  • upstream.latency - histogram
  • apisix.latency - histogram
  • ingress.size - timer
  • egress.size - timer

To enable datadog plugin via route, just run the following snippet

{
    "name": "my_route",
    "uri": "/hello",
    "upstream": {
        "type": "roundrobin",
        "nodes": {
            "127.0.0.1:80": 1
        }
    },
    "plugins":{
        "datadog":{},
    }
}

Pre-submission checklist:

  • Did you explain what problem does this PR solve? Or what new features have been added?
  • Have you added corresponding test cases?
  • Have you modified the corresponding document?
  • Is this PR backward compatible? If it is not backward compatible, please discuss on the mailing list first

@bisakhmondal bisakhmondal changed the title feat: datadog plugin for metrics collection feat(plugins): datadog plugin for metrics collection Oct 29, 2021
@bisakhmondal bisakhmondal changed the title feat(plugins): datadog plugin for metrics collection feat(plugins): Datadog for metrics collection Oct 29, 2021
@spacewander spacewander marked this pull request as draft October 31, 2021 12:00
@bisakhmondal
Copy link
Member Author

apisix/plugins/datadog.lua Outdated Show resolved Hide resolved
apisix/plugins/datadog.lua Outdated Show resolved Hide resolved
apisix/plugins/datadog.lua Outdated Show resolved Hide resolved
apisix/plugins/datadog.lua Outdated Show resolved Hide resolved
apisix/plugins/datadog.lua Outdated Show resolved Hide resolved
apisix/plugins/datadog.lua Outdated Show resolved Hide resolved
apisix/plugins/prometheus/exporter.lua Outdated Show resolved Hide resolved
apisix/plugins/prometheus/exporter.lua Outdated Show resolved Hide resolved
t/plugin/datadog.t Outdated Show resolved Hide resolved
@bisakhmondal bisakhmondal marked this pull request as ready for review November 1, 2021 06:56
t/lib/mock_dogstatsd.lua Outdated Show resolved Hide resolved
t/lib/mock_dogstatsd.lua Outdated Show resolved Hide resolved
t/plugin/datadog.t Show resolved Hide resolved
t/plugin/datadog.t Outdated Show resolved Hide resolved
t/plugin/datadog.t Outdated Show resolved Hide resolved
t/plugin/datadog.t Outdated Show resolved Hide resolved
message received: apisix.upstream.latency
message received: apisix.apisix.latency
message received: apisix.ingress.size
message received: apisix.egress.size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add 3 more tests:

  1. request twice & check the dump
  2. configure namespace in the metadata and check the different namespace
  3. configure tags in the metadata and check the difference

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

end

local function generate_tag(conf, const_tags)
local tags = ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it faster if we can use table.concat to concat the string instead?

Copy link
Member Author

@bisakhmondal bisakhmondal Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right,

Lua strings are immutable

effectively it will be O(N^2) instead of O(N) without a table. updating accordingly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimized

t/plugin/datadog.t Outdated Show resolved Hide resolved
t/plugin/datadog.t Outdated Show resolved Hide resolved
t/plugin/datadog.t Outdated Show resolved Hide resolved
t/plugin/datadog.t Outdated Show resolved Hide resolved
docs/en/latest/plugins/datadog.md Show resolved Hide resolved
docs/en/latest/plugins/datadog.md Show resolved Hide resolved
apisix/plugins/datadog.lua Outdated Show resolved Hide resolved
local entry = fetch_log(ngx, {})
entry.upstream_latency = ctx.var.upstream_response_time * 1000
entry.balancer_ip = ctx.balancer_ip or ""
entry.route_name = ctx.route_name or ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should fall back to route_id if the name is missing. And need a test for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code block for fallback logic is at L86
Okay, adding tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use route_name even there is only route_id? Therefore the name of tag is consistent between routes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfectly makes sense, Updated accordingly. Thanks for pointing it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spacewander ecommend using route name, won't the same routes name be confusing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! @tzssangglass thanks for the honest concern. Yeah, in a badly designed system, it might be an issue to segregate metrics of multiple routes having the same route_name. But users can always pick multiple tags for grouping (for eg. route_name:abc, service_id:12), provided they are designing APISIX infra with service level abstraction (just one example, they can use "balancer_ip, consumer" also for filtering).

@spacewander recommended using name over id because it's more readable to the user and there was this requirement in the Prometheus plugin (#5149). Do you think that we should add a param prefer_name at the plugin schema or we should defer it for the upcoming versions?
Please do let me know. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get. I think it is up to the community to decide if prefer_name is needed like the prometheus plugin.I suggest adding a note about that it could be misleading when route name could be duplicated for multiple.

Comment on lines +83 to +84
- **route_name**: Name specified in the route schema definition. If not present, it will fall back to the route id value.
- Note: If multiple routes have the same name duplicated, we suggest you to visualize graphs on the Datadog dashboard over multiple tags that could compositely pinpoint a particular route/service. If it's still insufficient for your needs, feel free to drop a feature request at [apisix/issues](https://github.com/apache/apisix/issues).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tzssangglass I hope this would be sufficient. WDYT?

@spacewander spacewander merged commit da691b9 into apache:master Nov 9, 2021
Comment on lines +58 to +61
| ----------- | ------ | ----------- | ------- | ----- | ---------------------------------------------------------------------- |
| host | string | optional | "127.0.0.1" | | The DogStatsD server host address |
| port | integer | optional | 8125 | | The DogStatsD server host port |
| namespace | string | optional | "apisix" | | Prefix for all the custom metrics sent by APISIX agent. Useful for finding entities for metric graph. e.g. (apisix.request.counter) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bisakhmondal please add doc and examples for how to configure host and port, thanks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @moonming do you mean how to update datadog metadata by calling the respective endpoint? Or Is it regarding the setup of a local dogstatsd agent?
Thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I means how to set the datadog endpoint

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, take this as an example: If I'm a newbie to use Apache APISIX and the datadog plugin, I'd like to know all the necessary steps after Apache APISIX runs successfully.

Like when I should update its metadata, and how I should setup its data with a Route.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got your point. Thanks for the elaborate explanation.
The plugin has a default metadata schema that assumes dogstatsd is available at localhost:8125. If someone needs to reconfigure that I have covered that on the blog which is soon going to be exported at apisix blog
Here is the link: https://blog.bisakh.com/blog/Monitoring-with-Datadog-in-Apache-APISIX

Do you think, we should add it here (the Steps to Run Datadog Agent section & Custom Configuration)? Thanks guys

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, I have read your blog[1], I think the Custom Configuration part is what @moonming and I need 😄

image

[1] https://blog.bisakh.com/blog/Monitoring-with-Datadog-in-Apache-APISIX

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool Cool, I'll open a PR with the code snippet and some extra details. Thanks, @juzhiyuan @moonming

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants