Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stack Monitoring] Additional APM metrics #43998

Closed
wants to merge 27 commits into from

Conversation

cachedout
Copy link
Contributor

@cachedout cachedout commented Aug 26, 2019

Summary

Adds new graphs to APM.

apm-1-1

apm-1-2

Checklist

Use strikethroughs to remove checklist items you don't feel are applicable to this PR.

For maintainers

Testing

To test this, you need to bring up APM server and configure it to send metrics to Stack Monitoring.

Important: I have tested this with Metricbeat as well but would still appreciate a second set of eyes on all of these tests with Metricbeat in addition to native monitoring.

For all metrics described below, check to ensure that they are present and populated on both the APM overview page in Stack Monitoring as well as in any APM instance page(s).

For each metric described in #44001, verify that it is present.

@simitt
Copy link
Contributor

simitt commented Aug 26, 2019

pulling in @elastic/apm-server for visibility

@cachedout cachedout marked this pull request as ready for review August 26, 2019 16:14
@cachedout
Copy link
Contributor Author

@chrisronline OK, this passed tests for me locally so it's probably ready for a first pass from you. Thanks!

@elasticmachine
Copy link
Contributor

💔 Build Failed

Copy link
Contributor

@chrisronline chrisronline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks GREAT so far! Awesome work!

I made a couple of small notes, but I don't see any major issues!

We should probably add/adjust existing tests to account for this, but I imagine you'll do that while getting CI passing. I'll wait to do a final review until then.

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💔 Build Failed

Copy link
Contributor

@simitt simitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this in that quickly. I left a couple of comments.

@cachedout
Copy link
Contributor Author

I've discovered a potential issue.

When the changes were made in elastic/apm-server#2142, there wasn't any change made to the templates which describe how the new fields should be indexed.

Here is the current Elasticsearch template which is applied to the Beats monitoring index when it is created:

https://github.com/elastic/elasticsearch/blob/master/x-pack/plugin/core/src/main/resources/monitoring-beats.json#L247

Note that the top-level keys in apm-server are server, decoder, and processor.

However, as discussed in #44001, there is now a new top-level key for apm-server, which is acm.

To move forward on this, I think we'd need to update the mapping first to support these new fields. That's going to involve, at a minimum, getting the Elasticsearch team (and @jakelandis) involved.

However, getting this done before a FF tomorrow seems very sketchy.

So, this puts us in a bit of a spot. The ES team might be willing to consider this a bug and let us get it in after the feature-freeze. However, the feature on the Kibana side (i.e., this PR) can't go in until the mapping is updated, because it will fail tests. (We could probably work around this but, again, it gets a little sketchy.)

I am going to keep looking at this but wanted to get these notes down for everybody to see and discuss.

@simitt My gut feeling here is that we are going to have to defer these changes for a later release. What's the impact on your side if we do so? It seems like the only issue may be potential discrepency between versions with the request metrics, since they're no longer present in the root endpoint. Are there other impacts you can see on your end from delaying this change?

@simitt
Copy link
Contributor

simitt commented Aug 27, 2019

Good catch @cachedout . Unfortunately I wasn't aware of that template. There are three new top level keys for apm-server now: acm, root and sourcemap, all three of them should be added to the template.

Since the values for the counters will not change, and are only created from within the APM Server, I don't think that creating the metrics will cause any issues like mapping errors. IMO the change in the APM Server can therefore be released with the next version as planned.

Regarding the monitoring UI, I agree that this sounds like a too short notice. Would it be feasible to not show the ACM counters for now, but still change the titles for the existing visualizations to make it clear that they only represent values for the Intake API endpoint?

@cachedout
Copy link
Contributor Author

Regarding the monitoring UI, I agree that this sounds like a too short notice. Would it be feasible to not show the ACM counters for now, but still change the titles for the existing visualizations to make it clear that they only represent values for the Intake API endpoint?

I think that is a very wise course of action. Let's do that. I'll start on a new PR now which just makes changes to the titles.

@chrisronline
Copy link
Contributor

@cachedout I think adding the new mappings to the template is pretty straight-forward and shouldn't take long to code up and review.

Here is an example of me doing this for APM in the past: elastic/elasticsearch#34392

It usually doesn't even require an Elasticsearch team member to review (so we don't have to worry about their time)

@elasticmachine
Copy link
Contributor

💔 Build Failed

@cachedout
Copy link
Contributor Author

@chrisronline Thanks for the link! You are right that this shouldn't be too bad. I assumed that ES folks would need to be involved so I am glad that is not the case.

After going over this with @simitt it seems like we can get #44091 in and then not be so rushed with these changes, which makes me feel a little better tbh. If we have the option to be more methodical with these changes, I think we should take it. WDYT?

@chrisronline
Copy link
Contributor

Sounds good to me!

@cachedout
Copy link
Contributor Author

Tests will not pass on this until elastic/elasticsearch#46244 is merged.

@elasticmachine
Copy link
Contributor

💔 Build Failed

@cachedout
Copy link
Contributor Author

Kind of a horrific rebase but I think I got all the relevant bits back in the right place. Let's see what the tests think. :) It would be nice to get this merged.

@cachedout
Copy link
Contributor Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💔 Build Failed

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@LeeDr LeeDr added v7.7.0 and removed v7.6.0 labels Feb 4, 2020
@cachedout
Copy link
Contributor Author

@chrisronline and @simitt This has been hanging around for a while. Is this still needed? If so, I can fix up the merge conflicts and let somebody take a swing at a review but since switching teams I probably don't have much time to put a lot of additional work in. Let me know what you think. Thanks!

@simitt
Copy link
Contributor

simitt commented Apr 21, 2020

@cachedout it would still be nice to have the additional metrics visualized. As a matter of fact we put in even more metrics with new features.

@cachedout
Copy link
Contributor Author

@chrisronline I'll defer to you here on what you would like to do. Would you like me to fix the merge conflicts and hand this over to you? Just let me know how you'd like to transfer this work over. :) Thanks!

@LeeDr
Copy link
Contributor

LeeDr commented May 7, 2020

Can someone update the labels for this PR please? It's definitely not going out in 7.7.0 and we're past FF for 7.8.0 (and this is an enhancement), so probably 7.9.0 now.

@cachedout
Copy link
Contributor Author

I believe this was superceded by #69052

@cachedout cachedout closed this Jun 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants