APM server monitoring #32515

ycombinator · 2018-07-31T18:43:37Z

This PR:

Adds anapm_system built-in role with privileges to call the bulk Monitoring API, analogous to the beats_system built-in role.
Adds an apm_system built-in user with the apm_system role, analogous to the beats_system built-in user.

ycombinator · 2018-07-31T18:44:17Z

elasticmachine · 2018-07-31T18:44:39Z

Pinging @elastic/es-core-infra

pickypg

Some thoughts about creating a new index. I'm definitely more against creating a new one than for, but it does follow a pattern.

pickypg · 2018-07-31T18:48:27Z

x-pack/plugin/core/src/main/resources/monitoring-apm-server.json

+    "index.number_of_replicas": 0,
+    "index.number_of_shards": 1
+  },
+  "version": 7000001,


FYI: When this is backported, it needs to share the same version as the other templates.

pickypg · 2018-07-31T18:51:31Z

x-pack/plugin/core/src/main/resources/monitoring-apm-server.json

+            }
+          }
+        },
+        "beats_stats": {


I wonder if we should roll APM server stuff into the Beats index. We already explicitly ignore apm-server data for all Beats monitoring UI, so it would help to avoid an extra template and set of indices, and the index is going to share a majority of the same fields with Beats.

We've already run into a lot of confusion with using beats within apm-server - I think users who monitor their indices and don't use beats will be very surprised to see a beats index present.

Are there any security concerns that would push towards separating the indexes?

Yeah, @ruflin, @graphaelli and I discussed this briefly off PR. The advantages I see of giving APM server its own set of monitoring indices are:

We could, in the future, create a separate built-in security role for APM server monitoring and restrict it's write access to only APM server monitoring indices.

The mapping for APM server monitoring indices can grow to fit APM server needs without "polluting" the general beats mapping.

(Relatively minor) It removes the filtering in the Monitoring UI code for excluding APM server documents when looking at Beats monitoring and, vice versa, for only including APM server documents when looking at APM server monitoring.

The very big negative for adding a new index is that it creates a drain on Elasticsearch resources. Chances are, users won't have a lot of apm-server instances reporting to the Monitoring cluster, so creating a separate index for such a low scale use case means we're adding the shards for what will likely be about 24 * 60 * 60 / 10 = 8640 documents per day. At the same time, we'll require Elasticsearch to search each one of those indices for data whenever we try to summarize the stack views.

That's extremely wasteful, especially for small instances/clusters that can generally host monitoring data, but are now getting pushed closer to their limits because of the added load from threading the searches, joining the data in memory, growing the cluster state, and adding the segments.

I think users who monitor their indices and don't use beats will be very surprised to see a beats index present.

We could get around this fear by either documenting it, or updating the index pattern for .monitoring-beats-6-* to be .monitoring-beats-apm-6-*. We could do this for 6.5 and add an alias to .monitoring-beats-6-alias, to the template, so that there is no impact to the UI.

We could, in the future, create a separate built-in security role for APM server monitoring and restrict it's write access to only APM server monitoring indices.

We can still do this. The product user (Beats, Kibana, Logstash) writes indirectly to the monitoring data, so adding a user specific to APM does not require its own index. When Beats gains the ability to monitor APM then we can restrict access by adding a suffix to the index (e.g., .monitoring-beats-apm-6-apm-*) and limit the user to that, if we so choose. It would be ideal, even if we do split the indices, to think of them as a .monitoring-* index pattern rather than the subsets. This should help us, in the future, to avoid this kind of problem. We never should have separated Kibana into its own index. We did that for curation and this, but it's useless on its own, tiny, and never curated separately.

I had a chat about this with @ycombinator last week and we expect that apm-server metrics potentially diverge from the Beats ones and we would prefer to keep it separate to have the flexibility in the future.

For the resource usage problem we should rather look into using ILM to have less indices and shards in my opinion instead of mixing multiple products into 1 index.

The other extreme we could go is having the data form all products in 1 index as we currently do with Metricbeat, but we should not start to mix both options.

Hey @pickypg, I'd like to move this PR forward a bit this week. Could you please take a look at @ruflin's comment above? Thanks!

@ruflin

we expect that apm-server metrics potentially diverge from the Beats ones and we would prefer to keep it separate to have the flexibility in the future.

Isolating data by the prefixed type has always worked for stack monitoring and nothing about that should be impacted by such deviations unless they try to share a type (e.g., beats_stats) while changing the meaning of specific fields. I don't expect that to happen (versus augmenting it, which is fine).

For the resource usage problem we should rather look into using ILM to have less indices and shards in my opinion instead of mixing multiple products into 1 index.

I agree that we should be using ILM/rollover when it is released, regardless of what we do here. However, at least for the short term, to avoid expanding the amount of data that we're storing on existing monitoring clusters, we will want to configure ILM to prune data based on time to match existing behavior (7 days by default). Longer term, particularly once Beats takes over, we can hopefully revisit this solution to be based more on size. Also, adding the shards and extra index mappings is an unnecessary impact that I really think that we should avoid.

The other extreme we could go is having the data form all products in 1 index as we currently do with Metricbeat, but we should not start to mix both options.

I strongly think that we should move in that direction. With Beats, if we / the user decides that they want their data isolated by product, then it's as simple as modifying the index like how I am suggesting to add the -apm suffix. It really does boil down to a huge mistake that we aren't already doing this.

Think about the impact on the cluster from sharing the index relative to Beats-based usage:

Searches use the same shards

Bulk activity can use the same shards

Versus separate indices:

Searches use separate shards

Index mappings are separated (growing the cluster state and routing table)

Bulk activity cannot be optimized further than by product

I chatted with @ruflin off-PR. For now, for the reasons that @pickypg has articulated in this thread, I'm going to not create a separate index (and template) for APM server monitoring documents. In the longer run, it's likely that we'll be changing indices altogether, perhaps consolidating all monitoring data into metricbeat indices; we can always revisit the indexing strategy then. So for now I'm going to update this PR to remove the index-related changes.

I will, however, keep the changes for a new built-in role and user for APM server monitoring, since asking users/admins to use beats users/roles could be easily confusing (as opposed to users/admins seeing monitoring indices with beats in the name as those are a bit more behind the scenes).

graphaelli · 2018-07-31T19:44:27Z

x-pack/plugin/core/src/main/resources/monitoring-apm-server.json

+                  "type": "keyword"
+                },
+                "name": {
+                  "type": "keyword"


thoughts on marking this and other fields not relevant to apm-server as enabled: false ?

Or we could even just remove such fields from the mapping entirely for now, and add them as/when we need them. WDYT?

Would that require removing them from the libbeat reporting for apm-server?

No, I don't think so because the mapping uses dynamic: false, so any un-mapped fields will be simply ignored and not indexed (as opposed to if the mapping were to use dynamic: strict, in which case un-mapped fields would cause errors).

The belief is right. Anything unmapped just tags along in the _source but is not used by ES.

So @graphaelli, which fields specifically is it okay to remove from the mapping? Let me know and I'll take them out of this PR!

all of metrics.libbeat.config.* can go as we don't support modules or reloading

@ycombinator Perhaps we can also make it possible to not show up in the events if not there.

@ruflin Can you clarify? Do you mean, we can make it so that Beats (APM server) doesn't ship these fields to Monitoring?

@ycombinator Yes, but not only applies to apm-server as other beats also don't have these fields. But it's not important at the moment I would say.

ruflin · 2018-08-02T09:32:34Z

I think we will also have to add a user for apm-server to Elasticsearch.

ycombinator · 2018-08-02T12:23:10Z

I think we will also have to add a user for apm-server to Elasticsearch.

++ will add built-in role and user, analogous to beats_system.

elasticmachine · 2018-08-03T00:52:06Z

Pinging @elastic/es-security

ruflin · 2018-08-06T06:30:38Z

docs/reference/commands/setup-passwords.asciidoc

@@ -4,7 +4,7 @@
 == elasticsearch-setup-passwords

 The `elasticsearch-setup-passwords` command sets the passwords for the built-in
-`elastic`, `kibana`, `logstash_system`, and `beats_system` users.
+`elastic`, `kibana`, `logstash_system`, `beats_system`, and `apm_server_system` users.


Is it possible to call the user apm-server_system? I mainly worry that users might mistype it.

Wouldn't mixing the dash and the underscore make it more easy for users to mistype the username?

I think it could be argued both ways and probably both options are going to cause trouble. Go with the option you feel more comfortable with.

@graphaelli Why don't you break the tie here, since this username is intended for APM customers after all 😄

apm_server_system looks great, thanks

ruflin · 2018-08-09T11:51:06Z

Looks like the failing test is related:

16:08:09 FAILURE 10.5s J3 | LocalExporterIntegTests.testExport <<< FAILURES!
16:08:09    > Throwable #1: java.lang.AssertionError: expected:<[.monitoring-logstash, .monitoring-es, .monitoring-alerts, .monitoring-beats, .monitoring-kibana]> but was:<[.monitoring-logstash, .monitoring-es, .monitoring-apm-server, .monitoring-beats, .monitoring-alerts, .monitoring-kibana]>

ycombinator · 2018-08-09T12:55:05Z

Yeah, I can fix that test, but I'll have to "fix" it again if we decide not to go with a separate index. So I was hoping to get that resolved first.

ycombinator · 2018-08-13T17:45:07Z

@pickypg @ruflin @graphaelli I've updated this PR per #32515 (comment). Ready for your review.

Additionally, could someone from @elastic/es-security review as well, please?

Thanks!

ruflin · 2018-08-14T06:46:07Z

...ty/src/main/java/org/elasticsearch/xpack/security/authc/esnative/tool/SetupPasswordTool.java

@@ -63,7 +64,7 @@
 public class SetupPasswordTool extends LoggingAwareMultiCommand {

    private static final char[] CHARS = ("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789").toCharArray();
-    public static final List<String> USERS = asList(ElasticUser.NAME, KibanaUser.NAME, LogstashSystemUser.NAME, BeatsSystemUser.NAME);
+    public static final List<String> USERS = asList(ElasticUser.NAME, KibanaUser.NAME, LogstashSystemUser.NAME, BeatsSystemUser.NAME, APMServerSystemUser.NAME);


This line probably needs shortening. See CI error:

21:07:59 [ant:checkstyle] [ERROR] /var/lib/jenkins/workspace/elastic+elasticsearch+pull-request/x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/esnative/tool/SetupPasswordTool.java:67: Line is longer than 140 characters (found 160). [LineLength]

ruflin · 2018-08-15T06:59:38Z

Now CI found a new place to complain about line length:

03:42:59 [ant:checkstyle] [ERROR] /var/lib/jenkins/workspace/elastic+elasticsearch+pull-request/x-pack/plugin/security/src/test/java/org/elasticsearch/xpack/security/authc/esnative/NativeUsersStoreTests.java:104: Line is longer than 140 characters (found 147). [LineLength]

jaymode

LGTM. My only thought is around whether we need server in the username, but I'm fine with or without it.

ycombinator · 2018-08-17T13:33:02Z

My only thought is around whether we need server in the username

@graphaelli what's your opinion on this? I put server in the role/user names to try and be future-proof, in case we ever have the need for a role/user for the APM client in the future. But maybe that's not a valid use case and we could drop server from the role/user names in this PR?

ycombinator · 2018-08-22T21:47:34Z

@jaymode CI is green again, mind taking a (hopefully final) peek? Thanks!

Shaunak

jaymode

Still LGTM

pickypg

LGTM

* Adding new MonitoredSystem for APM server * Teaching Monitoring template utils about APM server monitoring indices * Documenting new monitoring index for APM server * Adding monitoring index template for APM server * Copy pasta typo * Removing metrics.libbeat.config section from mapping * Adding built-in user and role for APM server user * Actually define the role :) * Adding missing import * Removing index template and system ID for apm server * Shortening line lengths * Updating expected number of built-in users in integration test * Removing "system" from role and user names * Rearranging users to make tests pass

ycombinator · 2018-08-27T12:44:02Z

Backported to:

6.x / 6.5.0: 100c1a0

* This was broken by elastic#32515 since the 5.x versions were removed between PR creation and merge

* This was broken by #32515 since the 5.x versions were removed between PR creation and merge

* This was broken by elastic#32515 since the 5.x versions were removed between PR creation and merge

* master: Adjust BWC version on mapping version Token API supports the client_credentials grant (#33106) Build: forked compiler max memory matches jvmArgs (#33138) Introduce mapping version to index metadata (#33147) SQL: Enable aggregations to create a separate bucket for missing values (#32832) Fix grammar in contributing docs SECURITY: Fix Compile Error in ReservedRealmTests (#33166) APM server monitoring (#32515) Support only string `format` in date, root object & date range (#28117) [Rollup] Move toBuilders() methods out of rollup config objects (#32585) Fix forbiddenapis on java 11 (#33116) Apply publishing to genreate pom (#33094) Have circuit breaker succeed on unknown mem usage Do not lose default mapper on metadata updates (#33153) Fix a mappings update test (#33146) Reload Secure Settings REST specs & docs (#32990) Refactor CachingUsernamePassword realm (#32646)

* 6.x: Introduce mapping version to index metadata (#33147) Move non duplicated actions back into xpack core (#32952) HLRC: Create server agnostic request and response (#32912) Build: forked compiler max memory matches jvmArgs (#33138) * Added breaking change section for GROUP BY behavior: now it considers null or empty values as a separate group/bucket. Previously, they were ignored. * This is part of backporting of #32832 SQL: Enable aggregations to create a separate bucket for missing values (#32832) [TEST] version guard for reload rest-api-spec Fix grammar in contributing docs APM server monitoring (#32515) Support only string `format` in date, root object & date range (#28117) [Rollup] Move toBuilders() methods out of rollup config objects (#32585) Accept Gradle build scan agreement (#30645) Fix forbiddenapis on java 11 (#33116) Run forbidden api checks with runtimeJavaVersion (#32947) Apply publishing to genreate pom (#33094) Fix a mappings update test (#33146) Reload Secure Settings REST specs & docs (#32990) Refactor CachingUsernamePassword realm (#32646)

ycombinator requested review from graphaelli, ruflin and pickypg July 31, 2018 18:44

ycombinator added review v7.0.0 :Data Management/Monitoring v6.5.0 labels Jul 31, 2018

pickypg reviewed Jul 31, 2018

View reviewed changes

graphaelli reviewed Jul 31, 2018

View reviewed changes

ycombinator force-pushed the x-pack/monitoring/apm-server branch from d73a203 to 23241b4 Compare August 2, 2018 17:24

tvernum added the :Security/Security Security issues without another label label Aug 3, 2018

ycombinator force-pushed the x-pack/monitoring/apm-server branch from 14e403b to aa65190 Compare August 3, 2018 12:21

ruflin reviewed Aug 6, 2018

View reviewed changes

ycombinator force-pushed the x-pack/monitoring/apm-server branch from 3a38edc to 9985f11 Compare August 13, 2018 17:21

ruflin approved these changes Aug 14, 2018

View reviewed changes

ruflin reviewed Aug 14, 2018

View reviewed changes

ycombinator force-pushed the x-pack/monitoring/apm-server branch 2 times, most recently from 3ebd1f5 to 88875c1 Compare August 15, 2018 00:21

ycombinator force-pushed the x-pack/monitoring/apm-server branch from 88875c1 to 44f0bb7 Compare August 15, 2018 13:30

jaymode approved these changes Aug 17, 2018

View reviewed changes

ycombinator added 10 commits August 22, 2018 09:59

Copy pasta typo

e7cc101

Removing metrics.libbeat.config section from mapping

9003a90

Adding built-in user and role for APM server user

a0a4cea

Actually define the role :)

f0665dc

Adding missing import

84d2903

Removing index template and system ID for apm server

66f132c

Shortening line lengths

ab9817f

Updating expected number of built-in users in integration test

2d13928

Removing "system" from role and user names

4f23a6c

Rearranging users to make tests pass

3a5cef1

ycombinator force-pushed the x-pack/monitoring/apm-server branch from 0df5f98 to 3a5cef1 Compare August 22, 2018 17:09

jaymode approved these changes Aug 23, 2018

View reviewed changes

pickypg approved these changes Aug 23, 2018

View reviewed changes

ycombinator merged commit 1779d33 into elastic:master Aug 27, 2018

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Aug 27, 2018

X-PACK: Fix Compile Error in ReservedRealmTests

c636c6a

* This was broken by elastic#32515 since the 5.x versions were removed between PR creation and merge

original-brownbear mentioned this pull request Aug 27, 2018

SECURITY: Fix Compile Error in ReservedRealmTests #33166

Merged

original-brownbear added a commit that referenced this pull request Aug 27, 2018

SECURITY: Fix Compile Error in ReservedRealmTests (#33166)

f7a9186

* This was broken by #32515 since the 5.x versions were removed between PR creation and merge

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Aug 27, 2018

SECURITY: Fix Compile Error in ReservedRealmTests (elastic#33166)

85670ca

* This was broken by elastic#32515 since the 5.x versions were removed between PR creation and merge

This was referenced Sep 10, 2018

[DOCS] Adds missing built-in user information #33585

Merged

[DOCS] Adds apm_system user and role elastic/stack-docs#120

Merged

graphaelli mentioned this pull request Sep 11, 2018

Add apm-server monitoring user/configuration/docs elastic/apm-server#1378

Closed

4 tasks

colings86 added >feature and removed :Security/Security Security issues without another label labels Oct 25, 2018

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

APM server monitoring #32515

APM server monitoring #32515

Conversation

ycombinator commented Jul 31, 2018 • edited Loading

ycombinator commented Jul 31, 2018

elasticmachine commented Jul 31, 2018

pickypg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ycombinator Jul 31, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruflin Aug 6, 2018 • edited Loading

Choose a reason for hiding this comment

ruflin commented Aug 2, 2018

ycombinator commented Aug 2, 2018 • edited Loading

elasticmachine commented Aug 3, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruflin commented Aug 9, 2018

ycombinator commented Aug 9, 2018

ycombinator commented Aug 13, 2018

Choose a reason for hiding this comment

ruflin commented Aug 15, 2018

jaymode left a comment

Choose a reason for hiding this comment

ycombinator commented Aug 17, 2018 • edited Loading

ycombinator commented Aug 22, 2018

jaymode left a comment

Choose a reason for hiding this comment

pickypg left a comment

Choose a reason for hiding this comment

ycombinator commented Aug 27, 2018

ycombinator commented Jul 31, 2018 •

edited

Loading

ycombinator Jul 31, 2018 •

edited

Loading

ruflin Aug 6, 2018 •

edited

Loading

ycombinator commented Aug 2, 2018 •

edited

Loading

ycombinator commented Aug 17, 2018 •

edited

Loading