Log monitoring bulk failures #14356

ycombinator · 2019-10-31T17:39:20Z

Resolves #14303.

As reported in #14303, when the Elasticsearch monitoring reporter in libbeat sends a bulk API request to Elasticsearch, and that request fails, the errors are currently swallowed. This is because the actual response code for the bulk API request is 200 OK; the actual errors are embedded in the request's response body.

This PR teaches the Elasticsearch monitoring reporter to parse the bulk API response and log any errors. For the parsing, the same code as the Elasticsearch output is reused.

Testing this PR

Start up Elasticsearch with security enabled. Make sure you know the password for the elastic superuser.

Create a role that grants necessary privileges for managing and writing to metricbeat-* indices.

curl -s -u elastic -H 'Content-Type: application/json' 'http://localhost:9200/_security/role/mb_writer' -d '{ "cluster": [ "monitor", "manage_ilm", "manage_index_templates" ], "indices": [ { "names": [ "metricbeat-*" ], "privileges": [ "all" ] } ] }'

Create a user with the above role.

curl -s -u elastic -H 'Content-Type: application/json' 'http://localhost:9200/_security/user/mb_writer' -d '{ "password": "mb_writer", "roles": [ "mb_writer" ] }'

Build Metricbeat with this PR.

cd $GOPATH/src/github.com/elastic/beats/metricbeat
mage build

Start Metricbeat with monitoring enabled and specifying the credentials of the above user for the elasticsearch output.

./metricbeat -e -E output.elasticsearch.username=mb_writer -E output.elasticsearch.password=mb_writer -E monitoring.enabled=true

Verify that metricbeat-* indices are being created and populated in Elasticsearch but no .monitoring-beats-* indices are being created.
```
curl -s -u elastic 'http://localhost:9200/_cat/indices'
```

Verify that there are warnings in the log like so:

2019-11-01T08:57:43.910-0700    WARN    elasticsearch/client.go:258     monitoring bulk item insert failed (i=0, status=403): {"type":"security_exception","reason":"action [indices:admin/create] is unauthorized for user [mb_writer]"}

houndci-bot · 2019-10-31T17:39:24Z

libbeat/outputs/elasticsearch/client.go

@@ -548,9 +522,9 @@ func bulkCollectPublishFails(
 	return failed, stats
 }

-func itemStatus(reader *jsonReader) (int, []byte, error) {
+func ItemStatus(reader *JSONReader) (int, []byte, error) {


exported function ItemStatus should have comment or be unexported

libbeat/outputs/elasticsearch/json_read.go

houndci-bot · 2019-10-31T17:39:25Z

libbeat/monitoring/report/elasticsearch/client.go

+		return
+	}
+
+	for i, _ := range events {


should omit 2nd value from range; this loop is equivalent to for i := range ...

libbeat/outputs/elasticsearch/client.go

elasticmachine · 2019-11-01T16:05:26Z

Pinging @elastic/stack-monitoring (Stack monitoring)

ycombinator · 2019-11-12T03:04:39Z

jenkins, test this

ph

Code is OK to me, but I think we should have some tests added to cover that behavior and especially if the remote system changes his behavior. I don't link how the 200 vs the 403 response code is handled in this scenario.

Looking at existing code, there is currently no unit tests for the ES/reporter and adding that to the existing python system tests might be complicated but still worth investigating.

Also for BulkReadToItems we can surely add a test for it?

ph · 2019-11-12T18:09:28Z

libbeat/outputs/elasticsearch/bulkapi.go

-	raw []byte
-}
+// BulkResult contains the result of a bulk API request.
+type BulkResult json.RawMessage


+1 nice change

libbeat/outputs/elasticsearch/client.go

ph

LGTM, we need to find a better way with system test, I think its a problem and we need to have a proposal for that. Maybe a way to use a specific docker-compose file for a set of test.

ycombinator · 2019-11-14T21:04:49Z

Travis CI is green. Jenkins CI failures are unrelated. Merging.

* Log monitoring bulk failures (#14356) * Log monitoring bulk failures * Renaming function * Simplifying type * Removing extraneous second value * Adding godoc comments * Adding CHANGELOG entry * Clarifying log messages * WIP: adding unit test stubs * Fleshing out unit tests * [DOCS] Deprecate central management (#14104) (#14594) * State minimum Go version (#14400) (#14598) * [DOCS] Fix description of rename processor (#14408) (#14600) * Log monitoring bulk failures (#14356) * Log monitoring bulk failures * Renaming function * Simplifying type * Removing extraneous second value * Adding godoc comments * Adding CHANGELOG entry * Clarifying log messages * WIP: adding unit test stubs * Fleshing out unit tests * Fixing up CHANGELOG

* Log monitoring bulk failures * Renaming function * Simplifying type * Removing extraneous second value * Adding godoc comments * Adding CHANGELOG entry * Clarifying log messages * WIP: adding unit test stubs * Fleshing out unit tests

* Log monitoring bulk failures (elastic#14356) * Log monitoring bulk failures * Renaming function * Simplifying type * Removing extraneous second value * Adding godoc comments * Adding CHANGELOG entry * Clarifying log messages * WIP: adding unit test stubs * Fleshing out unit tests * [DOCS] Deprecate central management (elastic#14104) (elastic#14594) * State minimum Go version (elastic#14400) (elastic#14598) * [DOCS] Fix description of rename processor (elastic#14408) (elastic#14600) * Log monitoring bulk failures (elastic#14356) * Log monitoring bulk failures * Renaming function * Simplifying type * Removing extraneous second value * Adding godoc comments * Adding CHANGELOG entry * Clarifying log messages * WIP: adding unit test stubs * Fleshing out unit tests * Fixing up CHANGELOG

houndci-bot reviewed Oct 31, 2019

View reviewed changes

libbeat/outputs/elasticsearch/client.go Show resolved Hide resolved

libbeat/outputs/elasticsearch/client.go Show resolved Hide resolved

ycombinator mentioned this pull request Oct 31, 2019

Handle bulk request results in monitoring #14354

Closed

ycombinator marked this pull request as ready for review November 1, 2019 16:04

ycombinator requested review from ph and cwurm November 1, 2019 16:05

ycombinator added bug libbeat Feature:Stack Monitoring v7.5.1 v7.6.0 v8.0.0 labels Nov 1, 2019

ycombinator force-pushed the lb-mon-log-bulk-failures branch 3 times, most recently from 73c8a72 to 34227f6 Compare November 12, 2019 01:34

ph suggested changes Nov 12, 2019

View reviewed changes

ycombinator force-pushed the lb-mon-log-bulk-failures branch from f6dbefd to d95bd5f Compare November 12, 2019 23:33

ph approved these changes Nov 13, 2019

View reviewed changes

ycombinator added 9 commits November 13, 2019 07:53

Log monitoring bulk failures

1006796

Renaming function

93e7e9f

Simplifying type

102eb9f

Removing extraneous second value

8de27d5

Adding godoc comments

febfb9a

Adding CHANGELOG entry

6e59a0d

Clarifying log messages

a886064

WIP: adding unit test stubs

732c075

Fleshing out unit tests

36d60bb

ycombinator force-pushed the lb-mon-log-bulk-failures branch from d95bd5f to 36d60bb Compare November 13, 2019 16:16

ycombinator merged commit a9aff6f into elastic:master Nov 14, 2019

This was referenced Nov 14, 2019

[7.x] Log monitoring bulk failures (#14356) #14526

Merged

[7.5] Log monitoring bulk failures (#14356) #14527

Merged

ycombinator deleted the lb-mon-log-bulk-failures branch December 25, 2019 11:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log monitoring bulk failures #14356

Log monitoring bulk failures #14356

ycombinator commented Oct 31, 2019 •

edited

Loading

houndci-bot Oct 31, 2019

houndci-bot Oct 31, 2019

elasticmachine commented Nov 1, 2019

ycombinator commented Nov 12, 2019

ph left a comment

ph Nov 12, 2019

ph left a comment

ycombinator commented Nov 14, 2019

Log monitoring bulk failures #14356

Log monitoring bulk failures #14356

Conversation

ycombinator commented Oct 31, 2019 • edited Loading

Testing this PR

houndci-bot Oct 31, 2019

Choose a reason for hiding this comment

houndci-bot Oct 31, 2019

Choose a reason for hiding this comment

elasticmachine commented Nov 1, 2019

ycombinator commented Nov 12, 2019

ph left a comment

Choose a reason for hiding this comment

ph Nov 12, 2019

Choose a reason for hiding this comment

ph left a comment

Choose a reason for hiding this comment

ycombinator commented Nov 14, 2019

ycombinator commented Oct 31, 2019 •

edited

Loading