-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log monitoring bulk failures #14356
Log monitoring bulk failures #14356
Conversation
@@ -548,9 +522,9 @@ func bulkCollectPublishFails( | |||
return failed, stats | |||
} | |||
|
|||
func itemStatus(reader *jsonReader) (int, []byte, error) { | |||
func ItemStatus(reader *JSONReader) (int, []byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exported function ItemStatus should have comment or be unexported
return | ||
} | ||
|
||
for i, _ := range events { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should omit 2nd value from range; this loop is equivalent to for i := range ...
Pinging @elastic/stack-monitoring (Stack monitoring) |
73c8a72
to
34227f6
Compare
jenkins, test this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code is OK to me, but I think we should have some tests added to cover that behavior and especially if the remote system changes his behavior. I don't link how the 200 vs the 403 response code is handled in this scenario.
Looking at existing code, there is currently no unit tests for the ES/reporter and adding that to the existing python system tests might be complicated but still worth investigating.
Also for BulkReadToItems
we can surely add a test for it?
raw []byte | ||
} | ||
// BulkResult contains the result of a bulk API request. | ||
type BulkResult json.RawMessage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 nice change
f6dbefd
to
d95bd5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, we need to find a better way with system test, I think its a problem and we need to have a proposal for that. Maybe a way to use a specific docker-compose file for a set of test.
d95bd5f
to
36d60bb
Compare
Travis CI is green. Jenkins CI failures are unrelated. Merging. |
* Log monitoring bulk failures (#14356) * Log monitoring bulk failures * Renaming function * Simplifying type * Removing extraneous second value * Adding godoc comments * Adding CHANGELOG entry * Clarifying log messages * WIP: adding unit test stubs * Fleshing out unit tests * [DOCS] Deprecate central management (#14104) (#14594) * State minimum Go version (#14400) (#14598) * [DOCS] Fix description of rename processor (#14408) (#14600) * Log monitoring bulk failures (#14356) * Log monitoring bulk failures * Renaming function * Simplifying type * Removing extraneous second value * Adding godoc comments * Adding CHANGELOG entry * Clarifying log messages * WIP: adding unit test stubs * Fleshing out unit tests * Fixing up CHANGELOG
* Log monitoring bulk failures (elastic#14356) * Log monitoring bulk failures * Renaming function * Simplifying type * Removing extraneous second value * Adding godoc comments * Adding CHANGELOG entry * Clarifying log messages * WIP: adding unit test stubs * Fleshing out unit tests * [DOCS] Deprecate central management (elastic#14104) (elastic#14594) * State minimum Go version (elastic#14400) (elastic#14598) * [DOCS] Fix description of rename processor (elastic#14408) (elastic#14600) * Log monitoring bulk failures (elastic#14356) * Log monitoring bulk failures * Renaming function * Simplifying type * Removing extraneous second value * Adding godoc comments * Adding CHANGELOG entry * Clarifying log messages * WIP: adding unit test stubs * Fleshing out unit tests * Fixing up CHANGELOG
Resolves #14303.
As reported in #14303, when the Elasticsearch monitoring reporter in libbeat sends a bulk API request to Elasticsearch, and that request fails, the errors are currently swallowed. This is because the actual response code for the bulk API request is
200 OK
; the actual errors are embedded in the request's response body.This PR teaches the Elasticsearch monitoring reporter to parse the bulk API response and log any errors. For the parsing, the same code as the Elasticsearch output is reused.
Testing this PR
Start up Elasticsearch with security enabled. Make sure you know the password for the
elastic
superuser.Create a role that grants necessary privileges for managing and writing to
metricbeat-*
indices.Create a user with the above role.
Build Metricbeat with this PR.
Start Metricbeat with monitoring enabled and specifying the credentials of the above user for the
elasticsearch
output.Verify that
metricbeat-*
indices are being created and populated in Elasticsearch but no.monitoring-beats-*
indices are being created.Verify that there are warnings in the log like so: