Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose broker metrics with go-metrics #701

Merged
merged 5 commits into from
Aug 30, 2016

Conversation

slaunay
Copy link
Contributor

@slaunay slaunay commented Jul 12, 2016

  • add MetricRegistry configuration parameter that defaults to metrics.DefaultRegistry
  • provide the following metrics:
    • incoming-byte-rate meter (global and per registered broker)
    • request-rate meter (global and per registered broker)
    • request-size histogram (global and per registered broker)
    • outgoing-byte-rate meter (global and per registered broker)
    • response-rate meter (global and per registered broker)
    • response-size histogram (global and per registered broker)
  • add metrics flag to kafka-console-producer to output metrics
  • add bytes read to decodeRequest
  • count socket bytes read and written in MockBroker for unit testing
  • add unit tests and example
  • functional tests in functional_producer_test
  • documentation of metrics in main package

Related issue #683 and original PR #688.
I'll submit a PR for producer related metrics once this one is merged (should be shorter 😉).

slaunay added 4 commits July 12, 2016 16:46
- add MetricRegistry configuration parameter that defaults to
  metrics.DefaultRegistry
- provide the following metrics:
  - incoming-byte-rate meter (global and per registered broker)
  - request-rate meter (global and per registered broker)
  - request-size histogram (global and per registered broker)
  - outgoing-byte-rate meter (global and per registered broker)
  - response-rate meter (global and per registered broker)
  - response-size histogram (global and per registered broker)
- add metrics flag to kafka-console-producer to output metrics
- add bytes read to decodeRequest
- store request and response size in MockBroker history
- add unit tests and example
- functional tests in functional_producer_test
- documentation of metrics in main package
- close MockBroker in the for loop before validating metrics
- do not add expectation for ProduceRequest (NoResponse)
- use Logger instead of Logf in unit test to stay consistent
- add MmockBroker.WaitForExpectations for graceful shutdown
@eapache
Copy link
Contributor

eapache commented Jul 23, 2016

Sorry for the delays in responding to this - between vacations and some internal deadlines there hasn't been a lot of time for Sarama recently. Hopefully I'll be able to review this properly next week.

@slaunay
Copy link
Contributor Author

slaunay commented Aug 9, 2016

We have been running that branch to push logs for a couple of weeks without issues.

The metrics matches what we see coming out of the network interfaces:
grafana-dashboard

The only artifact we saw is that the min and max of the response size metrics tends to fluctuate most likely because of the way sampling is done and the periodic metadata refresh that is significantly bigger than regular produce responses:
grafana-response-size

Let me know if you need more information.

@@ -338,6 +371,8 @@ func (b *Broker) send(rb protocolBody, promiseResponse bool) (*responsePromise,
return nil, err
}

b.updateOutgoingCommunicationMetrics(len(buf))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably want to move this down below the SetWriteDeadline else you could miscount in the (admittedly exceptional) case where that fails

@eapache
Copy link
Contributor

eapache commented Aug 12, 2016

Thanks for all your work on this! Most of the code review points are pretty minor, it's just the mockbroker changes I'm not really convinced of. Apologies again for the delay.

break
}
b.lock.Lock()
requestResponse.ResponseSize = len(resHeader) + len(encodedRes)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By using a reference to a RequestResponse I am able to update the ResponseSize field without keeping the index of the entry that was added to the slice and then updating the entry later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't even have to keep the index though do you? It's always just the last element in the slice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be if the responses are served in serial order but I do not know if this is always the case (third parties using the MockBroker concurrently).
But the cleaner implementation through a handler should discard that detail I believe.

- update*CommunicationMetrics even when a Read/Write fails
- use MockBroker notifier for waiting for both expectations and metrics
- add documentation about disabling metrics gathering
- use METRICS_DISABLE env variable for disabling metrics in benchmarks
- use constants for exponentially decaying reservoir for histograms
- fix typo in main documentation
@slaunay
Copy link
Contributor Author

slaunay commented Aug 16, 2016

Here are the results from the benchmark when running them twice on the original master commit, against the PR with the no-op implementation and then with the effective implementation:

$ git checkout daa549b755c85cf347258a4c22e0baba1e8d2cc2
$ go test -v -bench=. -run=^$
PASS
BenchmarkProducerSmall-8                  200000          9338 ns/op
BenchmarkProducerMedium-8                  50000         38441 ns/op
BenchmarkProducerLarge-8                   10000        259523 ns/op
BenchmarkProducerSmallSinglePartition-8   200000          7422 ns/op
BenchmarkProducerMediumSnappy-8           200000         11344 ns/op
ok      github.com/Shopify/sarama   14.704s
$ go test -v -bench=. -run=^$
PASS
BenchmarkProducerSmall-8                  100000         10699 ns/op
BenchmarkProducerMedium-8                  30000         42789 ns/op
BenchmarkProducerLarge-8                    5000        259906 ns/op
BenchmarkProducerSmallSinglePartition-8   200000          5422 ns/op
BenchmarkProducerMediumSnappy-8           100000         11666 ns/op
ok      github.com/Shopify/sarama   10.226s

$ git checkout enhancement/broker-metrics 
$ export METRICS_DISABLE=1
$ go test -v -bench=. -run=^$
PASS
BenchmarkProducerSmall-8                  200000          9156 ns/op
BenchmarkProducerMedium-8                  30000         38275 ns/op
BenchmarkProducerLarge-8                    5000        240086 ns/op
BenchmarkProducerSmallSinglePartition-8   200000          6304 ns/op
BenchmarkProducerMediumSnappy-8           100000         10657 ns/op
ok      github.com/Shopify/sarama   9.356s
$ go test -v -bench=. -run=^$
PASS
BenchmarkProducerSmall-8                  100000         10308 ns/op
BenchmarkProducerMedium-8                  30000         38828 ns/op
BenchmarkProducerLarge-8                    5000        260416 ns/op
BenchmarkProducerSmallSinglePartition-8   300000          4705 ns/op
BenchmarkProducerMediumSnappy-8           200000          9466 ns/op
ok      github.com/Shopify/sarama   10.890s

$ unset METRICS_DISABLE
$ go test -v -bench=. -run=^$
PASS
BenchmarkProducerSmall-8                  200000          9181 ns/op
BenchmarkProducerMedium-8                  30000         40766 ns/op
BenchmarkProducerLarge-8                    5000        284455 ns/op
BenchmarkProducerSmallSinglePartition-8   200000          6352 ns/op
BenchmarkProducerMediumSnappy-8           100000         11113 ns/op
ok      github.com/Shopify/sarama   11.409s
$ go test -v -bench=. -run=^$
PASS
BenchmarkProducerSmall-8                  200000          8389 ns/op
BenchmarkProducerMedium-8                  50000         37401 ns/op
BenchmarkProducerLarge-8                    5000        265853 ns/op
BenchmarkProducerSmallSinglePartition-8   300000          5418 ns/op
BenchmarkProducerMediumSnappy-8           200000         10612 ns/op
ok      github.com/Shopify/sarama   12.507s

Results are very similar so it probably comes down to I/O and caching on my VM, that's why I believe the overall impact is minimal.

if err != nil {
b.updateIncomingCommunicationMetrics(bytesReadHeader)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could just do this immediately after calling ReadFull rather than in all three error branches?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but then I would have to call it twice for the regular case where we read the header and then read the body.
This would result in the same metric for byte rate but will double the rate for response rate metric and mess up with response size histogram metric too.

That being said I think using defer with a totalBytesRead variable might simplify the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot that this code is running inside a for loop so using defer would require wrapping the for code block inside an function, not sure if it makes it easier to read.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, OK I missed the fact it would double-count packets in the success case. This is fine as-is.

@eapache
Copy link
Contributor

eapache commented Aug 23, 2016

This is ready to go, thanks again for all your patience. I'll probably wait until I'm sure the consumer timer issues (#730, #733, #734) have settled out, release 1.10.1 with that fix, then merge this.

@slaunay
Copy link
Contributor Author

slaunay commented Aug 23, 2016

Glad to hear that, I think this feature is gonna lead to more optimizations and easier tuning in the future as we will be able to quantify those.

@eapache eapache merged commit e03d23b into IBM:master Aug 30, 2016
@slaunay slaunay deleted the enhancement/broker-metrics branch August 30, 2016 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants