-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make reports smaller #1201
Comments
For instance, I suspect the way we encode times in json might be a little verbose, although (a) compression should be very effective on that and (b) we use msgpack. |
Related: #985 |
This is getting urgent. After seeing the pressure put on gzip on #1454 #1457 I measured the reports on the service and the are all around ~10MB. Note that this is mspack and not json.
|
Looking at a report (from dev admin scope, so it was running without proc probing), most of it is taken up by timestamps! iirc @tomwilkie mentioned this before. e.g.
which would shrink to
without timestamps. Surely we don't need timestamps on nearly every individual json object - one time stamp per report should be fine, with the possible exception of some metrics (though these seem to carry separate "first" and "last" timestamps anyway). Related: some timestamps are recorded with second precision, some with nanosecond precision. |
Looks like the source of all these timestamps is a) For
|
That would make merging non-commutative when there are any entries where the value can change over time. Maintaining commutativity in such cases is of course the main reason of having any timestamps at all in the first place. Now, in reality the majority of values are constant - they cannot change over time. So perhaps we should have an additional data structure, |
The reality is we can't really change this data structure anymore, so to On Tue, Jul 19, 2016 at 7:41 AM, Matthias Radestock <
|
Yep, that's what I suggested :) |
Indeed, I was agreeing. On Tue, Jul 19, 2016 at 11:50 AM, Matthias Radestock <
|
Another thing we could to is to make sure that empty fields are not serialized/deserialized. Here's an example of a serialized endpoint node:
counters, sets, counters and controls are not used and yet occupy space in the serialized report. For ugorji's msgpack serializer, we would need to check whether |
Also, the EDIT: Done in #2581 |
Not immediately possible without using pointers instead of structs for each field :S See http://stackoverflow.com/questions/18088294/how-to-not-marshal-an-empty-struct-into-json-with-go The codec library we are using has the same problem, see https://godoc.org/github.com/ugorji/go/codec#Encoder.Encode
|
I've created a feature request upstream: ugorji/go#163 |
@rade I've just realized that those are probably the Docker stat metrics which I believe only have second-precision (the serializer omits the zeros for the nanoseconds). |
I took a report from the service, and found it to be 6M uncompressed (500K compressed). This was slightly bigger than I was expecting - although lets assume this is made of 15*3=45 different reports, which still puts compressed probe -> app reports at about 10K.
I was interested in seeing which topology was using the most space - assuming it to be endpoints. It was not:
So, perhaps an easy win would be to not report on every process; for instance, we only show ones which are doing IO, so we could easily filter out processes in the probe who's PIDs don't appear in the endpoint topology. How this would affect things like the details panel is a different problem.
I also suspect the reason the process topology is so big is because we have metrics on it (and we don't have metrics on endpoints). Perhaps their representation could be improved?
The text was updated successfully, but these errors were encountered: