output/cloudv2: Compact histogram #3169

codebien · 2023-07-05T15:42:18Z

What?

Implementation of a more compact Protobuf representation for the Histogram type. This new version only stores and pushes the histogram's significant buckets (non-zero).

Why?

The current implementation only trims the zero at the limit of the histogram, doing a massive waste of memory if the histogram has a sparse representation. The new version instead allocates only for non-zero buckets.

Related PR(s)/Issue(s)

Updates #3117

codecov-commenter · 2023-07-06T13:48:56Z

Codecov Report

Merging #3169 (2a23ec4) into master (0fb962b) will decrease coverage by 0.20%.
The diff coverage is 70.96%.

❗ Current head 2a23ec4 differs from pull request most recent head 1ec8c97. Consider uploading reports for the commit 1ec8c97 to get more accurate results

@@            Coverage Diff             @@
##           master    #3169      +/-   ##
==========================================
- Coverage   72.83%   72.63%   -0.20%     
==========================================
  Files         255      253       -2     
  Lines       19611    19630      +19     
==========================================
- Hits        14283    14259      -24     
- Misses       4432     4469      +37     
- Partials      896      902       +6

Flag	Coverage Δ
ubuntu	`72.63% <70.96%> (-0.14%)`	⬇️
windows	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
output/cloud/expv2/pbcloud/metric.pb.go	`27.23% <27.02%> (-0.26%)`	⬇️
output/cloud/expv2/hdr.go	`100.00% <100.00%> (+2.85%)`	⬆️
output/cloud/expv2/output.go	`66.45% <100.00%> (ø)`
output/cloud/expv2/sink.go	`92.30% <100.00%> (ø)`

... and 11 files with indirect coverage changes

The histogram now uses a more compact solution for storing the distribution. It tracks only the significant buckets.

codebien · 2023-07-06T14:23:56Z

output/cloud/expv2/hdr.go

+	// It does not include counters for the untrackable values,
+	// because they contain exception cases and require to be tracked in a dedicated way.
+	Buckets map[uint32]uint32
+
+	// Indexes keeps an ordered slice of unique-seen buckets' indexes.
+	// It allows to iterate the buckets in order. It uses an ascendent order.
+	Indexes []uint32


We may consider using a Btree as an optimization. But I would like to achieve stability in the protocol before.

I wonder if we can't just order the buckets indexes at the end 🤷. There is no need to constantly keep track of it -if we only want it to be in order 🤷

Yeah, it simplifies the code a lot, good suggestion.

output/cloud/expv2/hdr.go

olegbespalov · 2023-07-07T06:58:39Z

output/cloud/expv2/hdr.go

+		spans    []*pbcloud.BucketSpan
+	)
+
+	// allocate only if at least one item is available


A dummy question, is it possible to have a histogram without values (and indexes)?

As we only send observed metrics in the aggregation time period - it should not be possible. The first time we see a metric we create the histogram structure and add the first sample.

Without values not, but it is possible to have Indexes and Buckets empty when there are only untrackable values that are tracked into the special extreme buckets.

mstoykov · 2023-07-07T08:24:09Z

output/cloud/expv2/hdr.go

+	if h.Buckets == nil {
+		h.Buckets = make(map[uint32]uint32)


I think this should be done just when we make the histogram struct.

I move it to a constructor

mstoykov · 2023-07-07T08:37:41Z

output/cloud/expv2/hdr.go

-	}
+		// if the current and the previous indexes are not consecutive
+		// consider as closed the current on-going span and start a new one.
+		if diff := h.Indexes[i] - h.Indexes[i-1]; diff > 1 {


For more optimal space use this likely should be 3+ as adding 1 span is at least 2 counters (offset and length) .

So we can add at least 2 zeros before we became even. "at least" here is as I don't know if there is some more overhead for this in the protobuf protocol

edit: This can be done after the original merge as an optimization

mstoykov

LGTM! But we can probably fix some of the small fixes before merging.

The optimizations can likely wait a while and maybe benchmarks ;)

output/cloud/expv2/sink_test.go

olegbespalov

Great job! 🚀

codebien added the cloud label Jul 5, 2023

codebien added this to the v0.46.0 milestone Jul 5, 2023

codebien self-assigned this Jul 5, 2023

codebien mentioned this pull request Jul 5, 2023

Cloud output v2 #3117

Closed

codebien added 2 commits July 6, 2023 15:30

cloudv2/pbcloud: Compact histogram

ca33225

hdr_test: Test more edge cases

6f7548d

codebien force-pushed the cloduv2-compact-histogram branch from 3019add to 8657eeb Compare July 6, 2023 13:38

codebien force-pushed the cloduv2-compact-histogram branch from 8657eeb to 5ba1008 Compare July 6, 2023 13:58

codebien added 2 commits July 6, 2023 16:02

cloudv2: Use a compact version for histogram

0f676c5

The histogram now uses a more compact solution for storing the distribution. It tracks only the significant buckets.

cloudv2/integration: Test updated accordingly

b7cb2b3

codebien force-pushed the cloduv2-compact-histogram branch from 5ba1008 to b7cb2b3 Compare July 6, 2023 14:02

codebien marked this pull request as ready for review July 6, 2023 14:22

github-actions bot requested review from mstoykov and oleiade July 6, 2023 14:22

codebien requested review from olegbespalov and removed request for oleiade July 6, 2023 14:22

codebien commented Jul 6, 2023

View reviewed changes

olegbespalov reviewed Jul 7, 2023

View reviewed changes

mstoykov reviewed Jul 7, 2023

View reviewed changes

mstoykov previously approved these changes Jul 7, 2023

View reviewed changes

Address request changes

e7bee24

codebien dismissed mstoykov’s stale review via e7bee24 July 7, 2023 10:49

Test polishing

976a8ff

codebien force-pushed the cloduv2-compact-histogram branch from 3a2f799 to 976a8ff Compare July 7, 2023 10:55

codebien requested a review from olegbespalov July 7, 2023 11:38

olegbespalov previously approved these changes Jul 7, 2023

View reviewed changes

output/cloud/expv2/sink_test.go Outdated Show resolved Hide resolved

Use the constructor

1ec8c97

codebien dismissed olegbespalov’s stale review via 1ec8c97 July 7, 2023 12:16

codebien requested a review from olegbespalov July 7, 2023 13:03

olegbespalov previously approved these changes Jul 7, 2023

View reviewed changes

mstoykov previously approved these changes Jul 7, 2023

View reviewed changes

Sort the buckets during the Proto generation

4f49f36

codebien dismissed stale reviews from mstoykov and olegbespalov via 4f49f36 July 7, 2023 14:17

codebien requested a review from mstoykov July 7, 2023 14:51

mstoykov approved these changes Jul 7, 2023

View reviewed changes

codebien requested a review from olegbespalov July 7, 2023 15:03

olegbespalov approved these changes Jul 10, 2023

View reviewed changes

codebien merged commit 3906dcf into master Jul 10, 2023

codebien deleted the cloduv2-compact-histogram branch July 10, 2023 08:58

codebien mentioned this pull request Jul 10, 2023

cloudv2/hdr: Fix off-by-one error for spans #3182

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

output/cloudv2: Compact histogram #3169

output/cloudv2: Compact histogram #3169

codebien commented Jul 5, 2023

codecov-commenter commented Jul 6, 2023 •

edited

Loading

codebien Jul 6, 2023

mstoykov Jul 7, 2023

codebien Jul 7, 2023

olegbespalov Jul 7, 2023

mstoykov Jul 7, 2023

codebien Jul 7, 2023

mstoykov Jul 7, 2023

codebien Jul 7, 2023

mstoykov Jul 7, 2023 •

edited

Loading

mstoykov left a comment

olegbespalov left a comment

output/cloudv2: Compact histogram #3169

output/cloudv2: Compact histogram #3169

Conversation

codebien commented Jul 5, 2023

What?

Why?

Related PR(s)/Issue(s)

codecov-commenter commented Jul 6, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mstoykov Jul 7, 2023 • edited Loading

Choose a reason for hiding this comment

mstoykov left a comment

Choose a reason for hiding this comment

olegbespalov left a comment

Choose a reason for hiding this comment

codecov-commenter commented Jul 6, 2023 •

edited

Loading

mstoykov Jul 7, 2023 •

edited

Loading