Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rate() query performance regression in 2.0.0 #2827

Closed
scoof opened this issue Oct 27, 2020 · 3 comments · Fixed by #2875
Closed

rate() query performance regression in 2.0.0 #2827

scoof opened this issue Oct 27, 2020 · 3 comments · Fixed by #2875
Assignees
Milestone

Comments

@scoof
Copy link
Contributor

scoof commented Oct 27, 2020

Describe the bug
It looks like logqlv2 has introduced a performance regression when querying rates on a largish amount of streams. This query: rate({namespace_name=""}[5m]) (for a 1 hour period) takes 3-4 times longer with 2.0.0, than with grafana/loki:k34-8ea6c38 (provided by @slim-bean on Slack)

Expected behavior
Same or better performance than 1.6.1

Environment:

  • Infrastructure: Kubernetes
  • Deployment tool: helm

Screenshots, Promtail config, or terminal output
Stats from 2.0.0:

   "stats" : {
      "ingester" : {
         "compressedBytes" : 0,
         "decompressedBytes" : 0,
         "decompressedLines" : 0,
         "headChunkBytes" : 0,
         "headChunkLines" : 0,
         "totalBatches" : 0,
         "totalChunksMatched" : 0,
         "totalDuplicates" : 0,
         "totalLinesSent" : 0,
         "totalReached" : 0
      },
      "store" : {
         "chunksDownloadTime" : 1.745696272,
         "compressedBytes" : 17368345,
         "decompressedBytes" : 216576036,
         "decompressedLines" : 1019558,
         "headChunkBytes" : 0,
         "headChunkLines" : 0,
         "totalChunksDownloaded" : 2732,
         "totalChunksRef" : 2732,
         "totalDuplicates" : 229773
      },
      "summary" : {
         "bytesProcessedPerSecond" : 9750244,
         "execTime" : 22.212370672,
         "linesProcessedPerSecond" : 45900,
         "totalBytesProcessed" : 216576036,
         "totalLinesProcessed" : 1019558
      }
   }
}

Stats from k34-8ea6c38:

         "ingester" : {
            "compressedBytes" : 3796297,
            "decompressedBytes" : 34697528,
            "decompressedLines" : 160127,
            "headChunkBytes" : 26062200,
            "headChunkLines" : 110580,
            "totalBatches" : 0,
            "totalChunksMatched" : 413,
            "totalDuplicates" : 0,
            "totalLinesSent" : 0,
            "totalReached" : 5
         },
         "store" : {
            "chunksDownloadTime" : 0.074917783,
            "compressedBytes" : 7128290,
            "decompressedBytes" : 55934946,
            "decompressedLines" : 249630,
            "headChunkBytes" : 0,
            "headChunkLines" : 0,
            "totalChunksDownloaded" : 1878,
            "totalChunksRef" : 1878,
            "totalDuplicates" : 166809
         },
         "summary" : {
            "bytesProcessedPerSecond" : 36028231,
            "execTime" : 3.238978645,
            "linesProcessedPerSecond" : 160648,
            "totalBytesProcessed" : 116694674,
            "totalLinesProcessed" : 520337
         }
      }
@cyriltovena cyriltovena added this to the 2.1 milestone Oct 28, 2020
@cyriltovena cyriltovena self-assigned this Oct 28, 2020
@cyriltovena
Copy link
Contributor

Dropping my notes:

Now in LogQL v2 labels can different from one log line to another, this means we have to stringify labels for each lines whereas before we would only need to do that once per chunk.

We need to implement a cache, with shortcut if no modifications has been done using,

see

func (e *sampleBufferedIterator) Labels() string { return e.currLabels.String() }

@cyriltovena
Copy link
Contributor

cyriltovena commented Oct 28, 2020

Got a flame graph.

image

Going to work from this benchmark:

❯ go test -benchmem -run=^$ -mod=vendor  ./pkg/chunkenc  -bench=BenchmarkBufferedIteratorLabels  -memprofile memprofile.out -cpuprofile cpuprofile.out
goos: darwin
goarch: amd64
pkg: github.com/grafana/loki/pkg/chunkenc
BenchmarkBufferedIteratorLabels/{app="foo"}-16         	24785552	        42.1 ns/op	       6 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"-16         	25299765	        42.3 ns/op	       6 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_-16         	11777482	        85.6 ns/op	      32 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms-16         	24603439	        40.7 ns/op	       3 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms_and_component="tsdb"-16         	25011400	        40.3 ns/op	       3 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/rate({app="foo"}[1m])-16                                                        	34682602	        30.9 ns/op	       4 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}[10s]))-16                                    	32869472	        33.4 ns/op	       4 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt[10s]))-16                  	32186278	        32.2 ns/op	       4 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/sum_by_(caller)_(rate({app="foo"}_!=_"foo"_|_logfmt[10s]))-16                   	30648024	        33.4 ns/op	       4 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms[10s]))-16         	40132675	        28.3 ns/op	       1 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms_and_component="tsdb"[1m]))-16         	40285348	        28.9 ns/op	       1 B/op	       0 allocs/op
PASS
ok  	github.com/grafana/loki/pkg/chunkenc	61.066s

Working in this branch: https://github.com/cyriltovena/loki/tree/labels-logqlv2

@cyriltovena
Copy link
Contributor

cyriltovena commented Nov 3, 2020

current status:

go test -benchmem -run=^$ -mod=vendor  ./pkg/chunkenc  -bench=BenchmarkBufferedIteratorLabels
goos: darwin
goarch: amd64
pkg: github.com/grafana/loki/pkg/chunkenc
BenchmarkBufferedIteratorLabels/{app="foo"}-16         	33238780	        35.9 ns/op	       0 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"-16         	33595916	        35.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_-16         	27097609	        42.7 ns/op	       4 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms-16         	29798445	        39.6 ns/op	       1 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms_and_component="tsdb"-16         	30767776	        37.9 ns/op	       1 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/rate({app="foo"}[1m])-16                                                        	47617257	        24.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}[10s]))-16                                    	45781831	        24.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt[10s]))-16                  	41351338	        28.0 ns/op	       2 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/sum_by_(caller)_(rate({app="foo"}_!=_"foo"_|_logfmt[10s]))-16                   	40465006	        28.2 ns/op	       2 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms[10s]))-16         	44210521	        26.6 ns/op	       0 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms_and_component="tsdb"[1m]))-16         	43935969	        26.5 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	github.com/grafana/loki/pkg/chunkenc	35.506s

diff

benchcmp before.txt after.txt
benchmark                                                                                                                                old ns/op     new ns/op     delta
BenchmarkBufferedIteratorLabels/{app="foo"}-16                                                                                           42.1          35.9          -14.73%
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"-16                                                                                  42.3          35.3          -16.55%
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_-16                                                                        85.6          42.7          -50.12%
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms-16                                                       40.7          39.6          -2.70%
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms_and_component="tsdb"-16                                  40.3          37.9          -5.96%
BenchmarkBufferedIteratorLabels/rate({app="foo"}[1m])-16                                                                                 30.9          24.3          -21.36%
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}[10s]))-16                                                             33.4          24.3          -27.25%
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt[10s]))-16                                           32.2          28.0          -13.04%
BenchmarkBufferedIteratorLabels/sum_by_(caller)_(rate({app="foo"}_!=_"foo"_|_logfmt[10s]))-16                                            33.4          28.2          -15.57%
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms[10s]))-16                         28.3          26.6          -6.01%
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms_and_component="tsdb"[1m]))-16     28.9          26.5          -8.30%

benchmark                                                                                                                                old allocs     new allocs     delta
BenchmarkBufferedIteratorLabels/{app="foo"}-16                                                                                           0              0              +0.00%
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"-16                                                                                  0              0              +0.00%
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_-16                                                                        0              0              +0.00%
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms-16                                                       0              0              +0.00%
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms_and_component="tsdb"-16                                  0              0              +0.00%
BenchmarkBufferedIteratorLabels/rate({app="foo"}[1m])-16                                                                                 0              0              +0.00%
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}[10s]))-16                                                             0              0              +0.00%
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt[10s]))-16                                           0              0              +0.00%
BenchmarkBufferedIteratorLabels/sum_by_(caller)_(rate({app="foo"}_!=_"foo"_|_logfmt[10s]))-16                                            0              0              +0.00%
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms[10s]))-16                         0              0              +0.00%
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms_and_component="tsdb"[1m]))-16     0              0              +0.00%

benchmark                                                                                                                                old bytes     new bytes     delta
BenchmarkBufferedIteratorLabels/{app="foo"}-16                                                                                           6             0             -100.00%
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"-16                                                                                  6             0             -100.00%
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_-16                                                                        32            4             -87.50%
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms-16                                                       3             1             -66.67%
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms_and_component="tsdb"-16                                  3             1             -66.67%
BenchmarkBufferedIteratorLabels/rate({app="foo"}[1m])-16                                                                                 4             0             -100.00%
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}[10s]))-16                                                             4             0             -100.00%
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt[10s]))-16                                           4             2             -50.00%
BenchmarkBufferedIteratorLabels/sum_by_(caller)_(rate({app="foo"}_!=_"foo"_|_logfmt[10s]))-16                                            4             2             -50.00%
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms[10s]))-16                         1             0             -100.00%
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms_and_component="tsdb"[1m]))-16     1             0             -100.00%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants