Introduce a more scalable index-gateway API. #5892

cyriltovena · 2022-04-12T15:24:20Z

Because of the storage refactoring (#5833) we can now introduce a better
index-gateway API that fits more our index interface.

Such as :

    rpc GetChunkRef(GetChunkRefRequest) returns (GetChunkRefResponse) {};
    rpc LabelNamesForMetricName(LabelNamesForMetricNameRequest) returns (LabelResponse)  {};
    rpc LabelValuesForMetricName(LabelValuesForMetricNameRequest) returns (LabelResponse) {};

This will avoid sending thousands of index queries to the index-gateway but instead them just a single request.
The index caching, parsing and filtering is now happening all on the index-gateway side.

Loki queriers will first check if the new API exists before using it, this way update can be done transparently.
However the check happens only on startup, so if you want to start using the new API you need to restart queriers after fully rolling out index-gateways.

CC @sandeepsukhani @simonswine @slim-bean

Note: Now that I think about it if one index gateway is rolled out the querier might think they all are. So I might want to test ALL IPs, instead of a random one, assuming I can do that.

Signed-off-by: Cyril Tovena cyril.tovena@gmail.com

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Checklist

Documentation added
Tests updated
Add an entry in the CHANGELOG.md about the changes.

Because of the storage refactoring (grafana#5833) we can now introduce a better index-gateway API that fits more our index interface. Such as : ```proto rpc GetChunkRef(GetChunkRefRequest) returns (GetChunkRefResponse) {}; rpc LabelNamesForMetricName(LabelNamesForMetricNameRequest) returns (LabelResponse) {}; rpc LabelValuesForMetricName(LabelValuesForMetricNameRequest) returns (LabelResponse) {}; ``` This will avoid sending thousands of index queries to the index-gateway but instead them just a single request. The index caching, parsing and filtering is now happening all on the index-gateway side. Loki queriers will first check if the new API exists before using it, this way update can be done transparently. However the check happens only on startup, so if you want to start using the new API you need to restart queriers after fully rolling out index-gateways. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

periklis

Excellent 🚀

simonswine · 2022-04-12T16:51:36Z

pkg/storage/store.go

+		hasRefsAPI, err := shipper.HasGetRefsAPI(s.cfg.BoltDBShipperConfig.IndexGatewayClientConfig)
+		if err != nil {
+			return nil, nil, nil, err
+		}
+		if hasRefsAPI {
+			gw, err := shipper.NewGatewayClient(s.cfg.BoltDBShipperConfig.IndexGatewayClientConfig, indexClientReg)
+			if err != nil {
+				return nil, nil, nil, err
+			}
+			index = series.NewIndexGatewayClientStore(gw, seriesdIndex)
+		}


I have some doubts that a start-up check can really cover all the bases here. What happens if:

The cluster users decides to roll-back the index gateways.

There is a mix of index gateways run by the user.

I would think an approach handling the error on every request should be reasonable. It costs an extra hop, but that is not a huge cost given that the cluster should be in that state only for a very short period:

a0118e0

Thanks this is a great suggestion, I'll merge your commit if you don't mind and add some tests.

I was playing around with a test like this maybe that can also be useful: c50f2df

(should have really committed in the first place)

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> Co-authored-by: Christian Simon <christian.simon@grafana.com>

periklis

Do I get this right, if the RPC methods are not implemented yet, we cascade to the index store? Nice!

cyriltovena · 2022-04-13T07:23:42Z

Do I get this right, if the RPC methods are not implemented yet, we cascade to the index store? Nice!

Yes so we don't need an option or to rollout in order. The index-store uses the old index-gateway queries API.

periklis · 2022-04-13T07:25:58Z

Do I get this right, if the RPC methods are not implemented yet, we cascade to the index store? Nice!

Yes so we don't need an option or to rollout in order. The index-store uses the old index-gateway queries API.

This kind of handling is a huge plus for rolling out via the operator. Anything we can go to make rollouts order-agnostic is a huge win for keeping the operator complexity as low as possible

sandeepsukhani · 2022-04-13T08:23:08Z

pkg/storage/store.go

+	)
+	if s.cfg.BoltDBShipperConfig.Mode == shipper.ModeReadOnly && s.cfg.BoltDBShipperConfig.IndexGatewayClientConfig.Address != "" {
+		// inject the index-gateway client into the index store
+		gw, err := shipper.NewGatewayClient(s.cfg.BoltDBShipperConfig.IndexGatewayClientConfig, indexClientReg)


Wouldn't idx already be a gateway client instance created by NewIndexClient here?

Yes but a different one. This one wrap the index-store, the other one call the old API.

yeah, but that part is done in IndexGatewayClientStore, right?
you can type cast idx to *shipper.GatewayClient and pass it to NewIndexGatewayClientStore like below:
series.NewIndexGatewayClientStore(idx.(*shipper.GatewayClient), seriesdIndex)

it might work but this is more risky.

sandeepsukhani · 2022-04-13T08:28:03Z

Note: Now that I think about it if one index gateway is rolled out the querier might think they all are. So I might want to test ALL IPs, instead of a random one, assuming I can do that.

In non ring mode I think we don't have to worry about it since we point to a service, right? With the ring mode, it is already taken care of here.

Sorry for the trouble, it seems you need to rebase your PR since I merged #5358

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

cyriltovena · 2022-04-13T08:56:13Z

ow that I think about it if one index gateway is rolled out the querier might think they all are. So I might want to test ALL IPs, instead of a random one, assuming I

We decided to go ahead with doing a call each time, since this is a temporary cluster state.

cyriltovena · 2022-04-13T09:03:17Z

it's ok, I was able to merge, can you verify what I did is fine ?

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> Co-authored-by: Christian Simon <christian.simon@grafana.com>

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

simonswine

LGTM

pkg/storage/stores/series/series_index_gateway_store.go

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

pkg/storage/stores/shipper/indexgateway/indexgatewaypb/gateway.proto

pkg/storage/stores/shipper/indexgateway/gateway.go

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

DylanGuedes

a few comments but LGTM!

pkg/storage/stores/shipper/gateway_client.go

pkg/loki/modules.go

pkg/storage/store.go

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

This fixes a bug in the index gateway when querying values for a label. Since the gRPC handler for LabelValuesForMetricName in the index gateway allows empty matchers in the LabelValuesForMetricNameRequest, we need to check if the matchers string is an empty matcher (`{}`) before we parse the string. This bug was introduced with the index gateway api refactoring in #5892 Fixes: #5965 Signed-off-by: Christian Haudum <christian.haudum@gmail.com>

cyriltovena requested a review from a team as a code owner April 12, 2022 15:24

pull-request-size bot added the size/XXL label Apr 12, 2022

periklis reviewed Apr 12, 2022

View reviewed changes

simonswine reviewed Apr 12, 2022

View reviewed changes

Attempt to use the new API on every call.

fc1d128

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> Co-authored-by: Christian Simon <christian.simon@grafana.com>

periklis reviewed Apr 13, 2022

View reviewed changes

sandeepsukhani reviewed Apr 13, 2022

View reviewed changes

Merge remote-tracking branch 'upstream/main' into idxgwapi

c1c7a2b

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Add test for the GRPC fallback.

e6d39c6

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> Co-authored-by: Christian Simon <christian.simon@grafana.com>

sandeepsukhani approved these changes Apr 13, 2022

View reviewed changes

lint

eb9f478

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

simonswine approved these changes Apr 13, 2022

View reviewed changes

pkg/storage/stores/series/series_index_gateway_store.go Outdated Show resolved Hide resolved

review feedback

69d5622

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

DylanGuedes reviewed Apr 13, 2022

View reviewed changes

pkg/storage/stores/shipper/indexgateway/indexgatewaypb/gateway.proto Show resolved Hide resolved

DylanGuedes reviewed Apr 13, 2022

View reviewed changes

pkg/storage/stores/shipper/indexgateway/gateway.go Outdated Show resolved Hide resolved

DylanGuedes reviewed Apr 13, 2022

View reviewed changes

pkg/storage/stores/shipper/indexgateway/gateway.go Outdated Show resolved Hide resolved

Fixes the parameters

aef4503

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

DylanGuedes approved these changes Apr 13, 2022

View reviewed changes

pkg/storage/stores/shipper/gateway_client.go Outdated Show resolved Hide resolved

pkg/loki/modules.go Show resolved Hide resolved

sandeepsukhani reviewed Apr 13, 2022

View reviewed changes

pkg/storage/store.go Outdated Show resolved Hide resolved

Review feedback

fe41a25

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

cyriltovena merged commit f691e1b into grafana:main Apr 13, 2022

simonswine mentioned this pull request Apr 20, 2022

Index Gateway: Unable to handle GetChunkRefs/LabelValuesForMetricName with empty matchers #5965

Closed

chaudum mentioned this pull request Apr 21, 2022

Do not parse string of empty matchers #5980

Merged

4 tasks

kavirajk mentioned this pull request Apr 26, 2022

Disable calling new index-gateway client's API. #6025

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce a more scalable index-gateway API. #5892

Introduce a more scalable index-gateway API. #5892

cyriltovena commented Apr 12, 2022

periklis left a comment

simonswine Apr 12, 2022

cyriltovena Apr 12, 2022 •

edited

Loading

simonswine Apr 13, 2022

cyriltovena Apr 13, 2022

periklis left a comment

cyriltovena commented Apr 13, 2022

periklis commented Apr 13, 2022

sandeepsukhani Apr 13, 2022

cyriltovena Apr 13, 2022

sandeepsukhani Apr 13, 2022

cyriltovena Apr 13, 2022

sandeepsukhani commented Apr 13, 2022

cyriltovena commented Apr 13, 2022

cyriltovena commented Apr 13, 2022

simonswine left a comment

DylanGuedes left a comment

Introduce a more scalable index-gateway API. #5892

Introduce a more scalable index-gateway API. #5892

Conversation

cyriltovena commented Apr 12, 2022

periklis left a comment

Choose a reason for hiding this comment

simonswine Apr 12, 2022

Choose a reason for hiding this comment

cyriltovena Apr 12, 2022 • edited Loading

Choose a reason for hiding this comment

simonswine Apr 13, 2022

Choose a reason for hiding this comment

cyriltovena Apr 13, 2022

Choose a reason for hiding this comment

periklis left a comment

Choose a reason for hiding this comment

cyriltovena commented Apr 13, 2022

periklis commented Apr 13, 2022

sandeepsukhani Apr 13, 2022

Choose a reason for hiding this comment

cyriltovena Apr 13, 2022

Choose a reason for hiding this comment

sandeepsukhani Apr 13, 2022

Choose a reason for hiding this comment

cyriltovena Apr 13, 2022

Choose a reason for hiding this comment

sandeepsukhani commented Apr 13, 2022

cyriltovena commented Apr 13, 2022

cyriltovena commented Apr 13, 2022

simonswine left a comment

Choose a reason for hiding this comment

DylanGuedes left a comment

Choose a reason for hiding this comment

cyriltovena Apr 12, 2022 •

edited

Loading