-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce a more scalable index-gateway API. #5892
Conversation
Because of the storage refactoring (grafana#5833) we can now introduce a better index-gateway API that fits more our index interface. Such as : ```proto rpc GetChunkRef(GetChunkRefRequest) returns (GetChunkRefResponse) {}; rpc LabelNamesForMetricName(LabelNamesForMetricNameRequest) returns (LabelResponse) {}; rpc LabelValuesForMetricName(LabelValuesForMetricNameRequest) returns (LabelResponse) {}; ``` This will avoid sending thousands of index queries to the index-gateway but instead them just a single request. The index caching, parsing and filtering is now happening all on the index-gateway side. Loki queriers will first check if the new API exists before using it, this way update can be done transparently. However the check happens only on startup, so if you want to start using the new API you need to restart queriers after fully rolling out index-gateways. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent 🚀
pkg/storage/store.go
Outdated
hasRefsAPI, err := shipper.HasGetRefsAPI(s.cfg.BoltDBShipperConfig.IndexGatewayClientConfig) | ||
if err != nil { | ||
return nil, nil, nil, err | ||
} | ||
if hasRefsAPI { | ||
gw, err := shipper.NewGatewayClient(s.cfg.BoltDBShipperConfig.IndexGatewayClientConfig, indexClientReg) | ||
if err != nil { | ||
return nil, nil, nil, err | ||
} | ||
index = series.NewIndexGatewayClientStore(gw, seriesdIndex) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some doubts that a start-up check can really cover all the bases here. What happens if:
- The cluster users decides to roll-back the index gateways.
- There is a mix of index gateways run by the user.
I would think an approach handling the error on every request should be reasonable. It costs an extra hop, but that is not a huge cost given that the cluster should be in that state only for a very short period:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this is a great suggestion, I'll merge your commit if you don't mind and add some tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was playing around with a test like this maybe that can also be useful: c50f2df
(should have really committed in the first place)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on it.
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> Co-authored-by: Christian Simon <christian.simon@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I get this right, if the RPC methods are not implemented yet, we cascade to the index store? Nice!
Yes so we don't need an option or to rollout in order. The index-store uses the old index-gateway queries API. |
This kind of handling is a huge plus for rolling out via the operator. Anything we can go to make rollouts order-agnostic is a huge win for keeping the operator complexity as low as possible |
pkg/storage/store.go
Outdated
) | ||
if s.cfg.BoltDBShipperConfig.Mode == shipper.ModeReadOnly && s.cfg.BoltDBShipperConfig.IndexGatewayClientConfig.Address != "" { | ||
// inject the index-gateway client into the index store | ||
gw, err := shipper.NewGatewayClient(s.cfg.BoltDBShipperConfig.IndexGatewayClientConfig, indexClientReg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't idx
already be a gateway client instance created by NewIndexClient
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but a different one. This one wrap the index-store, the other one call the old API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, but that part is done in IndexGatewayClientStore
, right?
you can type cast idx
to *shipper.GatewayClient
and pass it to NewIndexGatewayClientStore
like below:
series.NewIndexGatewayClientStore(idx.(*shipper.GatewayClient), seriesdIndex)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might work but this is more risky.
In non ring mode I think we don't have to worry about it since we point to a service, right? With the ring mode, it is already taken care of here. Sorry for the trouble, it seems you need to rebase your PR since I merged #5358 |
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
We decided to go ahead with doing a call each time, since this is a temporary cluster state. |
it's ok, I was able to merge, can you verify what I did is fine ? |
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> Co-authored-by: Christian Simon <christian.simon@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few comments but LGTM!
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
This fixes a bug in the index gateway when querying values for a label. Since the gRPC handler for LabelValuesForMetricName in the index gateway allows empty matchers in the LabelValuesForMetricNameRequest, we need to check if the matchers string is an empty matcher (`{}`) before we parse the string. This bug was introduced with the index gateway api refactoring in #5892 Fixes: #5965 Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
This fixes a bug in the index gateway when querying values for a label. Since the gRPC handler for LabelValuesForMetricName in the index gateway allows empty matchers in the LabelValuesForMetricNameRequest, we need to check if the matchers string is an empty matcher (`{}`) before we parse the string. This bug was introduced with the index gateway api refactoring in #5892 Fixes: #5965 Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
This fixes a bug in the index gateway when querying values for a label. Since the gRPC handler for LabelValuesForMetricName in the index gateway allows empty matchers in the LabelValuesForMetricNameRequest, we need to check if the matchers string is an empty matcher (`{}`) before we parse the string. This bug was introduced with the index gateway api refactoring in #5892 Fixes: #5965 Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
This fixes a bug in the index gateway when querying values for a label. Since the gRPC handler for LabelValuesForMetricName in the index gateway allows empty matchers in the LabelValuesForMetricNameRequest, we need to check if the matchers string is an empty matcher (`{}`) before we parse the string. This bug was introduced with the index gateway api refactoring in #5892 Fixes: #5965 Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Because of the storage refactoring (#5833) we can now introduce a better
index-gateway API that fits more our index interface.
Such as :
This will avoid sending thousands of index queries to the index-gateway but instead them just a single request.
The index caching, parsing and filtering is now happening all on the index-gateway side.
Loki queriers will first check if the new API exists before using it, this way update can be done transparently.
However the check happens only on startup, so if you want to start using the new API you need to restart queriers after fully rolling out index-gateways.
CC @sandeepsukhani @simonswine @slim-bean
Note: Now that I think about it if one index gateway is rolled out the querier might think they all are. So I might want to test ALL IPs, instead of a random one, assuming I can do that.
Signed-off-by: Cyril Tovena cyril.tovena@gmail.com
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Checklist
CHANGELOG.md
about the changes.