Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a more scalable index-gateway API. #5892

Merged
merged 8 commits into from
Apr 13, 2022

Conversation

cyriltovena
Copy link
Contributor

Because of the storage refactoring (#5833) we can now introduce a better
index-gateway API that fits more our index interface.

Such as :

    rpc GetChunkRef(GetChunkRefRequest) returns (GetChunkRefResponse) {};
    rpc LabelNamesForMetricName(LabelNamesForMetricNameRequest) returns (LabelResponse)  {};
    rpc LabelValuesForMetricName(LabelValuesForMetricNameRequest) returns (LabelResponse) {};

This will avoid sending thousands of index queries to the index-gateway but instead them just a single request.
The index caching, parsing and filtering is now happening all on the index-gateway side.

Loki queriers will first check if the new API exists before using it, this way update can be done transparently.
However the check happens only on startup, so if you want to start using the new API you need to restart queriers after fully rolling out index-gateways.

CC @sandeepsukhani @simonswine @slim-bean

Note: Now that I think about it if one index gateway is rolled out the querier might think they all are. So I might want to test ALL IPs, instead of a random one, assuming I can do that.

Signed-off-by: Cyril Tovena cyril.tovena@gmail.com

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Checklist

  • Documentation added
  • Tests updated
  • Add an entry in the CHANGELOG.md about the changes.

Because of the storage refactoring (grafana#5833) we can now introduce a better
index-gateway API that fits more our index interface.

Such as :

```proto
    rpc GetChunkRef(GetChunkRefRequest) returns (GetChunkRefResponse) {};
    rpc LabelNamesForMetricName(LabelNamesForMetricNameRequest) returns (LabelResponse)  {};
    rpc LabelValuesForMetricName(LabelValuesForMetricNameRequest) returns (LabelResponse) {};
```

This will avoid sending thousands of index queries to the index-gateway but instead them just a single request.
The index caching, parsing and filtering is now happening all on the index-gateway side.

Loki queriers will first check if the new API exists before using it, this way update can be done transparently.
However the check happens only on startup, so if you want to start using the new API you need to restart queriers after fully rolling out index-gateways.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
@cyriltovena cyriltovena requested a review from a team as a code owner April 12, 2022 15:24
Copy link
Collaborator

@periklis periklis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent 🚀

Comment on lines 208 to 218
hasRefsAPI, err := shipper.HasGetRefsAPI(s.cfg.BoltDBShipperConfig.IndexGatewayClientConfig)
if err != nil {
return nil, nil, nil, err
}
if hasRefsAPI {
gw, err := shipper.NewGatewayClient(s.cfg.BoltDBShipperConfig.IndexGatewayClientConfig, indexClientReg)
if err != nil {
return nil, nil, nil, err
}
index = series.NewIndexGatewayClientStore(gw, seriesdIndex)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some doubts that a start-up check can really cover all the bases here. What happens if:

  • The cluster users decides to roll-back the index gateways.
  • There is a mix of index gateways run by the user.

I would think an approach handling the error on every request should be reasonable. It costs an extra hop, but that is not a huge cost given that the cluster should be in that state only for a very short period:

a0118e0

Copy link
Contributor Author

@cyriltovena cyriltovena Apr 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this is a great suggestion, I'll merge your commit if you don't mind and add some tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was playing around with a test like this maybe that can also be useful: c50f2df

(should have really committed in the first place)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on it.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Co-authored-by: Christian Simon <christian.simon@grafana.com>
Copy link
Collaborator

@periklis periklis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I get this right, if the RPC methods are not implemented yet, we cascade to the index store? Nice!

@cyriltovena
Copy link
Contributor Author

Do I get this right, if the RPC methods are not implemented yet, we cascade to the index store? Nice!

Yes so we don't need an option or to rollout in order. The index-store uses the old index-gateway queries API.

@periklis
Copy link
Collaborator

Do I get this right, if the RPC methods are not implemented yet, we cascade to the index store? Nice!

Yes so we don't need an option or to rollout in order. The index-store uses the old index-gateway queries API.

This kind of handling is a huge plus for rolling out via the operator. Anything we can go to make rollouts order-agnostic is a huge win for keeping the operator complexity as low as possible

)
if s.cfg.BoltDBShipperConfig.Mode == shipper.ModeReadOnly && s.cfg.BoltDBShipperConfig.IndexGatewayClientConfig.Address != "" {
// inject the index-gateway client into the index store
gw, err := shipper.NewGatewayClient(s.cfg.BoltDBShipperConfig.IndexGatewayClientConfig, indexClientReg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't idx already be a gateway client instance created by NewIndexClient here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but a different one. This one wrap the index-store, the other one call the old API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but that part is done in IndexGatewayClientStore, right?
you can type cast idx to *shipper.GatewayClient and pass it to NewIndexGatewayClientStore like below:
series.NewIndexGatewayClientStore(idx.(*shipper.GatewayClient), seriesdIndex)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might work but this is more risky.

@sandeepsukhani
Copy link
Contributor

Note: Now that I think about it if one index gateway is rolled out the querier might think they all are. So I might want to test ALL IPs, instead of a random one, assuming I can do that.

In non ring mode I think we don't have to worry about it since we point to a service, right? With the ring mode, it is already taken care of here.

Sorry for the trouble, it seems you need to rebase your PR since I merged #5358

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
@cyriltovena
Copy link
Contributor Author

ow that I think about it if one index gateway is rolled out the querier might think they all are. So I might want to test ALL IPs, instead of a random one, assuming I

We decided to go ahead with doing a call each time, since this is a temporary cluster state.

@cyriltovena
Copy link
Contributor Author

it's ok, I was able to merge, can you verify what I did is fine ?

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Co-authored-by: Christian Simon <christian.simon@grafana.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Copy link
Contributor

@simonswine simonswine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

pkg/storage/stores/series/series_index_gateway_store.go Outdated Show resolved Hide resolved
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Copy link
Contributor

@DylanGuedes DylanGuedes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few comments but LGTM!

pkg/storage/stores/shipper/gateway_client.go Outdated Show resolved Hide resolved
pkg/loki/modules.go Show resolved Hide resolved
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
@cyriltovena cyriltovena merged commit f691e1b into grafana:main Apr 13, 2022
chaudum added a commit that referenced this pull request Apr 21, 2022
This fixes a bug in the index gateway when querying values for a label.
Since the gRPC handler for LabelValuesForMetricName in the index gateway
allows empty matchers in the LabelValuesForMetricNameRequest, we need to
check if the matchers string is an empty matcher (`{}`) before we parse
the string.

This bug was introduced with the index gateway api refactoring in #5892

Fixes: #5965

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
kavirajk pushed a commit that referenced this pull request Apr 21, 2022
This fixes a bug in the index gateway when querying values for a label.
Since the gRPC handler for LabelValuesForMetricName in the index gateway
allows empty matchers in the LabelValuesForMetricNameRequest, we need to
check if the matchers string is an empty matcher (`{}`) before we parse
the string.

This bug was introduced with the index gateway api refactoring in #5892

Fixes: #5965

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
chaudum added a commit that referenced this pull request Apr 21, 2022
This fixes a bug in the index gateway when querying values for a label.
Since the gRPC handler for LabelValuesForMetricName in the index gateway
allows empty matchers in the LabelValuesForMetricNameRequest, we need to
check if the matchers string is an empty matcher (`{}`) before we parse
the string.

This bug was introduced with the index gateway api refactoring in #5892

Fixes: #5965

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
chaudum added a commit that referenced this pull request Apr 21, 2022
This fixes a bug in the index gateway when querying values for a label.
Since the gRPC handler for LabelValuesForMetricName in the index gateway
allows empty matchers in the LabelValuesForMetricNameRequest, we need to
check if the matchers string is an empty matcher (`{}`) before we parse
the string.

This bug was introduced with the index gateway api refactoring in #5892

Fixes: #5965

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants