-
Notifications
You must be signed in to change notification settings - Fork 16
There doesn't appear to be a way to create an API Gateway, or Gateway per cluster in a federated WAN #300
Comments
|
Thanks for getting back to me about this, it definitely helps explain what's going on. I did try MeshService, but it complained about the type (will check the error message, but I suspect I need to apply the following: https://github.com/hashicorp/consul-api-gateway/blob/main/config/crd/bases/api-gateway.consul.hashicorp.com_meshservices.yaml) I will investigate this in more detail tomorrow and let you know how I get on. I have two options one is the Single Consul Datacenter in Multiple Kubernetes Clusters (https://www.consul.io/docs/k8s/installation/deployment-configurations/single-dc-multi-k8s) and the other Federation Between Kubernetes Clusters (https://www.consul.io/docs/k8s/installation/multi-cluster/kubernetes). I have managed to get either option working with varying degrees of success for cross cluster and service mesh communication. Anyway, I will do more testing and update the thread tomorrow. |
Missing CRD would definitely explain not being able to use Definitely let us know how anything you manage to get working, and we'll consider proper support for federated services as a feature for our roadmap. |
@mikemorris , I was hoping to have a look at this, but realised that whatever configuration changes I have made, the cross cluster service mesh connection through the mesh gateway is now broken for Kafka. I was running kafka inside the service mesh and it was working. I've tried to roll back my changes but can't get it working again. It seems difficult for me to debug the issue. Is it work mentioning it here, open another ticket, or is there a better place to seek support for the mesh gateway? |
By the way, I checked the CRDs, I had installed, but for a previous version, perhaps that will fix some of the issues: As for the kafka problem, I've opened a separate issue as it's something very different: |
Looks like hashicorp/consul-k8s#1344 is tracking the issue currently preventing creation of a Gateway in secondary datacenters in a WAN-federated Consul deployment. |
Thanks @mikemorris, as you can see I've added my comment there as well. I've also fixed the issue I had with implementing kafka which now frees me up to do some more testing on the API gateway |
@mikemorris I've now been able to do some more testing, if I add in
|
More importantly though, is there a way of debugging an HttpRoute? I've currently only got one route that's working, the second route looks like everything is correct, but when I try to curl the endpoint, it returns a 404 error. I can't see anything in any of the logs to tell me where the error is. |
How you've been doing it so far is correct - first checking the route status field, then controller logs - if something isn't implemented correctly it may be helpful to dump the actual applied Envoy config, but this should be enough to debug most cases (and when it's not, we could likely benefit from contributions improving status messages, logs, or docs). A route is only "applied/in effect" when its
In addition to specifying This is documented in the Routes configuration docs, but should probably be mentioned in MeshService too. |
@codex70 @manobi I recorded a demo yesterday pulling together the 3 related PRs that will be included across the upcoming consul-k8s
CAPIGW.in.Secondary.Datacenter.720p.mp4 |
@nathancoleman I'll try this soon, thank you for sharing. |
@nathancoleman I've tried with consul-k8s (0.49.0) and
This is what it looks like in consul ui on "DC2" (AcessorIDs and datacenter name have being redacted): PS: my DC1 is still running consul-k8s v0.48.0 and many federated datacenters connected (31) each in a different version. |
Hi @manobi 👋 PS: any chance you could share your |
Hi @nathancoleman
|
@manobi if you apply that policy to the role analogous to the one I screenshotted, does everything work for you standing up a |
@nathancoleman From the UI it's not working, the browser crashes while loading the policy options. Maybe there is too much roles/policies and the same error happens during tokens bootstrap?
After that the |
Even after the manual attachment the
I've noticed a similar behaviour with Might be not related to api-gateway but some consul-k8s bug. |
@manobi that would make sense as the possible cause. That scale is the main difference between my temporary setups and your own. I'll be traveling most of this week but will see if I can find out anything once I'm back. |
The @nathancoleman could we maybe implement the same workaround as consul-ecs did in hashicorp/consul-ecs#79 until Consul adds "read your writes" support for an improved |
@mikemorris Is there a way to debug if the routing have actually being registered? Unlike With
HTTPRoute resource status seems to be ok but it's working: status:
parents:
- conditions:
- lastTransitionTime: '2022-10-04T23:04:20Z'
message: Route accepted.
observedGeneration: 1
reason: Accepted
status: 'True'
type: Accepted
- lastTransitionTime: '2022-10-04T23:04:20Z'
message: ResolvedRefs
observedGeneration: 1
reason: ResolvedRefs
status: 'True'
type: ResolvedRefs Upstreams in secondary DC (0):
|
Hi @manobi , were you able to get this working? Just to clarify, your |
Yes they are all running in the secondary datacenter, but I have not being able to get this working. Still seeing the following in
How can I force this "mesh:write" permission ? |
The gateway deployment is running in secondary datacenter, but there is no service-default or ingress-gateway registered. |
@manobi I'd expect it to be using It makes sense that the config entries aren't registered because the controller isn't able to create them in your setup. I'm not yet sure why this is, and I haven't been able to reproduce it myself. Just to be certain, to replicate your setup, I need consul-k8s v0.48.0 in my primary datacenter and consul-k8s v0.49.0 in my secondary datacenter. Is that accurate? Are you using consul-api-gateway v0.5-dev in both datacenters? |
@nathancoleman The only way I've managed to make it work was by attaching the My current setup is the following one: Primary datacenter:
Secondary datacenter:
|
@manobi here's a writeup of the whole process I went through to replicate the issue, but I'm still seeing everything work. I figure at least this will show what the Kubernetes https://gist.github.com/nathancoleman/076343780c3e0b4c03fb91f9d4f84616 |
@nathancoleman thank you, I'll try to reproduce your steps. The service router is not reading the filters with URLRewrite:
Becomes:
|
@manobi I'm asking around to see if anyone has encountered issues like the role bindings failing to apply at a scale of hundreds of roles/policies. My understanding is that the missing role bindings are the only issue you're seeing at this point (given the fix in #414) and that everything works as expected when you manually apply those bindings. Is that accurate? |
@nathancoleman Accurate. The ACL not found error is not restricted to API gateway, I can see it in other components that eventually reconcile. It might be the problem mentioned by @mikemorris, if I have to rolebind manually it's not a huge problem. I was more worried while I have no ideas what was going on. |
@nathancoleman Will the #414 fix be automatically published to Docker Hub or is it a manual action? Seems unfair to hold the v0.5 release if there are no other issues. |
@manobi you'll see it published to Docker Hub in a few minutes after I merge #416. The merge of #414 itself didn't publish because our tooling identified the CVE referenced in #416. Edit: You can now see an updated set of tags out on https://hub.docker.com/r/hashicorppreview/consul-api-gateway/tags |
Just to confirm that I've got the URLrewrite back to work with: Thank you @nathancoleman |
@codex70 @manobi I believe this particular issue can be closed now but wanted to run it by you first. Thoughts? The upcoming v0.5.0 release of Consul API Gateway will allow you to run the API gateway controller and create Gateways that route to services within the same datacenter whether that datacenter is a primary or secondary datacenter. |
We should close it. |
Just to confirm I have been able to test this and it is now working following on from the fix for: hashicorp/consul-k8s#1344 |
Overview of the Issue
I don't seem to be able to set up API gateway in such a way that I can either have access to all mesh services from a single API Gateway, or using and API Gateway per cluster.
Reproduction Steps
Logs
Error when trying to add mesh service from second cluster to API Gateway in first cluster
Error when trying to connect to a second API Gateway in the second datacenter cluster.
Expected behavior
There is a documented solution for setting up API Gateways across federated clusters.
Environment details
Additional Context
I suspect this is a simple case of me not seeing the specific documentation required to set this up correctly, but I'm having a lot of problems getting the API Gateway up and running across multiple clusters.
The text was updated successfully, but these errors were encountered: