Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Consul Connect #7407

Merged
merged 53 commits into from
Jul 15, 2021
Merged

Conversation

Gufran
Copy link

@Gufran Gufran commented Oct 9, 2020

What does this PR do?

The change set introduces support for Consul Connect enabled services.

Motivation

There is no edge proxy available that can route traffic to a connect enabled service. Consul Connect, despite being a powerful and easy to use service mesh, is useless to a lot of people who are mainly looking to route traffic from internet to private services. A service running inside Connect service mesh can only receive traffic via its sidecar, and sidecar will only communicate with a network peer using mutual TLS. The solution is easy, but haven't been implemented in any form.

Traefik already supports Consul Catalog, it is only a matter of utilizing the certificates for upstream connection wherever applicable and it becomes the perfect edge proxy for connect mesh.

More

  • Added/updated tests
  • Added/updated documentation

Additional Notes

This PR is in progress, I need some help to figure out how to set the TLS configuration on a connection without specifying it in service tags.

Related: #3544

Continues consul connect integration from #6373

Co-authored-by: Florian Apolloner apollo13@users.noreply.github.com

@Gufran
Copy link
Author

Gufran commented Oct 9, 2020

@apollo13 Picked up the work from #6373 in this PR.

@Gufran Gufran marked this pull request as draft October 10, 2020 18:44
@jbdoumenjou jbdoumenjou added this to To review in v2 via automation Oct 12, 2020
@apollo13
Copy link
Contributor

apollo13 commented Nov 8, 2020

What is the current status of the PR? Are you still hitting some road blocks?

@Gufran
Copy link
Author

Gufran commented Nov 12, 2020

@apollo13 yes, there is some problem still related to mTLS. I can't find anything obviously wrong but mTLS doesn't work.
I could be related to how Traefik presents the certificates to upstream service but I'm not sure about that.

If any maintainer is reading this, could you please help me find the code where mTLS handshake and verification happens?
Is there any mechanism to override the Dial function so that I can use that to control the way certificate information is presented and peer is verified? Or if I can override *tls.Config that'd work as well.

@apollo13
Copy link
Contributor

apollo13 commented Nov 12, 2020

Ah that's to bad. @ldez do you think that anyone from the traefik team could provide some pointers here? Having traefik talk to the consul service mesh would be a massive win security wise.

@SantoDE
Copy link
Collaborator

SantoDE commented Jan 11, 2021

Hey @Gufran,

we'd like to help you get started on that.

Could you maybe join our office hours tommorow? That would be quicker: https://info.traefik.io/traefik-developer-office-hours

@Scandiravian
Copy link

Scandiravian commented Jan 18, 2021

Hey @Gufran and @SantoDE

I think I've fixed the issue and gotten TLS working for the consul connect services (confirmed with WireShark).

I'll be happy to finish up the remaining items in the checklist (test and docs), but I'm not sure what the proper etiquette for continuing someone else's work is.

Is it acceptable if I fork this PR, make the required changes and create a new PR from my fork?

@ldez
Copy link
Contributor

ldez commented Jan 18, 2021

We recommend creating a PR from your fork to the fork of @Gufran. (please don't open a PR on the main repo)

@apollo13
Copy link
Contributor

Hi @SantoDE , we got the PR into a working state and would need some specific feedback. I'll try to describe the issue here and if that doesn't end well I'd like to join the next community session.

The problem that we currently face is that consul connect certificates do not have a valid DNS/IP SAN. One bad way we can get this working is setting InsecureSkipVerify -- which is bad because it will skip CA validation completely. A better way is to provide our own validation routine which I added to the ServersTransport: https://github.com/traefik/traefik/pull/7407/files#diff-8670b1903f65475222633e1a576171f031eb1252d8591884931de2995a44fd37 The roundtripper then uses this information to create the proper TLS config. Now we are wondering if such an approach is okay for you or what would be an better approach (I realize that I took a sledge-hammer approach just to get it working and have something to explain).

The tl;dr: We would need a supported way in traefik to customize the certificate validation process aside from the options that ServersTransport currently has to offer.

Looking forward to hear back from you.

@jbdoumenjou
Copy link
Member

We are glad to see that this PR is moving forward. :)

First and foremost, could you rebase the PR?

This topic is far away from being trivial, and we are actively discussing the best way to address it.
One can argue that we need a global mechanism to fine-tune the TLS certificate validation from the serversTranport configuration.
However, if we decide to expose it, we don't know what would be the more efficient/elegant way to do so.

We can also let this configuration not be exposed and apply it only for consul connect and wait for another need to expose it.
As a side note, we would prefer to use the VerifyPeerCertificate on TLS configuration instead of VerifyConnection, wdyt?

We are also wondering why the certificate validation does not check the spiffe id? (in the actual state, it only check the certificate chain)

@apollo13
Copy link
Contributor

@Gufran Can you add me as maintainer to your repo?

First and foremost, could you rebase the PR?

Will do ASAP

This topic is far away from being trivial, and we are actively discussing the best way to address it.

Yes, and I appreciate it! I know that the PR is not perfect, but I thought code would illustrate our issues better…

As a side note, we would prefer to use the VerifyPeerCertificate on TLS configuration instead of VerifyConnection, wdyt?

I just know enough Go to be dangerous. I opted for VerifyConnection because I thought it allows for more controls as you have access to more than the certs. In the end I do not know what is better.

We are also wondering why the certificate validation does not check the spiffe id? (in the actual state, it only check the certificate chain)

Because a) I wanted to have something minimal to show, b) I have yet to investigate how spiffe works -- but I plan to add this (though I do not think it should have an impact on the general discussion about how to allow more fine grained cert validation).

@Scandiravian
Copy link

Scandiravian commented Jan 22, 2021

First and foremost, could you rebase the PR?

This of course needs to be done. I'll add a checklist at the end of this post to sum of the tasks, that I think still need to be done. Please let me know if I missed anything :)

As a side note, we would prefer to use the VerifyPeerCertificate on TLS configuration instead of VerifyConnection, wdyt?

I don't have a particular preference and I think @apollo13 is better equipped to make a decision on this, but I'd like to hear what makes you prefer VerifyPeerCertificate over VerifyConnection? :)

TODO:

  • Rebase
  • Update docs
  • Add tests for Connect
  • Validate spiffe id
  • maybe switch validation from VerifyConnection to VerifyPeerCertificate (discussion needed)

I'd like to sign up for docs and tests, since I believe @apollo13 has a better understanding of TLS and would be more likely to make good calls in that area. You okay with that, @appollo13?

@apollo13
Copy link
Contributor

You okay with that, @apollo13?

Sounds perfect, I hate docs & tests anyways :)

@apollo13
Copy link
Contributor

apollo13 commented Jan 22, 2021

Hi @Gufran, I rebased & updated the PR to include SPIFFE etc. I squashed the commits down into one single commit: apollo13@bb45ec3 -- it would be great if you could force push this one into your branch.

EDIT:// Pushed also a cleanup commit; this is the branch https://github.com/apollo13/traefik/tree/consul-connect-backend-rebased

@apollo13
Copy link
Contributor

Okay @jbdoumenjou My branch https://github.com/apollo13/traefik/tree/consul-connect-backend-rebased includes the changes for SPIFFE verification and uses VerifyPeerCertificate instead of VerifyConnection. Personally I prefer VerifyConnection because then you get the already parsed certificates -- but in the end I am fine with whatever you decide (also the reason why it is in an extra commit for now).

I hope @​Gufran wil push my changes into this PR soonish.

@RavensburgOP 1,4,5 of your TODOs are done :)

@Scandiravian
Copy link

Hey @apollo13 :)

I'll hold out on updating the todo until it's in the @Gufran's fork, so the list matches what is in the PR

I'll take a look at docs and tests later today or tomorrow

@apollo13
Copy link
Contributor

apollo13 commented Jan 23, 2021 via email

@Gufran
Copy link
Author

Gufran commented Jan 25, 2021

Hi @apollo13, I've added you as a collaborator on my fork, you should be able to push to my branch now.
Please feel free to take it over, my contribution is now limited to slowing everyone down anyways.

@apollo13
Copy link
Contributor

Thank you very much @Gufran. No worries about not having time, you already put a lot in and layed the ground work for which I am very much grateful. I wouldn't be able to work on this if it weren't for you!

@apollo13
Copy link
Contributor

@RavensburgOP I added some code so that trafik.connect=True/False controls whether normal services are fetched or connect enabled services (Instead of https://www.consul.io/api-docs/catalog#list-nodes-for-service I am using https://www.consul.io/api-docs/catalog#list-nodes-for-connect-capable-service ). This way your default rule will work again. You either need to set TRAEFIK_PROVIDERS_CONSULCATALOG_CONNECTBYDEFAULT=true or explicitly enable traefik.connect=true on a service to fetch the connect capable endpoints.

Copy link
Contributor

@ldez ldez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@traefiker traefiker merged commit 7e43e56 into traefik:v2.5 Jul 15, 2021
v2 automation moved this from To review to Done Jul 15, 2021
@apollo13
Copy link
Contributor

Lovely, many thanks to the whole team for helping pushing this through. This will be "the one" feature of traefik 2.5 for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
v2
Done
Development

Successfully merging this pull request may close these issues.

None yet