Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting setting LocalAddr in peer communication #17068

Open
ahrtr opened this issue Dec 5, 2023 · 13 comments
Open

Supporting setting LocalAddr in peer communication #17068

ahrtr opened this issue Dec 5, 2023 · 13 comments

Comments

@ahrtr
Copy link
Member

ahrtr commented Dec 5, 2023

What would you like to be added?

Currently in peer communication, the local address is automatically chosen by the golang standard lib,

t := &http.Transport{
Proxy: http.ProxyFromEnvironment,
DialContext: (&net.Dialer{
Timeout: dialtimeoutd,
// value taken from http.DefaultTransport
KeepAlive: 30 * time.Second,
}).DialContext,
// value taken from http.DefaultTransport
TLSHandshakeTimeout: 10 * time.Second,
TLSClientConfig: cfg,
}

We should support setting a LocalAddr, which defaults to the advertise Peer address if enabled. It should be disabled by default to keep the existing behavior unless users intentionally enable it.

Why is this needed?

When accepting a connection from a peer, each etcd member checks the peer's remote address against the Subject Alternative Name (SAN) field in the peer's certificate, and reject the connection if they don't match. Please read https://etcd.io/docs/v3.5/op-guide/security/#notes-for-tls-authentication.

See also

func checkCertSAN(ctx context.Context, cert *x509.Certificate, remoteAddr string) error {
if len(cert.IPAddresses) == 0 && len(cert.DNSNames) == 0 {
return nil
}
h, _, herr := net.SplitHostPort(remoteAddr)
if herr != nil {
return herr
}
if len(cert.IPAddresses) > 0 {
cerr := cert.VerifyHostname(h)
if cerr == nil {
return nil
}
if len(cert.DNSNames) == 0 {
return cerr
}
}
if len(cert.DNSNames) > 0 {
ok, err := isHostInDNS(ctx, h, cert.DNSNames)
if ok {
return nil
}
errStr := ""
if err != nil {
errStr = " (" + err.Error() + ")"
}
return fmt.Errorf("tls: %q does not match any of DNSNames %q"+errStr, h, cert.DNSNames)
}
return nil
}

Sometimes the peer may has multiple IP addresses, even including dynamic (i.e. floating) IP addresses. Probably only part of the IP addresses are valid against the SAN field in the certificate. But if an IP address which doesn't match the SAN in certificate is chosen by the golang standard lib, then it will fail to communicate with other etcd members.

@highpon
Copy link
Contributor

highpon commented Dec 5, 2023

@ahrtr
Would like to work on this!

@ahrtr
Copy link
Member Author

ahrtr commented Dec 6, 2023

Thanks @highpon .

ping @serathius @jmhbnz @wenjiaswe @fuweid for feedback.

@ahrtr
Copy link
Member Author

ahrtr commented Dec 18, 2023

cc @aojea for opinion as well

@aojea
Copy link

aojea commented Dec 18, 2023

cc @aojea for opinion as well

I can see that there are multiple flags that declare addresses identifying the client, should you not use one of those since those are known to be reachable?

@ahrtr
Copy link
Member Author

ahrtr commented Dec 18, 2023

cc @aojea for opinion as well

I can see that there are multiple flags that declare addresses identifying the client, should you not use one of those since those are known to be reachable?

Thanks for the feedback. Yes, I was planning to reuse the --initial-advertise-peer-urls if users enable this feature (disabled by default to keep the existing behaviour). I am not sure whether it's a best practice to set LocalAddr, and whether there are any potential issue. Any comment?

$ etcd \
  --name m1 \
  --listen-client-urls http://host1:2379 \
  --advertise-client-urls http://host1:2379 \
  --listen-peer-urls http://host1:2380 \
  --initial-advertise-peer-urls http://host1:2380

Another problem is that it's hard to verify it in workflow CI. For example, how to create multiple IP addresses on the environment.

@aojea
Copy link

aojea commented Dec 18, 2023

I am not sure whether it's a best practice to set LocalAddr, and whether there are any potential issue. Any comment?

In an etcd cluster there are some strong requirements or network connectivity between the peers, my experience is that every advanced networking knob like this can have also an undesired effect of providing a footgun for users that starts to implement complex scenarios just because they can or because they came with that idea and then they hit a problem and they start to ask for new knobs.
I always prefer to have the use cases and document the scenarios with how the new knobs are supposed to be used before adding them.

@ahrtr
Copy link
Member Author

ahrtr commented Dec 19, 2023

I always prefer to have the use cases and document the scenarios with how the new knobs are supposed to be used before adding them.

Yes, agreed on this. It's a real problem discovered in our testing when etcd and kube-vip coexist. Just as I mentioned in #17068 (comment), When an etcd member accepts a connection from a peer, it checks the peer's remote address against the Subject Alternative Name (SAN) field in the peer's certificate, and reject the connection if they don't match. It's exactly the reason why the LocalAddr matters here.

But when kube-vip coexist with etcd on the same host, kube-vip may dynamically add a floating IP to the host. Especially for the case of IPv6, it adds the IP to the beginning of the interface's IP list. Since etcd doesn't specify the LocalAddr when communicating with a peer, the floating IP is automatically chosen as the LocalAddr by the golang standard lib, eventually the connection is rejected by the peer.

I believe this is also a common issue. It applies the cases where the hosts on which the etcd runs have multiple IP addresses, especially dynamical floating IP.

Explicitly, it would be great if we can get feedback from network experts on:

  • In the beginning, we will disable the new feature by default, and etcd will not set a LocalAddr when communicating with a peer unless users intentionally enable it. But If there is no any concern or side effect, eventually we may want to enable it by default.

  • Another problem is how to automatically verify it in workflow CI using an e2e test case to prevent any regression. I don't see an easy way to mimic the case of the host having multiple IP addresses in workflow CI environment. Probably we can try unit test?

@aojea
Copy link

aojea commented Dec 19, 2023

that makes sense, then add just an option to define the localAddress for connecting to the apiserver, if is empty just don't set the localAddr field, this will make the change completely backwards compatible and you don't have to worry about the future.

Another problem is how to automatically verify it in workflow CI using an e2e test case to prevent any regressio

some ideas, if you are in the same node you can play with the localhost address 127.0.0.1 and the one in the node

  1. create a TCP listener that receives the connection and assert on the address you want
  2. test the negative case, verify that fails to connect if you set a wrong address

@ahrtr
Copy link
Member Author

ahrtr commented Dec 19, 2023

you can play with the localhost address 127.0.0.1 and the one in the node

Thanks for the suggestion.

@ahrtr
Copy link
Member Author

ahrtr commented Dec 19, 2023

@highpon Please follow comments below to update #17085. Thanks

  • Let's add an option --set-member-localaddr (bool), which defaults to false. If users enable it, then let's use the host in --initial-advertise-peer-urls as the LocalAddr when etcd communicates with a peer.

  • Follow comment above Supporting setting LocalAddr in peer communication #17068 (comment) to add an unit test. **play with the localhost address 127.0.0.1 and the one in the node**. We need to test both positive and negative cases.

    • Please also add an e2e test. We just need to verify positive case in e2e case to ensure it doesn't break anything. Specifically, we just need to set --set-member-localaddr to true in both 1-member and 3-members clusters, and ensure nothing is broken.

@highpon
Copy link
Contributor

highpon commented Dec 19, 2023

@ahrtr
Sure! Thanks for your help!
I would like to update #17085 for your comment.

@flawedmatrix
Copy link
Contributor

flawedmatrix commented Mar 9, 2024

I've tried to pick up the ball to implement these changes here: https://github.com/flawedmatrix/etcd/tree/fix/17068

One question I had was whether we only wanted to set LocalAddr to an IP address, or should we try to resolve the DNS name to an address from --initial-advertise-peer-urls?

The major problem I ran into was that I've been unable to get the e2e test to fail Mutual TLS when playing around with using the host IP address or 127.0.0.1. It seems the etcd nodes would only start up properly if all the IP addresses provided in --listen-peer-urls, --initial-advertise-peer-urls, and --initial-cluster matched. I've been unable to get the Golang library to infer LocalAddr to be 127.0.0.1 if I used the host IP for the three flags I mentioned.

@ahrtr Any thoughts as to how one could properly test this feature in an e2e test?

After revisiting this issue, I've resolved the issue of how to e2e test this feature.

@ahrtr
Copy link
Member Author

ahrtr commented Mar 29, 2024

One question I had was whether we only wanted to set LocalAddr to an IP address, or should we try to resolve the DNS name to an address from --initial-advertise-peer-urls?

The answer should be the latter based on #17068 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants