Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection with multiple TLS brokers fails to negotiate #1700

Closed
Albibek opened this issue May 12, 2020 · 13 comments · Fixed by #1701
Closed

Connection with multiple TLS brokers fails to negotiate #1700

Albibek opened this issue May 12, 2020 · 13 comments · Fixed by #1701

Comments

@Albibek
Copy link

Albibek commented May 12, 2020

Versions
Sarama Kafka Go
1.26.{1-3} 2.3 1.14
Configuration
Brokers = [broker01, broker02]
TLSConfig = {}

broker01 has certificate with cn=broker01
broker02 has certificate with cn=broker02

Logs
time="2020-05-12T14:05:57Z" level=error msg="consuming messages" error="kafka: error while consuming kubernetes/8: x509: certificate is valid for broker02, not broker01" 

Problem Description

Server name for TLS connection inferred incorrectly when multiple brokers are used.

We beleive, the bug is in the code below (broker.go)

When, in the first iteration of the cycle the cfg is changed, ServerName is never empty again. So all other connections will have incorrect ServerName and therefore fail the TLS negotiation.

           // If no ServerName is set, infer the ServerName
            // from the hostname we're connecting to.
            // Gets the hostname as tls.DialWithDialer does it.
            if cfg.ServerName == "" {
                colonPos := strings.LastIndex(b.addr, ":")
                if colonPos == -1 {
                    colonPos = len(b.addr)
                }
                hostname := b.addr[:colonPos]
                cfg.ServerName = hostname
            }
            b.conn = tls.Client(b.conn, cfg)
@dselyuzhitskiy
Copy link

dselyuzhitskiy commented May 12, 2020

The problem in this ab525ed

conf is a pointer to sarama.Config, so it set ServerName for the first broker and the remaining brokers try to use tls with wrong ServerName

@dnwe
Copy link
Collaborator

dnwe commented May 12, 2020

@d1egoaz ref #1692 😢

@d1egoaz
Copy link
Contributor

d1egoaz commented May 12, 2020

🤔 weird, we don't have that issue, I left running in one test app, and it's been running fine, that app connects to a 80 brokers cluster with no issues.

@d1egoaz
Copy link
Contributor

d1egoaz commented May 12, 2020

The code can be modified to create a copy/clone conf.Net.TLS.Config instead of referencing it directly! I just realized this was a pointer 😞

@d1egoaz
Copy link
Contributor

d1egoaz commented May 12, 2020

but it does work for us 🤷

@d1egoaz
Copy link
Contributor

d1egoaz commented May 12, 2020

@dselyuzhitskiy Can you try #1701 branch diego_clone-tls-config

@dselyuzhitskiy
Copy link

@d1egoaz i checked it and it works well. i left a comment about code improvement in PR

@dselyuzhitskiy
Copy link

dselyuzhitskiy commented May 13, 2020

I looked at #1666 and i could not understand what is wrong with tls.DialWithDialer that solves all problems. For proxy support we just need get correct dialer from config.getDialer and send it to DialWithDialer, i think.

dialer := conf.getDialer()
if conf.Net.TLS.Enable {
    b.conn, b.connErr = tls.DialWithDialer(&dialer, "tcp", b.addr, conf.Net.TLS.Config)
} else {
    b.conn, b.connErr = dialer.Dial("tcp", b.addr)
}

@d1egoaz
Copy link
Contributor

d1egoaz commented May 14, 2020

@dselyuzhitskiy I was trying to do the same actually, but DialWithDialer

 broker.…   159  43 error           compiler: cannot use &dialer (value of type *proxy.Dialer) as *net.Dialer value in argument to tls.DialWithDialer (lsp)

dialer is not anymore a net.Dialer, it's now a proxy.Dialer because of conf.getDialer()

and proxy.Dialer only implements Dial, tls.DialWithDialer needs more than that 😞

@dselyuzhitskiy
Copy link

Thanks for the explanation, i missed it.

@joelschopp
Copy link

We are also having trouble publishing on 1.26.3, downgrading to 1.26.1 fixes for us. We are using 3 tls servers. Looking forward to a 1.26.4 with a fix. Also happy to test any potential fixes.

@d1egoaz
Copy link
Contributor

d1egoaz commented May 15, 2020

@joelschopp can you try #1701?

@joelschopp
Copy link

@joelschopp can you try #1701?

Added this to my go.mod at it works for me... note 5ff9581b104a is the last commit in the branch you pointed to.

+replace github.com/Shopify/sarama => github.com/Shopify/sarama v1.26.4-0.20200514231448-5ff9581b104a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants