Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] CoreFullClusterRestartIT NotMasterException #30443

Closed
jdconrad opened this issue May 7, 2018 · 6 comments
Closed

[CI] CoreFullClusterRestartIT NotMasterException #30443

jdconrad opened this issue May 7, 2018 · 6 comments
Labels
:Security/TLS SSL/TLS, Certificates >test-failure Triaged test failures from CI

Comments

@jdconrad
Copy link
Contributor

jdconrad commented May 7, 2018

Was unable to reproduce locally. Pulled the two node log files.

(https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.3+bwc-tests/39/console)

full-cluster-restart-node0-part1.log
full-cluster-restart-node0-part2.log
full-cluster-restart-node1-part1.log
full-cluster-restart-node1-part2.log

REPRODUCE WITH: ./gradlew :x-pack:qa:full-cluster-restart:with-system-key:v5.0.0#oldClusterTestRunner
-Dtests.seed=535F856688D6390
-Dtests.class=org.elasticsearch.xpack.restart.CoreFullClusterRestartIT
-Dtests.method="testAliasWithBadName"
-Dtests.security.manager=true
-Dtests.locale=is
-Dtests.timezone=Asia/Ust-Nera

REPRODUCE WITH: ./gradlew :x-pack:qa:full-cluster-restart:with-system-key:v5.0.0#oldClusterTestRunner
-Dtests.seed=535F856688D6390
-Dtests.class=org.elasticsearch.xpack.restart.CoreFullClusterRestartIT
-Dtests.method="testEmptyShard"
-Dtests.security.manager=true
-Dtests.locale=is
-Dtests.timezone=Asia/Ust-Nera

REPRODUCE WITH: ./gradlew :x-pack:qa:full-cluster-restart:with-system-key:v5.0.0#oldClusterTestRunner
-Dtests.seed=535F856688D6390
-Dtests.class=org.elasticsearch.xpack.restart.CoreFullClusterRestartIT
-Dtests.method="testSearch"
-Dtests.security.manager=true
-Dtests.locale=is
-Dtests.timezone=Asia/Ust-Nera

REPRODUCE WITH: ./gradlew :x-pack:qa:full-cluster-restart:with-system-key:v5.0.0#oldClusterTestRunner
-Dtests.seed=535F856688D6390
-Dtests.class=org.elasticsearch.xpack.restart.CoreFullClusterRestartIT
-Dtests.security.manager=true
-Dtests.locale=en-US
-Dtests.timezone=Etc/UTC

@jdconrad jdconrad added >test-failure Triaged test failures from CI v6.3.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels May 7, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@bleskes
Copy link
Contributor

bleskes commented May 8, 2018

@DaveCTurner , @ywelsch can one of you maybe look at this?

@ywelsch
Copy link
Contributor

ywelsch commented May 8, 2018

This looks to be either a transport security or serialization issue. The NotMasterException happens because the 2 nodes drop the connection to each other all the time. @jaymode does this sound familiar?

[2018-05-07T21:38:43,900][WARN ][o.e.x.s.t.n.SecurityNetty4Transport] [node-1] exception caught on transport layer [[id: 0x83064464, L:0.0.0.0/0.0.0.0:52158 ! R:/127.0.0.1:46073]], closing connection
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: Received close_notify during handshake
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:442) ~[netty-codec-4.1.5.Final.jar:4.1.5.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248) ~[netty-codec-4.1.5.Final.jar:4.1.5.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:372) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:358) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:350) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:372) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:358) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:129) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:610) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:513) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:467) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:437) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:873) [netty-common-4.1.5.Final.jar:4.1.5.Final]
	at java.base/java.lang.Thread.run(Thread.java:844) [?:?]
Caused by: javax.net.ssl.SSLException: Received close_notify during handshake
	at java.base/sun.security.ssl.Alerts.getSSLException(Alerts.java:214) ~[?:?]
	at java.base/sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1762) ~[?:?]
	at java.base/sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1725) ~[?:?]
	at java.base/sun.security.ssl.SSLEngineImpl.recvAlert(SSLEngineImpl.java:1878) ~[?:?]
	at java.base/sun.security.ssl.SSLEngineImpl.processInputRecord(SSLEngineImpl.java:1140) ~[?:?]
	at java.base/sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1020) ~[?:?]
	at java.base/sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:902) ~[?:?]
	at java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:680) ~[?:?]
	at java.base/javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:626) ~[?:?]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1094) ~[?:?]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:966) ~[?:?]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:900) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:411) ~[?:?]
	... 15 more

@ywelsch ywelsch added :Security/TLS SSL/TLS, Certificates and removed :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels May 8, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security

@jasontedor
Copy link
Member

@ywelsch This only needs a backport of #30337. I will do this now.

@jasontedor
Copy link
Member

Closed by fed949e

@jpountz jpountz removed the v6.3.0 label Jun 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Security/TLS SSL/TLS, Certificates >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

6 participants