Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cassandra NoHostAvailableException connection time out #5

Open
alseddnm opened this issue Jun 13, 2018 · 7 comments
Open

Cassandra NoHostAvailableException connection time out #5

alseddnm opened this issue Jun 13, 2018 · 7 comments

Comments

@alseddnm
Copy link

We are using mesos/marathon to manage our docker containers, zipkin ran fine for 15 mins or less -> then heath check starts failing.
we found a bunch of errors in our service log : cannot load service names: Request processing failed; nested exception is com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /xxxxx:9042 (com.datastax.driver.core.exceptions.TransportException: [/1xxxx:9042] Connection has been closed),/(com.datastax.driver.core.exceptions.TransportException: [xyz/10.124.8.97:9042] Connection has been closed))

at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84) ~[cassandra-driver-core-3.5.0-shaded.jar!/:?]
at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:37) ~[cassandra-driver-core-3.5.0-shaded.jar!/:?]
at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37) ~[cassandra-driver-core-3.5.0-shaded.jar!/:?]
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245) ~[cassandra-driver-core-3.5.0-shaded.jar!/:?]
at zipkin2.storage.cassandra.internal.call.ResultSetFutureCall.getUninterruptibly(ResultSetFutureCall.java:74) ~[zipkin-storage-cassandra-2.9.1.jar!/:?]
at at zipkin2.storage.cassandra.internal.call.ResultSetFutureCall.getUninterruptibly(ResultSetFutureCall.java:74) ~[zipkin-storage-cassandra-2.9.1.jar!/:?]
at zipkin2.storage.cassandra.internal.call.ResultSetFutureCall$1CallbackListener.run(ResultSetFutureCall.java:50) [zipkin-storage-cassandra-2.9.1.jar!/:?]
at zipkin2.storage.cassandra.internal.call.DirectExecutor.execute(DirectExecutor.java:23) [zipkin-storage-cassandra-2.9.1.jar!/:?]````

I thought is better to open an issue, we are investigating on our side as well.

I did also notice zipkin cassandra is using SASI index and per datastax doc? SASI indexes in DSE are experimental. DataStax does not support SASI indexes for production https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/useSASIIndex.html.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Jun 14, 2018 via email

@alseddnm
Copy link
Author

@adriancole Just saw your message, Not even able to access our c* nodes this morning.. I see a bunch of errors in c* logs
Can't open index file at /cassandra/data/zipkin2/span-15bb5b006e7111e8a8d2af46ca93ec1b/mc-2673-big-SI_span_l_service_idx.db, skipping. org.apache.cassandra.io.FSReadError: java.io.EOFException at org.apache.cassandra.index.sasi.disk.OnDiskIndex.<init>(OnDiskIndex.java:164) ~[apache-cassandra-3.9.0.jar:3.9.0] at org.apache.cassandra.index.sasi.SSTableIndex.<init>(SSTableIndex.java:68) ~[apache-cassandra-

ERROR [Reference-Reaper:1] 2018-06-14 14:46:11,044 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@74bce35c) to class org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@2054886331:/cassandra/data/zipkin2/span-15bb5b006e7111e8a8d2af46ca93ec1b/mc-2673-big was not released before the reference was garbage collected

@alseddnm
Copy link
Author

as of now we don't have much service names I see only 292 distinct service in the table
total records are 404118
select count(*) from span_by_service;

count

404118

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Jun 15, 2018 via email

@shakuzen
Copy link
Member

@alseddnm are you able to try with more recent versions to see if things work better?

@ukreddy-erwin
Copy link

same issue even with official cassandra docker image latest one

@codefromthecrypt codefromthecrypt transferred this issue from openzipkin/zipkin Apr 16, 2020
@codefromthecrypt
Copy link
Member

protip: adding comments to old issues about a troubleshooting scenario isn't usually something that results in an outcome. try poking on https://gitter.im/openzipkin/zipkin or including actual error message especially what "does" work for example if the /health endpoint works (which if not is a more fundamental problem)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants