Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early outbound user messages prevents cluster nodes from joining on RPI(possibly mono or lowend HW) #874

Closed
rogeralsing opened this issue Apr 18, 2015 · 5 comments

Comments

@rogeralsing
Copy link
Contributor

After digging around some more with the Raspberry PI Cluster support, I've found that even if I start up two nodes on the RPI only, they still can not communicate and fail with an association exception.
So that rules out network issues as far as I can tell.
localhost to localhost still fail, and there is no firewall on the RPI.
While the remoting chat works fine.

I figure that if things don't work 100% on the RPI, it could be similar problems on other Mono OS'es
On what OS'es have we verified cluster support?
Ubuntu and iOS?
Debian?

@rogeralsing
Copy link
Contributor Author

I found the issue.
I was running the Petabridge WebCrawler example, and it seems that the crawler service is too fast sending messages at start up.
Removing the all code except the cluster start up makes everything work just fine.

So there is possibly an RC bug in the cluster that on a low end machine tries to talk to the seed node too early or something like that. cc @Aaronontheweb @smalldave
Removing any user initiated outbound communication from the start up solves the issue.

Once booted, everything works nicely..

Renaming issue.

@rogeralsing rogeralsing changed the title RPI(Mono?) Cluster Early outbound user messages prevents cluster nodes from joining on RPI(possibly mono or lowend HW) Apr 18, 2015
@smalldave
Copy link
Contributor

Oops didn't mean to close that

@smalldave smalldave reopened this Apr 18, 2015
@Aaronontheweb
Copy link
Member

This is an interesting issue... I've used Akka.Remote in production on Windows where the inbound receiver was under tremendous loads immediately after joining - so I wonder if the issue here is that the cluster joining stuff is too sensitive to latency right now?

@rogeralsing
Copy link
Contributor Author

Possibly related to #1062 (?)

@Aaronontheweb Aaronontheweb modified the milestone: Akka.NET v1.1 Jul 8, 2015
@Aaronontheweb
Copy link
Member

Closing this - there were lots of possible causes that could have been responsible for this, but my money is on the race condition we fixed inside the Cluster constructor that caused it to deadlock or the race conditions inside Helios 1.4.1. Happen to reopen if it crops up again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants