You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently ran into the following issue. A GenServer (called CA in the rest of the post) gets restarted two times by the Horde.DynamicSupervisor when nodes had been removed and re-included few times in the cluster.
The source code demonstrating the issue is available on Gitlab.
I'm using Elixir 1.10.1 and Horde 0.8.3. When I downgraded horde to 0.7.1 I couldn't reproduce the issue.
Demonstration
To demonstrate the issue:
Start the application
Create a GenServer (module POC.CA in my code) with the HTTP interface, i.e. that is execute the POCWeb.Login.exec() controller
Kill (with two times Control-c) the node on which the CA is running, restart it, re-do the operation and observe the logs
The two first steps are as follows:
Start node 1 in one terminal : $ HTTP_PORT=5001 ERL_AFLAGS="-name poc1@127.0.0.1 -setcookie abc" iex -S mix phx.server
Start node 2 in another terminal : $ HTTP_PORT=5002 ERL_AFLAGS="-name poc2@127.0.0.1 -setcookie abc" iex -S mix phx.server
Send the following request $ http -v http://localhost:5001/api cmd=login from a third terminal.
In the process above, let's assume that while the HTTP requests hit the node N1, the CA is created on N2. This gives us the following traces:
Terminal 1
...
(1) Interactive Elixir (1.10.1) - press Ctrl+C to exit (type h() ENTER for help)
(2) iex(poc1@127.0.0.1)1> [debug] NodeListener.handle_info(:nodeup)
[info] [libcluster:example] connected to :"poc2@127.0.0.1"
[debug] Cluster members: [{POC.DReg, :"poc1@127.0.0.1"}, {POC.DReg, :"poc2@127.0.0.1"}]
[debug] Cluster members: [{POC.DSup, :"poc1@127.0.0.1"}, {POC.DSup, :"poc2@127.0.0.1"}]
(3) [info] POST /api
[debug] Processing with POCWeb.Login.exec/2
Parameters: %{"cmd" => "login"}
Pipelines: [:api]
(4) [debug] Login.exec()
(5) [debug] Login.exec(): start_child->res={:ok, #PID<21019.478.0>}
[info] Sent 200 in 99ms
The BEAM is booting
Detection of the second node joining the cluster
Reception of the HTTP request
Start of the login controller. The controller calls Horde.DynamicSupervisor.start_child(POC.DSup, {POC.CA, {uaid, caid}}) which will call CA.start_link()
End of the login controller
Terminal 2
...
(1) Interactive Elixir (1.10.1) - press Ctrl+C to exit (type h() ENTER for help)
(2) iex(poc2@127.0.0.1)1> [debug] NodeListener.handle_info(:nodeup)
[debug] Cluster members: [{POC.DReg, :"poc2@127.0.0.1"}, {POC.DReg, :"poc1@127.0.0.1"}]
[debug] Cluster members: [{POC.DSup, :"poc2@127.0.0.1"}, {POC.DSup, :"poc1@127.0.0.1"}]
(3) [debug] CA.start_link({uaid="FFLFNTRJWV", caid="OLGFXEZLCO")
(4) [debug] CA.init(uaid="FFLFNTRJWV", caid="OLGFXEZLCO")
(5) [debug] CA.start_link(): GS.start_link->res={:ok, #PID<0.478.0>}
[debug] CA.start_link(): process started
The BEAM is booting
Detection of the second node joining the cluster
CA.start_link called by the login controller. Calls GenServer.start_link(__MODULE__, {uaid, caid}, name: via_tuple(caid))
CA.init() called by GenServer.start_link() at step 3.
Back in CA.start_link(). The process was created on N2
I recently ran into the following issue. A
GenServer
(calledCA
in the rest of the post) gets restarted two times by theHorde.DynamicSupervisor
when nodes had been removed and re-included few times in the cluster.The source code demonstrating the issue is available on Gitlab.
I'm using Elixir 1.10.1 and Horde 0.8.3. When I downgraded horde to 0.7.1 I couldn't reproduce the issue.
Demonstration
To demonstrate the issue:
GenServer
(modulePOC.CA
in my code) with the HTTP interface, i.e. that is execute thePOCWeb.Login.exec()
controllerControl-c
) the node on which theCA
is running, restart it, re-do the operation and observe the logsThe two first steps are as follows:
$ HTTP_PORT=5001 ERL_AFLAGS="-name poc1@127.0.0.1 -setcookie abc" iex -S mix phx.server
$ HTTP_PORT=5002 ERL_AFLAGS="-name poc2@127.0.0.1 -setcookie abc" iex -S mix phx.server
$ http -v http://localhost:5001/api cmd=login
from a third terminal.In the process above, let's assume that while the HTTP requests hit the node N1, the CA is created on N2. This gives us the following traces:
Terminal 1
login
controller. The controller callsHorde.DynamicSupervisor.start_child(POC.DSup, {POC.CA, {uaid, caid}})
which will callCA.start_link()
login
controllerTerminal 2
CA.start_link
called by thelogin
controller. CallsGenServer.start_link(__MODULE__, {uaid, caid}, name: via_tuple(caid))
CA.init()
called byGenServer.start_link()
at step 3.CA.start_link()
. The process was created on N2Killing and restarting the nodes
From now on, we will
CA
is runningKill N2 (as the
CA
is running on N2)Terminal 1
CA
.Restart of node 2
Terminal 2
Kill N1 (as the
CA
had been restarted on N1)Terminal 2
CA
is restarted on node N2Restart of node 1
The cluster gets reorganized as before
Kill N2
This where the strange thing will happen. Look at step 3.
Terminal 1
CA
gets restarted. This is normal. The following 3 lines show that the process is restarts successfullyCA.start_link()
get called a second time. As the process had already been started by step 2. the error:already_started
is returnedThe text was updated successfully, but these errors were encountered: