-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hostname is incorrect after deploying to k8s #111
Comments
Does it work if you restart without deleting the Mnesia disk files? |
No, I get stuck with a CrashLoop. I did more testing, and here is what I found:
|
What is the log for the app when the CrashLoop occurs? The same as above with Also when you say master node tables, what do you refer to? In the Mnesia cluster all nodes are master nodes. |
The initial error there is pretty clear, though I imagine the confusion is in why it happens - the lookup for the Without additional context it isn't clear when this error is occuring relative to the startup of the node, but I'm assuming it happens right away with the first pod after the previous instances were terminated, and before the new pods have been marked as live and thus made available as instances of the service from k8s perspective. Mnesia really requires something akin to StatefulSets - it is not friendly to environments where nodes come and go freely, it expects a certain set of nodes to be part of a cluster, and for membership changes in that cluster to be relatively rare, and those changes require coordination. Furthermore, it expects that the cluster is formed by bringing up all of the nodes, creating a schema with the set of nodes participating in the Mnesia cluster, then starting Mnesia on all of those nodes (including the one where the schema is first created). Once the schema is replicated, then all of the nodes can be terminated safely and brought back up, but only if the schema data is persistent, otherwise you either have to start from scratch again, or bring up new nodes, use I don't believe this is a |
Thanks for the detailed description @bitwalker! @gabrielrinaldi has mentioned to me in the elixir slack channel that https://github.com/Shopify/kubernetes-deploy is used for deploys . The Mnesia cache in Pow is a GenServer that automatically handles initialization and replication in Mnesia as long as a list of existing cluster nodes are set for the init callback. This is usually done by providing the value of I've been helping with a guide on how to integrate libcluster with Pow/Mnesia. The following setup with def start(_type, _args) do
topologies = [
example: [
strategy: Cluster.Strategy.Kubernetes,
config: [
# ...
]
]
]
# List all child processes to be supervised
children = [
{Cluster.Supervisor, [topologies, [name: MyApp.ClusterSupervisor]]},
MyApp.MnesiaClusterSupervisor,
# Start the Ecto repository
MyApp.Repo,
# Start the endpoint when the application starts
MyAppWeb.Endpoint
# Starts a worker by calling: MyApp.Worker.start_link(arg)
# {MyApp.Worker, arg},
]
# See https://hexdocs.pm/elixir/Supervisor.html
# for other strategies and supported options
opts = [strategy: :one_for_one, name: MyApp.Supervisor]
Supervisor.start_link(children, opts)
end defmodule MyApp.MnesiaClusterSupervisor do
use Supervisor
def start_link(init_arg) do
Supervisor.start_link(__MODULE__, init_arg, name: __MODULE__)
end
@impl true
def init(_init_arg) do
children = [
{Pow.Store.Backend.MnesiaCache, extra_db_nodes: Node.list()},
Pow.Store.Backend.MnesiaCache.Unsplit
]
Supervisor.init(children, strategy: :one_for_one)
end
end But I don't know if this is robust? Would it be better to restart the Mnesia cache in the There is support for netsplit recovery btw. The thing I want to make sure of is that the Mnesia cache GenServer starts with at least one active cluster node (if a cluster exists) to init replication. |
@danschultzer Yeah that's more or less what I was expecting. The problem(s) I described in my last comment are definitely possible with that setup. There are few factors at play:
NOTE: It is not clear from the The best setup here in my opinion is to be able to boot a node, have it fully start without starting Mnesia, but only be marked ready, not live; and once started, the node tries to join up with a pre-existing cluster for some period of time before deciding to form a new cluster. Once Mnesia is started, the components which depend on Mnesia are then able to be started as well (and depending on how you approach it, may be able to automatically start on their own once they see that Mnesia is ready). This technique is known to me as "stacking theory", and you can read more about the idea here. The key is that you don't necessarily try to accomplish absolutely everything in the initialization of the supervisor tree - some things should necessarily be deferred and lazily instantiated once they are ready, and as that happens, the system approaches operational readiness. Your liveness checks then just evaluate whether the system has fully started or not (or has at least obtained enough operational capability to be allowed to start servicing requests). The bottom line though, is that the current setup you've outlined is racy, and I believe that is the main culprit here. I could be more precise given more information about the exact sequence in which events occur (i.e. how deploys are rolled out, step by step, pod by pod); but hopefully what I've outlined here clarifies the issue well enough to allow you to troubleshoot. I really do want to stress that I don't think Mnesia is compatible with cattle-style nodes, it is designed for pet-style/persistent nodes. This also simplifies a lot of things (such as which node is responsible for initializing the schema/cluster), whereas trying to make Mnesia work with a shifting set of anonymous nodes is very fragile. |
Thanks for the excellent response @bitwalker! That makes a lot of sense. I found this blog post that goes into StatefulSet with Cassandra cluster which I think is what is needed here: https://medium.com/velotio-perspectives/exploring-upgrade-strategies-for-stateful-sets-in-kubernetes-c02b8286f251 @gabrielrinaldi what is your deploy strategy? Can you share the YAML? Also any other info would help. |
@bitwalker I am almost convinced that @danschultzer here is our
|
@danschultzer our update strategy is to pull one node at a time, put the new one back in and wait for it to pass the health check. Then repeat with the rest of the pods. |
@gabrielrinaldi In a cluster with dynamic membership, I would agree with you that Redis is a better choice, though they aren't really 1:1, as Mnesia is a distributed key/value store while Redis is not. If you are able to use Redis to solve the same problem, then Mnesia is definitely not worth the operational overhead in my opinion. That said, in a cluster with static membership, Mnesia is able to provide some nice benefits if your use case is a good fit, notably for read-heavy workloads. Since you are using StatefulSet here, you have the foundation needed for the static cluster. I think aside from the issues I've already mentioned, part of the problem here may be that libcluster is resolving an IP and using that to connect nodes, and what happens is that at some point the IP for a given node (say
'app@backend-elixir-0.backend-elixir.backend-elixir-staging.svc.cluster.local'
'app@backend-elixir-1.backend-elixir.backend-elixir-staging.svc.cluster.local' The above assumes a cluster of two pods, and since you haven't shared the Service definition, I'm guessing at the name (which is the second component of each DNS name). Likewise, the Erlang node name I've just filled in as Restart your cluster from scratch (i.e. delete the Mnesia disk copies - back them up if you need to save the data though), and then try and replicate the issue by performing a few rolling updates. If you are able to reproduce the problem again, then we know it is due to one of the other issues and not how libcluster is connecting the nodes in the DNSSRV strategy. NOTE: If you are new to Elixir, trying to dive into using Mnesia right out of the gate may not be worth the effort. If it is the right solution to your problem, then that is one thing, just be aware that like many distributed systems, it takes additional preparation and maintenance to keep things running smoothly, it is not a "set it and forget it" component of OTP. That said, this is a great opportunity to learn more about how it works, so if you have some time to apply to figuring it out now, it is worth trying to at least get to the bottom of this particular problem. |
Turns out I was mistaken about how libcluster resolves the IPs when using the DNSSRV strategy - it actually does use the DNS name, not the IP. Nevertheless, may still be worth switching to ErlangHosts for simplicity. However, it may also be that the nodes themselves are using their IP as their node hostname, and not the DNS, i.e. I'm still not 100% sure whether the fact that a node starts up without a DNS record is part of the issue here or not. If it is, then that complicates things one way or another, unless k8s has an easy way to assign static IPs to pods that I don't know about. |
I am using
Cluster.Strategy.Kubernetes.DNSSRV
for the strategy. Everything works fine on the initial deploy. But after a few deploys I get the following error:The only way to recover from it is delete the mnesia cache disk for each of the pods and restart everything.
The only thing using mnesia so far is pow.
I am new to Elixir, but I am happy to gather more information if someone points me in the right direction.
The text was updated successfully, but these errors were encountered: