-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setting name for Replicated Cache returns no node #101
Comments
Hi @gaiabeatrice, glad to know you use and you like the project :) About the error, when you give a name to a cache, internally the adapter uses that name to create and name the cluster and/or group along with other things (the module cannot be used because it is supposed the same cache module may be used for different cache instances, that's why the name is used internally instead of the cache module). So, when you call MyCache.nodes(:mycache) #=> you can pass the name or the pid Or you can also set the default in the cache definition, like so: defmodule MyApp.Caches.MyCache do
use Nebulex.Cache,
default_dynamic_cache: :my_cache,
otp_app: :my_app,
adapter: Nebulex.Adapters.Replicated
end Since Another option is, don't pass the name, you can just start the cache as: children = [
# ...
{MyApp.Caches.MyCache, []}
# ...
] And use the dynamic cache for the tests as you are doing it, that means, in production, you go for the default definition and setup, so that you can do
Please try it out and let me know if that works for you, stay tuned! |
Thank you so much for the prompt response. I think my tests failing were related with the fact that two caches were involved in the module. I added |
Glad to hear that!
Yeah, maybe there is a conflict with the names, I would need more info about how are you doing the tests for example, if you can provide something similar to see what do you have, how you are running the tests, perhaps I could help you more. Also, Nebulext tests are done in that way, using dynamic caches, you can maybe use it as an example. Anyway, let me know if there anything I can help you with, I'm glad to help 😄 ! |
I am still having production errors, and I think it probably was unrelated with the name after all. Everyday at the same time I get this error
for my caches. Yesterday I started getting this error as well: GenServer MyApp.Caches.MyCache.Bootstrap terminating
** (MatchError) no match of right hand side value: {[:ok, :ok, :ok], [:"<serveraddress>"]}
(nebulex 2.0.0-rc.2) lib/nebulex/adapters/replicated.ex:587: Nebulex.Adapters.Replicated.Bootstrap.maybe_run_on_nodes/3
(nebulex 2.0.0-rc.2) lib/nebulex/adapters/replicated.ex:516: Nebulex.Adapters.Replicated.Bootstrap.handle_info/2
(nebulex 2.0.0-rc.2) lib/nebulex/adapters/replicated.ex:527: Nebulex.Adapters.Replicated.Bootstrap.handle_info/2
(stdlib 3.14) gen_server.erl:689: :gen_server.try_dispatch/4
(stdlib 3.14) gen_server.erl:765: :gen_server.handle_msg/6
(stdlib 3.14) proc_lib.erl:226: :proc_lib.init_p_do_apply/3 All of the caches have a similar configuration, so for the sake of clarity this is one config :myapp, MyApp.Caches.MyCache,
primary: [
gc_interval: :timer.hours(24)
],
stats: true and this is the cache defmodule MyApp.Caches.MyCache do
use Nebulex.Cache,
otp_app: :my_app,
adapter: Nebulex.Adapters.Replicated,
primary_storage_adapter: Nebulex.Adapters.Local
end If I ssh to the cluster I can see that all the caches seem to be running and have entries. |
First, about this error: When the replicated cache started, it tries to synchronize with the other nodes, and as far as I can see, three nodes reply Regarding this, I assume you have 4 nodes, is that right? For example, what are the names of the other three nodes?
Out of curiosity, if you do |
@gaiabeatrice I found the issue, during the sync-up process the replicated adapter was is the cache name, assuming always the default, therefore, dynamic caches will fail. I'll push a fix today, keep you posted! |
@gaiabeatrice I pushed a fix in the master branch, may you try it out and let me know if it actually solves the error you currently have? Given the case you still got the error, we should get a more specific error message and see where the problem is. Stay tuned! |
@gaiabeatrice any update? |
Hi, I am sorry. My errors stopped appearing a few days ago. I didn't use the latest master, but I am still using I will absolutely try your fix if the error presents itself again. I am really impressed by how responsive you are! This is one of my favorite libraries! |
Glad to hear that! I will close the issue, but please feel free to re-open it if even with the fix the error shows up again. Thanks!! |
Hi @cabol - thanks for a great library! I seem to also be running into a similar issue with the Replicated adapter, even when running from the latest master commit. I am using libcluster with the EPMD strategy, so all I specify in the config is the names of the two hosts, The module for my replicated cache is pretty barebones:
And for config:
After starting the iex sessions, I can confirm that both nodes see each other. "Putting" a value in the replicated cache on one node and then "getting" the value from the second node works as expected. Additionally, I've confirmed with:
However, terminating
As you can see, it's the same error @gaiabeatrice posted earlier in the thread. Running To further corroborate this, the following code, run at the same time, doesn't error and returns the expected value:
For context, I'm trying to actively hand-off some state from a terminating node, so running the I can continue simply wrapping it in the transaction per the snippet above, but I figured it wasn't the desired behavior from the caching side. Thanks for any thoughts! |
Hey @dwmcc !! Certainly, it is the same error, but as far as I can see, the cause is different. Perhaps the real cause of the error is the one you are mentioning, which makes a lot of sense. The initial issue was about using the replicated adapter with dynamic caches, and that was the fix. But this is a different thing that hasn't to do with the dynamic caches and names at all. So, let me elaborate more on what you have reported, and then maybe we can create a separate ticket for that, but let's discuss it here first. I was able to reproduce the error you mention, and you're right, it is about timing (BTW, you don't even need the transaction, adding a short sleep will also prevent the error, but I totally agree it shouldn't be the solution). When a cache node is terminated, the cluster view ( There are a couple of things I come up with right now:
In the meantime, I will continue trying different alternatives. Let me know your thoughts! |
@dwmcc you mentioned you're calling def stop do
:ok = MyApp.Cache.Replicated.leave_cluster()
:init.stop()
end I pushed some changes adding |
Good morning @cabol -- I'm thoroughly impressed with the response time here! TL;DR: Just pulled your changes from master and I've tested four times now, previously I was able to repro the error nearly every time but now I'm not able to reproduce it at all! Seems to me that leaving the Nebulex cluster is the cleanest solution on my end. Thanks for listing those potential solutions and I largely agree with you. The specific scenario when I'm experiencing this issue is I am handing off the state of a genserver on the terminating node, and restarting it on a running node. Solution (2) is a good idea, albeit I was hopeful I could find a cleaner way rather than repeatedly retrying (also since the new genserver may need to do some work right away after init!). I also checked the docs for the options I could pass to All of that is to say, I think these are all potentially valid solutions. I didn't get around to it last night but I was going to try to force the node to leave the libcluster-formed cluster before running If you're amenable to keeping these new functions in the replicated and partitioned adapters, I think that's an excellent solution for my use-case. Thanks for pushing these changes so quickly! |
Tangentially, for my understanding, why was Nebulex attempting to run an RPC call on Is it the after-get deletion action of the Would a |
Totally agree, leaving the cluster is the cleanest way, the other alternatives may have important side effects.
Indeed,
Right, with the
Yes, count with that, |
@cabol excellent! I'll continue using the Thanks for your help and looking forward to |
Hello, this is one of my favorite elixir libraries!
I am having one issue: (running 2.0.0-rc.2)
I have a cache that looks like this:
I tried set the name in the
children
list of theApplication
moduleI also have a test helper for async testing that looks like this (I call this in the setup of the tests)
If I run this, the tests pass and the app runs. However if I try to get the nodes I get an empty list.
This is causing errors in production when some cron jobs are running saying
I tried making some changes, for example:
and removing the name from the children list in
Application
and I can now see the nodes running, however my tests are failing
I also tried setting the name in my configuration but I am still having errors of the nodes not showing up.
What is the best way to navigate this?
The text was updated successfully, but these errors were encountered: