-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unhandled :erpc failures #140
Comments
Hey!
Yeah, very good catch, I already pushed a fix for it, now the
Well, not sure, because according to the documentation, if the node was down, the error should be
Right, actually I've been working on a feature to allow ignoring the errors when using the annotations, something
Well, if you need a distributed/partitioned cache and you don't want to deal with Elixir/Erlang cluster, you can use the Redis adapter, and yes, maybe you can set up a multi-level topology with the local adapter as L1 and the Redis adapter as L2, and you can have a Redis cluster, etc. See the Redis example for more information. Stay tuned! |
@cabol Thank you for the quick response. Upon further investigation, I am getting a lot of An option to ignore exceptions when using the cache decorators would be immensely helpful. I am looking at the redis example and it looks like a possible solution (we have a redis cluster in our k8s env already for other stuff). I was concerned about the data type issue when using erlang terms as keys and/or values, but I see that the adapter handles that. I will take a look at that sometime soon. Thanks again for the response and I'll keep you updated if I find anything. I look forward to the new features! |
@mjquinlan2000 just pushed the new feature for caching decorators (#141), please try it out, let me know your thoughts. |
Hey @mjquinlan2000, some thoughts about how you can handle these errors.
If you are using/calling the cache directly (without annotations/decorators), you can wrap the logic like: retry with: constant_backoff(1000) |> Stream.take(10), rescue_only: [Nebulex.RPCError] do
MyPartitionedCache.get(key)
after
result -> result
else
_error -> nil
end Or you can use the @retry with: constant_backoff(1000) |> Stream.take(10), rescue_only: [Nebulex.RPCError]
def some_function(attrs) do
values = MyPartitionedCache.get(attrs[:id]) # just an example
# rest of the logic ...
end And you can also use @retry with: constant_backoff(1000) |> Stream.take(10), rescue_only: [Nebulex.RPCError]
@decorate cacheable(cache: MyPartitionedCache, key: attrs.id)
def some_function(attrs) do
# your logic ...
end I think in this way you can control the RPC error in a better way, even without ignoring them. And the good thing about the Also, as a heads-up, Nebulex v3 is ongoing, and one of the main features is the new API, it will provide an Ok/Error tuple API too (aside from the Wrapping up, try the alternatives out, and let me know your thoughts, I stay tuned! Thanks! |
@cabol I have seen the retry library before, but I have never used it. Most of cache errors that are concerning for me happen on user requests and I'd rather just ignore the error raised and fall back to the db so that I don't keep users waiting. I think the retry library will actually help me with some other areas in the application so I'm going to look into using it. After more investigation, it's plain to see that something is causing the pods to crash in k8s and this abrupt termination is what causes the :erpc errors. In all other cases, the Erlang nodes are shut down gracefully and caching is handled correctly. Since the implementation of the mutlilevel cache with local and partitioned was put in, I am only getting a small amount of errors so I am going to try and fix the issue with pods crashing first. I have not had a chance to look into the new decorator implementation and I am slated to be doing other work for the entire month of November, but it is on my list and I'll try to sneak some testing in. Thanks |
Sounds good, in the meantime, I will close the issue, but feel free to reopen it and even create a new one if you come across something not working properly, stay tuned, and thanks! |
I am getting a lot of errors while using the Nebulex.Caching decorators which is caused by an unhandled case clause in Nebulex.RPC.rpc_call/6 on rpc.ex:141.
The clause that does not match is
{:erpc, :timeout}
and some other{:erpc, _error}
exceptions. Here is some background about how the app is set up:erlang 24.1
elixir 1.12.3
running on k8s cluster with a horizontal pod autoscaler (pods created or destroyed dynamically depending on resource usage)
Nebulex set up running Nebulex.Adapters.Partitioned
Node clustering done with libcluster over kubernetes metadata API. Polling interval is set at 10s (I might lower this to see if that helps).
I think the main solution is to handle the
{:erpc, _error}
clause in a better way, but I'm having a hard time understanding the:timeout
issue when reading through the erlang documentation. I suspect it has something to do with when the system brings k8s pods down and nebulex is referencing a node that no longer exists.A secondary concern that I have is when I use the caching decorators there are a number of errors that can arise from not being able to contact other nodes in the cluster and how those errors are handled. It seems to me that right now an error is raised, but this can and has been causing issues with some api requests. Is there a way that I could configure the cache to treat these errors as a "cache miss" so that I can fetch data directly from the data store on an exception without halting the execution? Is there a way for me to set the :erpc timeout explicitly? Also, might my cache configuration be causing issues with this as well?:
I'm not sure if a multi-level cache would help with this problem (and I probably haven't given this part of the app enough love or done nearly enough research on caching strategies). Any help here would be greatly appreciated.
Thank you
The text was updated successfully, but these errors were encountered: