-
-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Silently catches OutOfMemoryErrors without rethrowing #301
Comments
@rap1ds Hi Mikko, thanks for pinging about this - and for the crystal clear report and references. Really appreciate it 🙏
This was indeed a very unfortunate decision IMO, and continues to cause endless hassles. I'm definitely not keen on rethrowing all errors, since that'll include the type of assertion failures that will be part of normal operation (certainly in Clojure, at least). Will give this some thought next time I'm on Carmine. May be a bit messy, but one solution would be conditionally rethrowing OOMs, etc. Is there any urgency from your side for a fix in Carmine? |
Oh, I should add- sorry for the trouble! This was definitely an oversight. |
@ptaoussanis Thanks for super duper fast response 😄
Yeah, I totally understand. Rethrowing all errors would be a big breaking change for the existing behavior. Letting the user to configure if errors are rethrown was one possible solution that I had in my mind. But also, conditionally rethrowing some errors like OutOfMemory is also a good solution candidate (or the opposite, conditionally not rethrowing some errors like AssertionErrors).
Definitely not, but thanks for asking, really appreciate it 🙏 I'm anyway trying to hunt down the root cause for the memory leak, so that will be the ultimate fix, hopefully, if I find it 😄 In addition, I was thinking as a workaround to wrap our handler function with a try-catch and catch Throwables there, and shutdown the process. I didn't fully test it yet but the idea is to do something like this: ;; handler function:
(try
(do-the-work ,,,)
(catch Exception e
(do-some-logging e)
)
(catch Throwable t
(do-some-logging t)
;; shutdown hook will handle stopping the system
;; use future to avoid dead-lock
(future (.exit (Runtime/getRuntime) 0))
;; rethrow
(throw t))) |
That's a good idea 👍 I've just added support to Encore's experimental master for easily catching Something like that should hopefully be reasonable. Will look closer at switching over when I'm next doing batched work on Carmine. In the meantime feedback still very welcome if anyone wants to propose alternatives. I'm currently focused on Telemere - so it'll be a few more weeks before I switch context back to Carmine. |
@rap1ds Hi Mikko! Just to update- I've given some more thought to rethrowing critical handler errors, and am a little hesitant for a few reasons:
What I propose instead is just documenting the current behaviour, and recommending that users include appropriate error catching in their handler fn (much like you have in your "workaround" above). That's a little more verbose, but I think would help maximize control and visibility of semantics. That'd also help resolve the question of exactly which errors to consider critical, since it'd be up to the user to decide what's relevant in their case. Does that seem reasonable to you? |
@ptaoussanis Thanks for coming back to the issue Yes, that sounds reasonable. I think it's a good idea to let the user decide what to do. I did some investigation and thinking about what the options are to handle OOM on the user's side, and here's what I found: 1. Use JVM flag
|
Thanks for the thoughtful and clear feedback Mikko 🙏 I'm not keen on Option 4 (O4) since to my understanding that wouldn't provide much benefit over O2 (which I believe is also simpler and more idiomatic). BTW I'm inclined to prefer O2 over O3 since O3 is coupled to the surrounding catching behaviour, which isn't within your control. O2 seems a more direct capture of your intentions- if this thing OOMs, do a system shutdown. If you're concerned about duplicating code here and in your UncaughtExceptionHandler, you could just pull that code out into a function and call it from both. So it seems your remaining discomfort comes from here:
Is there some specific place you have in mind? Most interactions with the Carmine API will be synchronous. So the only places I can think of off-hand that might be relevant to silent OOMs would be background threads from the message queue worker or Pub/Sub listener. |
That's a good point. Indeed, that would probably be more straightforward and easier to reason about.
No, I don't have any specific place in my mind. And my discomfort on this one is really minor :) I guess if the thing OOMs it's most likely going to happen in the handler since that's where the heaviest work is going to happen. |
Hi!
First, the root cause: we have a memory leak somewhere in our own application code. That's a separate, non-carmine-related issue that I'm trying to solve separately.
But how this relates to carmine is that the memory leak caused our handler code to throw
java.lang.OutOfMemoryError
. This error is a non-recoverable error that should not be caught and should propagate and, eventually, kill the process.We run our app in the cloud, and whenever the worker process dies, a new one is started automatically. However, this didn't happen, because it seems that carmine catches Throwables (the link points to version 3.2.0, which is the one we use), including errors like OutOfMemoryError.
I believe Throwables shouldn't be caught, or if they are caught for logging purposes like here, they should be rethrown. This will let the app crash, which is the only sensible thing to do for non-recoverable errors.
I searched for old issues and PRs and found this #20, which is probably the origin of catching Throwables instead of Exceptions. In the PR, the argument for the change was that this way assertion errors can be caught. However, to my understanding, assertions throw an Error, not an Exception, for a reason, and they shouldn't be caught.
The text was updated successfully, but these errors were encountered: