-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Observe io.netty.allocator.type for custom Netty allocator #10292
Comments
Sorry you had to jump through hoops to get your allocator enabled. A custom allocator was implemented to have lower memory usage than the default allocator could provide. Maybe it's a bit counterintuitive, but for gRPC purposes we consider our custom allocator to be "the default". |
It isn't all that well documented because people haven't really needed it. I think I might have heard of one person using What problem did you have such that you wanted unpooled buffers? |
Thanks for information. TL; DR - There is a direct memory leak which appears to be caused by the netty pooled allocator type. Switching to the netty unpooled allocator type fixes the leak. Here is a high-level view of the system:
There are conditions where the system's direct memory can leak. Typically, when the Flink job restarts, we observe an increase in direct memory usage. So far, I have isolated the direct memory increase to the netty pooled allocator. This is because when I switch to using the unpooled allocator type (using the steps detailed in the initial question), there is no longer a direct memory leak. Unfortunately, I do not fully understand everything that is going on and why the memory leak is occurring. I suspect that during a Flink restart something related to the netty pools is not properly cleaned up, but I need to do more investigation on my side to get a more detailed picture of what is occurring. There is a chance we are running into this netty issue netty/netty#12067 but I cannot prove it. |
If the allocator itself is leaking, and it isn't something simple like channels/servers leaked/not shut down, then I'd suspect the ClassLoader is being leaked and that's the real problem. Assuming it isn't via something easy like a gRPC thread is still running, I'd suspect ThreadLocals. Any Threads that call Netty (which includes a thread issuing an RPC) can retain a ThreadLocal which holds onto the classloader. The only sure way to clean those up is to shut down the threads. netty/netty#6891 (comment) . This is mostly a JDK problem but partly a Netty problem. |
It would be fine to observe But most the important part of this issue is the leak. The other things really don't matter, because they are just workarounds. But the leak's also quite bothersome to debug. I think the best thing to do is to shut down Threads used by Netty/gRPC so the ClassLoader can be released and see if the leak is addressed. We could improve the behavior of |
Let's observe |
Thanks @ejona86 for the information. I had some time to dig into the application's behavior a bit more on my side. With respect to your suggestion I also did some debugging to better understand the cleanup process these threads go through. In all cases the
This case of Given the observations I made locally, I am not sure if the memory leak scenario you described in netty/netty#6891 (comment) would apply for this case (it is very possible I am missing some understanding though). Can the ClassLoader still leak in the case that the Circling back to the allocator type. Under the assumption the direct memory leak is somehow related to |
I was not talking about The problem I was describing occurs when you call gRPC from a long-lived thread. gRPC will encode a message using the allocator, and Netty then may use a ThreadLocal to store a cache. Independent of what information is trying to be stored, this will leak, as the value will be InternalThreadLocalMap which has a reference to Netty's ClassLoader and keeps it alive for the remainder of the Thread's life. InternalThreadLocalMap isn't really special in the issue, as storing Any usage of Netty from a thread risks that FastThreadLocal is used. There's two options of how unpooled helps:
Note that |
I modified code that depends on
grpc-java
to use theunpooled
allocator type fornetty
. In order to accomplish this I had to:unpooled
for theio.netty.allocator.type
property. See https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/ByteBufUtil.java#L78false
for theio.grpc.netty.useCustomAllocator
property. I am usingv1.32.1
but this also applies to the latest versionv1.56.x
. See https://github.com/grpc/grpc-java/blob/v1.32.x/netty/src/main/java/io/grpc/netty/Utils.java#L126 and https://github.com/grpc/grpc-java/blob/v1.56.x/netty/src/main/java/io/grpc/netty/Utils.java#L136Why is the default value for
io.grpc.netty.useCustomAllocator
true
? In order for the code to use theByteBufAllocator.DEFAULT
theio.grpc.netty.useCustomAllocator
property must be overridden tofalse
, which seems to contradict the idea of a default. Would it make more sense forio.grpc.netty.useCustomAllocator
to default tofalse
so thatByteBufAllocator.DEFAULT
is used by default?I am asking because I had to do some debugging to figure out why only supplying
io.netty.allocator.type=unpooled
did not change the allocator type as I had expected. Thanks!The text was updated successfully, but these errors were encountered: