Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observe io.netty.allocator.type for custom Netty allocator #10292

Closed
jaredbwasserman opened this issue Jun 20, 2023 · 8 comments · Fixed by #10543
Closed

Observe io.netty.allocator.type for custom Netty allocator #10292

jaredbwasserman opened this issue Jun 20, 2023 · 8 comments · Fixed by #10543
Assignees
Milestone

Comments

@jaredbwasserman
Copy link

I modified code that depends on grpc-java to use the unpooled allocator type for netty. In order to accomplish this I had to:

  1. Pass in unpooled for the io.netty.allocator.type property. See https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/ByteBufUtil.java#L78
  2. Pass in false for the io.grpc.netty.useCustomAllocator property. I am using v1.32.1 but this also applies to the latest version v1.56.x. See https://github.com/grpc/grpc-java/blob/v1.32.x/netty/src/main/java/io/grpc/netty/Utils.java#L126 and https://github.com/grpc/grpc-java/blob/v1.56.x/netty/src/main/java/io/grpc/netty/Utils.java#L136

Why is the default value for io.grpc.netty.useCustomAllocator true? In order for the code to use the ByteBufAllocator.DEFAULT the io.grpc.netty.useCustomAllocator property must be overridden to false, which seems to contradict the idea of a default. Would it make more sense for io.grpc.netty.useCustomAllocator to default to false so that ByteBufAllocator.DEFAULT is used by default?

I am asking because I had to do some debugging to figure out why only supplying io.netty.allocator.type=unpooled did not change the allocator type as I had expected. Thanks!

@temawi
Copy link
Contributor

temawi commented Jun 20, 2023

Sorry you had to jump through hoops to get your allocator enabled.

A custom allocator was implemented to have lower memory usage than the default allocator could provide. Maybe it's a bit counterintuitive, but for gRPC purposes we consider our custom allocator to be "the default".

@ejona86
Copy link
Member

ejona86 commented Jun 20, 2023

It isn't all that well documented because people haven't really needed it. I think I might have heard of one person using NettyChannelBuilder.withOption(ChannelOption.ALLOCATOR, yourAlloc). We made our own custom allocator to reduce memory usage (especially direct memory), and it seems it has been working.

What problem did you have such that you wanted unpooled buffers?

@jaredbwasserman
Copy link
Author

Thanks for information.

TL; DR - There is a direct memory leak which appears to be caused by the netty pooled allocator type. Switching to the netty unpooled allocator type fixes the leak.

Here is a high-level view of the system:

  • Top level is a Flink application running Flink version 1.13.5
  • Flink is using Google Pub/Sub as an input source. We are depending on com.google.cloud:libraries-bom:8.1.0
  • The BOM pulls in com.google.cloud:google-cloud-pubsub:jar:1.108.0
  • Pubsub in turn depends on io.grpc:grpc-netty-shaded:jar:1.30.2 (looks like I had the version incorrect in the initial question)

There are conditions where the system's direct memory can leak. Typically, when the Flink job restarts, we observe an increase in direct memory usage. So far, I have isolated the direct memory increase to the netty pooled allocator. This is because when I switch to using the unpooled allocator type (using the steps detailed in the initial question), there is no longer a direct memory leak.

Unfortunately, I do not fully understand everything that is going on and why the memory leak is occurring. I suspect that during a Flink restart something related to the netty pools is not properly cleaned up, but I need to do more investigation on my side to get a more detailed picture of what is occurring. There is a chance we are running into this netty issue netty/netty#12067 but I cannot prove it.

@ejona86
Copy link
Member

ejona86 commented Jun 22, 2023

If the allocator itself is leaking, and it isn't something simple like channels/servers leaked/not shut down, then I'd suspect the ClassLoader is being leaked and that's the real problem. Assuming it isn't via something easy like a gRPC thread is still running, I'd suspect ThreadLocals. Any Threads that call Netty (which includes a thread issuing an RPC) can retain a ThreadLocal which holds onto the classloader. The only sure way to clean those up is to shut down the threads. netty/netty#6891 (comment) . This is mostly a JDK problem but partly a Netty problem.

@ejona86
Copy link
Member

ejona86 commented Jun 28, 2023

io.grpc.netty.useCustomAllocator is about whether grpc should use its custom allocator vs Netty's default allocator, and that custom allocator is used by default. So I think the current name make sense. We also aren't going to "optimize" the value too much, since we really would hope it doesn't need wide use.

It would be fine to observe io.netty.allocator.type when creating our allocators. That would be an easy change, and I think for most of the other values we are already observing the system properties (via observing Netty's defaults).

But most the important part of this issue is the leak. The other things really don't matter, because they are just workarounds. But the leak's also quite bothersome to debug. I think the best thing to do is to shut down Threads used by Netty/gRPC so the ClassLoader can be released and see if the leak is addressed.

We could improve the behavior of FastThreadLocal.destroy(), so that the ClassLoader might be eventually GC'ed, but it requires creating new ThreadLocals to trigger and so is probably too slow/non-deterministic. That's why I didn't change it before. ThreadLocalMap (in Java) would need to use a ReferenceQueue instead of the slow cleanSomeSlots(). But that doesn't make much sense for the JDK because almost no ThreadLocals will ever be GCed (because of cycles like this one, which requires an explicit API to break). And if you are doing that, "you might as well shut down the Threads."

@ejona86 ejona86 changed the title Confusing behavior for useCustomAllocator Observe io.netty.allocator.type for custom Netty allocator Jun 30, 2023
@ejona86 ejona86 added this to the Next milestone Jun 30, 2023
@ejona86
Copy link
Member

ejona86 commented Jun 30, 2023

Let's observe io.netty.allocator.type as part of Utils when we create our allocator. This issue will track that change. (Implementation note: if Netty is unpooled, we can just use their allocator since all the interesting settings other than heap/direct (which we don't change) are for pooled.)

@jaredbwasserman
Copy link
Author

Thanks @ejona86 for the information. I had some time to dig into the application's behavior a bit more on my side.

With respect to your suggestion to shut down Threads used by Netty/gRPC so the ClassLoader can be released and see if the leak is addressed it appears grpc worker threads are already being shut down and recreated as part of the restart process our Flink application goes through. I profiled locally and found there are sets of threads beginning with grpc-nio-worker-ELG- which are periodically destroyed and recreated. It is possible I misunderstood which threads you meant, but these seemed the most likely based on your description.

I also did some debugging to better understand the cleanup process these threads go through. In all cases the remove() function inside InternalThreadLocalMap hit the case where the thread is an instance of FastThreadLocalThread and therefore set the threadLocalMap to null (https://github.com/netty/netty/blob/4.1/common/src/main/java/io/netty/util/internal/InternalThreadLocalMap.java#L141). The source of remove() seems to come from calling FastThreadLocal.removeAll() in one of two places:

  1. The doStartThread method inside SingleThreadEventExecutor (https://github.com/netty/netty/blob/4.1/common/src/main/java/io/netty/util/concurrent/SingleThreadEventExecutor.java#L1050)
  2. The run method inside FastThreadLocalRunnable (https://github.com/netty/netty/blob/4.1/common/src/main/java/io/netty/util/concurrent/FastThreadLocalRunnable.java#L32)

This case of remove() differs from calling destroy() because the slowThreadLocalMap is not modified in the cases I observed whereas destroy() explicitly modifies slowThreadLocalMap (https://github.com/netty/netty/blob/4.1/common/src/main/java/io/netty/util/internal/InternalThreadLocalMap.java#L148).

Given the observations I made locally, I am not sure if the memory leak scenario you described in netty/netty#6891 (comment) would apply for this case (it is very possible I am missing some understanding though). Can the ClassLoader still leak in the case that the grpc-nio-worker-ELG- threads are periodically recreated?

Circling back to the allocator type. Under the assumption the direct memory leak is somehow related to FastThreadLocal the UnpooledByteBufAllocator avoids the leak by not relying on FastThreadLocal. The PoolThreadLocalCache inside PooledByteBufAllocator is a FastThreadLocal (https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java#L507) while UnpooledByteBufAllocator does not utilize FastThreadLocal in any way.

@ejona86
Copy link
Member

ejona86 commented Jul 12, 2023

I was not talking about grpc-nio-worker-ELG-. Those threads I assumed were already getting shut down. if not, then it means you'd be leaking a Channel or Server.

The problem I was describing occurs when you call gRPC from a long-lived thread. gRPC will encode a message using the allocator, and Netty then may use a ThreadLocal to store a cache. Independent of what information is trying to be stored, this will leak, as the value will be InternalThreadLocalMap which has a reference to Netty's ClassLoader and keeps it alive for the remainder of the Thread's life. InternalThreadLocalMap isn't really special in the issue, as storing new Object() {} to a ThreadLocal value would cause the same issue. The only way to avoid the issue is to only store JDK-defined types in the ThreadLocal.

Any usage of Netty from a thread risks that FastThreadLocal is used. There's two options of how unpooled helps:

  1. FastThreadLocal is never used from another thread. This would allow the ClassLoader to garbage collect.
  2. The ClassLoader still leaks, but at least it no longer holds onto large pooled resources.

Note that destroy() does not do what it claims. It used to set slowThreadLocalMap = null, which given enough activity on a thread may eventually GC the class loader. The current call to remove() is mostly pointless as it only impacts the current thread.

larry-safran added a commit to larry-safran/grpc-java that referenced this issue Sep 7, 2023
DNVindhya pushed a commit to DNVindhya/grpc-java that referenced this issue Oct 5, 2023
@ejona86 ejona86 modified the milestones: Next, 1.59 Nov 13, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
4 participants