[optimization] Use bytes slice pool for allocations #278

Wondertan · 2021-04-20T10:29:52Z

Context

We allocate plain byte slices extensively throughout the project repos and after some work with them discard allocated slices causing GC to clean them up. In some hot paths, like whole block processing flow, this surely causes additional pressure on both allocator and GC. Fortunately, there is a relatively simple trick to avoid that and reduce stress in general for all data allocations by reusing fixed-sized allocated buffers through sync.Pool as basic primitive, though we can rely on already existing libs with simple APIs not to reinvent the wheel.

Expected Gains

Stop-the-world takes much less CPU time.
Much less RAM usage for the long run execution, but in some cases can be observed in some e2e testing as well.
Less pressure on allocators, as mentioned already.

I would like to provide real numbers, but this is very application-specific, so for us, they could be very different. For my previous project, htop showed almost 2x RAM usage reduction after applying this to multiple places.

Options:

https://github.com/libp2p/go-buffer-pool

Has fewer stars and less used within Go comm, but has a smarter API. Actually, we already using it as we rely on IPFS/libp2p, thus using it directly would allow us to share allocated bytes within IPFS and LL code, what is generally good. I would just stick with this one.

https://github.com/valyala/bytebufferpool

From the gopher who created fasthttp. BTW, he is from Kyiv and I know him personally. He is obsessed with optimizations.

Places to be altered

Rsmt2d
NMT
Un/marshalling of core data types
Reactor's msg un/marshalling

pls add more

The text was updated successfully, but these errors were encountered:

Wondertan · 2021-04-20T10:30:03Z

Theissue is related to all sublibs we have in LL's org, but AFAIK we don't have an established process on how to create such root issues, so I decided to keep it in the core.

Wondertan · 2021-04-20T10:31:12Z

@liamsi, pls attach it to the proper Project. Don't know what to choose

liamsi · 2021-04-20T11:00:05Z

As all this sounds good but it's mostly about optimizations, I've attached this to Testnet. We could even do some of this after mainnet though.

AFAIK we don't have an established process on how to create such root issues, so I decided to keep it in the core.

For similar cases, that span multiple repositories, I've used cards in the projects (e.g. see the first entry in https://github.com/orgs/lazyledger/projects/4 or in https://github.com/orgs/lazyledger/projects/6).

In some hot paths, like whole block processing flow, this surely causes additional pressure on both allocator and GC.

IMO, the first step should be to actually measure this via a profiler and benchmarks to see if that actually has any impact. Especially compared to overall systems latency. I'd expect the main bottleneck will be IO / network comms (it always is). And the gains through low-level optimizations like the mentioned ones are often rather minuscule in the grand scheme of things. We also need to measure network latencies via our ipld experiments (or via that testframework protocol labs developed). I think measuring latencies for retrieving block data as well as sampling da proofs has much higher priority (e.g. see: https://github.com/lazyledger/ipld-plugin-experiments/issues).

liamsi · 2021-04-20T11:07:11Z

rsmt2d is a bit different with this regard, as we are currently refactoring it to streamline the APIs and make it production-ready. So if you have good ideas there, you can sync with @adlerjohn who has recently added benchmarks there.

Wondertan · 2021-04-20T13:04:58Z

For the new rsmt2d API we can add something like Close/Destruct/Free and etc semantics for a user to free(get back to the pool) resources once processing ended.

Wondertan · 2021-04-20T13:57:48Z

As all this sounds good but it's mostly about optimizations

Totally agree, but It would hurt to file an issue for the future.

IMO, the first step should be to actually measure this via a profiler and benchmarks to see if that actually has any impact.

But measuring differences would require us to implement that first, so it hard to understand the exact numbers of gains without doing actual work here.

I'd expect the main bottleneck will be IO / network comms (it always is)actually to implement that first

This is surely not about bottlenecks but better resource management within the application with predictable gains. Imagine how many times we allocate fixed-sized slices for samples every block when we can reuse them by allocating them once, hiding from GC in sync.Pool after some processing, and then accessing again when a new block arrives. And there are other cases like this. Unfortunately, we cannot fully manage memory in Go ourselves(btw, I would love to see toggleable GC in Go). Still, we are smart enough to overcome some of the disadvantages of GC.

liamsi · 2021-04-20T14:28:57Z

But measuring differences would require us to implement that first, so it hard to understand the exact numbers of gains without doing actual work here.

Yeah, you are right about measuring the difference of course. I'm just saying that we should first profile how much time and how many allocs happen because of GC (of slices) first.

It's certainly good that you opened an issue about it!

tzdybal · 2021-04-22T07:49:25Z

"Premature optimization is the root of all evil". Definitely good place to find some CPU cycles and memory, but I totally support @liamsi on this - we probably can do it even after mainnet.

liamsi · 2021-12-08T18:12:51Z

~~Is this still relevant? I think @Wondertan went and did this in node anyways?~~
Moved over to node.

Bidon15 · 2022-07-12T14:25:00Z

Grooming 12/07/2022:

This could be a part of a later epic describing the RAM optimization roadmap

liamsi transferred this issue from celestiaorg/celestia-core Dec 8, 2021

renaynay changed the title ~~Use bytes slice pool for allocations~~ [optimization] Use bytes slice pool for allocations Apr 26, 2022

renaynay added the enhancement New feature or request label Apr 26, 2022

Bidon15 added the optimization label Jul 12, 2022

Wondertan mentioned this issue Sep 16, 2022

Revamp allocations on rsmt2d #1121

Closed

ramin added the kind:discussion label Dec 18, 2023

celestiaorg locked and limited conversation to collaborators Jan 22, 2024

ramin converted this issue into discussion #3127 Jan 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

[optimization] Use bytes slice pool for allocations #278

[optimization] Use bytes slice pool for allocations #278

Wondertan commented Apr 20, 2021

Wondertan commented Apr 20, 2021

Wondertan commented Apr 20, 2021

liamsi commented Apr 20, 2021

liamsi commented Apr 20, 2021 •

edited

Loading

Wondertan commented Apr 20, 2021

Wondertan commented Apr 20, 2021

liamsi commented Apr 20, 2021

tzdybal commented Apr 22, 2021

liamsi commented Dec 8, 2021 •

edited

Loading

Bidon15 commented Jul 12, 2022 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

[optimization] Use bytes slice pool for allocations #278

[optimization] Use bytes slice pool for allocations #278

Comments

Wondertan commented Apr 20, 2021

Context

Expected Gains

Options:

Places to be altered

Wondertan commented Apr 20, 2021

Wondertan commented Apr 20, 2021

liamsi commented Apr 20, 2021

liamsi commented Apr 20, 2021 • edited Loading

Wondertan commented Apr 20, 2021

Wondertan commented Apr 20, 2021

liamsi commented Apr 20, 2021

tzdybal commented Apr 22, 2021

liamsi commented Dec 8, 2021 • edited Loading

Bidon15 commented Jul 12, 2022 • edited Loading

This issue was moved to a discussion.

liamsi commented Apr 20, 2021 •

edited

Loading

liamsi commented Dec 8, 2021 •

edited

Loading

Bidon15 commented Jul 12, 2022 •

edited

Loading