-
-
Notifications
You must be signed in to change notification settings - Fork 795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add full set of RecyclerPool
implementations
#1064
Add full set of RecyclerPool
implementations
#1064
Conversation
With this commit I tested different object pools implementations in order to benchmark them and pick the one that fit in the best possible way the jackson use case. I also temporarily add a |
I made an extensive performance comparison of all the different pool implementations that I introduced with this commit in order to choose the best fit for jackson needs. For this comparison I used the following benchmark that serialize in json a very simple pojo made of only 3 fields (a Person with firstName, lastName and age), performing this operation in parallel on 10, 100 or 1000 (native or virtual) threads.
I used this smaller In the remaining part of this analysis I will consider only 5 different possibilities:
plus the 2 best performing implementations not requiring the introduction of any external dependency
The charts below summarize the performances of these implementations for both virtual and native threads. The Y-axis reports the number of operations done per milliseconds, taking count that the number of operations in a benchmark loop is equal to the number of parallel tasks used when running that benchmark. In other words these results are normalized with the number of parallel tasks. As expected when running with virtual threads the On the other side, when running with the traditional native threads, the Given all these considerations, and keeping in mind that we are introducing this new pool mostly to deal with virtual threads, I decided to keep the lock free implementation. As suggested by @pjfanning, at least for jackson 2.x, the actual use of this pool will be a feature opt-in that in this way could be used by projects and frameworks already leveraging virtual threads, while the existing |
|
||
@Override | ||
public T borrow() { | ||
int pos = counter.getAndUpdate(i -> i > 0 ? i-1 : i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same performance problem of the Lock free variant pool: counter
is highly contended (they are both stacks, actually - linked or array-based) hence would limit scalability.
I would instead use a striped variant of this
eg
add a parameter "concurrency" which represent the number of "slots" of your pool.
each slot contains a fixed size partition of the total size (ie total size/concurrency).
If you make both total size and concurrency power of 2 would be better (more info later why).
You can then use the thread id to decide which slot to hit first (using power of 2 size allow to perform modulus with threadId & (size -1)
) and you have 2 strategies here:
- performing cas on the counter owning that slot's index, moving to the next counter if failing, till trying all
concurrency
counters (that means that we need to keep track "where" we have borrowed it, or we risk to unbalance the pool on release - searching for some available slot) - just keep on trying (via a cas loop) on the same slot's index: thanks to the thread id distributions (with v threads in particular) you'll probably be lucky to spread the contention the same
To better deal with the contention, too, the counters per partition could be all stored into an AtomicLongArray at 8/16 long distance to each others, to keep each counter separated by 1 or 2 cache lines to avoid false-sharing.
Another interesting way to deal with this, could be by make the algorithm a bit unbalanced
ie in order to make progress we need to first to borrow right?
Then we could just perform getAndIncrement
there....
- if we overflow (more then slot capacity) we just switch to a mix of get and compareAndSet (because no available stuff is in) or just allocate
- the release side instead can check what's the status of the counter and use get + compareAndSet: if we overflow it means we have exhausted the capacity, and we just need to decrease it by 1 from the max capacity
Last one is really optional anyway, and still be proved to work good under contention (give that getAndIncrement is proved to be decent under contention only on x86 and few modern ARM archs)
|
||
@Override | ||
public T borrow() { | ||
T t = queue.poll(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use https://github.com/JCTools/JCTools/blob/master/jctools-core/src/main/java/org/jctools/queues/MpscArrayQueue.java#L330 and https://github.com/JCTools/JCTools/blob/master/jctools-core/src/main/java/org/jctools/queues/MpscArrayQueue.java#L460 to deal with contention in a similar way (and still symmetric) of the lock free pool
For the feature, could I suggest adding something alongside @mariofusco could you load test disabling the USE_THREAD_LOCAL_FOR_BUFFER_RECYCLING feature when you are testing the BufferRecyclers case? It would be useful to know the performance of BufferRecyclers with this feature enabled (the default) and with it disabled. |
@cowtowncoder @pjfanning @franz1981 I now consider this pull request completed and ready to be reviewed and merged. I made the new |
BufferRecycler withPool(ObjectPool<BufferRecycler> pool) { | ||
this._pool = pool; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to throw an exception in case is set twice? or will replace anything within?
or adding an assert to make evident the impl invariant here?
src/main/java/com/fasterxml/jackson/core/util/BufferRecycler.java
Outdated
Show resolved
Hide resolved
1daf69e
to
1823ada
Compare
@cowtowncoder did you give a look at this? any comment? If you think it's ok I'd appreciate if it could be merged in a reasonable time: I'm keeping having conflicts with other commits (I will fix the last ones asap) and it's becoming increasingly hard to resolve them. |
1823ada
to
3a0f9e5
Compare
@cowtowncoder sorry if I insist and ask again, but I'm literally resolving conflicts on this pull request on a daily basis and it's almost becoming a full time job. I'd appreciate it if you could at least provide some feedback. /cc @pjfanning @franz1981 |
@mariofusco Sorry, I haven't had any time to look into this. I am going on a vacation, so on a plus side there shouldn't be anything to merge for that time (2 weeks). Reading through this PR is high on my TODO list but it requires quite a bit of focus. I'll add one more (separate) note on a general approach I think makes sense, for 2.16 timeline -- apologies for not trying to reconcile it with your work so far. |
So, to me what makes sense is (and once again, apologies for not figuring out how close this PR is from these ideas) as follows:
Once this life-cycle works in 2.16; and with one backing implementation (existing Approach above is based on my strong preference for allowing per-factory pooling as one of the options -- but also allowing global: latter case is achieved by just using single |
@cowtowncoder I think the PR as is achieves more or less what you have highlighted. The default behaviour remains as the ThreadLocal based BufferRecycler. Users can opt in to use the per-JsonFactory pooled BufferRecycler instead. |
The main intent of this pull request is clearly define a lifecycle for I also enforced the whole test suite to follow this lifecycle. In particular I ran all tests using the Regarding your points:
At the moment the
This is already what happens, or more precisely this is what the
This is indeed the main point of this pull request: now
This has not been changed in any way and it's also out of scope for this pull request.
The existing implementation based on |
{ | ||
_streamReadConstraints = src; | ||
_streamWriteConstraints = swc; | ||
_streamReadConstraints = (src == null) ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit, probably related to changes in 2.16 in the meantime -- no null checks should be made, caller should pass default values, never null (probably just need to merge/rebase from 2.16)
Would this work?
|
I started thinking along similar lines, although working from I think I can on PR for doing integration pieces separate from this PR (to be merged here when ready). Aside from mechanics of gettting to |
@mariofusco @pjfanning Ok, so, created PR #1083 for suggested integration. I can merge that in 2.16 and master, but wanted to get your feedback first. |
54e9e39
to
b6d5afe
Compare
@mariofusco Ok, apologies for making tons of tweaking, but I think results are now something I could merge. Changes mostly concern following things:
On (1) there is one thing I didn't yet figure out, and that is keeping identity of global LockFree / DeQue -based pools. Remaining questions or (minor) open issues from my end are:
Neither of these is really blocker; but if we can resolve them, great. Either way I hope to actually finally get this merged tomorrow if all goes well. |
import java.io.IOException; | ||
import java.io.OutputStream; | ||
|
||
public class BufferRecyclerPoolTest extends BaseTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to actually test recycling aspects: no need to use multi-threading, just create sequence of operations, starting with a single JsonFactory
and single parser/generator, to allocate buffers, check that underlying BufferRecycler
is properly acquired/released.
If an accessor needs to be added in parser/generator implementation for testing, that's fine.
This is a nice to have tho and just for basic sanity checking. No need to create complicated schemes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 5394156
* to compensate, need to re-create proper instance using constructor. | ||
*/ | ||
protected Object readResolve() { | ||
return new ConcurrentDequePool(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where we'd need to choose between returning SHARED
vs creating new one, if we serialized some sort of marker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The readResolve()
method is typically used to implement the Singleton pattern, where the same object needs to be returned after deserialization. I'd rather return the SHARED
instance here, otherwise you're silently duplicating the instances available after deserialization, or in other words you will have JsonFactory
created before and after deserialization that should have the same SHARED
instance but in reality have different pool instances.
More in general I don't understand the need of a "non-shared" pool instance (at least for these "default" pool implementations provided by us). Can you please clarify in which situation an user may want to have such "non-shared" instance? If there isn't any evident need, I'd rather keep the pools constructors private and only expose the shared instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I believe pretty much everything should be scoped by ObjectMapper
/ JsonFactory
, and about nothing shared across. So if and when Mapper/Factory gets dropped, garbage-collected, Recycler Pool would similarly be disposed of. That's the fundamental sentiment. This is why Jackson has very few static stateful global singletons (many stateless of course).
This in turn is because Jackson itself is used as a private dependency by many other libraries and frameworks, where isolation is typically beneficial.
So using globally shared stateful recycler pools is against this general idea.
Two specific concerns I have:
- Does global sharing allow over-retention of
RecyclerBuffer
s (and thereby underlying byte[]/char[] buffers)? Thinking it through, it probably doesn't (since recyclers created for then-drop Factories will be happily recycled by remaining ones) -- but here I would want bounded max size to avoid retaining peak number of recycles - Global sharing increases lock-contention compared to per-Factory recyclers; so for busiest cases isolation would help.
I guess (1) really isn't much of a concern specifically wrt globally shared instances, esp. in absence of maximum size limits.
But (2) is something that could become an issue for some users.
So... not quite sure how to go about that. I still feel there is need to allow both use of convenient globally shared pools -- with convenient access.
But also allow instances to be constructed by users, for isolation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On readResolve()
: it (or writeReplace()
) can also be used to prevent serializing part of state, such as actual cached contents. Since JDK serialization does not call any constructors on deserialization (nor, I think, initialization statements? Or does it?), it is necessary to either;
- Clear out cache contents Map before write (writeReplace), to read back empty Map (etc)
- Mark Map field
transient
; usereadResolve()
to create properly initialized empty instance (with empty Map)
I did both things with my 2 latest commits. My last outstanding concern is on the non-shared pools versions. At the moment I see them useful only to test the pools themselves, but nothing else, and they are also problematic with serialization. Also, even if we want to keep them, I also have some doubts about the naming: |
I feel strongly in this case that plain "instance" is not adequate -- I was thinking of I agree that convenient access to the globally shared instances, along with allowing construction of non-shared, makes sense. For the last part, then, the only (?) question is that of JDK serialization properly linking back shared/global to that, but creating new instance for non-shared. |
src/main/java/com/fasterxml/jackson/core/json/JsonGeneratorImpl.java
Outdated
Show resolved
Hide resolved
src/main/java/com/fasterxml/jackson/core/util/BufferRecyclerPool.java
Outdated
Show resolved
Hide resolved
Added couple of notes, but I think my thinking now is that:
With that I think we would be done here. EDIT: I am working on this (JDK serialization, tests) -- and then should be able to FINALLY merge this thing. |
Ok: to expedite things I will go ahead and merge -- this does not mean that aspects, naming etc could not be changed based on discussions; I absolutely expect some minor tweaking. |
BufferRecyclerPool
implementations
BufferRecyclerPool
implementationsRecyclerPool
implementations
Ok: I am looking for comments to #1117 -- changing the default pool used by Jackson 2.17 and later. My strawman argument is that we should use:
pool as the default. This is mostly to get discussion going; I don't have strong objection over alternatives. |
This is an adaptation for branch 2.16 of this pull request for master.
As requested I didn't remove any public method, but added a few new ones, deprecating the old methods that they are intended to replace. I also improved many tests trying to make sure that all Closeable resources are properly closed after their usage. This is necessary to guarantee the correct reuse of the
BufferRecycler
taken from the pool and in general to adhere with the lifecycle of those resources.The last outstanding part is having an efficient implementation of the object pool itself. If you agree on the general idea of this pull request I will work on it. /cc @cowtowncoder @pjfanning @franz1981
Note that at the moment this pull request is the indirect cause of a couple of test failures that are actually caused by the bug I reported here.