Contention in `TypeNameIdResolver.idFromClass()` #2556

CodingFabian · 2019-11-29T14:32:58Z

We are using Jackson 2.9.10 and we have many threads serializing Java objects onto kafka.

Under high load (cough, black friday), we see several threads blocking on the type map that is contained in TypeNameIdResolver

downstream-7 id=121 state=BLOCKED
    - waiting to lock <0x1a5caa08> (a java.util.HashMap)
     owned by downstream-4 id=118
    at com.fasterxml.jackson.databind.jsontype.impl.TypeNameIdResolver.idFromClass(TypeNameIdResolver.java:95)
    at com.fasterxml.jackson.databind.jsontype.impl.TypeNameIdResolver.idFromValue(TypeNameIdResolver.java:83)
    at com.fasterxml.jackson.databind.jsontype.impl.TypeSerializerBase.idFromValue(TypeSerializerBase.java:82)
    at com.fasterxml.jackson.databind.jsontype.impl.TypeSerializerBase._generateTypeId(TypeSerializerBase.java:67)
    at com.fasterxml.jackson.databind.jsontype.impl.TypeSerializerBase.writeTypePrefix(TypeSerializerBase.java:45)
    at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeWithType(BeanSerializerBase.java:600)
    at com.fasterxml.jackson.databind.ser.std.CollectionSerializer.serializeContents(CollectionSerializer.java:147)
    at com.fasterxml.jackson.databind.ser.std.CollectionSerializer.serialize(CollectionSerializer.java:107)
    at com.fasterxml.jackson.databind.ser.std.CollectionSerializer.serialize(CollectionSerializer.java:25)
    at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:727)
    at com.fasterxml.jackson.module.afterburner.ser.ObjectMethodPropertyWriter.serializeAsField(ObjectMethodPropertyWriter.java:40)
    at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:719)
    at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:155)
    at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._serialize(DefaultSerializerProvider.java:480)
    at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:400)
    at com.fasterxml.jackson.databind.ObjectWriter$Prefetch.serialize(ObjectWriter.java:1392)
    at com.fasterxml.jackson.databind.ObjectWriter._configAndWriteValue(ObjectWriter.java:1120)
    at com.fasterxml.jackson.databind.ObjectWriter.writeValueAsBytes(ObjectWriter.java:1017)

I see that this is a synchronized block here:
https://github.com/FasterXML/jackson-databind/blob/master/src/main/java/com/fasterxml/jackson/databind/jsontype/impl/TypeNameIdResolver.java#L99

Could that be changed? Should we somehow use several TypeNameIdResolvers (how would one do that?

The text was updated successfully, but these errors were encountered:

CodingFabian · 2019-11-29T15:08:47Z

One solution we are thinking of is to copy() the ObjectMapper instance before obtaining a writer (which we then put in a ThreadLocal)

cowtowncoder · 2019-11-30T04:37:29Z

Results should quickly get cached, so even with synchronization it would seem odd that it would become bottle-neck, so my first suspicion is usually regarding whether something else is not getting reused.
In this case, scope of TypeNameIdResolver is per-property (or possibly, per-polymorphic class).

I would not recommend copying ObjectMapper as that is as expensive operation is fully creating a new mapper instance, and having separate copy per-thread is rather heavy-weight (due to caching of all (de)serializers).

However: I can change synchronization itself to reduce scope so that (usually successful) lookup, and other processing (introspection of cached id not found, addition to lookup map) are not within same sync block, and use separate sync for put'ing of new value.

My main concern is just that I do not have any way to really test this, so if there was a way to test this somehow (like running under profiler) to verify improvement, that'd be great.

But I can make a change to get released in 2.10.2 I think; I have made similar changes to synchronized scope in the past, with good results.

CodingFabian · 2019-11-30T06:57:46Z

The ThreadLocal copies gave us 10% throughput improvements. We have plenty of memory, so that was a valuable tradeoff.

Not sure if we can make a good benchmark, but now that we know where the Threads block, we should be able to compare versions with our loadtests.

CodingFabian · 2019-11-30T08:14:02Z

turning _idToType and _typeToId into ConcurrentHashMap with computeIfAbsent should probably fix this.

cowtowncoder · 2019-12-01T02:27:27Z

Jackson-databind can not use Java 8 methods until 3.0, so unfortunately it is not available.
But I think change I am checking in will reduce or remove the issue.

cowtowncoder · 2019-12-01T07:01:59Z

With fix, it should be possible to build (or use from Sonatype OSS) 2.10.2-SNAPSHOT build to see if change resolved thread block.

I am still somewhat surprised as to how it could have resulted in performance problems: after resolving type ids once, they should stick, unless ObjectMapper instances were discarded, or deserializers not cached. Or something...

CodingFabian · 2019-12-01T07:06:50Z

We will check it. But you are right. That’s why I proposed the ConcurrentHashMap. I think in our case the successful reads just block (so the synchronized block that is still in). We have around 50k ObjectMapper.writeValue per second.

cowtowncoder · 2019-12-01T07:14:10Z

That rate does explain why it could matter (impressive that polymorphic handling can scale as that seems like it could add quite a bit of overhead).
But the part I do not get is not that resolution gets called often enough to matter, but rather that "miss" case -- id not yet resolved -- would occur often enough to matter. It should only be hit once per type id, fill the Map, and then never be touched again.

Alternatively I could be mistaken and it actually is about contested get(): if so, then ConcurrentHashMap would be the logical next step.
Even if computeIfAbsent() can't yet be used (with 2.x, anyway -- and with 3.0 it might make sense to avoid to minimize block length), it would be more efficient with get/put pair too, I assume.

So I guess results from benchmark would be interesting. :)

CodingFabian · 2019-12-01T09:12:26Z

Terve @cowtowncoder,

I am currently travelling, so please take my benchmarks with a grain of salt. I will re-run them when I have a stable power connection.

Our code uses afterburner, so I included that in my benchmarks.
As you will be able to see in the results, your change does very little. That is because the Map does contain the information and no lookup is necessary. it is really the contention on "get".
Also I need to correct myself. the lookups are probably around 500k/s in our case, as our pojo has about 10 entries in the list.

I did exchange the map for a ConcurrentHashMap and got a reproducible 10% speedup (that is about the same as we get with our ThreadLocal ObjectWriter)

Find my benchmark in this gist:
https://gist.github.com/CodingFabian/4d3612a256628fd25521f4946389f133

I will make a PR to illustrate my CHM change (although its really straightforward, just wrap it in the constructor)

CodingFabian · 2019-12-01T09:13:21Z

For 2.10.1


Result "com.fasterxml.jackson.benchmark.ContentionBenchmark.benchContention":
  9417713.874 ±(99.9%) 1518796.826 ops/s [Average]
  (min, avg, max) = (8736898.377, 9417713.874, 9710435.409), stdev = 394426.713
  CI (99.9%): [7898917.047, 10936510.700] (assumes normal distribution)

Full Output:

# JMH version: 1.22
# VM version: JDK 1.8.0_192, OpenJDK 64-Bit Server VM, 25.192-b01
# VM invoker: /Library/Java/JavaVirtualMachines/zulu-8.jdk/Contents/Home/jre/bin/java
# VM options: <none>
# Warmup: 1 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: com.fasterxml.jackson.benchmark.ContentionBenchmark.benchContention

# Run progress: 0.00% complete, ETA 00:01:00
# Fork: 1 of 1
# Warmup Iteration   1: 8151895.492 ops/s
Iteration   1: 9710435.409 ops/s
Iteration   2: 9569245.825 ops/s
Iteration   3: 8736898.377 ops/s
Iteration   4: 9430753.815 ops/s
Iteration   5: 9641235.944 ops/s


Result "com.fasterxml.jackson.benchmark.ContentionBenchmark.benchContention":
  9417713.874 ±(99.9%) 1518796.826 ops/s [Average]
  (min, avg, max) = (8736898.377, 9417713.874, 9710435.409), stdev = 394426.713
  CI (99.9%): [7898917.047, 10936510.700] (assumes normal distribution)


# Run complete. Total time: 00:01:00

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                             Mode  Cnt        Score         Error  Units
ContentionBenchmark.benchContention  thrpt    5  9417713.874 ± 1518796.826  ops/s

CodingFabian · 2019-12-01T09:14:21Z

For your 2.10.2-SNAPSHOT change:

Result "com.fasterxml.jackson.benchmark.ContentionBenchmark.benchContention":
  9537120.750 ±(99.9%) 200726.195 ops/s [Average]
  (min, avg, max) = (9468759.138, 9537120.750, 9593482.627), stdev = 52127.956
  CI (99.9%): [9336394.554, 9737846.945] (assumes normal distribution)

Full Output:

# JMH version: 1.22
# VM version: JDK 1.8.0_192, OpenJDK 64-Bit Server VM, 25.192-b01
# VM invoker: /Library/Java/JavaVirtualMachines/zulu-8.jdk/Contents/Home/jre/bin/java
# VM options: <none>
# Warmup: 1 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: com.fasterxml.jackson.benchmark.ContentionBenchmark.benchContention

# Run progress: 0.00% complete, ETA 00:01:00
# Fork: 1 of 1
# Warmup Iteration   1: 7751979.386 ops/s
Iteration   1: 9468759.138 ops/s
Iteration   2: 9496704.427 ops/s
Iteration   3: 9565790.804 ops/s
Iteration   4: 9593482.627 ops/s
Iteration   5: 9560866.753 ops/s


Result "com.fasterxml.jackson.benchmark.ContentionBenchmark.benchContention":
  9537120.750 ±(99.9%) 200726.195 ops/s [Average]
  (min, avg, max) = (9468759.138, 9537120.750, 9593482.627), stdev = 52127.956
  CI (99.9%): [9336394.554, 9737846.945] (assumes normal distribution)


# Run complete. Total time: 00:01:00

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                             Mode  Cnt        Score        Error  Units
ContentionBenchmark.benchContention  thrpt    5  9537120.750 ± 200726.195  ops/s

CodingFabian · 2019-12-01T09:15:14Z

For your 2.10.2-SNAPSHOT change + my CHM change

Result "com.fasterxml.jackson.benchmark.ContentionBenchmark.benchContention":
  10298811.073 ±(99.9%) 602139.284 ops/s [Average]
  (min, avg, max) = (10138689.645, 10298811.073, 10554062.979), stdev = 156373.660
  CI (99.9%): [9696671.790, 10900950.357] (assumes normal distribution)

# JMH version: 1.22
# VM version: JDK 1.8.0_192, OpenJDK 64-Bit Server VM, 25.192-b01
# VM invoker: /Library/Java/JavaVirtualMachines/zulu-8.jdk/Contents/Home/jre/bin/java
# VM options: <none>
# Warmup: 1 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: com.fasterxml.jackson.benchmark.ContentionBenchmark.benchContention

# Run progress: 0.00% complete, ETA 00:01:00
# Fork: 1 of 1
# Warmup Iteration   1: 9178392.125 ops/s
Iteration   1: 10554062.979 ops/s
Iteration   2: 10290009.714 ops/s
Iteration   3: 10215555.017 ops/s
Iteration   4: 10138689.645 ops/s
Iteration   5: 10295738.012 ops/s


Result "com.fasterxml.jackson.benchmark.ContentionBenchmark.benchContention":
  10298811.073 ±(99.9%) 602139.284 ops/s [Average]
  (min, avg, max) = (10138689.645, 10298811.073, 10554062.979), stdev = 156373.660
  CI (99.9%): [9696671.790, 10900950.357] (assumes normal distribution)


# Run complete. Total time: 00:01:00

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                             Mode  Cnt         Score        Error  Units
ContentionBenchmark.benchContention  thrpt    5  10298811.073 ± 602139.284  ops/s

CodingFabian · 2019-12-01T09:16:32Z

So even the worst result of the ConcurrentHashMap run is about 5% faster than the best run with synchronized

CodingFabian · 2019-12-01T09:30:34Z

PR is here: #2557

Thanks for a) your hard work on this framework b) for being so super reactive and c) for considering each and every of the many issues/prs that is reported.

CodingFabian · 2019-12-01T11:29:21Z

Still looking for a better public API to bench this on, as the memory allocation for the result makes it a bit noisy

cowtowncoder · 2019-12-03T05:15:30Z

Fix merged, made a slight change to avoid copying (and had to dig to remember why deserialization case needed non-null map in one edge case, wrt #1198).

cowtowncoder added the 2.10 label Nov 30, 2019

cowtowncoder modified the milestones: 2.10.1, 2.10.2 Dec 1, 2019

cowtowncoder changed the title ~~Contention in TypeNameIdResolver~~ Contention in TypeNameIdResolver.idFromClass() Dec 1, 2019

cowtowncoder closed this as completed in c1250b7 Dec 1, 2019

CodingFabian added a commit to CodingFabian/jackson-databind that referenced this issue Dec 1, 2019

Reduce contention in TypeNameIdResolver. Fixes [FasterXML#2556]

8643600

cowtowncoder reopened this Dec 3, 2019

cowtowncoder pushed a commit that referenced this issue Dec 3, 2019

Reduce contention in TypeNameIdResolver. Fixes [#2556]

044caee

cowtowncoder closed this as completed Dec 3, 2019

cowtowncoder added a commit that referenced this issue Dec 3, 2019

Minor tweaks after #2556 (no functional changes)

3c7729f

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contention in `TypeNameIdResolver.idFromClass()` #2556

Contention in `TypeNameIdResolver.idFromClass()` #2556

CodingFabian commented Nov 29, 2019

CodingFabian commented Nov 29, 2019

cowtowncoder commented Nov 30, 2019

CodingFabian commented Nov 30, 2019

CodingFabian commented Nov 30, 2019

cowtowncoder commented Dec 1, 2019

cowtowncoder commented Dec 1, 2019

CodingFabian commented Dec 1, 2019

cowtowncoder commented Dec 1, 2019

CodingFabian commented Dec 1, 2019

CodingFabian commented Dec 1, 2019

CodingFabian commented Dec 1, 2019

CodingFabian commented Dec 1, 2019

CodingFabian commented Dec 1, 2019

CodingFabian commented Dec 1, 2019

CodingFabian commented Dec 1, 2019

cowtowncoder commented Dec 3, 2019

Contention in TypeNameIdResolver.idFromClass() #2556

Contention in TypeNameIdResolver.idFromClass() #2556

Comments

CodingFabian commented Nov 29, 2019

CodingFabian commented Nov 29, 2019

cowtowncoder commented Nov 30, 2019

CodingFabian commented Nov 30, 2019

CodingFabian commented Nov 30, 2019

cowtowncoder commented Dec 1, 2019

cowtowncoder commented Dec 1, 2019

CodingFabian commented Dec 1, 2019

cowtowncoder commented Dec 1, 2019

CodingFabian commented Dec 1, 2019

CodingFabian commented Dec 1, 2019

CodingFabian commented Dec 1, 2019

CodingFabian commented Dec 1, 2019

CodingFabian commented Dec 1, 2019

CodingFabian commented Dec 1, 2019

CodingFabian commented Dec 1, 2019

cowtowncoder commented Dec 3, 2019

Contention in `TypeNameIdResolver.idFromClass()` #2556

Contention in `TypeNameIdResolver.idFromClass()` #2556