-
-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change jsObject.value
to ImmutableLinkedHashMap
to mitigate hash collisions
#675
Conversation
@@ -136,7 +137,7 @@ case class JsObject( | |||
*/ | |||
lazy val value: Map[String, JsValue] = underlying match { | |||
case m: immutable.Map[String, JsValue] => m | |||
case m => m.toMap | |||
case m => (TreeMap.newBuilder[String, JsValue] ++= m).result() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, can we just return m
directly?
case m => (TreeMap.newBuilder[String, JsValue] ++= m).result() | |
case m => m |
The scaladoc says "The value of this JsObject as an immutable map", but the type signature is only scala.collection.Map
. Anyone requiring a scala.collection.immutable.Map
would need to call .toMap
anyway or do an .asInstanceOf[s.c.i.Map]
.
If we just return m
directly and change remove "immutable" from the scaladoc, it would do good things for memory footprint and throughput:
[info] Benchmark (n) Mode Cnt Score Error Units
[info] JsonParsing_01_ParseManyFields.parseObjectValue 10000 thrpt 3 167.452 ± 37.356 ops/s
One minor caveat: this would shift the responsibility for handling hashCode collisions onto users, but this method isn't suggested usage and we didn't handle them previously anyway.
@gmethvin, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created a separate PR for that option: #676.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's worth the breaking change. Since the method isn't suggested usage I don't think users will benefit much from the change anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like .value
is getting used in the wild.
I'll reopen this PR to keep both options on the table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not worth the breaking change
I'm having difficultly thinking of a scenario where it would be breaking. Method signatures wouldn't change. Anyone doing .value.asInstanceOf[s.c.i.Map]
should know that's unsound since JsObject
is non-final, so it's already possible to subclass this and make value
return other representations. Are there other scenarios to be concerned about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess using the ImmutableJMapWrapper
as the underlying
map could be a good idea. It feels like an ugly hack but the whole LinkedHashMap
thing feels that way too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All that said, the "correct" solution here is to fix the bug in Scala, so what we're doing here is hopefully just a "temporary" workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the performance of LinkedHashMap compare to immutable.TreeMap in general?
FWIW, j.u.LinkedHashMap
had 1.3x-1.4x of the memory footprint of s.c.i.TreeMap
in my testing, but has faster lookup over the common case from the hash table.
As for .value
, all of the s.c.immutable.Map
implementations (except Map1-4
) destroy order in one way or another on .toMap
. Our ImmutableJMapWrapper
would be an exception by preserving order--another benefit.
All that said, the "correct" solution here is to fix the bug in Scala
That's a good point, although I haven't heard any developments since scala/bug#11203 was marked won't fix
. I know scala/scala#7633 is available, but that's mutable and doesn't preserve order.
It feels like an ugly hack
Well, the MapAndVector approach is cleaner, but would be breaking to apply here for anyone using the case class unapply
to get at the underlying
map--which I only just realized was possible.
It's worth mentioning that I've been experimenting with applying V8's hidden classes to JsObject
to cut memory footprint. (I love that I can stuff any Map I want into it!) Gains are quite promising for large messages. I'm aware that's probably more crazy than you'd want in the base implementation, but regardless, here are the bones for potential inspiration. It's ordered, as well.
class ShapeMap[V](
values: Array[V],
sharedShape: Map[String, Int], // array indexes (j.u.HashMap)
sharedShapes: ConcurrentMap[Seq[String], Map[String, Int]] // for updated()/removed()
) extends scala.collection.immutable.Map[String, V]
Anyway, I'm just here to upstream any useful low hanging fruit I'm finding while benching and poking around in the weeds. Do you want me to go ahead with a PR for ImmutableJMapWrapper
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImmutableJMapWrapper
seems like an ok solution.
Honestly it would be really nice if we could just break the ordering and use TreeMap
as the underlying data structure, but I don't remember exactly what the concerns were there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy new year!
I've rewritten the PR to use an immutable wrapper around java.util.LinkedHashMap
. I coupled directly to LHM rather than all T <: j.u.Map[_, _]
since it's easier to support +
and ++
. Generality seems unnecessary here.
I also relaxed JsObject.+
, etc. to delegate to the underlying Map
implementation and avoid forcing j.u.LHM
. If users don't care about order and want log(n)
CoW ops (vs n
with LHM), then they have the option of using TreeMap
, s.c.i.HashMap
.
I've explored the Map
space further and ended up with the following implementation in my own project (along with specialized Map1-4):
class SortedArraysMap[V](
sortedKeys: Array[String], // sorted, shareable, binary search lookup
vals: Array[V] // parallel to sortedKeys
) extends immutable.AbstractMap[String, V]
Everything's mostly log(n)
(like TreeMap
), but the footprint is much smaller.
Please review when you get a chance. Criticism welcome.
87aed20
to
fc84924
Compare
jsObject.value
to TreeMap
to mitigate hash collisionsjsObject.value
to ImmutableLinkedHashMap
to mitigate hash collisions
218e7b4
to
04b4689
Compare
4f5d901
to
47221f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I'm pretty happy with this. Maybe @cchantep should take a look as well?
play-json/shared/src/main/scala-2.13+/play/api/libs/json/ImmutableLinkedHashMap.scala
Show resolved
Hide resolved
Need any changes from me for this to be mergeable? |
c40e73e
to
f4c0d86
Compare
f4c0d86
to
50ab7df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM |
After #674,
jsObject.value
would never be used internally (which is great for heap usage and to avoid rehashing), but it's still vulnerable to #186. This PR switches toImmutableLinkedHashMap
, an immutable wrapper aroundjava.util.LinkedHashMap
.Before:
After: