Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change jsObject.value to ImmutableLinkedHashMap to mitigate hash collisions #675

Merged
merged 1 commit into from
Jan 19, 2022

Conversation

htmldoug
Copy link
Contributor

@htmldoug htmldoug commented Nov 22, 2021

After #674, jsObject.value would never be used internally (which is great for heap usage and to avoid rehashing), but it's still vulnerable to #186. This PR switches to ImmutableLinkedHashMap, an immutable wrapper around java.util.LinkedHashMap.

Before:

[info] Benchmark                                          (n)   Mode  Cnt    Score    Error  Units
[info] JsonParsing_01_ParseManyFields.parseObjectValue  10000  thrpt    3    1.452 ±  0.144  ops/s

After:

[info] Benchmark                                          (n)   Mode  Cnt    Score   Error  Units
[info] JsonParsing_01_ParseManyFields.parseObjectValue  10000  thrpt    3  199.307 ± 6.272  ops/s```

@@ -136,7 +137,7 @@ case class JsObject(
*/
lazy val value: Map[String, JsValue] = underlying match {
case m: immutable.Map[String, JsValue] => m
case m => m.toMap
case m => (TreeMap.newBuilder[String, JsValue] ++= m).result()
Copy link
Contributor Author

@htmldoug htmldoug Nov 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, can we just return m directly?

Suggested change
case m => (TreeMap.newBuilder[String, JsValue] ++= m).result()
case m => m

The scaladoc says "The value of this JsObject as an immutable map", but the type signature is only scala.collection.Map. Anyone requiring a scala.collection.immutable.Map would need to call .toMap anyway or do an .asInstanceOf[s.c.i.Map].

If we just return m directly and change remove "immutable" from the scaladoc, it would do good things for memory footprint and throughput:

[info] Benchmark                                          (n)   Mode  Cnt    Score    Error  Units
[info] JsonParsing_01_ParseManyFields.parseObjectValue  10000  thrpt    3  167.452 ± 37.356  ops/s

One minor caveat: this would shift the responsibility for handling hashCode collisions onto users, but this method isn't suggested usage and we didn't handle them previously anyway.

@gmethvin, WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created a separate PR for that option: #676.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's worth the breaking change. Since the method isn't suggested usage I don't think users will benefit much from the change anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like .value is getting used in the wild.

I'll reopen this PR to keep both options on the table.

Copy link
Contributor Author

@htmldoug htmldoug Nov 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not worth the breaking change

I'm having difficultly thinking of a scenario where it would be breaking. Method signatures wouldn't change. Anyone doing .value.asInstanceOf[s.c.i.Map] should know that's unsound since JsObject is non-final, so it's already possible to subclass this and make value return other representations. Are there other scenarios to be concerned about?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess using the ImmutableJMapWrapper as the underlying map could be a good idea. It feels like an ugly hack but the whole LinkedHashMap thing feels that way too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All that said, the "correct" solution here is to fix the bug in Scala, so what we're doing here is hopefully just a "temporary" workaround.

Copy link
Contributor Author

@htmldoug htmldoug Nov 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the performance of LinkedHashMap compare to immutable.TreeMap in general?

FWIW, j.u.LinkedHashMap had 1.3x-1.4x of the memory footprint of s.c.i.TreeMap in my testing, but has faster lookup over the common case from the hash table.

As for .value, all of the s.c.immutable.Map implementations (except Map1-4) destroy order in one way or another on .toMap. Our ImmutableJMapWrapper would be an exception by preserving order--another benefit.

All that said, the "correct" solution here is to fix the bug in Scala

That's a good point, although I haven't heard any developments since scala/bug#11203 was marked won't fix. I know scala/scala#7633 is available, but that's mutable and doesn't preserve order.

It feels like an ugly hack

Well, the MapAndVector approach is cleaner, but would be breaking to apply here for anyone using the case class unapply to get at the underlying map--which I only just realized was possible.

It's worth mentioning that I've been experimenting with applying V8's hidden classes to JsObject to cut memory footprint. (I love that I can stuff any Map I want into it!) Gains are quite promising for large messages. I'm aware that's probably more crazy than you'd want in the base implementation, but regardless, here are the bones for potential inspiration. It's ordered, as well.

class ShapeMap[V](
  values: Array[V],
  sharedShape: Map[String, Int], // array indexes (j.u.HashMap)
  sharedShapes: ConcurrentMap[Seq[String], Map[String, Int]] // for updated()/removed()
) extends scala.collection.immutable.Map[String, V]

Anyway, I'm just here to upstream any useful low hanging fruit I'm finding while benching and poking around in the weeds. Do you want me to go ahead with a PR for ImmutableJMapWrapper?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ImmutableJMapWrapper seems like an ok solution.

Honestly it would be really nice if we could just break the ordering and use TreeMap as the underlying data structure, but I don't remember exactly what the concerns were there.

Copy link
Contributor Author

@htmldoug htmldoug Jan 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy new year!

I've rewritten the PR to use an immutable wrapper around java.util.LinkedHashMap. I coupled directly to LHM rather than all T <: j.u.Map[_, _] since it's easier to support + and ++. Generality seems unnecessary here.

I also relaxed JsObject.+, etc. to delegate to the underlying Map implementation and avoid forcing j.u.LHM. If users don't care about order and want log(n) CoW ops (vs n with LHM), then they have the option of using TreeMap, s.c.i.HashMap.

I've explored the Map space further and ended up with the following implementation in my own project (along with specialized Map1-4):

class SortedArraysMap[V](
  sortedKeys: Array[String], // sorted, shareable, binary search lookup
  vals: Array[V]             // parallel to sortedKeys
) extends immutable.AbstractMap[String, V]

Everything's mostly log(n) (like TreeMap), but the footprint is much smaller.

Map memory footprint by cardinality (1)

Please review when you get a chance. Criticism welcome.

@htmldoug htmldoug closed this Nov 22, 2021
@htmldoug htmldoug reopened this Nov 22, 2021
@htmldoug htmldoug marked this pull request as draft November 24, 2021 17:12
@htmldoug htmldoug force-pushed the issue-186-treemap branch 4 times, most recently from 87aed20 to fc84924 Compare January 9, 2022 07:34
@htmldoug htmldoug changed the title Change jsObject.value to TreeMap to mitigate hash collisions Change jsObject.value to ImmutableLinkedHashMap to mitigate hash collisions Jan 9, 2022
@htmldoug htmldoug force-pushed the issue-186-treemap branch 2 times, most recently from 218e7b4 to 04b4689 Compare January 9, 2022 07:44
@htmldoug htmldoug marked this pull request as ready for review January 9, 2022 07:54
@htmldoug htmldoug force-pushed the issue-186-treemap branch 2 times, most recently from 4f5d901 to 47221f9 Compare January 9, 2022 09:13
Copy link
Member

@gmethvin gmethvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I'm pretty happy with this. Maybe @cchantep should take a look as well?

@htmldoug
Copy link
Contributor Author

Need any changes from me for this to be mergeable?

build.sbt Outdated Show resolved Hide resolved
Copy link
Member

@ihostage ihostage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
@gmethvin @cchantep WDYT?

@cchantep cchantep merged commit 939d8bd into playframework:main Jan 19, 2022
@gmethvin
Copy link
Member

LGTM

@mkurz
Copy link
Member

mkurz commented May 4, 2023

Just to let you know, the added memory footprint spec failed when upgrading to Scala 3.2.1+, because with that scala version memory usage decreased. See #848 where I adjusted the expected values and for further explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants