Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add java priority queue, set, deque, collection coders #5520

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

kellen
Copy link
Contributor

@kellen kellen commented Oct 26, 2024

When using e.g. algebird PriorityQueueMonoid, scio needs to serialize java PriorityQueues. If Kryo is used, especially for small queues, there can be significant overhead.

Instead, this PR provides a PriorityQueue coder backed by a scala Ordering (as indeed the monoid is).

The coder must be created explicitly by the user so that they are asserting that the comparator of the original PriorityQueues and the reconstituted ones are the same. e.g.

implicit val pqCoder = Coders.jPriorityQueueCoder[T](ord)

@RustedBones Is there some trick I can use to get an error message when Coder[java.util.PriorityQueue[T]] fails to be derived (that's not just the generic implicitNotFound on Coder)?

Copy link

codecov bot commented Oct 26, 2024

Codecov Report

Attention: Patch coverage is 65.00000% with 7 lines in your changes missing coverage. Please review.

Project coverage is 61.43%. Comparing base (a1fce09) to head (92fad69).

Files with missing lines Patch % Lines
...com/spotify/scio/coders/instances/JavaCoders.scala 80.00% 3 Missing ⚠️
...ala/com/spotify/scio/testing/CoderAssertions.scala 0.00% 3 Missing ⚠️
...om/spotify/scio/coders/instances/ScalaCoders.scala 50.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5520   +/-   ##
=======================================
  Coverage   61.42%   61.43%           
=======================================
  Files         312      312           
  Lines       11104    11117   +13     
  Branches      757      753    -4     
=======================================
+ Hits         6821     6830    +9     
- Misses       4283     4287    +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@RustedBones RustedBones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we're also missing coders for java.util.Collection[T], java.util.Set[T] and java.util.Deque[T] which are available by default in beam

Comment on lines 83 to 84
// neither arrays nor PriorityQueues are consistentWithEquals
Coder.xmap(ScalaCoders.arrayCoder[T])(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to create a coder instance, explicitly overriding consistentWithEquals instead of depending on a specific type for that reason

@RustedBones
Copy link
Contributor

I recall we've been comparing underlying coder and value ordering here, but we've not done that for scala SortedSet nor PriorityQueue.

The drawback of such check is that we must give an ordering implementation that gives a stable equal after serialization (mostly by using object ordering).

pq coderShould roundtrip() and
beOfType[Transform[_, _]] and
materializeToTransformOf[ArrayCoder[_]] and
beFullyCompliantNotConsistentWithEquals()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be non deterministic too ?

Comment on lines 161 to 162
// custom ordering must have stable equal after serialization
implicit val pqOrd: Ordering[String] = FlippedStringOrdering
Copy link
Contributor

@RustedBones RustedBones Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really happy about this

@RustedBones RustedBones changed the title Add java priority queue coder Add java priority queue, set, deque, collection coders Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants