-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-1321 Use Guava's top k implementation rather than our BoundedPriorityQueue based implementation #229
Conversation
Merged build triggered. |
Merged build started. |
Merged build finished. |
All automated tests passed. |
override def compare(l: T, r: T) = ord.compare(l, r) | ||
} | ||
collectionAsScalaIterable( | ||
ordering.leastOf(asJavaIterator(input), num)).iterator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this proper indendation? Should this be indented from the previous line?
Looks good to me. One small style comment. |
weird i missed that. fixed. |
val ordering = new GuavaOrdering[T] { | ||
override def compare(l: T, r: T) = ord.compare(l, r) | ||
} | ||
collectionAsScalaIterable(ordering.leastOf(asJavaIterator(input), num)).iterator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't this just be
ordering.leastOf(input, num).iterator
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it could - but trying to be more explicit here. i'm really not a fan of implicits, especially in critical paths ..
… based implementation. Also updated the documentation for top and takeOrdered. Guava's top k implementation (in Ordering) is much faster than the BoundedPriorityQueue implementation for roughly sorted input (10 - 20X faster), and still faster for purely random input (2 - 5X).
Merged build triggered. |
Merged build started. |
Merged build finished. |
All automated tests passed. |
Ok I've merged this. |
…iorityQueue based implementation Also updated the documentation for top and takeOrdered. On my simple test of sorting 100 million (Int, Int) tuples using Spark, Guava's top k implementation (in Ordering) is much faster than the BoundedPriorityQueue implementation for roughly sorted input (10 - 20X faster), and still faster for purely random input (2 - 5X). Author: Reynold Xin <rxin@apache.org> Closes apache#229 from rxin/takeOrdered and squashes the following commits: 0d11844 [Reynold Xin] Use Guava's top k implementation rather than our BoundedPriorityQueue based implementation. Also updated the documentation for top and takeOrdered.
* In hdfs soak test, updated hdfsclient image and specified hdfs service name. * Made the HDFS service name configurable. * Use the "app name" (begins with a slash) instead of service name. * Simplified the "hdfs-kerberos-delete-terasort-files" job * Updated service name and image version in 'hdfs-kerberos-delete-terasort-files' job * Make the executors run as root (for centos) * Templated the principal in the 'hdfs-kerberos-delete-terasort-files' job
Fix the Mitaka devstack installation
AL-1864 upgrade jackson-databind to 2.9.10.8
Also updated the documentation for top and takeOrdered.
On my simple test of sorting 100 million (Int, Int) tuples using Spark, Guava's top k implementation (in Ordering) is much faster than the BoundedPriorityQueue implementation for roughly sorted input (10 - 20X faster), and still faster for purely random input (2 - 5X).