Skip to content

Commit

Permalink
Merge pull request alteryx#43 from mateiz/kryo-fix
Browse files Browse the repository at this point in the history
Don't allocate Kryo buffers unless needed

I noticed that the Kryo serializer could be slower than the Java one by 2-3x on small shuffles because it spend a lot of time initializing Kryo Input and Output objects. This is because our default buffer size for them is very large. Since the serializer is often used on streams, I made the initialization lazy for that, and used a smaller buffer (auto-managed by Kryo) for input.
  • Loading branch information
rxin committed Oct 9, 2013
2 parents ea34c52 + a8725bf commit e67d5b9
Showing 1 changed file with 4 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,6 @@ class KryoSerializer extends org.apache.spark.serializer.Serializer with Logging

def newKryoOutput() = new KryoOutput(bufferSize)

def newKryoInput() = new KryoInput(bufferSize)

def newKryo(): Kryo = {
val instantiator = new ScalaKryoInstantiator
val kryo = instantiator.newKryo()
Expand Down Expand Up @@ -118,8 +116,10 @@ class KryoDeserializationStream(kryo: Kryo, inStream: InputStream) extends Deser

private[spark] class KryoSerializerInstance(ks: KryoSerializer) extends SerializerInstance {
val kryo = ks.newKryo()
val output = ks.newKryoOutput()
val input = ks.newKryoInput()

// Make these lazy vals to avoid creating a buffer unless we use them
lazy val output = ks.newKryoOutput()
lazy val input = new KryoInput()

def serialize[T](t: T): ByteBuffer = {
output.clear()
Expand Down

0 comments on commit e67d5b9

Please sign in to comment.