Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve memory efficiency of Chunk.filter and replace internal use of mutable.Buffer with mutable.ArrayBuilder #2720

Merged
merged 5 commits into from
Nov 29, 2021

Conversation

mpilquist
Copy link
Member

This PR:

  • replaces uses of mutable.Buffer with mutable.ArrayBuilder
  • allocates primitive arrays when possible (e.g. filter)
# Map

## new Array(size)
[info] ChunkBenchmark.map           16  thrpt    5  20039615.265 ± 492279.237  ops/s
[info] ChunkBenchmark.map          256  thrpt    5   1278011.968 ±  42407.622  ops/s
[info] ChunkBenchmark.map         4096  thrpt    5     46243.164 ±    295.401  ops/s

## ArrayBuilder
[info] ChunkBenchmark.map           16  thrpt    5  43344785.011 ± 665760.099  ops/s
[info] ChunkBenchmark.map          256  thrpt    5   3551895.914 ±  12759.777  ops/s
[info] ChunkBenchmark.map         4096  thrpt    5     69298.960 ±   4435.243  ops/s

# Filter

## ArrayBuilder
[info] ChunkBenchmark.filter           16  thrpt    5  30132296.897 ± 378366.725  ops/s
[info] ChunkBenchmark.filter          256  thrpt    5   2326540.853 ±   7729.015  ops/s
[info] ChunkBenchmark.filter         4096  thrpt    5     66920.922 ±   1146.317  ops/s

## mutable.Buffer
[info] ChunkBenchmark.filter           16  thrpt    5  15843176.346 ± 109023.863  ops/s
[info] ChunkBenchmark.filter          256  thrpt    5   1031608.241 ±   3938.483  ops/s
[info] ChunkBenchmark.filter         4096  thrpt    5     55229.486 ±   4883.909  ops/s

@mpilquist
Copy link
Member Author

ArrayBuilder being faster than direct array allocation + foreachWithIndex is the most surprising result so I'm doing some more testing.

Trying to benchmark those differences directly shows a more intuitive result -- direct looping is fastest, foreachWithIndex is really close, and ArrayBuilder is about half the speed.

[info] ArrayBenchmark.direct                     16  thrpt    5  90792672.001 ± 9532041.008  ops/s
[info] ArrayBenchmark.direct                    256  thrpt    5   8138874.809 ±  194207.489  ops/s
[info] ArrayBenchmark.direct                   4096  thrpt    5     58614.835 ±    3124.328  ops/s
[info] ArrayBenchmark.foreach                    16  thrpt    5  43390695.440 ± 2005110.173  ops/s
[info] ArrayBenchmark.foreach                   256  thrpt    5   3760899.229 ±   28792.831  ops/s
[info] ArrayBenchmark.foreach                  4096  thrpt    5     68599.480 ±    2563.186  ops/s
[info] ArrayBenchmark.foreachWithIndex           16  thrpt    5  86140106.223 ±  894084.743  ops/s
[info] ArrayBenchmark.foreachWithIndex          256  thrpt    5   7777074.807 ±   81923.236  ops/s
[info] ArrayBenchmark.foreachWithIndex         4096  thrpt    5     85051.823 ±    1964.377  ops/s
package fs2
package benchmark

import org.openjdk.jmh.annotations.{Benchmark, Param, Scope, Setup, State}

@State(Scope.Thread)
class ArrayBenchmark {
  @Param(Array("16", "256", "4096"))
  var chunkSize: Int = _

  var ints: Chunk[Int] = _

  @Setup
  def setup() =
    ints = Chunk.array((0 until chunkSize).map(_ + 1000).toArray)

  @Benchmark
  def foreachWithIndex(): Unit = {
    val arr = new Array[Int](ints.size)
    ints.foreachWithIndex((a, i) => arr(i) = a)
    ()
  }

  @Benchmark
  def foreach(): Unit = {
    val b = collection.mutable.ArrayBuilder.make[Int]
    b.sizeHint(ints.size)
    ints.foreach(a => b += a)
    b.result()
    ()
  }

  @Benchmark
  def direct(): Unit = {
    val arr = new Array[Int](ints.size)
    var i = 0
    while (i < arr.length) {
      arr(i) = ints(i)
      i += 1
    }
    ()
  }
}

@mpilquist
Copy link
Member Author

Hm...

[info] ArrayBenchmark.mapViaArrayBuilder               16  thrpt    5  14930780.873 ± 1348427.104  ops/s
[info] ArrayBenchmark.mapViaArrayBuilder              256  thrpt    5    784453.427 ±   21618.127  ops/s
[info] ArrayBenchmark.mapViaArrayBuilder             4096  thrpt    5     43837.114 ±    1083.966  ops/s
[info] ArrayBenchmark.mapViaForeachWithIndex           16  thrpt    5  20277289.880 ±  309790.796  ops/s
[info] ArrayBenchmark.mapViaForeachWithIndex          256  thrpt    5   1259734.591 ±   88802.838  ops/s
[info] ArrayBenchmark.mapViaForeachWithIndex         4096  thrpt    5     38934.865 ±    1802.972  ops/s
  @Benchmark
  def mapViaArrayBuilder(): Unit = {
    ints.map(_ + 1)
    ()
  }

  @Benchmark
  def mapViaForeachWithIndex(): Unit = {
    ints.mapViaForeachWithIndex(_ + 1)
    ()
  }
  def map[O2](f: O => O2): Chunk[O2] = {
    val b = makeArrayBuilder[Any]
    b.sizeHint(size)
    foreach(e => b += f(e))
    Chunk.array(b.result()).asInstanceOf[Chunk[O2]]
  }

  def mapViaForeachWithIndex[O2](f: O => O2): Chunk[O2] = {
    val arr = new Array[Any](size)
    foreachWithIndex((e, i) => arr(i) = f(e))
    Chunk.array(arr).asInstanceOf[Chunk[O2]]
  }

@mpilquist
Copy link
Member Author

Another test suggesting my runs last night are not repeatable:

[info] ChunkBenchmark.map                              16  thrpt    5  15267266.969 ± 953829.844  ops/s
[info] ChunkBenchmark.map                             256  thrpt    5    957397.541 ±  17705.037  ops/s
[info] ChunkBenchmark.map                            4096  thrpt    5     42622.356 ±     89.863  ops/s
[info] ChunkBenchmark.mapViaForeachWithIndex           16  thrpt    5  16561890.228 ±  73205.085  ops/s
[info] ChunkBenchmark.mapViaForeachWithIndex          256  thrpt    5   1273457.072 ±  36751.641  ops/s
[info] ChunkBenchmark.mapViaForeachWithIndex         4096  thrpt    5     46025.542 ±   1069.425  ops/s

@mpilquist mpilquist changed the title Improve performance of Chunk.map and similar and memory efficiency of filter Improve memory efficiency of Chunk.filter and replace internal use of mutable.Buffer with mutable.ArrayBuilder Nov 24, 2021
@mpilquist mpilquist merged commit d508c4c into typelevel:main Nov 29, 2021
@mpilquist mpilquist deleted the topic/chunk-perf branch February 18, 2024 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants