Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some benchmark cleanup #3

Conversation

hassila
Copy link

@hassila hassila commented Nov 21, 2023

Added ARC metric, some overall cleanup to use the built-in support for inner loops.

Comment on lines 143 to 145

Benchmark("structs") { benchmark in
let structCount = 1_000_000

Benchmark("Structs", configuration: kiloConfiguration) { benchmark in
let structCount = 1_000
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are the structCount 1_000 now?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, should have commented that - sorry - it would crash due to what looks like OOM with 1M:

* thread #2, queue = 'com.apple.root.default-qos.cooperative', stop reason = Swift runtime failure: Not enough bits to represent the passed value
    frame #0: 0x000000010012a47c FlatbuffersBenchmarks`Int.convertToPowerofTwo.getter [inlined] Swift runtime failure: Not enough bits to represent the passed value at <compiler-generated>:0 [opt]
  * frame #1: 0x000000010012a47c FlatbuffersBenchmarks`Int.convertToPowerofTwo.getter [inlined] generic specialization <Swift.UInt32, Swift.Int> of Swift.UnsignedInteger< where τ_0_0: Swift.FixedWidthInteger>.init<τ_0_0 where τ_1_0: Swift.BinaryInteger>(τ_1_0) -> τ_0_0 at <compiler-generated>:0 [opt]
    frame #2: 0x000000010012a47c FlatbuffersBenchmarks`Int.convertToPowerofTwo.getter(self=4294967296) at Int+extension.swift:31:13 [opt]
    frame #3: 0x00000001001202c8 FlatbuffersBenchmarks`ByteBuffer.Storage.reallocate(size=80, writerSize=2147483576, alignment=8, self=0x00000001005c4b70) at ByteBuffer.swift:88:27 [opt]
    frame #4: 0x0000000100120c98 FlatbuffersBenchmarks`closure #1 (Swift.UnsafeRawBufferPointer) -> () in FlatBuffers.ByteBuffer.push<τ_0_0 where τ_0_0: FlatBuffers.NativeStruct>(elements: Swift.Array<τ_0_0>) -> () at ByteBuffer.swift:371:16
    frame #5: 0x0000000100129720 FlatbuffersBenchmarks`partial apply for closure #1 in ByteBuffer.push<A>(elements:) at <compiler-generated>:0 [opt]
    frame #6: 0x0000000190348b6c libswiftCore.dylib`Swift.Array.withUnsafeBytes<τ_0_0>((Swift.UnsafeRawBufferPointer) throws -> τ_1_0) throws -> τ_1_0 + 352
    frame #7: 0x0000000100126efc FlatbuffersBenchmarks`FlatBufferBuilder.createVector<A>(ofStructs:) [inlined] FlatBuffers.ByteBuffer.push<τ_0_0 where τ_0_0: FlatBuffers.NativeStruct>(elements=<unavailable>, self=FlatBuffers.ByteBuffer @ 0x000000016fe863e0) -> () at ByteBuffer.swift:246:14 [opt]
    frame #8: 0x0000000100126ec0 FlatbuffersBenchmarks`FlatBufferBuilder.createVector<A>(structs=<unavailable>, self=FlatBuffers.FlatBufferBuilder @ 0x000000016fe863d8) at FlatBufferBuilder.swift:626:9 [opt]
    frame #9: 0x0000000100130d80 FlatbuffersBenchmarks`closure #12 in closure #1 in variable initialization expression of benchmarks(benchmark=<unavailable>, array=5 values) at FlatbuffersBenchmarks.swift:153:25 [opt]
    frame #10: 0x00000001000a51d0 FlatbuffersBenchmarks`BenchmarkExecutor.run(_:) at Benchmark.swift:344:13 [opt]
    frame #11: 0x00000001000a51b8 FlatbuffersBenchmarks`BenchmarkExecutor.run(benchmark=0x00000001006b0d80, self=0x0000000100605480) at BenchmarkExecutor.swift:53:23 [opt]
    frame #12: 0x00000001000c6988 FlatbuffersBenchmarks`BenchmarkRunner.run(self=Benchmark.BenchmarkRunner @ 0x0000000100b8c440) at BenchmarkRunner.swift:192:49 [opt]
    frame #13: 0x00000001000c2f84 FlatbuffersBenchmarks`static BenchmarkRunnerHooks.main(self=0x00000001001ad048) at BenchmarkRunner.swift:92 [opt]
    frame #14: 0x000000010015fd58 FlatbuffersBenchmarks`specialized thunk for @escaping @convention(thin) @async () -> () at <compiler-generated>:0 [opt]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(previously the inner loop was a single iteration, when we moved up to kiloConfiguration we run 1K inner loops, so it gets to 1M total - but I guess the question here is really "what do you want to measure"? - we are putting 1M vectors into the single fb here - what is the intended desired benchmark really?)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep the million but make the iterations less? Or we clear the buffer after each iteration?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearing buffer gave runtime of 220 seconds (+another 220 seconds for the single warmup iteration).

I will return it back to how it was with single iteration, it still gives 10+ samples and runtime ~223ms.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structs
╒════════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Metric                     │      p0 │     p25 │     p50 │     p75 │     p90 │     p99 │    p100 │ Samples │
╞════════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ Malloc (total)             │      21 │      21 │      21 │      21 │      21 │      21 │      21 │      13 │
├────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Memory (resident peak) (M) │     155 │     155 │     155 │     155 │     156 │     156 │     156 │      13 │
├────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Releases (K)               │    6000 │    6000 │    6000 │    6000 │    6000 │    6000 │    6000 │      13 │
├────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (total CPU) (ms)      │     223 │     223 │     224 │     225 │     226 │     228 │     228 │      13 │
├────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (wall clock) (ms)     │     224 │     225 │     225 │     226 │     226 │     229 │     229 │      13 │
╘════════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in the code above we are only measuring the amount it takes to add 5 structs into the fb right? or are we measuring the amount of time we add 5 structs a million time into a buffer? as in the total time?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is 5 structs a million times into a buffer, for _ in benchmark.scaledIterations { is 1M times

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay then, perfect!

@mustiikhalil mustiikhalil merged commit 1310f14 into mustiikhalil:update-struct-pushing-to-buffer Nov 21, 2023
1 check passed
@hassila hassila deleted the benchmark-fixes branch November 21, 2023 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants