-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some benchmark cleanup #3
Some benchmark cleanup #3
Conversation
|
||
Benchmark("structs") { benchmark in | ||
let structCount = 1_000_000 | ||
|
||
Benchmark("Structs", configuration: kiloConfiguration) { benchmark in | ||
let structCount = 1_000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are the structCount 1_000 now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, should have commented that - sorry - it would crash due to what looks like OOM with 1M:
* thread #2, queue = 'com.apple.root.default-qos.cooperative', stop reason = Swift runtime failure: Not enough bits to represent the passed value
frame #0: 0x000000010012a47c FlatbuffersBenchmarks`Int.convertToPowerofTwo.getter [inlined] Swift runtime failure: Not enough bits to represent the passed value at <compiler-generated>:0 [opt]
* frame #1: 0x000000010012a47c FlatbuffersBenchmarks`Int.convertToPowerofTwo.getter [inlined] generic specialization <Swift.UInt32, Swift.Int> of Swift.UnsignedInteger< where τ_0_0: Swift.FixedWidthInteger>.init<τ_0_0 where τ_1_0: Swift.BinaryInteger>(τ_1_0) -> τ_0_0 at <compiler-generated>:0 [opt]
frame #2: 0x000000010012a47c FlatbuffersBenchmarks`Int.convertToPowerofTwo.getter(self=4294967296) at Int+extension.swift:31:13 [opt]
frame #3: 0x00000001001202c8 FlatbuffersBenchmarks`ByteBuffer.Storage.reallocate(size=80, writerSize=2147483576, alignment=8, self=0x00000001005c4b70) at ByteBuffer.swift:88:27 [opt]
frame #4: 0x0000000100120c98 FlatbuffersBenchmarks`closure #1 (Swift.UnsafeRawBufferPointer) -> () in FlatBuffers.ByteBuffer.push<τ_0_0 where τ_0_0: FlatBuffers.NativeStruct>(elements: Swift.Array<τ_0_0>) -> () at ByteBuffer.swift:371:16
frame #5: 0x0000000100129720 FlatbuffersBenchmarks`partial apply for closure #1 in ByteBuffer.push<A>(elements:) at <compiler-generated>:0 [opt]
frame #6: 0x0000000190348b6c libswiftCore.dylib`Swift.Array.withUnsafeBytes<τ_0_0>((Swift.UnsafeRawBufferPointer) throws -> τ_1_0) throws -> τ_1_0 + 352
frame #7: 0x0000000100126efc FlatbuffersBenchmarks`FlatBufferBuilder.createVector<A>(ofStructs:) [inlined] FlatBuffers.ByteBuffer.push<τ_0_0 where τ_0_0: FlatBuffers.NativeStruct>(elements=<unavailable>, self=FlatBuffers.ByteBuffer @ 0x000000016fe863e0) -> () at ByteBuffer.swift:246:14 [opt]
frame #8: 0x0000000100126ec0 FlatbuffersBenchmarks`FlatBufferBuilder.createVector<A>(structs=<unavailable>, self=FlatBuffers.FlatBufferBuilder @ 0x000000016fe863d8) at FlatBufferBuilder.swift:626:9 [opt]
frame #9: 0x0000000100130d80 FlatbuffersBenchmarks`closure #12 in closure #1 in variable initialization expression of benchmarks(benchmark=<unavailable>, array=5 values) at FlatbuffersBenchmarks.swift:153:25 [opt]
frame #10: 0x00000001000a51d0 FlatbuffersBenchmarks`BenchmarkExecutor.run(_:) at Benchmark.swift:344:13 [opt]
frame #11: 0x00000001000a51b8 FlatbuffersBenchmarks`BenchmarkExecutor.run(benchmark=0x00000001006b0d80, self=0x0000000100605480) at BenchmarkExecutor.swift:53:23 [opt]
frame #12: 0x00000001000c6988 FlatbuffersBenchmarks`BenchmarkRunner.run(self=Benchmark.BenchmarkRunner @ 0x0000000100b8c440) at BenchmarkRunner.swift:192:49 [opt]
frame #13: 0x00000001000c2f84 FlatbuffersBenchmarks`static BenchmarkRunnerHooks.main(self=0x00000001001ad048) at BenchmarkRunner.swift:92 [opt]
frame #14: 0x000000010015fd58 FlatbuffersBenchmarks`specialized thunk for @escaping @convention(thin) @async () -> () at <compiler-generated>:0 [opt]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(previously the inner loop was a single iteration, when we moved up to kiloConfiguration
we run 1K inner loops, so it gets to 1M total - but I guess the question here is really "what do you want to measure"? - we are putting 1M vectors into the single fb
here - what is the intended desired benchmark really?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep the million but make the iterations less? Or we clear the buffer after each iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clearing buffer gave runtime of 220 seconds (+another 220 seconds for the single warmup iteration).
I will return it back to how it was with single iteration, it still gives 10+ samples and runtime ~223ms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Structs
╒════════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Metric │ p0 │ p25 │ p50 │ p75 │ p90 │ p99 │ p100 │ Samples │
╞════════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ Malloc (total) │ 21 │ 21 │ 21 │ 21 │ 21 │ 21 │ 21 │ 13 │
├────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Memory (resident peak) (M) │ 155 │ 155 │ 155 │ 155 │ 156 │ 156 │ 156 │ 13 │
├────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Releases (K) │ 6000 │ 6000 │ 6000 │ 6000 │ 6000 │ 6000 │ 6000 │ 13 │
├────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (total CPU) (ms) │ 223 │ 223 │ 224 │ 225 │ 226 │ 228 │ 228 │ 13 │
├────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (wall clock) (ms) │ 224 │ 225 │ 225 │ 226 │ 226 │ 229 │ 229 │ 13 │
╘════════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in the code above we are only measuring the amount it takes to add 5 structs into the fb right? or are we measuring the amount of time we add 5 structs a million time into a buffer? as in the total time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is 5 structs a million times into a buffer, for _ in benchmark.scaledIterations {
is 1M times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay then, perfect!
1310f14
into
mustiikhalil:update-struct-pushing-to-buffer
Added ARC metric, some overall cleanup to use the built-in support for inner loops.