Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NNlib Benchmark #15

Merged
merged 3 commits into from
Jul 7, 2023

Conversation

skyleaworlder
Copy link
Collaborator

@skyleaworlder skyleaworlder commented Jul 6, 2023

PR Checklist

  • Tests are added
  • Documentation, if applicable

List

  • activations: already in benchmarks.jl of NNlib.jl
  • attention: it's easy and simple; I haven't add this
  • conv: actually already in perf_report.jl of NNlib.jl
  • dropout: new
  • gemm: new, partially done
  • padding: I don't think this need to be taken in
  • pool: actually already in perf_report.jl of NNlib.jl
  • softmax: already in benchamrks.jl of NNlib.jl
  • upsample: new

Result

Adding more benchmarks undoubtedly will take more time. Now running all benchmarks takes approximately 20 minutes, which even excludes the time for environment setup and teardown.

D3181F57FFE7D2D8D23E89B0A18F4612

@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (main@aa136ae). Click here to learn what that means.
Patch has no changes to coverable lines.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #15   +/-   ##
=======================================
  Coverage        ?   41.66%           
=======================================
  Files           ?        3           
  Lines           ?      108           
  Branches        ?        0           
=======================================
  Hits            ?       45           
  Misses          ?       63           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@skyleaworlder
Copy link
Collaborator Author

I pasted report.md here:

Judge result

Job Properties

  • Time of benchmarks:
    • Target: 6 Jul 2023 - 15:50
    • Baseline: 6 Jul 2023 - 15:26
  • Package commits:
    • Target: dirty
    • Baseline: dirty
  • Julia commits:
    • Target: 0434de
    • Baseline: 0434de
  • Julia command flags:
    • Target: None
    • Baseline: None
  • Environment variables:
    • Target: JULIA_NUM_THREADS => 1
    • Baseline: JULIA_NUM_THREADS => 1

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["activations", "Float16", "relu"] 1.66 (5%) ❌ 1.00 (1%)
["activations", "Float16", "softsign"] 1.09 (5%) ❌ 1.00 (1%)
["activations", "Float16", "tanhshrink"] 0.95 (5%) ✅ 1.00 (1%)
["activations", "Float16", "trelu"] 0.75 (5%) ✅ 1.00 (1%)
["activations", "Float32", "celu"] 1.05 (5%) ❌ 1.00 (1%)
["activations", "Float32", "elu"] 1.06 (5%) ❌ 1.00 (1%)
["activations", "Float32", "gelu"] 1.11 (5%) ❌ 1.00 (1%)
["activations", "Float32", "logcosh"] 1.06 (5%) ❌ 1.00 (1%)
["activations", "Float32", "rrelu"] 0.94 (5%) ✅ 1.00 (1%)
["activations", "Float32", "σ"] 1.05 (5%) ❌ 1.00 (1%)
["activations", "Float64", "lisht"] 1.06 (5%) ❌ 1.00 (1%)
["activations", "Float64", "logcosh"] 1.06 (5%) ❌ 1.00 (1%)
["activations", "Float64", "tanh_fast"] 1.05 (5%) ❌ 1.00 (1%)
["activations", "Float64", "tanhshrink"] 1.06 (5%) ❌ 1.00 (1%)
["activations", "Float64", "σ"] 1.08 (5%) ❌ 1.00 (1%)
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "conv"] 1.07 (5%) ❌ 1.00 (1%)
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "data"] 1.05 (5%) ❌ 1.00 (1%)
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "filter"] 0.76 (5%) ✅ 1.00 (1%)
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "filter"] 1.17 (5%) ❌ 1.00 (1%)
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "data"] 0.74 (5%) ✅ 1.00 (1%)
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "filter"] 0.90 (5%) ✅ 1.00 (1%)
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "data"] 0.91 (5%) ✅ 1.00 (1%)
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "filter"] 0.76 (5%) ✅ 1.00 (1%)
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "data"] 1.05 (5%) ❌ 1.00 (1%)
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "conv"] 1.06 (5%) ❌ 1.00 (1%)
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "data"] 1.16 (5%) ❌ 1.00 (1%)
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "data"] 1.05 (5%) ❌ 1.00 (1%)
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "filter"] 1.18 (5%) ❌ 1.00 (1%)
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "conv"] 0.95 (5%) ✅ 1.00 (1%)
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "data"] 0.90 (5%) ✅ 1.00 (1%)
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "filter"] 1.13 (5%) ❌ 1.00 (1%)
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "data"] 0.90 (5%) ✅ 1.00 (1%)
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "data"] 1.13 (5%) ❌ 1.00 (1%)
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "data"] 0.70 (5%) ✅ 1.00 (1%)
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "data"] 1.05 (5%) ❌ 1.00 (1%)
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "conv"] 1.18 (5%) ❌ 1.00 (1%)
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "data"] 0.68 (5%) ✅ 1.00 (1%)
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "conv"] 1.09 (5%) ❌ 1.00 (1%)
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "conv"] 0.90 (5%) ✅ 1.00 (1%)
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "conv"] 1.11 (5%) ❌ 1.00 (1%)
["dropout", "4-N(100)", "dropout", "with-dim"] 1.10 (5%) ❌ 1.00 (1%)
["softmax", "softmax", "Float16", "fw", (1024, 2048, 4)] 1.15 (5%) ❌ 1.01 (1%)
["softmax", "softmax", "Float16", "fw", (12288, 2048, 1)] 1.04 (5%) 1.03 (1%) ❌
["softmax", "softmax", "Float16", "fw", (128, 384, 8)] 3.98 (5%) ❌ 1.02 (1%) ❌
["softmax", "softmax", "Float16", "fw", (2048, 2048, 2)] 1.15 (5%) ❌ 1.02 (1%) ❌
["softmax", "softmax", "Float16", "fw", (4096, 2048, 2)] 1.08 (5%) ❌ 1.02 (1%) ❌
["softmax", "softmax", "Float16", "fw", (512, 784, 8)] 1.38 (5%) ❌ 1.01 (1%) ❌
["softmax", "softmax", "Float16", "fw", (768, 1024, 4)] 1.39 (5%) ❌ 1.02 (1%) ❌
["softmax", "softmax", "Float32", "fw", (1024, 2048, 4)] 1.35 (5%) ❌ 1.00 (1%)
["softmax", "softmax", "Float32", "fw", (12288, 2048, 1)] 1.14 (5%) ❌ 1.02 (1%) ❌
["softmax", "softmax", "Float32", "fw", (128, 384, 8)] 8.04 (5%) ❌ 1.01 (1%) ❌
["softmax", "softmax", "Float32", "fw", (2048, 2048, 2)] 1.36 (5%) ❌ 1.01 (1%)
["softmax", "softmax", "Float32", "fw", (4096, 2048, 2)] 1.20 (5%) ❌ 1.01 (1%)
["softmax", "softmax", "Float32", "fw", (4096, 4096, 2)] 1.11 (5%) ❌ 1.00 (1%)
["softmax", "softmax", "Float32", "fw", (512, 784, 8)] 1.87 (5%) ❌ 1.01 (1%)
["softmax", "softmax", "Float32", "fw", (768, 1024, 4)] 1.91 (5%) ❌ 1.01 (1%)
["upsample", "nearest", "4-N(512)", "Float64"] 0.95 (5%) ✅ 1.00 (1%)

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["activations", "Float16"]
  • ["activations", "Float32"]
  • ["activations", "Float64"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64"]
  • ["dropout", "4-N(100)", "dropout!"]
  • ["dropout", "4-N(100)", "dropout"]
  • ["dropout", "4-N(1000)", "dropout!"]
  • ["dropout", "4-N(1000)", "dropout"]
  • ["dropout", "4-N(10000)", "dropout!"]
  • ["dropout", "4-N(10000)", "dropout"]
  • ["pooling", "4-N(20)-K(2)-stride(1)", "lpnormpool2d-direct"]
  • ["pooling", "4-N(20)-K(2)-stride(1)", "maxpool2d-direct"]
  • ["pooling", "4-N(20)-K(2)-stride(1)", "meanpool2d-direct"]
  • ["pooling", "4-N(20)-K(2)-stride(2)", "lpnormpool2d-direct"]
  • ["pooling", "4-N(20)-K(2)-stride(2)", "maxpool2d-direct"]
  • ["pooling", "4-N(20)-K(2)-stride(2)", "meanpool2d-direct"]
  • ["pooling", "4-N(20)-K(2)-stride(4)", "lpnormpool2d-direct"]
  • ["pooling", "4-N(20)-K(2)-stride(4)", "maxpool2d-direct"]
  • ["pooling", "4-N(20)-K(2)-stride(4)", "meanpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(1)", "lpnormpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(1)", "maxpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(1)", "meanpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(2)", "lpnormpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(2)", "maxpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(2)", "meanpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(4)", "lpnormpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(4)", "maxpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(4)", "meanpool2d-direct"]
  • ["softmax", "logsoftmax", "Float16", "bw"]
  • ["softmax", "logsoftmax", "Float16", "fw"]
  • ["softmax", "logsoftmax", "Float32", "bw"]
  • ["softmax", "logsoftmax", "Float32", "fw"]
  • ["softmax", "softmax", "Float16", "bw"]
  • ["softmax", "softmax", "Float16", "fw"]
  • ["softmax", "softmax", "Float32", "bw"]
  • ["softmax", "softmax", "Float32", "fw"]
  • ["upsample", "linear", "Float16", "bw"]
  • ["upsample", "linear", "Float16", "fw"]
  • ["upsample", "linear", "Float32", "bw"]
  • ["upsample", "linear", "Float32", "fw"]
  • ["upsample", "nearest", "4-N(128)"]
  • ["upsample", "nearest", "4-N(2048)"]
  • ["upsample", "nearest", "4-N(512)"]

Julia versioninfo

Target

Julia Version 1.8.3
Commit 0434deb161e (2022-11-14 20:14 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      Ubuntu 22.04.2 LTS
  uname: Linux 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz: 
                 speed         user         nice          sys         idle          irq
       #1-64  2100 MHz    4503994 s       6177 s    2688877 s  2661682743 s          0 s
  Memory: 125.51467895507812 GB (117663.8515625 MB free)
  Uptime: 4.17160902e6 sec
  Load Avg:  1.09  1.04  1.1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, cascadelake)
  Threads: 1 on 64 virtual cores

Baseline

Julia Version 1.8.3
Commit 0434deb161e (2022-11-14 20:14 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      Ubuntu 22.04.2 LTS
  uname: Linux 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz: 
                 speed         user         nice          sys         idle          irq
       #1-64  2100 MHz    4488711 s       6177 s    2682682 s  2660753665 s          0 s
  Memory: 125.51467895507812 GB (117928.24609375 MB free)
  Uptime: 4.17012335e6 sec
  Load Avg:  1.06  1.08  1.27
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, cascadelake)
  Threads: 1 on 64 virtual cores

Target result

Job Properties

  • Time of benchmark: 6 Jul 2023 - 15:50
  • Package commit: dirty
  • Julia commit: 0434de
  • Julia command flags: None
  • Environment variables: JULIA_NUM_THREADS => 1

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["activations", "Float16", "celu"] 9.228 ms (5%)
["activations", "Float16", "elu"] 9.281 ms (5%)
["activations", "Float16", "gelu"] 36.790 ms (5%)
["activations", "Float16", "hardswish"] 5.673 ms (5%)
["activations", "Float16", "hardtanh"] 1.270 ms (5%)
["activations", "Float16", "hardσ"] 4.112 ms (5%)
["activations", "Float16", "leakyrelu"] 1.409 ms (5%)
["activations", "Float16", "lisht"] 10.340 ms (5%)
["activations", "Float16", "logcosh"] 48.086 ms (5%)
["activations", "Float16", "logσ"] 46.714 ms (5%)
["activations", "Float16", "mish"] 60.354 ms (5%)
["activations", "Float16", "relu"] 1.388 ms (5%)
["activations", "Float16", "relu6"] 1.235 ms (5%)
["activations", "Float16", "rrelu"] 5.631 ms (5%)
["activations", "Float16", "selu"] 14.674 ms (5%)
["activations", "Float16", "sigmoid_fast"] 18.037 ms (5%)
["activations", "Float16", "softplus"] 42.462 ms (5%)
["activations", "Float16", "softshrink"] 9.644 ms (5%)
["activations", "Float16", "softsign"] 3.964 ms (5%)
["activations", "Float16", "swish"] 19.616 ms (5%)
["activations", "Float16", "tanh_fast"] 9.236 ms (5%)
["activations", "Float16", "tanhshrink"] 10.438 ms (5%)
["activations", "Float16", "trelu"] 748.143 μs (5%)
["activations", "Float16", "σ"] 17.748 ms (5%)
["activations", "Float32", "celu"] 6.587 ms (5%)
["activations", "Float32", "elu"] 6.260 ms (5%)
["activations", "Float32", "gelu"] 11.612 ms (5%)
["activations", "Float32", "hardswish"] 330.991 μs (5%)
["activations", "Float32", "hardtanh"] 328.666 μs (5%)
["activations", "Float32", "hardσ"] 333.439 μs (5%)
["activations", "Float32", "leakyrelu"] 328.837 μs (5%)
["activations", "Float32", "lisht"] 447.228 μs (5%)
["activations", "Float32", "logcosh"] 29.696 ms (5%)
["activations", "Float32", "logσ"] 26.836 ms (5%)
["activations", "Float32", "mish"] 42.897 ms (5%)
["activations", "Float32", "relu"] 329.639 μs (5%)
["activations", "Float32", "relu6"] 329.816 μs (5%)
["activations", "Float32", "rrelu"] 2.178 ms (5%)
["activations", "Float32", "selu"] 6.261 ms (5%)
["activations", "Float32", "sigmoid_fast"] 7.578 ms (5%)
["activations", "Float32", "softplus"] 26.645 ms (5%)
["activations", "Float32", "softshrink"] 332.287 μs (5%)
["activations", "Float32", "softsign"] 324.634 μs (5%)
["activations", "Float32", "swish"] 7.811 ms (5%)
["activations", "Float32", "tanh_fast"] 425.460 μs (5%)
["activations", "Float32", "tanhshrink"] 447.694 μs (5%)
["activations", "Float32", "trelu"] 323.396 μs (5%)
["activations", "Float32", "σ"] 6.615 ms (5%)
["activations", "Float64", "celu"] 7.256 ms (5%)
["activations", "Float64", "elu"] 5.956 ms (5%)
["activations", "Float64", "gelu"] 10.080 ms (5%)
["activations", "Float64", "hardswish"] 769.829 μs (5%)
["activations", "Float64", "hardtanh"] 686.695 μs (5%)
["activations", "Float64", "hardσ"] 747.067 μs (5%)
["activations", "Float64", "leakyrelu"] 686.759 μs (5%)
["activations", "Float64", "lisht"] 9.925 ms (5%)
["activations", "Float64", "logcosh"] 26.611 ms (5%)
["activations", "Float64", "logσ"] 24.767 ms (5%)
["activations", "Float64", "mish"] 41.852 ms (5%)
["activations", "Float64", "relu"] 685.627 μs (5%)
["activations", "Float64", "relu6"] 686.501 μs (5%)
["activations", "Float64", "rrelu"] 2.109 ms (5%)
["activations", "Float64", "selu"] 5.958 ms (5%)
["activations", "Float64", "sigmoid_fast"] 7.039 ms (5%)
["activations", "Float64", "softplus"] 24.142 ms (5%)
["activations", "Float64", "softshrink"] 690.096 μs (5%)
["activations", "Float64", "softsign"] 687.384 μs (5%)
["activations", "Float64", "swish"] 7.541 ms (5%)
["activations", "Float64", "tanh_fast"] 9.564 ms (5%)
["activations", "Float64", "tanhshrink"] 9.921 ms (5%)
["activations", "Float64", "trelu"] 675.761 μs (5%)
["activations", "Float64", "σ"] 7.573 ms (5%)
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "conv"] 2.070 μs (5%) 1.53 KiB (1%) 21
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "data"] 2.511 μs (5%) 1.81 KiB (1%) 25
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "filter"] 2.670 μs (5%) 2.12 KiB (1%) 27
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "conv"] 2.071 μs (5%) 1.55 KiB (1%) 21
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "data"] 2.485 μs (5%) 1.83 KiB (1%) 25
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "filter"] 2.671 μs (5%) 2.38 KiB (1%) 27
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "conv"] 2.369 μs (5%) 2.20 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "data"] 2.195 μs (5%) 2.20 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "filter"] 1.154 μs (5%) 1.44 KiB (1%) 16
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "conv"] 2.247 μs (5%) 2.44 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "data"] 2.191 μs (5%) 2.42 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "filter"] 972.000 ns (5%) 1.66 KiB (1%) 16
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "conv"] 740.000 ns (5%) 1.12 KiB (1%) 15
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "data"] 2.716 μs (5%) 2.22 KiB (1%) 31
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "filter"] 3.230 μs (5%) 2.44 KiB (1%) 32
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "conv"] 711.000 ns (5%) 1.12 KiB (1%) 15
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "data"] 2.693 μs (5%) 2.23 KiB (1%) 31
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "filter"] 3.217 μs (5%) 2.69 KiB (1%) 32
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "conv"] 2.392 μs (5%) 2.33 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "data"] 2.176 μs (5%) 2.16 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "filter"] 1.060 μs (5%) 1.39 KiB (1%) 16
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "conv"] 2.295 μs (5%) 2.55 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "data"] 2.121 μs (5%) 2.38 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "filter"] 971.000 ns (5%) 1.61 KiB (1%) 16
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "conv"] 2.129 μs (5%) 1.53 KiB (1%) 21
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "data"] 2.471 μs (5%) 1.81 KiB (1%) 25
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "filter"] 2.619 μs (5%) 2.14 KiB (1%) 27
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "conv"] 2.152 μs (5%) 1.55 KiB (1%) 21
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "data"] 2.476 μs (5%) 1.83 KiB (1%) 25
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "filter"] 2.619 μs (5%) 2.41 KiB (1%) 27
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "conv"] 2.520 μs (5%) 2.27 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "data"] 2.244 μs (5%) 2.27 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "filter"] 1.273 μs (5%) 1.50 KiB (1%) 16
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "conv"] 2.382 μs (5%) 2.56 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "data"] 2.238 μs (5%) 2.55 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "filter"] 1.095 μs (5%) 1.78 KiB (1%) 16
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "conv"] 818.000 ns (5%) 1.12 KiB (1%) 15
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "data"] 2.684 μs (5%) 2.22 KiB (1%) 31
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "filter"] 3.255 μs (5%) 2.45 KiB (1%) 32
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "conv"] 802.000 ns (5%) 1.12 KiB (1%) 15
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "data"] 2.646 μs (5%) 2.23 KiB (1%) 31
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "filter"] 3.281 μs (5%) 2.72 KiB (1%) 32
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "conv"] 2.568 μs (5%) 2.39 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "data"] 2.231 μs (5%) 2.22 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "filter"] 1.181 μs (5%) 1.45 KiB (1%) 16
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "conv"] 2.387 μs (5%) 2.67 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "data"] 2.162 μs (5%) 2.50 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "filter"] 1.090 μs (5%) 1.73 KiB (1%) 16
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "conv"] 4.444 μs (5%) 752 bytes (1%) 12
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "data"] 5.628 μs (5%) 1.05 KiB (1%) 16
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "filter"] 8.415 μs (5%) 5.86 KiB (1%) 18
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "conv"] 4.419 μs (5%) 768 bytes (1%) 12
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "data"] 5.703 μs (5%) 1.12 KiB (1%) 16
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "filter"] 9.182 μs (5%) 10.08 KiB (1%) 18
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "conv"] 6.835 μs (5%) 12.70 KiB (1%) 13
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "data"] 15.516 μs (5%) 12.70 KiB (1%) 13
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "filter"] 3.459 μs (5%) 11.94 KiB (1%) 7
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "conv"] 7.575 μs (5%) 24.03 KiB (1%) 14
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "data"] 17.016 μs (5%) 24.02 KiB (1%) 14
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "filter"] 4.932 μs (5%) 23.25 KiB (1%) 8
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "conv"] 6.989 μs (5%) 384 bytes (1%) 6
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "data"] 7.544 μs (5%) 1.52 KiB (1%) 22
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "filter"] 14.600 μs (5%) 6.25 KiB (1%) 23
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "conv"] 6.148 μs (5%) 384 bytes (1%) 6
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "data"] 7.527 μs (5%) 1.62 KiB (1%) 22
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "filter"] 15.549 μs (5%) 10.53 KiB (1%) 23
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "conv"] 6.797 μs (5%) 12.88 KiB (1%) 13
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "data"] 11.565 μs (5%) 12.70 KiB (1%) 13
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "filter"] 3.729 μs (5%) 11.94 KiB (1%) 7
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "conv"] 7.369 μs (5%) 24.19 KiB (1%) 14
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "data"] 13.463 μs (5%) 24.02 KiB (1%) 14
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "filter"] 3.843 μs (5%) 23.25 KiB (1%) 8
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "conv"] 5.881 μs (5%) 752 bytes (1%) 12
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "data"] 5.408 μs (5%) 1.05 KiB (1%) 16
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "filter"] 8.018 μs (5%) 6.39 KiB (1%) 18
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "conv"] 6.098 μs (5%) 768 bytes (1%) 12
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "data"] 5.372 μs (5%) 1.12 KiB (1%) 16
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "filter"] 8.487 μs (5%) 11.33 KiB (1%) 18
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "conv"] 15.159 μs (5%) 18.27 KiB (1%) 14
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "data"] 17.720 μs (5%) 18.27 KiB (1%) 14
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "filter"] 11.237 μs (5%) 17.50 KiB (1%) 8
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "conv"] 17.517 μs (5%) 35.28 KiB (1%) 14
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "data"] 19.184 μs (5%) 35.27 KiB (1%) 14
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "filter"] 15.242 μs (5%) 34.50 KiB (1%) 8
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "conv"] 10.953 μs (5%) 384 bytes (1%) 6
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "data"] 6.648 μs (5%) 1.52 KiB (1%) 22
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "filter"] 16.893 μs (5%) 6.78 KiB (1%) 23
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "conv"] 9.921 μs (5%) 384 bytes (1%) 6
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "data"] 6.625 μs (5%) 1.62 KiB (1%) 22
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "filter"] 17.742 μs (5%) 11.78 KiB (1%) 23
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "conv"] 14.285 μs (5%) 18.44 KiB (1%) 14
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "data"] 16.556 μs (5%) 18.27 KiB (1%) 14
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "filter"] 10.952 μs (5%) 17.50 KiB (1%) 8
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "conv"] 17.209 μs (5%) 35.44 KiB (1%) 14
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "data"] 18.910 μs (5%) 35.27 KiB (1%) 14
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "filter"] 14.290 μs (5%) 34.50 KiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "conv"] 142.639 μs (5%) 368 bytes (1%) 6
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "data"] 175.337 μs (5%) 848 bytes (1%) 10
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "filter"] 294.515 μs (5%) 86.05 KiB (1%) 15
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "conv"] 147.516 μs (5%) 384 bytes (1%) 6
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "data"] 196.353 μs (5%) 1.03 KiB (1%) 10
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "filter"] 294.938 μs (5%) 171.31 KiB (1%) 15
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "conv"] 139.715 μs (5%) 615.95 KiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "data"] 475.939 μs (5%) 615.95 KiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "filter"] 131.322 μs (5%) 615.19 KiB (1%) 2
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "conv"] 235.557 μs (5%) 1.20 MiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "data"] 746.910 μs (5%) 1.20 MiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "filter"] 219.980 μs (5%) 1.20 MiB (1%) 2
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "conv"] 262.323 μs (5%)
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "data"] 256.918 μs (5%) 1.38 KiB (1%) 16
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "filter"] 400.001 μs (5%) 86.59 KiB (1%) 20
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "conv"] 267.901 μs (5%)
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "data"] 257.930 μs (5%) 1.67 KiB (1%) 16
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "filter"] 401.315 μs (5%) 172.05 KiB (1%) 20
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "conv"] 136.318 μs (5%) 616.12 KiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "data"] 472.661 μs (5%) 615.95 KiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "filter"] 132.896 μs (5%) 615.19 KiB (1%) 2
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "conv"] 249.564 μs (5%) 1.20 MiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "data"] 730.562 μs (5%) 1.20 MiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "filter"] 220.257 μs (5%) 1.20 MiB (1%) 2
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "conv"] 233.398 μs (5%) 368 bytes (1%) 6
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "data"] 193.583 μs (5%) 848 bytes (1%) 10
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "filter"] 300.362 μs (5%) 104.86 KiB (1%) 15
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "conv"] 259.369 μs (5%) 384 bytes (1%) 6
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "data"] 201.199 μs (5%) 1.03 KiB (1%) 10
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "filter"] 302.372 μs (5%) 208.94 KiB (1%) 15
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "conv"] 753.865 μs (5%) 1.10 MiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "data"] 926.599 μs (5%) 1.10 MiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "filter"] 743.251 μs (5%) 1.10 MiB (1%) 2
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "conv"] 1.043 ms (5%) 2.19 MiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "data"] 1.035 ms (5%) 2.19 MiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "filter"] 1.032 ms (5%) 2.19 MiB (1%) 2
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "conv"] 502.617 μs (5%)
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "data"] 220.482 μs (5%) 1.38 KiB (1%) 16
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "filter"] 502.187 μs (5%) 105.41 KiB (1%) 20
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "conv"] 462.685 μs (5%)
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "data"] 220.194 μs (5%) 1.67 KiB (1%) 16
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "filter"] 491.058 μs (5%) 209.67 KiB (1%) 20
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "conv"] 751.787 μs (5%) 1.10 MiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "data"] 940.272 μs (5%) 1.10 MiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "filter"] 733.376 μs (5%) 1.10 MiB (1%) 2
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "conv"] 1.046 ms (5%) 2.19 MiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "data"] 1.038 ms (5%) 2.19 MiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "filter"] 1.046 ms (5%) 2.19 MiB (1%) 2
["dropout", "4-N(100)", "dropout!", "with-colon"] 3.483 μs (5%)
["dropout", "4-N(100)", "dropout!", "with-dim"] 2.158 μs (5%) 544 bytes (1%) 2
["dropout", "4-N(100)", "dropout", "with-colon"] 5.879 μs (5%) 39.17 KiB (1%) 3
["dropout", "4-N(100)", "dropout", "with-dim"] 4.299 μs (5%) 39.70 KiB (1%) 5
["dropout", "4-N(1000)", "dropout!", "with-colon"] 472.306 μs (5%)
["dropout", "4-N(1000)", "dropout!", "with-dim"] 313.702 μs (5%) 4.11 KiB (1%) 2
["dropout", "4-N(1000)", "dropout", "with-colon"] 868.508 μs (5%) 3.81 MiB (1%) 3
["dropout", "4-N(1000)", "dropout", "with-dim"] 478.407 μs (5%) 3.82 MiB (1%) 5
["dropout", "4-N(10000)", "dropout!", "with-colon"] 108.722 ms (5%)
["dropout", "4-N(10000)", "dropout!", "with-dim"] 65.728 ms (5%) 39.17 KiB (1%) 3
["dropout", "4-N(10000)", "dropout", "with-colon"] 236.040 ms (5%) 381.47 MiB (1%) 3
["dropout", "4-N(10000)", "dropout", "with-dim"] 186.037 ms (5%) 381.51 MiB (1%) 6
["pooling", "4-N(20)-K(2)-stride(1)", "lpnormpool2d-direct", "data"] 13.310 μs (5%) 864 bytes (1%) 14
["pooling", "4-N(20)-K(2)-stride(1)", "lpnormpool2d-direct", "pool"] 8.565 μs (5%) 752 bytes (1%) 13
["pooling", "4-N(20)-K(2)-stride(1)", "maxpool2d-direct", "data"] 2.989 μs (5%) 816 bytes (1%) 11
["pooling", "4-N(20)-K(2)-stride(1)", "maxpool2d-direct", "pool"] 2.244 μs (5%) 704 bytes (1%) 11
["pooling", "4-N(20)-K(2)-stride(1)", "meanpool2d-direct", "data"] 1.526 μs (5%) 816 bytes (1%) 11
["pooling", "4-N(20)-K(2)-stride(1)", "meanpool2d-direct", "pool"] 2.151 μs (5%) 704 bytes (1%) 11
["pooling", "4-N(20)-K(2)-stride(2)", "lpnormpool2d-direct", "data"] 4.507 μs (5%) 864 bytes (1%) 14
["pooling", "4-N(20)-K(2)-stride(2)", "lpnormpool2d-direct", "pool"] 3.826 μs (5%) 752 bytes (1%) 13
["pooling", "4-N(20)-K(2)-stride(2)", "maxpool2d-direct", "data"] 1.633 μs (5%) 816 bytes (1%) 11
["pooling", "4-N(20)-K(2)-stride(2)", "maxpool2d-direct", "pool"] 2.030 μs (5%) 704 bytes (1%) 11
["pooling", "4-N(20)-K(2)-stride(2)", "meanpool2d-direct", "data"] 1.078 μs (5%) 816 bytes (1%) 11
["pooling", "4-N(20)-K(2)-stride(2)", "meanpool2d-direct", "pool"] 1.982 μs (5%) 704 bytes (1%) 11
["pooling", "4-N(20)-K(2)-stride(4)", "lpnormpool2d-direct", "data"] 1.947 μs (5%) 864 bytes (1%) 14
["pooling", "4-N(20)-K(2)-stride(4)", "lpnormpool2d-direct", "pool"] 2.522 μs (5%) 752 bytes (1%) 13
["pooling", "4-N(20)-K(2)-stride(4)", "maxpool2d-direct", "data"] 1.155 μs (5%) 816 bytes (1%) 11
["pooling", "4-N(20)-K(2)-stride(4)", "maxpool2d-direct", "pool"] 2.083 μs (5%) 704 bytes (1%) 11
["pooling", "4-N(20)-K(2)-stride(4)", "meanpool2d-direct", "data"] 938.000 ns (5%) 816 bytes (1%) 11
["pooling", "4-N(20)-K(2)-stride(4)", "meanpool2d-direct", "pool"] 2.074 μs (5%) 704 bytes (1%) 11
["pooling", "4-N(20)-K(4)-stride(1)", "lpnormpool2d-direct", "data"] 44.750 μs (5%) 864 bytes (1%) 14
["pooling", "4-N(20)-K(4)-stride(1)", "lpnormpool2d-direct", "pool"] 15.832 μs (5%) 752 bytes (1%) 13
["pooling", "4-N(20)-K(4)-stride(1)", "maxpool2d-direct", "data"] 2.795 μs (5%) 816 bytes (1%) 11
["pooling", "4-N(20)-K(4)-stride(1)", "maxpool2d-direct", "pool"] 2.395 μs (5%) 704 bytes (1%) 11
["pooling", "4-N(20)-K(4)-stride(1)", "meanpool2d-direct", "data"] 2.946 μs (5%) 816 bytes (1%) 11
["pooling", "4-N(20)-K(4)-stride(1)", "meanpool2d-direct", "pool"] 2.311 μs (5%) 704 bytes (1%) 11
["pooling", "4-N(20)-K(4)-stride(2)", "lpnormpool2d-direct", "data"] 13.405 μs (5%) 864 bytes (1%) 14
["pooling", "4-N(20)-K(4)-stride(2)", "lpnormpool2d-direct", "pool"] 5.926 μs (5%) 752 bytes (1%) 13
["pooling", "4-N(20)-K(4)-stride(2)", "maxpool2d-direct", "data"] 1.478 μs (5%) 816 bytes (1%) 11
["pooling", "4-N(20)-K(4)-stride(2)", "maxpool2d-direct", "pool"] 2.173 μs (5%) 704 bytes (1%) 11
["pooling", "4-N(20)-K(4)-stride(2)", "meanpool2d-direct", "data"] 1.486 μs (5%) 816 bytes (1%) 11
["pooling", "4-N(20)-K(4)-stride(2)", "meanpool2d-direct", "pool"] 2.180 μs (5%) 704 bytes (1%) 11
["pooling", "4-N(20)-K(4)-stride(4)", "lpnormpool2d-direct", "data"] 4.842 μs (5%) 864 bytes (1%) 14
["pooling", "4-N(20)-K(4)-stride(4)", "lpnormpool2d-direct", "pool"] 3.235 μs (5%) 752 bytes (1%) 13
["pooling", "4-N(20)-K(4)-stride(4)", "maxpool2d-direct", "data"] 1.120 μs (5%) 816 bytes (1%) 11
["pooling", "4-N(20)-K(4)-stride(4)", "maxpool2d-direct", "pool"] 2.040 μs (5%) 704 bytes (1%) 11
["pooling", "4-N(20)-K(4)-stride(4)", "meanpool2d-direct", "data"] 1.076 μs (5%) 816 bytes (1%) 11
["pooling", "4-N(20)-K(4)-stride(4)", "meanpool2d-direct", "pool"] 2.042 μs (5%) 704 bytes (1%) 11
["softmax", "logsoftmax", "Float16", "bw", (1024, 2048, 4)] 149.804 ms (5%) 16.02 MiB (1%) 3
["softmax", "logsoftmax", "Float16", "bw", (12288, 2048, 1)] 451.420 ms (5%) 48.00 MiB (1%) 3
["softmax", "logsoftmax", "Float16", "bw", (128, 384, 8)] 6.803 ms (5%) 774.19 KiB (1%) 3
["softmax", "logsoftmax", "Float16", "bw", (2048, 2048, 2)] 149.623 ms (5%) 16.01 MiB (1%) 3
["softmax", "logsoftmax", "Float16", "bw", (4096, 2048, 2)] 301.128 ms (5%) 32.01 MiB (1%) 3
["softmax", "logsoftmax", "Float16", "bw", (4096, 4096, 2)] 612.687 ms (5%) 64.02 MiB (1%) 3
["softmax", "logsoftmax", "Float16", "bw", (512, 784, 8)] 57.041 ms (5%) 6.14 MiB (1%) 3
["softmax", "logsoftmax", "Float16", "bw", (768, 1024, 4)] 55.906 ms (5%) 6.01 MiB (1%) 3
["softmax", "logsoftmax", "Float16", "fw", (1024, 2048, 4)] 115.466 ms (5%) 48.38 KiB (1%) 3
["softmax", "logsoftmax", "Float16", "fw", (12288, 2048, 1)] 350.097 ms (5%) 12.38 KiB (1%) 3
["softmax", "logsoftmax", "Float16", "fw", (128, 384, 8)] 5.452 ms (5%) 18.38 KiB (1%) 3
["softmax", "logsoftmax", "Float16", "fw", (2048, 2048, 2)] 115.638 ms (5%) 24.38 KiB (1%) 3
["softmax", "logsoftmax", "Float16", "fw", (4096, 2048, 2)] 232.341 ms (5%) 24.38 KiB (1%) 3
["softmax", "logsoftmax", "Float16", "fw", (4096, 4096, 2)] 465.100 ms (5%) 48.38 KiB (1%) 3
["softmax", "logsoftmax", "Float16", "fw", (512, 784, 8)] 44.525 ms (5%) 37.12 KiB (1%) 3
["softmax", "logsoftmax", "Float16", "fw", (768, 1024, 4)] 43.190 ms (5%) 24.38 KiB (1%) 3
["softmax", "logsoftmax", "Float32", "bw", (1024, 2048, 4)] 68.254 ms (5%) 32.03 MiB (1%) 4
["softmax", "logsoftmax", "Float32", "bw", (12288, 2048, 1)] 207.218 ms (5%) 96.01 MiB (1%) 3
["softmax", "logsoftmax", "Float32", "bw", (128, 384, 8)] 3.213 ms (5%) 1.51 MiB (1%) 3
["softmax", "logsoftmax", "Float32", "bw", (2048, 2048, 2)] 68.443 ms (5%) 32.02 MiB (1%) 3
["softmax", "logsoftmax", "Float32", "bw", (4096, 2048, 2)] 137.631 ms (5%) 64.02 MiB (1%) 3
["softmax", "logsoftmax", "Float32", "bw", (4096, 4096, 2)] 299.585 ms (5%) 128.03 MiB (1%) 4
["softmax", "logsoftmax", "Float32", "bw", (512, 784, 8)] 25.919 ms (5%) 12.27 MiB (1%) 4
["softmax", "logsoftmax", "Float32", "bw", (768, 1024, 4)] 25.260 ms (5%) 12.02 MiB (1%) 3
["softmax", "logsoftmax", "Float32", "fw", (1024, 2048, 4)] 66.851 ms (5%) 96.19 KiB (1%) 6
["softmax", "logsoftmax", "Float32", "fw", (12288, 2048, 1)] 202.546 ms (5%) 24.38 KiB (1%) 3
["softmax", "logsoftmax", "Float32", "fw", (128, 384, 8)] 3.026 ms (5%) 36.38 KiB (1%) 3
["softmax", "logsoftmax", "Float32", "fw", (2048, 2048, 2)] 66.729 ms (5%) 48.38 KiB (1%) 3
["softmax", "logsoftmax", "Float32", "fw", (4096, 2048, 2)] 134.902 ms (5%) 48.38 KiB (1%) 3
["softmax", "logsoftmax", "Float32", "fw", (4096, 4096, 2)] 271.534 ms (5%) 96.19 KiB (1%) 6
["softmax", "logsoftmax", "Float32", "fw", (512, 784, 8)] 24.784 ms (5%) 73.69 KiB (1%) 6
["softmax", "logsoftmax", "Float32", "fw", (768, 1024, 4)] 24.393 ms (5%) 48.38 KiB (1%) 3
["softmax", "softmax", "Float16", "bw", (1024, 2048, 4)] 93.314 ms (5%) 16.02 MiB (1%) 3
["softmax", "softmax", "Float16", "bw", (12288, 2048, 1)] 280.688 ms (5%) 48.00 MiB (1%) 3
["softmax", "softmax", "Float16", "bw", (128, 384, 8)] 4.278 ms (5%) 774.19 KiB (1%) 3
["softmax", "softmax", "Float16", "bw", (2048, 2048, 2)] 93.259 ms (5%) 16.01 MiB (1%) 3
["softmax", "softmax", "Float16", "bw", (4096, 2048, 2)] 187.035 ms (5%) 32.01 MiB (1%) 3
["softmax", "softmax", "Float16", "bw", (4096, 4096, 2)] 385.069 ms (5%) 64.02 MiB (1%) 3
["softmax", "softmax", "Float16", "bw", (512, 784, 8)] 35.529 ms (5%) 6.14 MiB (1%) 3
["softmax", "softmax", "Float16", "bw", (768, 1024, 4)] 34.862 ms (5%) 6.01 MiB (1%) 3
["softmax", "softmax", "Float16", "fw", (1024, 2048, 4)] 170.860 ms (5%) 16.27 KiB (1%) 6
["softmax", "softmax", "Float16", "fw", (12288, 2048, 1)] 470.978 ms (5%) 4.27 KiB (1%) 6
["softmax", "softmax", "Float16", "fw", (128, 384, 8)] 28.034 ms (5%) 6.27 KiB (1%) 6
["softmax", "softmax", "Float16", "fw", (2048, 2048, 2)] 170.528 ms (5%) 8.27 KiB (1%) 6
["softmax", "softmax", "Float16", "fw", (4096, 2048, 2)] 320.421 ms (5%) 8.27 KiB (1%) 6
["softmax", "softmax", "Float16", "fw", (4096, 4096, 2)] 621.200 ms (5%) 16.27 KiB (1%) 6
["softmax", "softmax", "Float16", "fw", (512, 784, 8)] 78.454 ms (5%) 12.52 KiB (1%) 6
["softmax", "softmax", "Float16", "fw", (768, 1024, 4)] 77.074 ms (5%) 8.27 KiB (1%) 6
["softmax", "softmax", "Float32", "bw", (1024, 2048, 4)] 15.876 ms (5%) 32.03 MiB (1%) 4
["softmax", "softmax", "Float32", "bw", (12288, 2048, 1)] 51.206 ms (5%) 96.01 MiB (1%) 3
["softmax", "softmax", "Float32", "bw", (128, 384, 8)] 717.810 μs (5%) 1.51 MiB (1%) 3
["softmax", "softmax", "Float32", "bw", (2048, 2048, 2)] 15.772 ms (5%) 32.02 MiB (1%) 3
["softmax", "softmax", "Float32", "bw", (4096, 2048, 2)] 32.833 ms (5%) 64.02 MiB (1%) 3
["softmax", "softmax", "Float32", "bw", (4096, 4096, 2)] 66.814 ms (5%) 128.03 MiB (1%) 4
["softmax", "softmax", "Float32", "bw", (512, 784, 8)] 5.750 ms (5%) 12.27 MiB (1%) 4
["softmax", "softmax", "Float32", "bw", (768, 1024, 4)] 5.561 ms (5%) 12.02 MiB (1%) 3
["softmax", "softmax", "Float32", "fw", (1024, 2048, 4)] 90.225 ms (5%) 32.20 KiB (1%) 7
["softmax", "softmax", "Float32", "fw", (12288, 2048, 1)] 230.344 ms (5%) 8.27 KiB (1%) 6
["softmax", "softmax", "Float32", "fw", (128, 384, 8)] 23.889 ms (5%) 12.27 KiB (1%) 6
["softmax", "softmax", "Float32", "fw", (2048, 2048, 2)] 90.576 ms (5%) 16.27 KiB (1%) 6
["softmax", "softmax", "Float32", "fw", (4096, 2048, 2)] 161.320 ms (5%) 16.27 KiB (1%) 6
["softmax", "softmax", "Float32", "fw", (4096, 4096, 2)] 300.730 ms (5%) 32.20 KiB (1%) 7
["softmax", "softmax", "Float32", "fw", (512, 784, 8)] 46.333 ms (5%) 24.70 KiB (1%) 7
["softmax", "softmax", "Float32", "fw", (768, 1024, 4)] 46.671 ms (5%) 16.27 KiB (1%) 6
["upsample", "linear", "Float16", "bw", "4-N(1024)-scale((0.5, 2))"] 50.362 ms (5%) 2.00 MiB (1%) 14
["upsample", "linear", "Float16", "bw", "4-N(128)-scale((1, 2))"] 786.756 μs (5%) 64.44 KiB (1%) 14
["upsample", "linear", "Float16", "bw", "4-N(128)-scale(2)"] 594.334 μs (5%) 128.53 KiB (1%) 17
["upsample", "linear", "Float16", "bw", "4-N(256)-scale(4)"] 2.576 ms (5%) 2.00 MiB (1%) 17
["upsample", "linear", "Float16", "bw", "4-N(256)-scale(8)"] 4.222 ms (5%) 8.00 MiB (1%) 17
["upsample", "linear", "Float16", "fw", "4-N(1024)-scale((0.5, 2))"] 41.729 ms (5%) 2.00 MiB (1%) 14
["upsample", "linear", "Float16", "fw", "4-N(128)-scale((1, 2))"] 1.245 ms (5%) 64.44 KiB (1%) 14
["upsample", "linear", "Float16", "fw", "4-N(128)-scale(2)"] 1.857 ms (5%) 128.44 KiB (1%) 14
["upsample", "linear", "Float16", "fw", "4-N(256)-scale(4)"] 30.224 ms (5%) 2.00 MiB (1%) 14
["upsample", "linear", "Float16", "fw", "4-N(256)-scale(8)"] 167.907 ms (5%) 8.00 MiB (1%) 14
["upsample", "linear", "Float32", "bw", "4-N(1024)-scale((0.5, 2))"] 9.314 ms (5%) 4.00 MiB (1%) 14
["upsample", "linear", "Float32", "bw", "4-N(128)-scale((1, 2))"] 157.282 μs (5%) 128.44 KiB (1%) 14
["upsample", "linear", "Float32", "bw", "4-N(128)-scale(2)"] 140.656 μs (5%) 256.53 KiB (1%) 17
["upsample", "linear", "Float32", "bw", "4-N(256)-scale(4)"] 979.353 μs (5%) 4.00 MiB (1%) 17
["upsample", "linear", "Float32", "bw", "4-N(256)-scale(8)"] 2.808 ms (5%) 16.00 MiB (1%) 17
["upsample", "linear", "Float32", "fw", "4-N(1024)-scale((0.5, 2))"] 8.213 ms (5%) 4.00 MiB (1%) 14
["upsample", "linear", "Float32", "fw", "4-N(128)-scale((1, 2))"] 254.542 μs (5%) 128.44 KiB (1%) 14
["upsample", "linear", "Float32", "fw", "4-N(128)-scale(2)"] 366.289 μs (5%) 256.44 KiB (1%) 14
["upsample", "linear", "Float32", "fw", "4-N(256)-scale(4)"] 5.816 ms (5%) 4.00 MiB (1%) 14
["upsample", "linear", "Float32", "fw", "4-N(256)-scale(8)"] 33.548 ms (5%) 16.00 MiB (1%) 14
["upsample", "nearest", "4-N(128)", "Float16"] 2.197 ms (5%) 6.25 MiB (1%) 14
["upsample", "nearest", "4-N(128)", "Float32"] 2.201 ms (5%) 6.25 MiB (1%) 14
["upsample", "nearest", "4-N(128)", "Float64"] 2.195 ms (5%) 6.25 MiB (1%) 14
["upsample", "nearest", "4-N(2048)", "Float16"] 1.175 s (5%) 1.56 GiB (1%) 14
["upsample", "nearest", "4-N(2048)", "Float32"] 1.171 s (5%) 1.56 GiB (1%) 14
["upsample", "nearest", "4-N(2048)", "Float64"] 1.170 s (5%) 1.56 GiB (1%) 14
["upsample", "nearest", "4-N(512)", "Float16"] 35.059 ms (5%) 100.00 MiB (1%) 14
["upsample", "nearest", "4-N(512)", "Float32"] 35.083 ms (5%) 100.00 MiB (1%) 14
["upsample", "nearest", "4-N(512)", "Float64"] 35.069 ms (5%) 100.00 MiB (1%) 14

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["activations", "Float16"]
  • ["activations", "Float32"]
  • ["activations", "Float64"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64"]
  • ["dropout", "4-N(100)", "dropout!"]
  • ["dropout", "4-N(100)", "dropout"]
  • ["dropout", "4-N(1000)", "dropout!"]
  • ["dropout", "4-N(1000)", "dropout"]
  • ["dropout", "4-N(10000)", "dropout!"]
  • ["dropout", "4-N(10000)", "dropout"]
  • ["pooling", "4-N(20)-K(2)-stride(1)", "lpnormpool2d-direct"]
  • ["pooling", "4-N(20)-K(2)-stride(1)", "maxpool2d-direct"]
  • ["pooling", "4-N(20)-K(2)-stride(1)", "meanpool2d-direct"]
  • ["pooling", "4-N(20)-K(2)-stride(2)", "lpnormpool2d-direct"]
  • ["pooling", "4-N(20)-K(2)-stride(2)", "maxpool2d-direct"]
  • ["pooling", "4-N(20)-K(2)-stride(2)", "meanpool2d-direct"]
  • ["pooling", "4-N(20)-K(2)-stride(4)", "lpnormpool2d-direct"]
  • ["pooling", "4-N(20)-K(2)-stride(4)", "maxpool2d-direct"]
  • ["pooling", "4-N(20)-K(2)-stride(4)", "meanpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(1)", "lpnormpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(1)", "maxpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(1)", "meanpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(2)", "lpnormpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(2)", "maxpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(2)", "meanpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(4)", "lpnormpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(4)", "maxpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(4)", "meanpool2d-direct"]
  • ["softmax", "logsoftmax", "Float16", "bw"]
  • ["softmax", "logsoftmax", "Float16", "fw"]
  • ["softmax", "logsoftmax", "Float32", "bw"]
  • ["softmax", "logsoftmax", "Float32", "fw"]
  • ["softmax", "softmax", "Float16", "bw"]
  • ["softmax", "softmax", "Float16", "fw"]
  • ["softmax", "softmax", "Float32", "bw"]
  • ["softmax", "softmax", "Float32", "fw"]
  • ["upsample", "linear", "Float16", "bw"]
  • ["upsample", "linear", "Float16", "fw"]
  • ["upsample", "linear", "Float32", "bw"]
  • ["upsample", "linear", "Float32", "fw"]
  • ["upsample", "nearest", "4-N(128)"]
  • ["upsample", "nearest", "4-N(2048)"]
  • ["upsample", "nearest", "4-N(512)"]

Julia versioninfo

Julia Version 1.8.3
Commit 0434deb161e (2022-11-14 20:14 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      Ubuntu 22.04.2 LTS
  uname: Linux 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz: 
                 speed         user         nice          sys         idle          irq
       #1-64  2100 MHz    4503994 s       6177 s    2688877 s  2661682743 s          0 s
  Memory: 125.51467895507812 GB (117663.8515625 MB free)
  Uptime: 4.17160902e6 sec
  Load Avg:  1.09  1.04  1.1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, cascadelake)
  Threads: 1 on 64 virtual cores

Baseline result

Job Properties

  • Time of benchmark: 6 Jul 2023 - 15:26
  • Package commit: dirty
  • Julia commit: 0434de
  • Julia command flags: None
  • Environment variables: JULIA_NUM_THREADS => 1

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["activations", "Float16", "celu"] 9.327 ms (5%)
["activations", "Float16", "elu"] 9.426 ms (5%)
["activations", "Float16", "gelu"] 36.673 ms (5%)
["activations", "Float16", "hardswish"] 5.921 ms (5%)
["activations", "Float16", "hardtanh"] 1.237 ms (5%)
["activations", "Float16", "hardσ"] 4.154 ms (5%)
["activations", "Float16", "leakyrelu"] 1.386 ms (5%)
["activations", "Float16", "lisht"] 10.871 ms (5%)
["activations", "Float16", "logcosh"] 47.988 ms (5%)
["activations", "Float16", "logσ"] 46.892 ms (5%)
["activations", "Float16", "mish"] 61.520 ms (5%)
["activations", "Float16", "relu"] 836.730 μs (5%)
["activations", "Float16", "relu6"] 1.239 ms (5%)
["activations", "Float16", "rrelu"] 5.671 ms (5%)
["activations", "Float16", "selu"] 14.843 ms (5%)
["activations", "Float16", "sigmoid_fast"] 18.095 ms (5%)
["activations", "Float16", "softplus"] 42.265 ms (5%)
["activations", "Float16", "softshrink"] 9.651 ms (5%)
["activations", "Float16", "softsign"] 3.635 ms (5%)
["activations", "Float16", "swish"] 19.782 ms (5%)
["activations", "Float16", "tanh_fast"] 9.587 ms (5%)
["activations", "Float16", "tanhshrink"] 11.044 ms (5%)
["activations", "Float16", "trelu"] 996.659 μs (5%)
["activations", "Float16", "σ"] 18.040 ms (5%)
["activations", "Float32", "celu"] 6.258 ms (5%)
["activations", "Float32", "elu"] 5.933 ms (5%)
["activations", "Float32", "gelu"] 10.473 ms (5%)
["activations", "Float32", "hardswish"] 329.964 μs (5%)
["activations", "Float32", "hardtanh"] 329.572 μs (5%)
["activations", "Float32", "hardσ"] 331.613 μs (5%)
["activations", "Float32", "leakyrelu"] 328.684 μs (5%)
["activations", "Float32", "lisht"] 430.610 μs (5%)
["activations", "Float32", "logcosh"] 28.006 ms (5%)
["activations", "Float32", "logσ"] 26.983 ms (5%)
["activations", "Float32", "mish"] 42.001 ms (5%)
["activations", "Float32", "relu"] 328.649 μs (5%)
["activations", "Float32", "relu6"] 328.641 μs (5%)
["activations", "Float32", "rrelu"] 2.314 ms (5%)
["activations", "Float32", "selu"] 6.072 ms (5%)
["activations", "Float32", "sigmoid_fast"] 7.579 ms (5%)
["activations", "Float32", "softplus"] 26.246 ms (5%)
["activations", "Float32", "softshrink"] 331.573 μs (5%)
["activations", "Float32", "softsign"] 332.008 μs (5%)
["activations", "Float32", "swish"] 7.723 ms (5%)
["activations", "Float32", "tanh_fast"] 424.283 μs (5%)
["activations", "Float32", "tanhshrink"] 429.953 μs (5%)
["activations", "Float32", "trelu"] 329.393 μs (5%)
["activations", "Float32", "σ"] 6.283 ms (5%)
["activations", "Float64", "celu"] 6.923 ms (5%)
["activations", "Float64", "elu"] 5.953 ms (5%)
["activations", "Float64", "gelu"] 10.074 ms (5%)
["activations", "Float64", "hardswish"] 754.711 μs (5%)
["activations", "Float64", "hardtanh"] 672.240 μs (5%)
["activations", "Float64", "hardσ"] 730.461 μs (5%)
["activations", "Float64", "leakyrelu"] 669.466 μs (5%)
["activations", "Float64", "lisht"] 9.325 ms (5%)
["activations", "Float64", "logcosh"] 25.101 ms (5%)
["activations", "Float64", "logσ"] 24.819 ms (5%)
["activations", "Float64", "mish"] 41.257 ms (5%)
["activations", "Float64", "relu"] 670.030 μs (5%)
["activations", "Float64", "relu6"] 672.849 μs (5%)
["activations", "Float64", "rrelu"] 2.099 ms (5%)
["activations", "Float64", "selu"] 5.947 ms (5%)
["activations", "Float64", "sigmoid_fast"] 7.042 ms (5%)
["activations", "Float64", "softplus"] 24.896 ms (5%)
["activations", "Float64", "softshrink"] 678.229 μs (5%)
["activations", "Float64", "softsign"] 679.528 μs (5%)
["activations", "Float64", "swish"] 7.529 ms (5%)
["activations", "Float64", "tanh_fast"] 9.077 ms (5%)
["activations", "Float64", "tanhshrink"] 9.321 ms (5%)
["activations", "Float64", "trelu"] 672.056 μs (5%)
["activations", "Float64", "σ"] 7.019 ms (5%)
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "conv"] 2.118 μs (5%) 1.53 KiB (1%) 21
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "data"] 2.540 μs (5%) 1.81 KiB (1%) 25
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "filter"] 2.695 μs (5%) 2.12 KiB (1%) 27
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "conv"] 2.101 μs (5%) 1.55 KiB (1%) 21
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "data"] 2.514 μs (5%) 1.83 KiB (1%) 25
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "filter"] 2.666 μs (5%) 2.38 KiB (1%) 27
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "conv"] 2.286 μs (5%) 2.20 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "data"] 2.170 μs (5%) 2.20 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "filter"] 1.108 μs (5%) 1.44 KiB (1%) 16
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "conv"] 2.227 μs (5%) 2.44 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "data"] 2.166 μs (5%) 2.42 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "filter"] 970.000 ns (5%) 1.66 KiB (1%) 16
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "conv"] 735.000 ns (5%) 1.12 KiB (1%) 15
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "data"] 2.771 μs (5%) 2.22 KiB (1%) 31
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "filter"] 3.316 μs (5%) 2.44 KiB (1%) 32
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "conv"] 714.000 ns (5%) 1.12 KiB (1%) 15
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "data"] 2.730 μs (5%) 2.23 KiB (1%) 31
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "filter"] 3.245 μs (5%) 2.69 KiB (1%) 32
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "conv"] 2.369 μs (5%) 2.33 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "data"] 2.193 μs (5%) 2.16 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "filter"] 1.073 μs (5%) 1.39 KiB (1%) 16
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "conv"] 2.336 μs (5%) 2.55 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "data"] 2.113 μs (5%) 2.38 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "filter"] 972.000 ns (5%) 1.61 KiB (1%) 16
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "conv"] 2.145 μs (5%) 1.53 KiB (1%) 21
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "data"] 2.547 μs (5%) 1.81 KiB (1%) 25
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "filter"] 2.630 μs (5%) 2.14 KiB (1%) 27
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "conv"] 2.144 μs (5%) 1.55 KiB (1%) 21
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "data"] 2.485 μs (5%) 1.83 KiB (1%) 25
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "filter"] 2.605 μs (5%) 2.41 KiB (1%) 27
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "conv"] 2.401 μs (5%) 2.27 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "data"] 2.214 μs (5%) 2.27 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "filter"] 1.246 μs (5%) 1.50 KiB (1%) 16
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "conv"] 2.347 μs (5%) 2.56 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "data"] 2.149 μs (5%) 2.55 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "filter"] 1.071 μs (5%) 1.78 KiB (1%) 16
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "conv"] 814.000 ns (5%) 1.12 KiB (1%) 15
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "data"] 2.723 μs (5%) 2.22 KiB (1%) 31
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "filter"] 3.308 μs (5%) 2.45 KiB (1%) 32
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "conv"] 820.000 ns (5%) 1.12 KiB (1%) 15
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "data"] 2.668 μs (5%) 2.23 KiB (1%) 31
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "filter"] 3.262 μs (5%) 2.72 KiB (1%) 32
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "conv"] 2.511 μs (5%) 2.39 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "data"] 2.253 μs (5%) 2.22 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "filter"] 1.174 μs (5%) 1.45 KiB (1%) 16
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "conv"] 2.436 μs (5%) 2.67 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "data"] 2.139 μs (5%) 2.50 KiB (1%) 22
["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "filter"] 1.088 μs (5%) 1.73 KiB (1%) 16
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "conv"] 4.399 μs (5%) 752 bytes (1%) 12
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "data"] 5.613 μs (5%) 1.05 KiB (1%) 16
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "filter"] 8.602 μs (5%) 5.86 KiB (1%) 18
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "conv"] 4.382 μs (5%) 768 bytes (1%) 12
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "data"] 5.612 μs (5%) 1.12 KiB (1%) 16
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "filter"] 8.989 μs (5%) 10.08 KiB (1%) 18
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "conv"] 6.393 μs (5%) 12.70 KiB (1%) 13
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "data"] 14.772 μs (5%) 12.70 KiB (1%) 13
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "filter"] 4.546 μs (5%) 11.94 KiB (1%) 7
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "conv"] 7.627 μs (5%) 24.03 KiB (1%) 14
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "data"] 16.555 μs (5%) 24.02 KiB (1%) 14
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "filter"] 4.201 μs (5%) 23.25 KiB (1%) 8
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "conv"] 6.890 μs (5%) 384 bytes (1%) 6
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "data"] 7.505 μs (5%) 1.52 KiB (1%) 22
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "filter"] 14.433 μs (5%) 6.25 KiB (1%) 23
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "conv"] 6.088 μs (5%) 384 bytes (1%) 6
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "data"] 7.359 μs (5%) 1.62 KiB (1%) 22
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "filter"] 15.157 μs (5%) 10.53 KiB (1%) 23
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "conv"] 6.886 μs (5%) 12.88 KiB (1%) 13
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "data"] 15.544 μs (5%) 12.70 KiB (1%) 13
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "filter"] 4.155 μs (5%) 11.94 KiB (1%) 7
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "conv"] 7.551 μs (5%) 24.19 KiB (1%) 14
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "data"] 14.844 μs (5%) 24.02 KiB (1%) 14
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "filter"] 5.035 μs (5%) 23.25 KiB (1%) 8
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "conv"] 5.771 μs (5%) 752 bytes (1%) 12
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "data"] 5.138 μs (5%) 1.05 KiB (1%) 16
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "filter"] 8.114 μs (5%) 6.39 KiB (1%) 18
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "conv"] 5.755 μs (5%) 768 bytes (1%) 12
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "data"] 5.132 μs (5%) 1.12 KiB (1%) 16
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "filter"] 8.515 μs (5%) 11.33 KiB (1%) 18
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "conv"] 14.472 μs (5%) 18.27 KiB (1%) 14
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "data"] 15.265 μs (5%) 18.27 KiB (1%) 14
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "filter"] 11.636 μs (5%) 17.50 KiB (1%) 8
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "conv"] 17.189 μs (5%) 35.28 KiB (1%) 14
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "data"] 18.250 μs (5%) 35.27 KiB (1%) 14
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "filter"] 12.925 μs (5%) 34.50 KiB (1%) 8
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "conv"] 10.932 μs (5%) 384 bytes (1%) 6
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "data"] 6.734 μs (5%) 1.52 KiB (1%) 22
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "filter"] 16.581 μs (5%) 6.78 KiB (1%) 23
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "conv"] 9.792 μs (5%) 384 bytes (1%) 6
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "data"] 6.646 μs (5%) 1.62 KiB (1%) 22
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "filter"] 17.342 μs (5%) 11.78 KiB (1%) 23
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "conv"] 15.111 μs (5%) 18.44 KiB (1%) 14
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "data"] 16.495 μs (5%) 18.27 KiB (1%) 14
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "filter"] 11.451 μs (5%) 17.50 KiB (1%) 8
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "conv"] 17.766 μs (5%) 35.44 KiB (1%) 14
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "data"] 21.024 μs (5%) 35.27 KiB (1%) 14
["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "filter"] 12.701 μs (5%) 34.50 KiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "conv"] 141.817 μs (5%) 368 bytes (1%) 6
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "data"] 195.643 μs (5%) 848 bytes (1%) 10
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "filter"] 294.029 μs (5%) 86.05 KiB (1%) 15
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "conv"] 147.346 μs (5%) 384 bytes (1%) 6
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "data"] 174.076 μs (5%) 1.03 KiB (1%) 10
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "filter"] 294.993 μs (5%) 171.31 KiB (1%) 15
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "conv"] 136.618 μs (5%) 615.95 KiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "data"] 680.774 μs (5%) 615.95 KiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "filter"] 134.725 μs (5%) 615.19 KiB (1%) 2
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "conv"] 227.749 μs (5%) 1.20 MiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "data"] 710.733 μs (5%) 1.20 MiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "filter"] 226.880 μs (5%) 1.20 MiB (1%) 2
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "conv"] 266.521 μs (5%)
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "data"] 257.510 μs (5%) 1.38 KiB (1%) 16
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "filter"] 394.923 μs (5%) 86.59 KiB (1%) 20
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "conv"] 226.163 μs (5%)
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "data"] 256.993 μs (5%) 1.67 KiB (1%) 16
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "filter"] 397.225 μs (5%) 172.05 KiB (1%) 20
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "conv"] 136.083 μs (5%) 616.12 KiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "data"] 690.466 μs (5%) 615.95 KiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "filter"] 134.071 μs (5%) 615.19 KiB (1%) 2
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "conv"] 229.966 μs (5%) 1.20 MiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "data"] 742.977 μs (5%) 1.20 MiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "filter"] 211.674 μs (5%) 1.20 MiB (1%) 2
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "conv"] 259.705 μs (5%) 368 bytes (1%) 6
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "data"] 195.488 μs (5%) 848 bytes (1%) 10
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "filter"] 300.643 μs (5%) 104.86 KiB (1%) 15
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "conv"] 233.938 μs (5%) 384 bytes (1%) 6
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "data"] 201.195 μs (5%) 1.03 KiB (1%) 10
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "filter"] 301.619 μs (5%) 208.94 KiB (1%) 15
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "conv"] 749.629 μs (5%) 1.10 MiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "data"] 900.470 μs (5%) 1.10 MiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "filter"] 745.621 μs (5%) 1.10 MiB (1%) 2
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "conv"] 1.035 ms (5%) 2.19 MiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "data"] 1.008 ms (5%) 2.19 MiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "filter"] 1.037 ms (5%) 2.19 MiB (1%) 2
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "conv"] 502.635 μs (5%)
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "data"] 220.892 μs (5%) 1.38 KiB (1%) 16
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "filter"] 482.268 μs (5%) 105.41 KiB (1%) 20
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "conv"] 450.889 μs (5%)
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "data"] 220.297 μs (5%) 1.67 KiB (1%) 16
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "filter"] 485.941 μs (5%) 209.67 KiB (1%) 20
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "conv"] 748.431 μs (5%) 1.10 MiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "data"] 926.264 μs (5%) 1.10 MiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "filter"] 739.664 μs (5%) 1.10 MiB (1%) 2
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "conv"] 1.042 ms (5%) 2.19 MiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "data"] 1.037 ms (5%) 2.19 MiB (1%) 8
["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "filter"] 1.024 ms (5%) 2.19 MiB (1%) 2
["dropout", "4-N(100)", "dropout!", "with-colon"] 3.496 μs (5%)
["dropout", "4-N(100)", "dropout!", "with-dim"] 2.178 μs (5%) 544 bytes (1%) 2
["dropout", "4-N(100)", "dropout", "with-colon"] 5.618 μs (5%) 39.17 KiB (1%) 3
["dropout", "4-N(100)", "dropout", "with-dim"] 3.916 μs (5%) 39.70 KiB (1%) 5
["dropout", "4-N(1000)", "dropout!", "with-colon"] 472.457 μs (5%)
["dropout", "4-N(1000)", "dropout!", "with-dim"] 312.454 μs (5%) 4.11 KiB (1%) 2
["dropout", "4-N(1000)", "dropout", "with-colon"] 841.101 μs (5%) 3.81 MiB (1%) 3
["dropout", "4-N(1000)", "dropout", "with-dim"] 479.710 μs (5%) 3.82 MiB (1%) 5
["dropout", "4-N(10000)", "dropout!", "with-colon"] 109.166 ms (5%)
["dropout", "4-N(10000)", "dropout!", "with-dim"] 65.611 ms (5%) 39.17 KiB (1%) 3
["dropout", "4-N(10000)", "dropout", "with-colon"] 236.470 ms (5%) 381.47 MiB (1%) 3
["dropout", "4-N(10000)", "dropout", "with-dim"] 186.145 ms (5%) 381.51 MiB (1%) 6
["pooling", "4-N(20)-K(2)-stride(1)", "lpnormpool2d-direct", "data"] 13.331 μs (5%) 864 bytes (1%) 14
["pooling", "4-N(20)-K(2)-stride(1)", "lpnormpool2d-direct", "pool"] 8.590 μs (5%) 752 bytes (1%) 13
["pooling", "4-N(20)-K(2)-stride(1)", "maxpool2d-direct", "data"] 2.998 μs (5%) 816 bytes (1%) 11
["pooling", "4-N(20)-K(2)-stride(1)", "maxpool2d-direct", "pool"] 2.259 μs (5%) 704 bytes (1%) 11
["pooling", "4-N(20)-K(2)-stride(1)", "meanpool2d-direct", "data"] 1.582 μs (5%) 816 bytes (1%) 11
["pooling", "4-N(20)-K(2)-stride(1)", "meanpool2d-direct", "pool"] 2.170 μs (5%) 704 bytes (1%) 11
["pooling", "4-N(20)-K(2)-stride(2)", "lpnormpool2d-direct", "data"] 4.510 μs (5%) 864 bytes (1%) 14
["pooling", "4-N(20)-K(2)-stride(2)", "lpnormpool2d-direct", "pool"] 3.819 μs (5%) 752 bytes (1%) 13
["pooling", "4-N(20)-K(2)-stride(2)", "maxpool2d-direct", "data"] 1.624 μs (5%) 816 bytes (1%) 11
["pooling", "4-N(20)-K(2)-stride(2)", "maxpool2d-direct", "pool"] 2.035 μs (5%) 704 bytes (1%) 11
["pooling", "4-N(20)-K(2)-stride(2)", "meanpool2d-direct", "data"] 1.088 μs (5%) 816 bytes (1%) 11
["pooling", "4-N(20)-K(2)-stride(2)", "meanpool2d-direct", "pool"] 1.961 μs (5%) 704 bytes (1%) 11
["pooling", "4-N(20)-K(2)-stride(4)", "lpnormpool2d-direct", "data"] 1.937 μs (5%) 864 bytes (1%) 14
["pooling", "4-N(20)-K(2)-stride(4)", "lpnormpool2d-direct", "pool"] 2.524 μs (5%) 752 bytes (1%) 13
["pooling", "4-N(20)-K(2)-stride(4)", "maxpool2d-direct", "data"] 1.128 μs (5%) 816 bytes (1%) 11
["pooling", "4-N(20)-K(2)-stride(4)", "maxpool2d-direct", "pool"] 2.094 μs (5%) 704 bytes (1%) 11
["pooling", "4-N(20)-K(2)-stride(4)", "meanpool2d-direct", "data"] 956.000 ns (5%) 816 bytes (1%) 11
["pooling", "4-N(20)-K(2)-stride(4)", "meanpool2d-direct", "pool"] 2.072 μs (5%) 704 bytes (1%) 11
["pooling", "4-N(20)-K(4)-stride(1)", "lpnormpool2d-direct", "data"] 44.728 μs (5%) 864 bytes (1%) 14
["pooling", "4-N(20)-K(4)-stride(1)", "lpnormpool2d-direct", "pool"] 15.958 μs (5%) 752 bytes (1%) 13
["pooling", "4-N(20)-K(4)-stride(1)", "maxpool2d-direct", "data"] 2.788 μs (5%) 816 bytes (1%) 11
["pooling", "4-N(20)-K(4)-stride(1)", "maxpool2d-direct", "pool"] 2.401 μs (5%) 704 bytes (1%) 11
["pooling", "4-N(20)-K(4)-stride(1)", "meanpool2d-direct", "data"] 2.973 μs (5%) 816 bytes (1%) 11
["pooling", "4-N(20)-K(4)-stride(1)", "meanpool2d-direct", "pool"] 2.346 μs (5%) 704 bytes (1%) 11
["pooling", "4-N(20)-K(4)-stride(2)", "lpnormpool2d-direct", "data"] 13.379 μs (5%) 864 bytes (1%) 14
["pooling", "4-N(20)-K(4)-stride(2)", "lpnormpool2d-direct", "pool"] 5.888 μs (5%) 752 bytes (1%) 13
["pooling", "4-N(20)-K(4)-stride(2)", "maxpool2d-direct", "data"] 1.486 μs (5%) 816 bytes (1%) 11
["pooling", "4-N(20)-K(4)-stride(2)", "maxpool2d-direct", "pool"] 2.168 μs (5%) 704 bytes (1%) 11
["pooling", "4-N(20)-K(4)-stride(2)", "meanpool2d-direct", "data"] 1.527 μs (5%) 816 bytes (1%) 11
["pooling", "4-N(20)-K(4)-stride(2)", "meanpool2d-direct", "pool"] 2.152 μs (5%) 704 bytes (1%) 11
["pooling", "4-N(20)-K(4)-stride(4)", "lpnormpool2d-direct", "data"] 4.843 μs (5%) 864 bytes (1%) 14
["pooling", "4-N(20)-K(4)-stride(4)", "lpnormpool2d-direct", "pool"] 3.260 μs (5%) 752 bytes (1%) 13
["pooling", "4-N(20)-K(4)-stride(4)", "maxpool2d-direct", "data"] 1.127 μs (5%) 816 bytes (1%) 11
["pooling", "4-N(20)-K(4)-stride(4)", "maxpool2d-direct", "pool"] 2.075 μs (5%) 704 bytes (1%) 11
["pooling", "4-N(20)-K(4)-stride(4)", "meanpool2d-direct", "data"] 1.086 μs (5%) 816 bytes (1%) 11
["pooling", "4-N(20)-K(4)-stride(4)", "meanpool2d-direct", "pool"] 2.083 μs (5%) 704 bytes (1%) 11
["softmax", "logsoftmax", "Float16", "bw", (1024, 2048, 4)] 150.733 ms (5%) 16.02 MiB (1%) 3
["softmax", "logsoftmax", "Float16", "bw", (12288, 2048, 1)] 452.639 ms (5%) 48.00 MiB (1%) 3
["softmax", "logsoftmax", "Float16", "bw", (128, 384, 8)] 6.918 ms (5%) 774.19 KiB (1%) 3
["softmax", "logsoftmax", "Float16", "bw", (2048, 2048, 2)] 150.788 ms (5%) 16.01 MiB (1%) 3
["softmax", "logsoftmax", "Float16", "bw", (4096, 2048, 2)] 301.777 ms (5%) 32.01 MiB (1%) 3
["softmax", "logsoftmax", "Float16", "bw", (4096, 4096, 2)] 615.310 ms (5%) 64.02 MiB (1%) 3
["softmax", "logsoftmax", "Float16", "bw", (512, 784, 8)] 57.429 ms (5%) 6.14 MiB (1%) 3
["softmax", "logsoftmax", "Float16", "bw", (768, 1024, 4)] 56.340 ms (5%) 6.01 MiB (1%) 3
["softmax", "logsoftmax", "Float16", "fw", (1024, 2048, 4)] 119.423 ms (5%) 48.38 KiB (1%) 3
["softmax", "logsoftmax", "Float16", "fw", (12288, 2048, 1)] 355.507 ms (5%) 12.38 KiB (1%) 3
["softmax", "logsoftmax", "Float16", "fw", (128, 384, 8)] 5.643 ms (5%) 18.38 KiB (1%) 3
["softmax", "logsoftmax", "Float16", "fw", (2048, 2048, 2)] 119.151 ms (5%) 24.38 KiB (1%) 3
["softmax", "logsoftmax", "Float16", "fw", (4096, 2048, 2)] 238.737 ms (5%) 24.38 KiB (1%) 3
["softmax", "logsoftmax", "Float16", "fw", (4096, 4096, 2)] 477.148 ms (5%) 48.38 KiB (1%) 3
["softmax", "logsoftmax", "Float16", "fw", (512, 784, 8)] 45.938 ms (5%) 37.12 KiB (1%) 3
["softmax", "logsoftmax", "Float16", "fw", (768, 1024, 4)] 44.848 ms (5%) 24.38 KiB (1%) 3
["softmax", "logsoftmax", "Float32", "bw", (1024, 2048, 4)] 68.423 ms (5%) 32.03 MiB (1%) 4
["softmax", "logsoftmax", "Float32", "bw", (12288, 2048, 1)] 206.334 ms (5%) 96.01 MiB (1%) 3
["softmax", "logsoftmax", "Float32", "bw", (128, 384, 8)] 3.202 ms (5%) 1.51 MiB (1%) 3
["softmax", "logsoftmax", "Float32", "bw", (2048, 2048, 2)] 68.453 ms (5%) 32.02 MiB (1%) 3
["softmax", "logsoftmax", "Float32", "bw", (4096, 2048, 2)] 137.585 ms (5%) 64.02 MiB (1%) 3
["softmax", "logsoftmax", "Float32", "bw", (4096, 4096, 2)] 297.599 ms (5%) 128.03 MiB (1%) 4
["softmax", "logsoftmax", "Float32", "bw", (512, 784, 8)] 25.899 ms (5%) 12.27 MiB (1%) 4
["softmax", "logsoftmax", "Float32", "bw", (768, 1024, 4)] 25.303 ms (5%) 12.02 MiB (1%) 3
["softmax", "logsoftmax", "Float32", "fw", (1024, 2048, 4)] 65.738 ms (5%) 96.19 KiB (1%) 6
["softmax", "logsoftmax", "Float32", "fw", (12288, 2048, 1)] 199.413 ms (5%) 24.38 KiB (1%) 3
["softmax", "logsoftmax", "Float32", "fw", (128, 384, 8)] 2.951 ms (5%) 36.38 KiB (1%) 3
["softmax", "logsoftmax", "Float32", "fw", (2048, 2048, 2)] 65.743 ms (5%) 48.38 KiB (1%) 3
["softmax", "logsoftmax", "Float32", "fw", (4096, 2048, 2)] 132.870 ms (5%) 48.38 KiB (1%) 3
["softmax", "logsoftmax", "Float32", "fw", (4096, 4096, 2)] 266.600 ms (5%) 96.19 KiB (1%) 6
["softmax", "logsoftmax", "Float32", "fw", (512, 784, 8)] 24.274 ms (5%) 73.69 KiB (1%) 6
["softmax", "logsoftmax", "Float32", "fw", (768, 1024, 4)] 23.857 ms (5%) 48.38 KiB (1%) 3
["softmax", "softmax", "Float16", "bw", (1024, 2048, 4)] 92.962 ms (5%) 16.02 MiB (1%) 3
["softmax", "softmax", "Float16", "bw", (12288, 2048, 1)] 280.139 ms (5%) 48.00 MiB (1%) 3
["softmax", "softmax", "Float16", "bw", (128, 384, 8)] 4.277 ms (5%) 774.19 KiB (1%) 3
["softmax", "softmax", "Float16", "bw", (2048, 2048, 2)] 92.975 ms (5%) 16.01 MiB (1%) 3
["softmax", "softmax", "Float16", "bw", (4096, 2048, 2)] 185.083 ms (5%) 32.01 MiB (1%) 3
["softmax", "softmax", "Float16", "bw", (4096, 4096, 2)] 384.664 ms (5%) 64.02 MiB (1%) 3
["softmax", "softmax", "Float16", "bw", (512, 784, 8)] 35.481 ms (5%) 6.14 MiB (1%) 3
["softmax", "softmax", "Float16", "bw", (768, 1024, 4)] 34.505 ms (5%) 6.01 MiB (1%) 3
["softmax", "softmax", "Float16", "fw", (1024, 2048, 4)] 148.689 ms (5%) 16.12 KiB (1%) 1
["softmax", "softmax", "Float16", "fw", (12288, 2048, 1)] 453.007 ms (5%) 4.12 KiB (1%) 1
["softmax", "softmax", "Float16", "fw", (128, 384, 8)] 7.041 ms (5%) 6.12 KiB (1%) 1
["softmax", "softmax", "Float16", "fw", (2048, 2048, 2)] 148.453 ms (5%) 8.12 KiB (1%) 1
["softmax", "softmax", "Float16", "fw", (4096, 2048, 2)] 297.647 ms (5%) 8.12 KiB (1%) 1
["softmax", "softmax", "Float16", "fw", (4096, 4096, 2)] 603.177 ms (5%) 16.12 KiB (1%) 1
["softmax", "softmax", "Float16", "fw", (512, 784, 8)] 56.662 ms (5%) 12.38 KiB (1%) 1
["softmax", "softmax", "Float16", "fw", (768, 1024, 4)] 55.487 ms (5%) 8.12 KiB (1%) 1
["softmax", "softmax", "Float32", "bw", (1024, 2048, 4)] 15.798 ms (5%) 32.03 MiB (1%) 4
["softmax", "softmax", "Float32", "bw", (12288, 2048, 1)] 50.235 ms (5%) 96.01 MiB (1%) 3
["softmax", "softmax", "Float32", "bw", (128, 384, 8)] 717.637 μs (5%) 1.51 MiB (1%) 3
["softmax", "softmax", "Float32", "bw", (2048, 2048, 2)] 15.763 ms (5%) 32.02 MiB (1%) 3
["softmax", "softmax", "Float32", "bw", (4096, 2048, 2)] 32.668 ms (5%) 64.02 MiB (1%) 3
["softmax", "softmax", "Float32", "bw", (4096, 4096, 2)] 66.739 ms (5%) 128.03 MiB (1%) 4
["softmax", "softmax", "Float32", "bw", (512, 784, 8)] 5.737 ms (5%) 12.27 MiB (1%) 4
["softmax", "softmax", "Float32", "bw", (768, 1024, 4)] 5.544 ms (5%) 12.02 MiB (1%) 3
["softmax", "softmax", "Float32", "fw", (1024, 2048, 4)] 66.629 ms (5%) 32.06 KiB (1%) 2
["softmax", "softmax", "Float32", "fw", (12288, 2048, 1)] 201.378 ms (5%) 8.12 KiB (1%) 1
["softmax", "softmax", "Float32", "fw", (128, 384, 8)] 2.973 ms (5%) 12.12 KiB (1%) 1
["softmax", "softmax", "Float32", "fw", (2048, 2048, 2)] 66.720 ms (5%) 16.12 KiB (1%) 1
["softmax", "softmax", "Float32", "fw", (4096, 2048, 2)] 134.612 ms (5%) 16.12 KiB (1%) 1
["softmax", "softmax", "Float32", "fw", (4096, 4096, 2)] 271.005 ms (5%) 32.06 KiB (1%) 2
["softmax", "softmax", "Float32", "fw", (512, 784, 8)] 24.804 ms (5%) 24.56 KiB (1%) 2
["softmax", "softmax", "Float32", "fw", (768, 1024, 4)] 24.475 ms (5%) 16.12 KiB (1%) 1
["upsample", "linear", "Float16", "bw", "4-N(1024)-scale((0.5, 2))"] 50.271 ms (5%) 2.00 MiB (1%) 14
["upsample", "linear", "Float16", "bw", "4-N(128)-scale((1, 2))"] 786.943 μs (5%) 64.44 KiB (1%) 14
["upsample", "linear", "Float16", "bw", "4-N(128)-scale(2)"] 594.380 μs (5%) 128.53 KiB (1%) 17
["upsample", "linear", "Float16", "bw", "4-N(256)-scale(4)"] 2.576 ms (5%) 2.00 MiB (1%) 17
["upsample", "linear", "Float16", "bw", "4-N(256)-scale(8)"] 4.217 ms (5%) 8.00 MiB (1%) 17
["upsample", "linear", "Float16", "fw", "4-N(1024)-scale((0.5, 2))"] 41.578 ms (5%) 2.00 MiB (1%) 14
["upsample", "linear", "Float16", "fw", "4-N(128)-scale((1, 2))"] 1.244 ms (5%) 64.44 KiB (1%) 14
["upsample", "linear", "Float16", "fw", "4-N(128)-scale(2)"] 1.850 ms (5%) 128.44 KiB (1%) 14
["upsample", "linear", "Float16", "fw", "4-N(256)-scale(4)"] 29.942 ms (5%) 2.00 MiB (1%) 14
["upsample", "linear", "Float16", "fw", "4-N(256)-scale(8)"] 167.649 ms (5%) 8.00 MiB (1%) 14
["upsample", "linear", "Float32", "bw", "4-N(1024)-scale((0.5, 2))"] 9.440 ms (5%) 4.00 MiB (1%) 14
["upsample", "linear", "Float32", "bw", "4-N(128)-scale((1, 2))"] 156.981 μs (5%) 128.44 KiB (1%) 14
["upsample", "linear", "Float32", "bw", "4-N(128)-scale(2)"] 140.815 μs (5%) 256.53 KiB (1%) 17
["upsample", "linear", "Float32", "bw", "4-N(256)-scale(4)"] 975.996 μs (5%) 4.00 MiB (1%) 17
["upsample", "linear", "Float32", "bw", "4-N(256)-scale(8)"] 2.791 ms (5%) 16.00 MiB (1%) 17
["upsample", "linear", "Float32", "fw", "4-N(1024)-scale((0.5, 2))"] 8.157 ms (5%) 4.00 MiB (1%) 14
["upsample", "linear", "Float32", "fw", "4-N(128)-scale((1, 2))"] 254.513 μs (5%) 128.44 KiB (1%) 14
["upsample", "linear", "Float32", "fw", "4-N(128)-scale(2)"] 366.208 μs (5%) 256.44 KiB (1%) 14
["upsample", "linear", "Float32", "fw", "4-N(256)-scale(4)"] 5.878 ms (5%) 4.00 MiB (1%) 14
["upsample", "linear", "Float32", "fw", "4-N(256)-scale(8)"] 33.455 ms (5%) 16.00 MiB (1%) 14
["upsample", "nearest", "4-N(128)", "Float16"] 2.283 ms (5%) 6.25 MiB (1%) 14
["upsample", "nearest", "4-N(128)", "Float32"] 2.294 ms (5%) 6.25 MiB (1%) 14
["upsample", "nearest", "4-N(128)", "Float64"] 2.286 ms (5%) 6.25 MiB (1%) 14
["upsample", "nearest", "4-N(2048)", "Float16"] 1.208 s (5%) 1.56 GiB (1%) 14
["upsample", "nearest", "4-N(2048)", "Float32"] 1.212 s (5%) 1.56 GiB (1%) 14
["upsample", "nearest", "4-N(2048)", "Float64"] 1.209 s (5%) 1.56 GiB (1%) 14
["upsample", "nearest", "4-N(512)", "Float16"] 36.887 ms (5%) 100.00 MiB (1%) 14
["upsample", "nearest", "4-N(512)", "Float32"] 36.927 ms (5%) 100.00 MiB (1%) 14
["upsample", "nearest", "4-N(512)", "Float64"] 36.955 ms (5%) 100.00 MiB (1%) 14

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["activations", "Float16"]
  • ["activations", "Float32"]
  • ["activations", "Float64"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32"]
  • ["conv", "3-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32"]
  • ["conv", "4-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32"]
  • ["conv", "5-N(20)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64"]
  • ["dropout", "4-N(100)", "dropout!"]
  • ["dropout", "4-N(100)", "dropout"]
  • ["dropout", "4-N(1000)", "dropout!"]
  • ["dropout", "4-N(1000)", "dropout"]
  • ["dropout", "4-N(10000)", "dropout!"]
  • ["dropout", "4-N(10000)", "dropout"]
  • ["pooling", "4-N(20)-K(2)-stride(1)", "lpnormpool2d-direct"]
  • ["pooling", "4-N(20)-K(2)-stride(1)", "maxpool2d-direct"]
  • ["pooling", "4-N(20)-K(2)-stride(1)", "meanpool2d-direct"]
  • ["pooling", "4-N(20)-K(2)-stride(2)", "lpnormpool2d-direct"]
  • ["pooling", "4-N(20)-K(2)-stride(2)", "maxpool2d-direct"]
  • ["pooling", "4-N(20)-K(2)-stride(2)", "meanpool2d-direct"]
  • ["pooling", "4-N(20)-K(2)-stride(4)", "lpnormpool2d-direct"]
  • ["pooling", "4-N(20)-K(2)-stride(4)", "maxpool2d-direct"]
  • ["pooling", "4-N(20)-K(2)-stride(4)", "meanpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(1)", "lpnormpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(1)", "maxpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(1)", "meanpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(2)", "lpnormpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(2)", "maxpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(2)", "meanpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(4)", "lpnormpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(4)", "maxpool2d-direct"]
  • ["pooling", "4-N(20)-K(4)-stride(4)", "meanpool2d-direct"]
  • ["softmax", "logsoftmax", "Float16", "bw"]
  • ["softmax", "logsoftmax", "Float16", "fw"]
  • ["softmax", "logsoftmax", "Float32", "bw"]
  • ["softmax", "logsoftmax", "Float32", "fw"]
  • ["softmax", "softmax", "Float16", "bw"]
  • ["softmax", "softmax", "Float16", "fw"]
  • ["softmax", "softmax", "Float32", "bw"]
  • ["softmax", "softmax", "Float32", "fw"]
  • ["upsample", "linear", "Float16", "bw"]
  • ["upsample", "linear", "Float16", "fw"]
  • ["upsample", "linear", "Float32", "bw"]
  • ["upsample", "linear", "Float32", "fw"]
  • ["upsample", "nearest", "4-N(128)"]
  • ["upsample", "nearest", "4-N(2048)"]
  • ["upsample", "nearest", "4-N(512)"]

Julia versioninfo

Julia Version 1.8.3
Commit 0434deb161e (2022-11-14 20:14 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      Ubuntu 22.04.2 LTS
  uname: Linux 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz: 
                 speed         user         nice          sys         idle          irq
       #1-64  2100 MHz    4488711 s       6177 s    2682682 s  2660753665 s          0 s
  Memory: 125.51467895507812 GB (117928.24609375 MB free)
  Uptime: 4.17012335e6 sec
  Load Avg:  1.06  1.08  1.27
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, cascadelake)
  Threads: 1 on 64 virtual cores

Runtime information

Runtime Info
BLAS #threads 32
BLAS.vendor() openblas64
Sys.CPU_THREADS 64

lscpu output:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Address sizes:                   46 bits physical, 48 bits virtual
Byte Order:                      Little Endian
CPU(s):                          64
On-line CPU(s) list:             0-63
Vendor ID:                       GenuineIntel
Model name:                      Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
CPU family:                      6
Model:                           85
Thread(s) per core:              2
Core(s) per socket:              16
Socket(s):                       2
Stepping:                        7
CPU max MHz:                     3200.0000
CPU min MHz:                     800.0000
BogoMIPS:                        4200.00
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear flush_l1d arch_capabilities
Virtualization:                  VT-x
L1d cache:                       1 MiB (32 instances)
L1i cache:                       1 MiB (32 instances)
L2 cache:                        32 MiB (32 instances)
L3 cache:                        44 MiB (2 instances)
NUMA node(s):                    2
NUMA node0 CPU(s):               0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62
NUMA node1 CPU(s):               1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63
Vulnerability Itlb multihit:     KVM: Mitigation: VMX disabled
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed:          Mitigation; Enhanced IBRS
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; TSX disabled
Cpu Property Value
Brand Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
Vendor :Intel
Architecture :Skylake
Model Family: 0x06, Model: 0x55, Stepping: 0x07, Type: 0x00
Cores 16 physical cores, 32 logical cores (on executing CPU)
Hyperthreading hardware capability detected
Clock Frequencies 2100 / 3200 MHz (base/max), 100 MHz bus
Data Cache Level 1:3 : (32, 1024, 22528) kbytes
64 byte cache line size
Address Size 48 bits virtual, 46 bits physical
SIMD 512 bit = 64 byte max. SIMD vector size
Time Stamp Counter TSC is accessible via rdtsc
TSC runs at constant rate (invariant from clock frequency)
Perf. Monitoring Performance Monitoring Counters (PMC) revision 4
Available hardware counters per logical core:
3 fixed-function counters of 48 bit width
4 general-purpose counters of 48 bit width
Hypervisor No

@skyleaworlder skyleaworlder merged commit 0fb7ee9 into FluxML:main Jul 7, 2023
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants