Skip to content

Fast bitwise operations on byte-slices

License

Notifications You must be signed in to change notification settings

bwesterb/go-and

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

go-and

Go Reference

Fast bitwise and, or, xor, andn, popcount and memset for []byte slices.

import "github.com/bwesterb/go-and"

func main() {
    var a, b, dst []byte

    // ...

    // Sets dst to the bitwise and of a and b
    and.And(dst, a, b)
}

Makes use of AVX2 on AMD64 and NEON on ARM64.

Benchmarks

Created using ./benchmark.sh.

This shows three benchmarks:

  • naive is a simple for loop doing one byte at a time.
  • purego are our slightly optimized versions that work on uint64s instead of bytes.
  • asm are the AVX2 implementations and the reason to use this library.

Intel Core i9-13900

goos: linux
goarch: amd64
pkg: github.com/bwesterb/go-and
cpu: 13th Gen Intel(R) Core(TM) i9-13900
          │     naive     │                purego                │                 asm                 │
          │    sec/op     │    sec/op     vs base                │   sec/op     vs base                │
And-32       8162.0n ± 5%   2034.5n ± 1%  -75.07% (p=0.000 n=10)   518.4n ± 1%  -93.65% (p=0.000 n=10)
Or-32        9751.5n ± 8%   2104.5n ± 3%  -78.42% (p=0.000 n=10)   515.0n ± 1%  -94.72% (p=0.000 n=10)
Xor-32       8112.5n ± 3%   2029.0n ± 0%  -74.99% (p=0.000 n=10)   518.6n ± 1%  -93.61% (p=0.000 n=10)
AndNot-32   10685.5n ± 4%   2292.0n ± 2%  -78.55% (p=0.000 n=10)   517.8n ± 1%  -95.15% (p=0.000 n=10)
Memset-32    167.96µ ± 0%    57.54µ ± 1%  -65.74% (p=0.000 n=10)   15.65µ ± 1%  -90.68% (p=0.000 n=10)
Popcnt-32    132.15µ ± 1%    71.63µ ± 1%  -45.80% (p=0.000 n=10)   36.51µ ± 2%  -72.37% (p=0.000 n=10)
geomean       23.13µ         6.592µ       -71.50%                  1.857µ       -91.97%

          │    naive     │                 purego                 │                   asm                   │
          │     B/s      │      B/s       vs base                 │      B/s       vs base                  │
And-32      3.651Gi ± 5%   14.649Gi ± 1%  +301.20% (p=0.000 n=10)   57.488Gi ± 1%  +1474.44% (p=0.000 n=10)
Or-32       3.057Gi ± 8%   14.163Gi ± 3%  +363.37% (p=0.000 n=10)   57.872Gi ± 1%  +1793.33% (p=0.000 n=10)
Xor-32      3.674Gi ± 3%   14.690Gi ± 0%  +299.88% (p=0.000 n=10)   57.469Gi ± 1%  +1464.38% (p=0.000 n=10)
AndNot-32   2.789Gi ± 4%   13.003Gi ± 2%  +366.21% (p=0.000 n=10)   57.558Gi ± 1%  +1963.74% (p=0.000 n=10)
Memset-32   5.545Gi ± 0%   16.187Gi ± 1%  +191.91% (p=0.000 n=10)   59.504Gi ± 1%   +973.11% (p=0.000 n=10)
Popcnt-32   7.048Gi ± 1%   13.002Gi ± 1%   +84.48% (p=0.000 n=10)   25.507Gi ± 2%   +261.92% (p=0.000 n=10)
geomean     4.058Gi         14.24Gi       +250.89%                   50.56Gi       +1145.76%

Apple M2 Pro

goos: darwin
goarch: arm64
pkg: github.com/bwesterb/go-and
          │    naive     │                purego                │                 asm                 │
          │    sec/op    │    sec/op     vs base                │   sec/op     vs base                │
And-12      9891.5n ± 0%   2248.0n ± 1%  -77.27% (p=0.000 n=10)   511.2n ± 1%  -94.83% (p=0.000 n=10)
Or-12       9824.5n ± 1%   2267.5n ± 1%  -76.92% (p=0.000 n=10)   511.2n ± 1%  -94.80% (p=0.000 n=10)
Xor-12      9826.5n ± 0%   2232.0n ± 1%  -77.29% (p=0.000 n=10)   509.8n ± 1%  -94.81% (p=0.000 n=10)
AndNot-12   9928.5n ± 1%   2251.0n ± 1%  -77.33% (p=0.000 n=10)   581.7n ± 0%  -94.14% (p=0.000 n=10)
Memset-12   294.96µ ± 1%    79.18µ ± 2%  -73.15% (p=0.000 n=10)   10.83µ ± 7%  -96.33% (p=0.000 n=10)
Popcnt-12   288.90µ ± 1%    45.24µ ± 0%  -84.34% (p=0.000 n=10)   11.30µ ± 0%  -96.09% (p=0.000 n=10)
geomean      30.52µ         6.716µ       -77.99%                  1.455µ       -95.23%

          │    naive     │                 purego                 │                   asm                   │
          │     B/s      │      B/s       vs base                 │      B/s       vs base                  │
And-12      3.013Gi ± 0%   13.256Gi ± 1%  +339.97% (p=0.000 n=10)   58.297Gi ± 1%  +1834.87% (p=0.000 n=10)
Or-12       3.033Gi ± 1%   13.144Gi ± 1%  +333.30% (p=0.000 n=10)   58.303Gi ± 1%  +1821.98% (p=0.000 n=10)
Xor-12      3.033Gi ± 0%   13.353Gi ± 1%  +340.28% (p=0.000 n=10)   58.462Gi ± 1%  +1827.64% (p=0.000 n=10)
AndNot-12   3.002Gi ± 1%   13.240Gi ± 1%  +341.07% (p=0.000 n=10)   51.233Gi ± 0%  +1606.77% (p=0.000 n=10)
Memset-12   3.157Gi ± 1%   11.761Gi ± 2%  +272.50% (p=0.000 n=10)   86.041Gi ± 7%  +2625.05% (p=0.000 n=10)
Popcnt-12   3.224Gi ± 1%   20.585Gi ± 0%  +538.57% (p=0.000 n=10)   82.407Gi ± 0%  +2456.30% (p=0.000 n=10)
geomean     3.076Gi         13.98Gi       +354.43%                   64.53Gi       +1997.80%