This is the low-level Go Assembly part of the go-cv wrapper around Simd.
The Simd Library is a highly optimized image processing library. It provides many useful high performance algorithms for image processing such as:
- pixel format conversion
- image scaling and filtration
- extraction of statistic information from images
- motion detection
- object detection (HAAR and LBP classifier cascades)
- classification
- neural network
The algorithms are optimized using different SIMD CPU extensions. In particular the library supports following CPU extensions:
- x86/x64: SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX and AVX2
- ARM: NEON
This wrapper depends on c2goasm for embedding the assembly from the individual functions into Go.
See the sse2.sh script for more information about how to invoke.
A comparison against go-opencv shows the following results:
OpenCV SSE2
benchmark old ns/op new ns/op delta
BenchmarkGaussian-8 74338 18481 -75.14%
BenchmarkGaussianRGB-8 186024 57169 -69.27%
BenchmarkBlur-8 110155 16623 -84.91%
BenchmarkBlurRGB-8 293017 53716 -81.67%
BenchmarkMedian3x3-8 129268 23270 -82.00%
BenchmarkMedian3x3RGB-8 169857 65896 -61.21%
BenchmarkMedian5x5-8 883311 131812 -85.08%
BenchmarkMedian5x5RGB-8 1246845 388415 -68.85%
Below are the performance figures for SSE2
BenchmarkFilterRhomb3x3-8 200000 11114 ns/op
BenchmarkFilterRhomb3x3RGB-8 50000 30721 ns/op
BenchmarkFilterRhomb5x5-8 20000 64515 ns/op
BenchmarkFilterRhomb5x5RGB-8 10000 207618 ns/op
BenchmarkFilterSquare3x3-8 50000 24205 ns/op
BenchmarkFilterSquare3x3RGB-8 20000 89507 ns/op
BenchmarkFilterSquare5x5-8 10000 133944 ns/op
BenchmarkFilterSquare5x5RGB-8 5000 373998 ns/op
BenchmarkGaussianBlur-8 100000 18290 ns/op
BenchmarkBgraToGray-8 100000 21582 ns/op
BenchmarkMeanFilter-8 100000 16248 ns/op
BenchmarkMeanFilterRGB-8 20000 65692 ns/op
BenchmarkFillBgr-8 300000 5859 ns/op
BenchmarkFillBgra-8 200000 7691 ns/op
BenchmarkLaplace-8 50000 23146 ns/op
BenchmarkInt16ToGray-8 100000 23108 ns/op
BenchmarkGrayToBgra-8 200000 9651 ns/op
BenchmarkDeinterleaveUv-8 200000 11442 ns/op
BenchmarkBinarization-8 300000 3910 ns/op
BenchmarkAveragingBinarization-8 20000 71418 ns/op
BenchmarkBgraToYuv420p-8 30000 46726 ns/op
BenchmarkBgraToYuv422p-8 30000 55859 ns/op
BenchmarkBgraToYuv444p-8 20000 61443 ns/op
BenchmarkAlphaBlending-8 30000 57316 ns/op
BenchmarkAbsGradientSaturatedSum-8 200000 7354 ns/op
BenchmarkSquaredDifferenceSum-8 200000 6192 ns/op
BenchmarkSquaredDifferenceSumMasked-8 200000 8200 ns/op
BenchmarkAbsDifferenceSum-8 500000 3169 ns/op
BenchmarkAbsDifferenceSumMasked-8 300000 4378 ns/op
BenchmarkAbsDifferenceSums3x3-8 100000 12497 ns/op
BenchmarkAbsDifferenceSums3x3Masked-8 100000 18695 ns/op
BenchmarkAddFeatureDifference-8 200000 9140 ns/op
BenchmarkBackgroundInitMask-8 500000 3883 ns/op
BenchmarkBackgroundGrowRangeSlow-8 200000 7023 ns/op
BenchmarkBackgroundGrowRangeFast-8 300000 5264 ns/op
BenchmarkBackgroundIncrementCount-8 200000 10248 ns/op
BenchmarkBackgroundAdjustRange-8 100000 11832 ns/op
BenchmarkBackgroundAdjustRangeMasked-8 100000 14080 ns/op
BenchmarkBackgroundShiftRange-8 200000 6310 ns/op
BenchmarkBackgroundShiftRangeMasked-8 200000 8252 ns/op
BenchmarkHistogramAbsSecondDerivative-8 30000 46647 ns/op
BenchmarkHistogramMasked-8 30000 48566 ns/op
BenchmarkHistogramConditional-8 30000 48975 ns/op
BenchmarkSegmentationFillSingleHoles-8 50000 26726 ns/op
BenchmarkSegmentationChangeIndex-8 500000 3430 ns/op
BenchmarkSegmentationPropagate2x2-8 50000 38551 ns/op
BenchmarkSobelDx-8 100000 17599 ns/op
BenchmarkSobelDy-8 100000 17063 ns/op
BenchmarkConditionalCount8u-8 500000 3159 ns/op
BenchmarkConditionalCount16i-8 200000 6549 ns/op
BenchmarkConditionalSum-8 500000 3284 ns/op
BenchmarkConditionalSquareSum-8 300000 5789 ns/op
BenchmarkConditionalSquareGradientSum-8 100000 15947 ns/op
BenchmarkConditionalFill-8 300000 4195 ns/op
BenchmarkOperationBinary8u-8 500000 4055 ns/op
BenchmarkOperationBinary16i-8 200000 11250 ns/op
BenchmarkReduceGray2x2-8 100000 15383 ns/op
BenchmarkReduceGray3x3-8 20000 71412 ns/op
BenchmarkReduceGray4x4-8 10000 134250 ns/op
BenchmarkReorder16bit-8 500000 3125 ns/op
BenchmarkReorder32bit-8 300000 5176 ns/op
BenchmarkReorder64bit-8 300000 5134 ns/op
BenchmarkStretchGray2x2-8 500000 3548 ns/op
BenchmarkYuv420pToBgra-8 30000 50043 ns/op
BenchmarkYuv422pToBgra-8 30000 54433 ns/op
BenchmarkYuv444pToBgra-8 30000 54661 ns/op
BenchmarkTextureGetDifferenceSum-8 300000 5642 ns/op
BenchmarkTextureBoostedUv-8 300000 5139 ns/op
BenchmarkTextureBoostedSaturatedGradient-8 100000 22974 ns/op
BenchmarkTexturePerformCompensation-8 500000 2801 ns/op
See the underlying package go-cv-simd for more information.
go-cv-simd is released under the Apache License v2.0. You can find the complete text in the file LICENSE.