Skip to content

Releases: ermig1979/Simd

Simd v5.2.124

03 Apr 15:11
Compare
Choose a tag to compare

Algorithms

New features
  • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function BgraToYuv422pV2.
  • NEON optimizations of function BgraToYuv444pV2.
  • NEON optimizations of function BgraToYuv420pV2.
  • NEON optimizations of function Float32ToBFloat16.
  • NEON optimizations of function BFloat16ToFloat32.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function BgraToYuva420pV2.
  • Support of SynetUnaryOperation32fErf in Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetUnaryOperation32f.
  • Base implementation, SSE4.1, AVX2 optimizations of function SynetGelu32f.
Improving
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetSoftmaxLayerForward.
Bug fixing
  • Error in method View::ToOcv.
Removing
  • Support of all formats besides NHWC and NCHW for function SynetAddBias.
  • Support of all formats besides NHWC and NCHW for function SynetLrnLayerCrossChannels.
  • Support of all formats besides NHWC and NCHW for function SynetPreluLayerForward.
  • Support of all formats besides NHWC and NCHW for function SynetScaleLayerForward.
  • Support of all formats besides NHWC and NCHW for function SynetFusedLayerForward0.
  • Support of all formats besides NHWC and NCHW for function SynetFusedLayerForward1.
  • Support of all formats besides NHWC and NCHW for function SynetFusedLayerForward2.
  • Support of all formats besides NHWC and NCHW for function SynetFusedLayerForward3.
  • Support of all formats besides NHWC and NCHW for function SynetFusedLayerForward4.
  • Support of all formats besides NHWC and NCHW for function SynetFusedLayerForward8.
  • Support of all formats besides NHWC and NCHW for function SynetFusedLayerForward9.
  • Function SynetReorderFilter.
  • Function SynetReorderImage.
  • Function SynetTensorAlignment.
  • Function SynetSpecifyTensorFormat.
  • Support of all formats besides NHWC and NCHW for enumeration SimdTensorFormatType.
Renaming
  • Function from SynetUnaryOperation32fLayerForward to SynetUnaryOperation32f.

Test framework

New features
  • Test command line argument '-ts' to print statistics of time of tests execution.
  • Tests for verifying functionality of function BgraToYuv422pV2.
  • Tests for verifying functionality of function BgraToYuva420pV2.
  • Improving header of performance report.
  • Tests for verifying functionality of function SynetGelu32f.
Bug fixing
  • Error in test SynetUnaryOperation32fLayerForward.

Simd v5.2.123

09 Mar 11:31
Compare
Choose a tag to compare

Algorithms

Bug fixing
  • MSVS-2022 compiler errors in AmxBf16 project.
  • Clang compiler error in method Array::Release.
  • MSVS-2022 compiler warnings in file SimdSse41ResizerNearest.cpp.
  • MSVS-2022 compiler warnings in file SimdAvx512bwResizerNearest.cpp.
  • Error in SSE4.1, AVX, AVX2, AVX-512BW kernels of ConvolutionNhwcDirect_2 (fixed type).
  • MSVS-2022 compiler warnings in file SimdAvx2RecursiveBilateralFilter.cpp.
  • Error in file SimdFrame.hpp (function Simd::Convert).
  • Error in AVX2 optimizations of class RecursiveBilateralFilterFast (x86 only).
  • MSVS-2022 compiler error in file SimdInit.h (ARM64).
  • MSVS-2022 compiler errors in file SimdLog.h (ARM64).
  • MSVS-2022 compiler errors in file SimdConversion.h (ARM64).
  • MSVS-2022 compiler errors in file SimdNeonYuvToHue.cpp (ARM64).
  • MSVS-2022 compiler errors in file SimdNeonAbsDifferenceSum.cpp (ARM64).
  • MSVS-2022 compiler errors in file SimdNeonDetection.cpp (ARM64).
  • MSVS-2022 compiler errors in file SimdNeonHog.cpp (ARM64).
  • MSVS-2022 compiler errors in file SimdNeonLaplace.cpp (ARM64).
  • MSVS-2022 compiler errors in file SimdNeonMeanFilter3x3.cpp (ARM64).
  • MSVS-2022 compiler errors in file SimdNeonNeural.cpp (ARM64).
  • MSVS-2022 compiler warnings in file SimdNeonNeuralConvolution.cpp (ARM64).
  • MSVS-2022 compiler errors in file SimdNeonSobel.cpp (ARM64).
  • MSVS-2022 compiler errors in file SimdNeonSynetConversion.cpp (ARM64).
  • MSVS-2022 compiler errors in file SimdNeonSynetConvolution8i.cpp (ARM64).
  • MSVS-2022 compiler errors in file SimdNeonYuvToHue.cpp (ARM64).
  • Wrong assert in AVX-512BW optimization of function BgraToYuv420pV2.
  • MSVS-2017 compiler errors in AVX-512BW optimizations of function AlphaBlendingBgraToYuv420p.
  • MSVS-2017 compiler errors in AVX-512BW optimizations of WarpAffine engine.
  • MSVS-2015 compiler errors in file SimdFmadd.h (Win32).
  • MSVS-2015 compiler errors in SSE4.1 and AVX2 optimizations of WarpAffine engine (Win32).
  • Crash in AVX2 optimizations of function CosineDistance16f.
  • Crash in SSE4.1, AVX2, AVX-512BW, NEON optimizations of function HogLiteResizeFeatures.
  • Clang compiler warnings in file SimdBaseRecursiveBilateralFilter.cpp.
  • Clang compiler warnings in file SimdSse41RecursiveBilateralFilter.cpp.
  • Clang linker error in method Motion::Detector::GenerateSearchRegionScanlines.
  • Crashes in AVX-512BW optimizations of WarpAffine engine (MSVS-2022, Release).
  • Internal compiler error in file SimdAvx512bwSynetConvolution32f.cpp (MSVS-2017, Release).
  • Error in SSE4.1, AVX2 optimizations of function BgraToYuv444pV2 (MSVS-2015, Release, Win32).
  • Error in SSE4.1, AVX2 optimizations of function BgraToYuv420pV2 (MSVS-2015, Release, Win32).
  • Error in AVX2 optimizations of function Yuva444pToBgraV2 (MSVS-2015, Release, Win32).
  • Error in AVX2 optimizations of function Yuv444pToBgraV2 (MSVS-2015, Release, Win32).
  • Error in AVX2 optimizations of function Yuv420pToBgraV2 (MSVS-2015, Release, Win32).
  • Error in AVX2 optimizations of function AlphaBlendingBgraToYuv420p (MSVS-2015, Release, Win32).
  • Error in AVX2 optimizations of class ResizerByteArea2x2 (MSVS-2015, Release, Win32).
  • Error in AVX2 optimizations of function Uyvy422ToBgr (MSVS-2015, Release, Win32).

Test framework

New features
  • Handling of Windows exceptions in AutoTest.
Bug fixing
  • Error in test Nv12SaveAsJpegToMemoryAutoTest.
  • Error in test SynetAdd8iAutoTest.
  • Error in test SynetConvolution8iForwardAutoTest.
  • Error in test SynetScale8iForwardAutoTest.
  • Error in test WarpAffineAutoTest.
  • Error in test ResizeBilinearAutoTest.
  • Error in test SynetMergedConvolution8iForwardAutoTest.
  • Error in function MakeAutoTests (multithreaded environment).
  • Error in test Float32ToBFloat16AutoTest.
  • Error in test SynetConvert32fTo8uAutoTest (MSVS-2015 and MSVS-2017, Release, Win32).
  • Error in test SynetMergedConvolution32fForwardAutoTest (MSVS-2015 and MSVS-2017, Release, Win32).
  • Error in test CosineDistancesMxNp16fAutoTest.
  • Error in test VectorNormNp16fAutoTest.

Infrastructure

New features
  • Script BuildAll.cmd to build MSVS solution for all configurations and platforms.
  • Github actions script for CMake (build and test for x86_x64, Linux).
  • Github actions script for CMake (cross platform build for ARM, Linux).
  • Github actions script for Cmake (cross platform build for ARM64, Linux).
  • Github actions script for CMake (cross platform build for PowerPC, Linux).
  • Github actions script for CMake (build and test for clang, Linux).
  • Github actions script for MSBuild (build and test for Visual Studio 2022, Windows).
  • Script GetThreadCount.cmd.
  • Github actions script for MSBuild (build and test for Visual Studio 2019, Windows).
  • Github actions script for MSBuild (build and test for Visual Studio 2017, Windows).
  • Github actions script for MSBuild (build and test for Visual Studio 2015, Windows).
Renaming
  • Script TestVisualStudio.cmd to TestAll.cmd.

Documentation

Bug fixing
  • Wrong description of Cmake parameters.

Simd v5.2.122

01 Feb 14:08
Compare
Choose a tag to compare

Algorithms

New features
  • New API of function Avx512bw::TileZero (AMX emulation).
  • New API of function Avx512bw::TileLoad (AMX emulation).
  • New API of function Avx512bw::TileStore (AMX emulation).
  • New API of function Avx512bw::TileMatMulBf16 (AMX emulation).
  • New API of function Avx512bw::TileMatMul8u8i (AMX emulation).
  • Function Avx512bw::TileMatMulFp16(AMX emulation).
  • The mark of function SimdInterferenceIncrement as deprecated.
  • The mark of function SimdInterferenceIncrementMasked as deprecated.
  • The mark of function SimdInterferenceDecrement as deprecated.
  • The mark of function SimdInterferenceDecrementMasked as deprecated.
  • The mark of function SimdSynetReorderImage as deprecated.
  • The mark of function SimdSynetReorderFilter as deprecated.
  • SimdTensorData16f (16-bit floating point) tensor type.
  • The mark of function SimdSynetSpecifyTensorFormat as deprecated.
  • The mark of function SimdSynetTensorAlignment as deprecated.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of SynetPermute engine.
  • NEON optimizations of function Yuva444pToBgraV2.
  • NEON optimizations of function AlphaBlending2x.
  • SSE4.1, AVX2, AVX-512BW optimizations of function BgraToYuv444pV2.
  • SSE4.1, AVX2, AVX-512BW optimizations of function BgraToYuv420pV2.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function AlphaBlendingBgraToYuv420p.
  • Parameter 'copy' to View::Capure.
  • Method Array::Release.
Improving
  • NEON optimizations of function TransformImage.
Bug fixing
  • Clang compiler error in function Simd::WarpAffine.
  • MSVS-2022 compiler warnings in file SimdBaseRecursiveBilateralFilter.cpp.
  • MSVS-2022 compiler warnings in file SimdSse41RecursiveBilateralFilter.cpp.
  • MSVS-2015 compiler error in file SimdAvx2RecursiveBilateralFilter.cpp.
  • Error in method MergConvParam32f::Valid.
  • Crash in constructor of Simd::TileConf.
  • Crash in AVX and AVX2 optimizations of function SynetInnerProductLayerForward.
  • MSVS-2022 compiler error in file SimdAvx2RecursiveBilateralFilter.cpp (Win32 target).
  • GCC compiler error in file SimdParallel.hpp (for AVX2 optimizations).

Test framework

New features
  • Tests for verifying functionality of SynetPermut engine.
  • Tests for verifying functionality of function AlphaBlendingBgraToYuv420p.
Bug fixing
  • Crash in test GaussianBlurAutoTest.

Infrastructure

New features
  • Install target in Cmake.
  • Uninstall target in Cmake.
Renaming
  • Project Amx to AmxBf16.

Simd v5.2.121

03 Jan 05:44
Compare
Choose a tag to compare

Algorithms

New features
  • SIMD_DEPRECATED macro.
  • The mark of function SimdSvmSumLinear as deprecated.
  • SSE4.1, AVX2, AVX-512BW optimizations of function SynetNormalizeLayerForward.
  • Enumeration SimdWarpAffineFlags.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of class WarpAffineNearest.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of class WarpAffineBilinear.
  • Multi-threaded optimizations of class WarpAffineNearest.
  • Multi-threaded optimizations of class WarpAffineBilinear.
  • Function Simd::WarpAffine.
  • Function Simd::Mean.
  • Function Simd::OtsuThreshold.
  • Function Simd::RecursiveBilateralFilter.
  • The mark of function SimdEdgeBackgroundGrowRangeSlow as deprecated.
  • The mark of function SimdEdgeBackgroundGrowRangeFast as deprecated.
  • The mark of function SimdEdgeBackgroundIncrementCount as deprecated.
  • The mark of function SimdEdgeBackgroundAdjustRange as deprecated.
  • The mark of function SimdEdgeBackgroundAdjustRangeMasked as deprecated.
  • The mark of function SimdEdgeBackgroundShiftRange as deprecated.
  • The mark of function SimdEdgeBackgroundShiftRangeMasked as deprecated.
  • The mark of function Simd::EdgeBackgroundGrowRangeSlow as deprecated.
  • The mark of function Simd::EdgeBackgroundGrowRangeFast as deprecated.
  • The mark of function Simd::EdgeBackgroundIncrementCount as deprecated.
  • The mark of function Simd::EdgeBackgroundAdjustRange as deprecated.
  • The mark of function Simd::EdgeBackgroundAdjustRangeMasked as deprecated.
  • The mark of function Simd::EdgeBackgroundShiftRange as deprecated.
  • The mark of function Simd::EdgeBackgroundShiftRangeMasked as deprecated.
Bug fixing
  • Wrong assert in AVX-512BW optimizations of function BgrToRgb.
  • MSVS compiler bug (Windows, Arm64).
  • Error in function Simd::DrawLine.

Test framework

New features
  • Tests for verifying functionality of WarpAffine engine.
  • Special tests for verifying functionality of WarpAffine engine.

Infrastructure

New features
  • SIMD_OPENCV Cmake option to test Simd with OpenCV support.

Documentation

Improving
  • Using example in description of function RecursiveBilateralFilterInit.

Simd v5.2.120

01 Dec 06:10
Compare
Choose a tag to compare

Algorithms

New features
  • AVX2 optimizations of class RecursiveBilateralFilterFast.
  • Base implementation of function SynetNormalizeLayerForward.
Bug fixing
  • Error in SSE4.1 optimizations of function SynetSetInput.
  • MSVS compiler warning in AMX optimizations of class SynetConvolution8iNhwcDirect.
  • MSVS compiler warning in AMX optimizations of class SynetMergedConvolution8iCdc.
  • MSVS compiler warning in AMX optimizations of class SynetMergedConvolution8iCd.
  • MSVS compiler warning in AMX optimizations of class SynetMergedConvolution8iDc.
  • Error in AVX and AVX2 optimizations of function SynetInnerProductLayerForward.
  • Using of SIMD_CPP_2011_ENABLE macro outside of library.

Test framework

New features
  • Tests for verifying functionality of function SynetNormalizeLayerForward.
Removing
  • Data test for function Fill.
  • Data test for function FillFrame.
  • Data test for function FillBgra.
  • Data test for function FillBgr.
  • Data test for function FillPixel.
  • Data test for function Float32ToFloat16.
  • Data test for function Float16ToFloat32.
  • Data test for function SquaredDifferenceSum16f.
  • Data test for function CosineDistance16f.
  • Data test for function Float32ToUint8.
  • Data test for function Uint8ToFloat32.
  • Data test for function MeanFilter3x3.
  • Data test for function MedianFilterRhomb3x3.
  • Data test for function MedianFilterRhomb5x5.
  • Data test for function MedianFilterSquare3x3.
  • Data test for function MedianFilterSquare5x5.
  • Data test for function GaussianBlur3x3.
  • Data test for function AbsGradientSaturatedSum.
  • Data test for function LbpEstimate.
  • Data test for function NormalizeHistogram.
  • Data test for function SobelDx.
  • Data test for function SobelDxAbs.
  • Data test for function SobelDy.
  • Data test for function SobelDyAbs.
  • Data test for function ContourMetrics.
  • Data test for function Laplace.
  • Data test for function LaplaceAbs.
  • Data test for function Histogram.
  • Data test for function HistogramMasked.
  • Data test for function HistogramConditional.
  • Data test for function AbsSecondDerivativeHistogram.
  • Data test for function ChangeColors.
  • Data test for function HogDirectionHistograms.
  • Data test for function HogExtractFeatures.
  • Data test for function HogDeinterleave.
  • Data test for function HogFilterSeparable.
  • Data test for function HogLiteExtractFeatures.
  • Data test for function HogLiteFilterFeatures.
  • Data test for function HogLiteResizeFeatures.
  • Data test for function HogLiteCompressFeatures.
  • Data test for function HogLiteFilterSeparable.
  • Data test for function HogLiteFindMax7x7.
  • Data test for function HogLiteCreateMask.
  • Data test for function Integral.
  • Data test for function InterferenceIncrement.
  • Data test for function InterferenceIncrementMasked.
  • Data test for function InterferenceDecrement.
  • Data test for function InterferenceDecrementMasked.
  • Data test for function InterleaveUv.
  • Data test for function InterleaveBgr.
  • Data test for function InterleaveBgra.
  • Data test for function NeuralConvert.
  • Data test for function NeuralProductSum.
  • Data test for function NeuralAddVectorMultipliedByValue.
  • Data test for function NeuralAddVector.
  • Data test for function NeuralAddValue.
  • Data test for function NeuralRoughSigmoid.
  • Data test for function NeuralRoughSigmoid2.
  • Data test for function NeuralDerivativeSigmoid.
  • Data test for function NeuralRoughTanh.
  • Data test for function NeuralDerivativeTanh.
  • Data test for function NeuralDerivativeRelu.
  • Data test for function NeuralPow.
  • Data test for function NeuralUpdateWeights.
  • Data test for function NeuralAdaptiveGradientUpdate.
  • Data test for function NeuralPooling1x1Max3x3.
  • Data test for function NeuralPooling2x2Max2x2.
  • Data test for function NeuralPooling2x2Max3x3.
  • Data test for function NeuralAddConvolution2x2Forward.
  • Data test for function NeuralAddConvolution3x3Forward.
  • Data test for function NeuralAddConvolution4x4Forward.
  • Data test for function NeuralAddConvolution5x5Forward.
  • Data test for function NeuralAddConvolution2x2Backward.
  • Data test for function NeuralAddConvolution3x3Backward.
  • Data test for function NeuralAddConvolution4x4Backward.
  • Data test for function NeuralAddConvolution5x5Backward.
  • Data test for function NeuralAddConvolution2x2Sum.
  • Data test for function NeuralAddConvolution3x3Sum.
  • Data test for function NeuralAddConvolution4x4Sum.
  • Data test for function NeuralAddConvolution5x5Sum.
  • Data test for function NeuralConvolutionForward.
  • Data test for function OperationBinary8u.
  • Data test for function OperationBinary16i.
  • Data test for function VectorProduct.
  • Data test for function ReduceColor2x2.
  • Data test for function ReduceGray2x2.
  • Data test for function ReduceGray3x3.
  • Data test for function ReduceGray4x4.
  • Data test for function ReduceGray5x5.
  • Data test for function Reorder16bit.
  • Data test for function Reorder32bit.
  • Data test for function Reorder64bit.
  • Data test for function ResizeBilinear.
  • Data test for function SegmentationShrinkRegion.
  • Data test for function SegmentationFillSingleHoles.
  • Data test for function SegmentationChangeIndex.
  • Data test for function SegmentationPropagate2x2.
  • Data test for function ShiftBilinear.
  • Data test for function GetStatistic.
  • Data test for function GetMoments.
  • Data test for function GetRowSums.
  • Data test for function GetColSums.
  • Data test for function GetAbsDyRowSums.
  • Data test for function GetAbsDxColSums.
  • Data test for function ValueSum.
  • Data test for function SquareSum.
  • Data test for function SobelDxAbsSum.
  • Data test for function SobelDyAbsSum.
  • Data test for function LaplaceAbsSum.
  • Data test for function ValueSquareSum.
  • Data test for function CorrelationSum.
  • Data test for function StretchGray2x2.
  • Data test for function SvmSumLinear.
  • Data test for function SynetEltwiseLayerForward.
  • Data test for function TextureBoostedSaturatedGradient.
  • Data test for function TextureBoostedUv.
  • Data test for function TextureGetDifferenceSum.
  • Data test for function TexturePerformCompensation.
  • Data test for function Yuv444pToBgr.
  • Data test for function Yuv422pToBgr.
  • Data test for function Yuv420pToBgr.
  • Data test for function Yuv444pToHsl.
  • Data test for function Yuv444pToHsv.
  • Data test for function Yuv444pToHue.
  • Data test for function Yuv420pToHue.
  • Data test for function Yuv444pToBgra.
  • Data test for function Yuv422pToBgra.
  • Data test for function Yuv420pToBgra.
  • Data test infrastructure.

Infrastructure

New features
  • SIMD_RUNTIME CMake build option.

Simd v5.1.119

01 Nov 08:32
Compare
Choose a tag to compare

Algorithms

New features
  • AMX optimizations of class SynetConvolution8iNhwcDirect.
  • AMX optimizations of class SynetMergedConvolution8iCdc.
  • AMX optimizations of class SynetMergedConvolution8iCd.
  • AMX optimizations of class SynetMergedConvolution8iDc.
Improving
  • Optimization of using of memory buffer in class SynetConvolution8iNhwcDirect.
Bug fixing
  • MSVS compiler bug (Windows, Arm64).
Removing
  • AVX-512VNNI optimizations of function SetDepthwise in class SynetConvolution8iNhwcDirect (it is equal to AVX-512BW version).

Test framework

New features
Removing
  • Data test for function BayerToBgr.
  • Data test for function BayerToBgra.
  • Data test for function Bgr48pToBgra32.
  • Data test for function Binarization.
  • Data test for function AveragingBinarization.
  • Data test for function ConditionalCount8u.
  • Data test for function ConditionalCount16i.
  • Data test for function ConditionalSum.
  • Data test for function ConditionalSquareSum.
  • Data test for function ConditionalSquareGradientSum.
  • Data test for function ConditionalFill.
  • Data test for function Copy.
  • Data test for function CopyFrame.
  • Data test for function Crc32c.
  • Data test for function DeinterleaveUv.
  • Data test for function DeinterleaveBgr.
  • Data test for function DeinterleaveBgra.
  • Data test for function DetectionHaarDetect32fp.
  • Data test for function DetectionHaarDetect32fi.
  • Data test for function DetectionLbpDetect32fp.
  • Data test for function DetectionLbpDetect32fi.
  • Data test for function DetectionLbpDetect16ip.
  • Data test for function DetectionLbpDetect16ii.
  • Data test for function AbsDifferenceSum.
  • Data test for function AbsDifferenceSumMasked.
  • Data test for function AbsDifferenceSums3x3.
  • Data test for function AbsDifferenceSums3x3Masked.
  • Data test for function SquaredDifferenceSum.
  • Data test for function SquaredDifferenceSumMasked.
  • Data test for function SquaredDifferenceSum32f.
  • Data test for function SquaredDifferenceKahanSum32f.
  • Data test for function CosineDistance32f.
  • Data test for function AlphaBlending.
  • Data test for function AlphaFilling.
  • Data test for function EdgeBackgroundGrowRangeSlow.
  • Data test for function EdgeBackgroundGrowRangeFast.
  • Data test for function EdgeBackgroundIncrementCount.
  • Data test for function EdgeBackgroundAdjustRange.
  • Data test for function EdgeBackgroundAdjustRangeMasked.
  • Data test for function EdgeBackgroundShiftRange.
  • Data test for function EdgeBackgroundShiftRangeMasked.

Simd v5.1.118

04 Oct 06:13
Compare
Choose a tag to compare

Algorithms

New features
  • Base implementation, SSE4.1 optimizations of RecursiveBilateralFilter engine.
  • Support of ARGB32 format in View.
  • Support of ARGB32 format in function AlphaPremultiply.
  • Support of ARGB32 format in function AlphaUnpremultiply.
  • AVX-512BW optimizations of function TileMatMul8u8i (AMX tile emulation).
Improving
  • Base implementation of class ImagePngLoader.
Bug fixing
  • Build errors on MacOS Arm64.
  • Clang compiler warning (-mfpu=neon -mfpu=neon-fp16).
  • Compiler errors (C++-98 specific).

Test framework

New features
  • Tests for verifying functionality of RecursiveBilateralFilter engine.
Removing
  • Data test for function AbsDifference.
  • Data test for function AddFeatureDifference.
  • Data test for function BgraToBgr.
  • Data test for function BgraToGray.
  • Data test for function BgrToGray.
  • Data test for function BgrToHsl.
  • Data test for function BgrToHsv.
  • Data test for function GrayToBgr.
  • Data test for function Int16ToGray.
  • Data test for function BgrToBayer.
  • Data test for function BgraToBayer.
  • Data test for function BgrToBgra.
  • Data test for function GrayToBgra.
  • Data test for function BgraToYuv420p.
  • Data test for function BgraToYuv422p.
  • Data test for function BgraToYuv444p.
  • Data test for function BgrToYuv420p.
  • Data test for function BgrToYuv422p.
  • Data test for function BgrToYuv444p.
  • Data test for function BackgroundGrowRangeSlow.
  • Data test for function BackgroundGrowRangeFast.
  • Data test for function BackgroundIncrementCount.
  • Data test for function BackgroundAdjustRange.
  • Data test for function BackgroundAdjustRangeMasked.
  • Data test for function BackgroundShiftRange.
  • Data test for function BackgroundShiftRangeMasked.
  • Data test for function BackgroundInitMask.

Simd v5.1.117

01 Sep 05:41
Compare
Choose a tag to compare

Algorithms

New features
  • Base implementation, SSE4.1, AVX2 optimizations of function AlphaBlending2x.
Bug fixing
  • Buffer overrun in Base implementation of function SynetFusedLayerForward9.
  • Buffer overrun in SSE4.1 optimization of class SynetScale8i.
  • Buffer overrun in AVX-512BW optimizations of class ResizerNearest.
  • Buffer overrun in AVX-512BW optimizations of class ResizerByteBilinear.
  • Buffer overrun in AVX-512BW optimizations of class ResizerByteBicubic.
  • Buffer overrun in AVX-512BW optimizations of class ResizerByteArea1x1.
  • Buffer overrun in AVX-512BW optimizations of class ResizerByteArea2x2.
  • Buffer overrun in AVX-512BW optimizations of function TransformImage.
  • Error in AVX-512BW optimizations of function Yuv420pToUyvy422.
  • Crash in std::unordered_map after calling of some Simd function (Simd does not clear MMX registers after using).
  • Error (possible negative output values) in AVX-512BW optimizations of function CosineDistancesMxNp16f.
  • Error (possible negative output values) in AVX-512BW optimizations of function CosineDistancesMxNa16f.
  • Error in AVX-512BW optimizations of function TransformImage.
  • Valgrind warning in OutputMemoryStream.
  • Memory leak in Base implementation of class ImagePngLoader.
Replacing
  • Replace SSE2 optimizations to SSE4.1 for function SegmentationChangeIndex.
  • Replace SSE2 optimizations to SSE4.1 for function SegmentationFillSingleHoles.
  • Replace SSE2 optimizations to SSE4.1 for function SegmentationPropagate2x2.
  • Replace SSE2 optimizations to SSE4.1 for function ShiftBilinear.
  • Replace SSE2 optimizations to SSE4.1 for function SobelDx.
  • Replace SSE2 optimizations to SSE4.1 for function SobelDy.
  • Replace SSE2 optimizations to SSE4.1 for function ContourAnchors.
  • Replace SSE2 optimizations to SSE4.1 for function SquaredDifferenceSum.
  • Replace SSE2 optimizations to SSE4.1 for function SquaredDifferenceSumMasked.
  • Replace SSE2 optimizations to SSE4.1 for function SquaredDifferenceSum32f.
  • Replace SSE2 optimizations to SSE4.1 for function SquaredDifferenceKahanSum32f.
  • Replace SSE2 optimizations to SSE4.1 for function GetStatistic.
  • Replace SSE2 optimizations to SSE4.1 for function GetMoments.
  • Replace SSE2 optimizations to SSE4.1 for function GetObjectMoments.
  • Replace SSE2 optimizations to SSE4.1 for function GetRowSums.
  • Replace SSE2 optimizations to SSE4.1 for function GetColSums.
  • Replace SSE2 optimizations to SSE4.1 for function GetAbsDyRowSums.
  • Replace SSE2 optimizations to SSE4.1 for function GetAbsDxColSums.
  • Replace SSE2 optimizations to SSE4.1 for function ValueSum.
  • Replace SSE2 optimizations to SSE4.1 for function SquareSum.
  • Replace SSE2 optimizations to SSE4.1 for function ValueSquareSum.
  • Replace SSE2 optimizations to SSE4.1 for function CorrelationSum.
  • Replace SSE2 optimizations to SSE4.1 for function StretchGray2x2.
  • Replace SSE2 optimizations to SSE4.1 for function TextureBoostedSaturatedGradient.
  • Replace SSE2 optimizations to SSE4.1 for function TextureBoostedUv.
  • Replace SSE2 optimizations to SSE4.1 for function TextureGetDifferenceSum.
  • Replace SSE2 optimizations to SSE4.1 for function TexturePerformCompensation.
  • Replace SSE2 optimizations to SSE4.1 for function Yuv420pToHue.
  • Replace SSE2 optimizations to SSE4.1 for function Yuv444pToHue.
  • Replace SSE2 optimizations to SSE4.1 for function Yuva420pToBgra.
  • Replace SSE2 optimizations to SSE4.1 for function Yuv420pToBgra.
  • Replace SSE2 optimizations to SSE4.1 for function Yuv420pToBgraV2.
  • Replace SSE2 optimizations to SSE4.1 for function Yuv422pToBgra.
  • Replace SSE2 optimizations to SSE4.1 for function Yuv444pToBgra.
  • Replace SSE2 optimizations to SSE4.1 for function Yuv444pToBgraV2.
  • Replace SSE2 optimizations to SSE4.1 for function SynetPoolingAverage.
  • Replace SSE2 optimizations to SSE4.1 for function SynetScaleLayerForward.
  • Replace SSE2 optimizations to SSE4.1 for function SynetConvert32fTo8u.
  • Replace SSE2 optimizations to SSE4.1 for function SynetReorderImage.
  • Replace SSE2 optimizations to SSE4.1 for function SynetReorderFilter.
  • Replace SSE2 optimizations to SSE4.1 for function SynetFusedLayerForward0.
  • Replace SSE2 optimizations to SSE4.1 for function SynetFusedLayerForward1.
  • Replace SSE2 optimizations to SSE4.1 for function SynetFusedLayerForward2.
  • Replace SSE2 optimizations to SSE4.1 for function SynetFusedLayerForward3.
  • Replace SSE2 optimizations to SSE4.1 for function SynetFusedLayerForward4.
  • Replace SSE2 optimizations to SSE4.1 for function SynetFusedLayerForward8.
  • Replace SSE2 optimizations to SSE4.1 for function SynetFusedLayerForward9.
  • Replace SSE2 optimizations to SSE4.1 for function SynetDeconvolution32fInit.
  • Replace SSE2 optimizations to SSE4.1 for class SynetDeconvolution32fGemmNN.
  • Replace SSE2 optimizations to SSE4.1 for class SynetDeconvolution32fNhwcDirect2x2.
  • Replace SSE2 optimizations to SSE4.1 for function SynetConvolution32fInit.
  • Replace SSE2 optimizations to SSE4.1 for class SynetConvolution32fDepthwiseDotProduct.
  • Replace SSE2 optimizations to SSE4.1 for class SynetConvolution32fWinograd.
  • Replace SSE2 optimizations to SSE4.1 for class SynetConvolution32fDirectNchw.
  • Replace SSE2 optimizations to SSE4.1 for class SynetConvolution32fNhwcDirect.
  • Replace SSE2 optimizations to SSE4.1 for class SynetConvolution32fDirectNhwc.
  • Replace SSE2 optimizations to SSE4.1 for class SynetConvolution32fGemmNN.
  • Replace SSE2 optimizations to SSE4.1 for class SynetMergedConvolution32fCdc.
  • Replace SSE2 optimizations to SSE4.1 for class SynetMergedConvolution32fCd.
  • Replace SSE2 optimizations to SSE4.1 for class SynetMergedConvolution32fDc.
  • Replace SSE2 optimizations to SSE4.1 for function SynetMergedConvolution32fInit.
  • Replace SSE2 optimizations to SSE4.1 for function SynetAddBias.
  • Replace SSE2 optimizations to SSE4.1 for function SynetEltwiseLayerForward.
  • Replace SSE2 optimizations to SSE4.1 for function SynetInnerProductLayerForward.
  • Replace SSE2 optimizations to SSE4.1 for function SynetLrnLayerCrossChannels.
  • Replace SSE2 optimizations to SSE4.1 for function SynetShuffleLayerForward.
  • Replace SSE2 optimizations to SSE4.1 for function SynetSoftmaxLayerForward.
  • Replace SSE2 optimizations to SSE4.1 for function SynetElu32f.
  • Replace SSE2 optimizations to SSE4.1 for function SynetHardSigmoid32f.
  • Replace SSE2 optimizations to SSE4.1 for function SynetHswish32f.
  • Replace SSE2 optimizations to SSE4.1 for function SynetMish32f.
  • Replace SSE2 optimizations to SSE4.1 for function SynetPreluLayerForward.
  • Replace SSE2 optimizations to SSE4.1 for function SynetRelu32f.
  • Replace SSE2 optimizations to SSE4.1 for function SynetRestrictRange32f.
  • Replace SSE2 optimizations to SSE4.1 for function SynetSigmoid32f.
  • Replace SSE2 optimizations to SSE4.1 for function SynetSoftplus32f.
  • Replace SSE2 optimizations to SSE4.1 for function SynetSwish32f.
  • Replace SSE2 optimizations to SSE4.1 for function SynetTanh32f.
  • Replace SSE2 optimizations to SSE4.1 for function SynetGemm32fNN.
  • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel1x3Block1x4SetFilter.
  • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel1x3Block1x4SetInput.
  • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel1x3Block1x4SetOutput.
  • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel1x5Block1x4SetFilter.
  • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel1x5Block1x4SetInput.
  • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel1x5Block1x4SetOutput.
  • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel2x2Block2x2SetFilter.
  • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel2x2Block2x2SetInput.
  • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel2x2Block2x2SetOutput.
  • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel2x2Block4x4SetFilter.
  • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel2x2Block4x4SetInput.
  • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel2x2Block4x4SetOutput.
  • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel3x3Block2x2SetFilter.
  • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel3x3Block2x2SetInput.
  • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel3x3Block2x2SetOutput.
  • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel3x3Block3x3SetFilter.
  • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel3x3Block3x3SetInput.
  • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel3x3Block3x3SetOutput.
  • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel3x3Block4x4SetFilter.
  • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel3x3Block4x4SetInput.
  • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel3x3Block4x4SetOutput.

Test...

Read more

Simd v5.0.116

15 Aug 15:47
Compare
Choose a tag to compare

Algorithms

New features
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function Yuva444pToBgraV2.
  • Function SimdEmpty.
  • Checking of no man's land watermarks in function SimdFree.
Improving
  • AVX-512BW optimizations of AMX tile emulation.
  • AMX optimizations of class SynetConvolution32fBf16Nhwc.
  • AMX optimizations of class SynetMergedConvolution32fBf16Cdc.
  • AMX optimizations of class SynetMergedConvolution32fBf16Cd.
Bug fixing
  • GCC linker error when SIMD_AMX_EMULATE macro is switched on.
  • Error in SSE4.1, AVX2, AVX-512BW, AMX optimizations of class SynetConvolution32fBf16Nhwc.
  • Wrong assert in SSE4.1 and AVX-512BW optimizations of class ResizerNearest.
  • Error in AVX optimizations of class SynetMergedConvolution32fCdc.
  • Error in Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16, AMX optimizations of class SynetMergedConvolution32fBf16Cdc.
  • External buffer reading overflow in class SynetMergedConvolution32fBf16Cdc.
  • External buffer reading overflow in class SynetMergedConvolution32fBf16Cd.
  • External buffer reading overflow in class SynetMergedConvolution32fBf16Dc.
  • FP32 overflow in SSE2, AVX2, AVX-512BW, NEON optimizations of function Tanh.
  • Error in function Base::SynetConvolution32fGemmNN::ImgToCol.
  • Error in SSE4.1, AVX2 optimizations of class ResizerByteArea2x2.
  • Buffer overrun in SSE4.1, AVX2 optimizations of class ResizerNearest.
  • Buffer overrun in SSE4.1, AVX2 optimizations of class ResizerByteBilinear.
  • Buffer overrun in SSE4.1, AVX2 optimizations of class ResizerByteBicubic.
  • Buffer overrun in SSE4.1, AVX2 optimizations of class ResizerByteArea1x1.
  • Buffer overrun in SSE4.1, AVX2 optimizations of class ResizerByteArea2x2.
Replacing
  • Replace SSE2 optimizations to SSE4.1 for function SvmSumLinear.
  • Replace SSE2 optimizations to SSE4.1 for function AbsDifference.
  • Replace SSE2 optimizations to SSE4.1 for function AbsDifferenceSum.
  • Replace SSE2 optimizations to SSE4.1 for function AbsDifferenceSumMasked.
  • Replace SSE2 optimizations to SSE4.1 for function AbsDifferenceSums3x3.
  • Replace SSE2 optimizations to SSE4.1 for function AbsDifferenceSums3x3Masked.
  • Replace SSE2 optimizations to SSE4.1 for function AbsGradientSaturatedSum.
  • Replace SSE2 optimizations to SSE4.1 for function AddFeatureDifference.
  • Replace SSE2 optimizations to SSE4.1 for function AlphaBlending.
  • Replace SSE2 optimizations to SSE4.1 for function AlphaBlendingUniform.
  • Replace SSE2 optimizations to SSE4.1 for function AlphaFilling.
  • Replace SSE2 optimizations to SSE4.1 for function AlphaPremultiply.
  • Replace SSE2 optimizations to SSE4.1 for function BackgroundGrowRangeSlow.
  • Replace SSE2 optimizations to SSE4.1 for function BackgroundGrowRangeFast.
  • Replace SSE2 optimizations to SSE4.1 for function BackgroundIncrementCount.
  • Replace SSE2 optimizations to SSE4.1 for function BackgroundAdjustRange.
  • Replace SSE2 optimizations to SSE4.1 for function BackgroundAdjustRangeMasked.
  • Replace SSE2 optimizations to SSE4.1 for function BackgroundShiftRange.
  • Replace SSE2 optimizations to SSE4.1 for function BackgroundShiftRangeMasked.
  • Replace SSE2 optimizations to SSE4.1 for function BackgroundInitMask.
  • Replace SSE2 optimizations to SSE4.1 for function EdgeBackgroundGrowRangeSlow.
  • Replace SSE2 optimizations to SSE4.1 for function EdgeBackgroundGrowRangeFast.
  • Replace SSE2 optimizations to SSE4.1 for function EdgeBackgroundIncrementCount.
  • Replace SSE2 optimizations to SSE4.1 for function EdgeBackgroundAdjustRange.
  • Replace SSE2 optimizations to SSE4.1 for function EdgeBackgroundAdjustRangeMasked.
  • Replace SSE2 optimizations to SSE4.1 for function EdgeBackgroundShiftRangeMasked.
  • Replace SSE2 optimizations to SSE4.1 for function BayerToBgra.
  • Replace SSE2 optimizations to SSE4.1 for function BgraToGray.
  • Replace SSE2 optimizations to SSE4.1 for function BgraToYuv420p.
  • Replace SSE2 optimizations to SSE4.1 for function BgraToYuv422p.
  • Replace SSE2 optimizations to SSE4.1 for function BgraToYuv444p.
  • Replace SSE2 optimizations to SSE4.1 for function BgraToYuva420p.
  • Replace SSE2 optimizations to SSE4.1 for function BgrToGray.
  • Replace SSE2 optimizations to SSE4.1 for function RgbaToGray.
  • Replace SSE2 optimizations to SSE4.1 for function Bgr48pToBgra32.
  • Replace SSE2 optimizations to SSE4.1 for function Binarization.
  • Replace SSE2 optimizations to SSE4.1 for function AveragingBinarization.
  • Replace SSE2 optimizations to SSE4.1 for function ConditionalCount8u.
  • Replace SSE2 optimizations to SSE4.1 for function ConditionalCount16i.
  • Replace SSE2 optimizations to SSE4.1 for function ConditionalSum.
  • Replace SSE2 optimizations to SSE4.1 for function ConditionalSquareSum.
  • Replace SSE2 optimizations to SSE4.1 for function ConditionalSquareGradientSum.
  • Replace SSE2 optimizations to SSE4.1 for function ConditionalFill.
  • Replace SSE2 optimizations to SSE4.1 for function DeinterleaveUv.
  • Replace SSE2 optimizations to SSE4.1 for function Fill32f.
  • Replace SSE2 optimizations to SSE4.1 for function FillBgr.
  • Replace SSE2 optimizations to SSE4.1 for function FillBgra.
  • Replace SSE2 optimizations to SSE4.1 for function FillPixel.
  • Replace SSE2 optimizations to SSE4.1 for function CosineDistance32f.
  • Replace SSE2 optimizations to SSE4.1 for function Float32ToUint8.
  • Replace SSE2 optimizations to SSE4.1 for function Uint8ToFloat32.
  • Replace SSE2 optimizations to SSE4.1 for function GaussianBlur3x3.
  • Replace SSE2 optimizations to SSE4.1 for function GrayToBgra.
  • Replace SSE2 optimizations to SSE4.1 for function AbsSecondDerivativeHistogram.
  • Replace SSE2 optimizations to SSE4.1 for function HistogramMasked.
  • Replace SSE2 optimizations to SSE4.1 for function HistogramConditional.
  • Replace SSE2 optimizations to SSE4.1 for function HogDirectionHistograms.
  • Replace SSE2 optimizations to SSE4.1 for function HogDeinterleave.
  • Replace SSE2 optimizations to SSE4.1 for function HogFilterSeparable.
  • Replace SSE2 optimizations to SSE4.1 for function Int16ToGray.
  • Replace SSE2 optimizations to SSE4.1 for function InterferenceIncrement.
  • Replace SSE2 optimizations to SSE4.1 for function InterferenceIncrementMasked.
  • Replace SSE2 optimizations to SSE4.1 for function InterferenceDecrement.
  • Replace SSE2 optimizations to SSE4.1 for function InterferenceDecrementMasked.
  • Replace SSE2 optimizations to SSE4.1 for function InterleaveUv.
  • Replace SSE2 optimizations to SSE4.1 for function Laplace.
  • Replace SSE2 optimizations to SSE4.1 for function LbpEstimate.
  • Replace SSE2 optimizations to SSE4.1 for function MeanFilter3x3.
  • Replace SSE2 optimizations to SSE4.1 for function MedianFilterRhomb3x3.
  • Replace SSE2 optimizations to SSE4.1 for function MedianFilterRhomb5x5.
  • Replace SSE2 optimizations to SSE4.1 for function MedianFilterSquare3x3.
  • Replace SSE2 optimizations to SSE4.1 for function MedianFilterSquare5x5.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution2x2Forward.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution3x3Forward.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution4x4Forward.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution5x5Forward.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution2x2Backward.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution3x3Backward.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution4x4Backward.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution5x5Backward.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution2x2Sum.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution3x3Sum.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution4x4Sum.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution5x5Sum.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAdaptiveGradientUpdate.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddVectorMultipliedByValue.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddVector.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddValue.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralConvert.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralDerivativeRelu.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralDerivativeSigmoid.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralDerivativeTanh.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralPooling1x1Max3x3.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralPooling2x2Max2x2.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralPooling2x2Max3x3.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralPow.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralProdu...
Read more

Simd v5.0.115

01 Jul 16:22
Compare
Choose a tag to compare

Algorithms

New features
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16, AMX optimizations of class SynetMergedConvolution32fBf16Cdc.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16, AMX optimizations of class SynetMergedConvolution32fBf16Cd.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16, AMX optimizations of class SynetMergedConvolution32fBf16Dc.
  • AVX-512BF16 extension support.
  • AVX-512BF16 optimizations of function Float32ToBFloat16.
  • AVX-512BF16, AMX optimizations of class SynetConvolution32fBf16Nhwc.
  • AMX extension support.
  • Support of 3D pooling in Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetPoolingMax32f.
Improving
  • AVX-512BW optimizations of function Fill32f.
Renaming
  • Rename function SynetPoolingForwardAverage to SynetPoolingAverage.
  • Rename function SynetPoolingForwardMax32f to SynetPoolingMax32f.
  • Rename function SynetPoolingForwardMax8u to SynetPoolingMax8u.
Replacing
  • Replace AVX-512F optimizations to AVX-512BW for function SvmSumLinear.
  • Replace AVX-512F optimizations to AVX-512BW for function Fill32f.
  • Replace AVX-512F optimizations to AVX-512BW for class ResizerNearest.
  • Replace AVX-512F optimizations to AVX-512BW for class ResizerFloatBilinear.
  • Replace AVX-512F optimizations to AVX-512BW for function SquaredDifferenceSum32f.
  • Replace AVX-512F optimizations to AVX-512BW for function SquaredDifferenceKahanSum32f.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralConvolutionForward.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution2x2Forward.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution2x2Backward.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution2x2Sum.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution3x3Forward.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution3x3Backward.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution3x3Sum.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution4x4Forward.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution4x4Backward.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution4x4Sum.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution5x5Forward.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution5x5Backward.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution5x5Sum.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralProductSum.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAdaptiveGradientUpdate.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralPooling1x1Max3x3.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralPooling2x2Max2x2.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralPooling2x2Max3x3.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralUpdateWeights.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddValue.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddVector.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddVectorMultipliedByValue.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralRoughSigmoid.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralRoughSigmoid2.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralDerivativeSigmoid.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralRoughTanh.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralDerivativeTanh.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralDerivativeRelu.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralPow.
  • Replace AVX-512F optimizations to AVX-512BW for class SynetConvolution32fGemmNN.
  • Replace AVX-512F optimizations to AVX-512BW for class SynetConvolution32fGemmNT.
  • Replace AVX-512F optimizations to AVX-512BW for class SynetConvolution32fWinograd.
  • Replace AVX-512F optimizations to AVX-512BW for class SynetDeconvolution32fGemmNN.
  • Replace AVX-512F optimizations to AVX-512BW for class SynetDeconvolution32fNhwcDirect2x2.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetDeconvolution32fInit.
  • Replace AVX-512F optimizations to AVX-512BW for class SynetInnerProduct32fGemm.
  • Replace AVX-512F optimizations to AVX-512BW for class SynetInnerProduct32fProd.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetInnerProduct32fInit.
  • Replace AVX-512F optimizations to AVX-512BW for function ConvolutionBiasAndActivation.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetReorderImage.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetReorderFilter.
  • Replace AVX-512F optimizations to AVX-512BW for function Gemm32fNN.
  • Replace AVX-512F optimizations to AVX-512BW for function Gemm32fNT.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward0.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward1.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward2.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward3.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward4.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward8.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward9.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel1x3Block1x4SetFilter.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel1x3Block1x4SetInput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel1x3Block1x4SetOutput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel1x5Block1x4SetFilter.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel1x5Block1x4SetInput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel1x5Block1x4SetOutput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel2x2Block2x2SetFilter.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel2x2Block2x2SetInput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel2x2Block2x2SetOutput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel2x2Block4x4SetFilter.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel2x2Block4x4SetInput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel2x2Block4x4SetOutput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block2x2SetFilter.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block2x2SetInput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block2x2SetOutput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block3x3SetFilter.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block3x3SetInput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block3x3SetOutput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block4x4SetFilter.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block4x4SetInput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block4x4SetOutput.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetElu32f.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetHardSigmoid32f.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetHswish32f.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetMish32f.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetPreluLayerForward.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetRelu32f.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetRestrictRange32f.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetSigmoid32f.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetSoftplus32f.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetSwish32f.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetTanh32f.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetScaleLayerForward.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetPoolingAverage.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetAddBias.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetEltwiseLayerForward.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetInnerProductLayerForward.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetLrnLayerCrossChannels.
Read more