Skip to content

1.0.5

Compare
Choose a tag to compare
@jan-wassenberg jan-wassenberg released this 19 Jul 16:10
· 971 commits to master since this release
  • Add Insert/ExtractBlock, BroadcastBlock/Lane, NumBlocks
  • Add integer Le/Ge and [Neg]MulAdd, extend DemoteTo/PromoteTo
  • Add Leading/TrailingZeroCount, HighestSetBitIndex, ReverseBits
  • Add MaskedLoadOr, tuple Get/Set/Create, ReduceSum, WidenMulPairwiseAdd
  • Add [ZeroExtend]ResizeBitCast, BitwiseIfThenElse, Find[Known]LastTrue
  • Add AESRoundInv, AESKeyGenAssist
  • Add contrib/math Atan2/SinCos, contrib/unroller
  • Add fp16/bf16 support (Armv8, SVE, RVV), HWY_DYNAMIC_POINTER
  • Add OrderedTruncate2To, Per4LaneBlockShuffle, TwoTablesLookupLanes
  • Add SlideUp/Down[Blocks/Lanes], Slide1Up/Down, ReverseLaneBytes
  • Add SetBeforeFirst, SetAtOrBefore/AfterFirst, SetOnlyFirst
  • Add 8-bit Reverse2/4/8, Shl/Shr, RotateRight, Reverse, Mul
  • Add 8/16-bit DupEven/Odd, TableLookupLanes
  • Add F64 ApproximateReciprocal[Sqrt], 32/64-bit SaturatedAdd/Sub
  • Build: Support Bazel modules
  • Codegen improvements
  • Compiler: support Clang 15/16
  • Doc: add Github pages, support policy, evaluation
  • Doc: publish AVX-512 throttling/startup findings
  • Release: add signing
  • Test: add GCC to Github Actions
  • VQSort: small N speedups: fix seeding, func ptr, 8-wide network.
  • VQSort: add BenchAllColdSort, VQSortStatic
  • VQSort: fix subnormal/inf/NaN, support fp16, fix KV types
  • Workarounds: RVV VXRM, x87 excess precision, missing intrinsics