v0.6.0
Highlights:
- Faster quantized matrix-vector multiplies
mx.fast.scaled_dot_product_attention
fused op
Core
- Memory allocation API improvements
- Faster GPU reductions for smaller sizes (between 2 and 7x)
mx.fast.scaled_dot_product_attention
fused op- Faster quantized matrix-vector multiplications
- Pickle support for
mx.array
NN
- Dilation on convolution layers
Bugfixes
- Fix
mx.topk
- Fix reshape for zero sizes