V4.2.0 Performance improvements
Features
- Fractional global capability
- Additional ResNet sizes
- Round up for half vgprs
- Initial code for PersistentKernel (disabled)
- Feature inner unroll2
- Enable BufferStore and buffer_atomic_cmpswap for GSU>1
Features