Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oneDNN v3.7 release notes #2481

Open
wants to merge 25 commits into
base: rls-v3.7
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Performance Optimizations
## Intel Architecture Processors
tprimak marked this conversation as resolved.
Show resolved Hide resolved
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
* Improved fp16/bf16 softmax performance with relaxed [accumulation mode](https://oneapi-src.github.io/oneDNN/dev_guide_attributes_accumulation_mode.html#doxid-dev-guide-attributes-accumulation-mode).
* Added support and improved perfomance for fp8 matmul with bf16/fp16.
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved

## Intel Graphics Products
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karturov, please review and update this section.

* Introduced initial optimizations for GPUs based on Xe3 architecture.
* Improved performance for convolution for Intel Arc Graphics for Intel Core Ultra processors (Series 2) (formerly Lunar Lake).
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved

vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
## AArch64-based Processors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jondea, @theComputeKid, could you please help summarizing AArch64 improvements?


# Functionality
* Introduced support for `select` algorithm in binary primitive. The functionality is optimized for Intel CPUs.
* Enabled support for matmul primitive with grouped quantization on weight along N dimension
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
* Introduced support for fp16/bf16 compressed weights in fp32 matmul on Intel CPUs.
* Introduced support for grouped scales and zero points in reorder primitive.
* Enabled support for 4d weight scale in matmul primitive.
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
* [experimental] Extended microkernel API:
Introduced int4 quantization support.
Fpmath mode API
Comment on lines +27 to +29
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we did that in external microkernel API (and cannot find the related commits).
However we did add a new query for B matrix packing type.

# Usability
* Relaxed memory object lifetime requirements created with CPU engine and SYCL runtime. New behavior is aligned with GPU engine.
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
* Improve verbose diagnostic to better identify issues during dispatching, primitive and kernel creation for CPU primitive and GPU (in case of OpenCL implementation) primitive implementations.
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
* Improve verbose diagnostic to simplify debugging of nGEN fallbacks.
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
* Enabled frame pointers support on Intel64 platforms to improve integration with profilers.
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
# Validation
* Extended benchdnn with support and validation for fp8 matmul patterns for tensor tags in RNN primitive validation.
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
# Deprecated Functionality

# Breaking Changes
* Updated minimal supported CMake version to 3.13 (was 2.8.12).
* Updated minimal supported GCC version to 8.0 (was 4.8).
* Updated minimal supported Clang version to 11.0 (was 3.0).
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
# Thanks to these Contributors

This release contains contributions from the [project core team] as well as Michał Górny @mgorny, Fadi Arafeh @fadara01, John Osorio @kala855, Ravi Pushkar @rpushkarr, Marek Michalowski @michalowski-arm, Renato Barros Arantes @renato-arantes, Ryo Suzuki @Ryo-not-rio, Varad Ahirwadkar @varad-ahirwadkar, Tadej Ciglarič @t4c1, Nikhil Sharma @nikhilfujitsu, @taoye9, @Shreyas-fuj, @raistefintel. We would also like to thank everyone who asked questions and reported issues.
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved

[project core team]: https://github.com/oneapi-src/oneDNN/blob/rls-v3.7/MAINTAINERS.md
Loading