Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oneDNN v3.7 release notes #2481

Open
wants to merge 25 commits into
base: rls-v3.7
Choose a base branch
from
Open
Changes from 14 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Performance Optimizations
## Intel Architecture Processors
tprimak marked this conversation as resolved.
Show resolved Hide resolved
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
* Improved fp16/bf16 softmax performance with relaxed [accumulation mode](https://oneapi-src.github.io/oneDNN/dev_guide_attributes_accumulation_mode.html#doxid-dev-guide-attributes-accumulation-mode).
* Added support and improved perfomance for fp8 matmul with bf16/fp16.
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved

## Intel Graphics Products
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karturov, please review and update this section.

* Introduced initial optimizations for GPUs based on Xe3 architecture.
* Improved performance for Intel Arc Graphics for Intel Core Ultra processors (Series 2) (formerly Lunar Lake) and Intel Arc B-series discrete graphics (formerly Battlemage).
* Improved performance of the following subgraphs with Graph API
* Scaled dot-product Attention (SDPA) [with implicit causal mask](https://oneapi-src.github.io/oneDNN/dev_guide_graph_sdpa.html#doxid-dev-guide-graph-sdpa)
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
* Scaled dot-product Attention (SDPA) [with int8/int4 compressed key and value](https://oneapi-src.github.io/oneDNN/dev_guide_graph_sdpa_compressed_kv.html#doxid-dev-guide-graph-sdpa-compressed-kv)
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
## AArch64-based Processors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jondea, @theComputeKid, could you please help summarizing AArch64 improvements?


# Functionality
* Introduced support for `select` algorithm in binary primitive. The functionality is optimized for Intel CPUs.
* Enabled support for matmul primitive with grouped quantization on weight along N dimension
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
* Graph API: new [`Select`](https://oneapi-src.github.io/oneDNN/dev_guide_op_select.html), [`GenIndex`](https://oneapi-src.github.io/oneDNN/dev_guide_op_genindex.html) and [`GreaterEqual`](https://oneapi-src.github.io/oneDNN/dev_guide_op_greaterequal.html) operations.
* Introduced support for fp16/bf16 compressed weights in fp32 matmul on Intel CPUs.
* Introduced support for grouped scales and zero points in reorder primitive.
* Enabled support for 4d weight scale in matmul primitive.
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
* Graph API: added support for Quantized and non-quantized Gated MLP pattern
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
* Introduced preliminary support for 4-bit floating-point data types `f4_e2m1` and `f4_e3m0` in matmul and reorder, as well as `e8m0` scales data type in matmul and reorder.
* [experimental] Extended microkernel API:
Introduced int4 quantization support.
Fpmath mode API
Comment on lines +27 to +29
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we did that in external microkernel API (and cannot find the related commits).
However we did add a new query for B matrix packing type.

# Usability
* With SYCL runtime, memory objects on CPU engine are now reference-counted and no more need to be explicitly kept alive by user for the duration of the primitive execution. This align memory object lifetime behavior on CPU and GPU engines.
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
* Improve verbose diagnostic to better identify issues during dispatching, primitive and kernel creation for CPU primitive and GPU (in case of OpenCL implementation) primitive implementations.
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
* Improve verbose diagnostic to simplify debugging of nGEN fallbacks.
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
* Enabled frame pointers support on Intel64 platforms to improve integration with profilers.
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
* Added [examples](https://github.com/oneapi-src/oneDNN/tree/main/examples/graph) for Gated MLP and int4 Gated MLP
# Validation
* Extended benchdnn with support and validation for fp8 matmul patterns for tensor tags in RNN primitive validation.
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
* Extended benchdnn with support for rewriting data types in the test JSON files in graph driver.
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
* Extended benchdnn with support and validation for the number of partition returned from the test JSON files.
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
# Deprecated Functionality

# Breaking Changes
* Updated minimal supported CMake version to 3.13 (was 2.8.12).
* Updated minimal supported GCC version to 8.0 (was 4.8).
* Updated minimal supported Clang version to 11.0 (was 3.0).
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
* Removed support for SYCL older than 2020
# Thanks to these Contributors

This release contains contributions from the [project core team] as well as Aditya Tewari @aditew01, Alexandra Sidorova @a-sidorova, Atharva Dubey @AD2605, Deb Taylor @deb-intel, Dmitriy Ovchinnikov @inteldimitrius, Fadi Arafeh @fadara01, Hengyu Meng @airMeng, @hmaciak, John Osorio @kala855, Marek Michalowski @michalowski-arm, Michael Froelich @MichaelFroelich, Michał Górny @mgorny, Nikhil Sharma @nikhilfujitsu, Permanence AI Coder @Permanence-AI-Coder, @raistefintel, Ravi Pushkar @rpushkarr, Renato Barros Arantes @renato-arantes, Romain Biessy @Rbiessy, Ryo Suzuki @Ryo-not-rio, @Shreyas-fuj, Varad Ahirwadkar @varad-ahirwadkar, @vishwascm, and Ye Tao @taoye9. We would also like to thank everyone who asked questions and reported issues.

[project core team]: https://github.com/oneapi-src/oneDNN/blob/rls-v3.7/MAINTAINERS.md
Loading