[SYCL][Graph] Update design doc for copy queue #362

…4143) When passing along multiple targets in the form of -fsycl-targets=intel_gpu_dg1,intel_gpu_pvc, the number of the device compilations was n*n as opposed to just n. Due to how we were handling duplicate entries for toolchain generation, the different names used even though they had the same target triple (spir64_gen) we being considered as unique, causing the multiple entries. This is the second attempt to push this one in, updated the sycl-offload-new-driver.c test to reflect ordering issues encountered.

…l targets (intel#14102) clang-linker-wrapper is not target-specific. i.e. it is not called for a single target device. It is called only once. Currently, clang-linker-wrapper is called only with device images with spir64 targets. So, the existing approach to capture the first target triple in the list of triples and use it for gathering sycl-device-library files is valid. As we plan to add support for more targets (AOT), we need to gather sycl-device-libraries for all targets. This PR addresses this change. Also, the triple should not be passed to the linker wrapper. The linker wrapper should get the triples from device images. Thanks --------- Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>

Signed-off-by: jinge90 <ge.jin@intel.com>

) For testing oneapi-src/unified-runtime#1282 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

…ntel#14138) Required for specific use-cases in SYCLomatic. --------- Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>

The `Graph/UnsupportedDevice/device_query.cpp` test asserts that L0 devices will never have full graph support. This is not the case, depending on the L0 device and driver version full graphs support is possible. Update the test to remove asserting on this, as diving into these details is out of the scope of the test. This was previously decided when discussion how to check the OpenCL backend for similar possible variances in aspect support.

`cuda_dev_kit` is not set properly in [test-e2e/lit.cfg.py](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/lit.cfg.py) due to invalid CUDA paths. Fixing the paths showed errors in [14115](intel#14115) and [14116](intel#14116) which are XFAILed. The patch fixes the failure of [cuda_queue_priority.cpp](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/Plugin/cuda_queue_priority.cpp) on Windows / CUDA.

oneapi-src/unified-runtime#1128 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

This PR adds math `extend_v*4` operators (18 in total) along with unit-tests for signed and unsigned int32 cases. *Some changes overlap with the previous `extend_v*2` PR intel#13953 and thus should be reviewed/merged first. --------- Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@intel.com> Co-authored-by: Joe Todd <joe.todd@codeplay.com> Co-authored-by: Yihan Wang <yihan.wang@intel.com>

…l#13807) It was propagated to `getOrCreateAllocaForReq` when creating a new record, but then no commands are expected to be enqueued there since the first alloca for a record cannot exceed its leaf limit or be linked to another alloca.

Running the test on Windows failed due to missing support of `ls`. Replacing `ls` with `cat` made the test pass on Windows.

…l#14120) - `InorderQueue/in_order_get_property.cpp` -> Use non-deprecated `sycl::exception`, add check for errc to ensure we are still catching the correct exception - `InorderQueue/in_order_kernels.cpp` -> Use group `get_group_id` function instead of deprecated `get_id` - `InorderQueue/in_order_usm_implicit.cpp` -> Use queue `mem_advice` function that uses `int` instead of `pi_mem_advice`

Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

oneapi-src/unified-runtime#1711 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

Currently Level Zero plugin uses loader and headers fetched by the Level Zero adapter (LevelZeroLoader-Headers, LevelZeroLoader targets). Currently downloaded loader code is not used, only headers are used for xpti. So, get headers location from LevelZeroLoader-Headers target instead and remove unnecessary code.

Add group_key_value_sorter sorters and sort_key_value_over_group APIs based on https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_group_sort.asciidoc extension. This PR was split out from larger PR: intel#13713 Co-authored-by: "Andrei Fedorov [andrey.fedorov@intel.com](mailto:andrey.fedorov@intel.com)" Co-authored-by: "Romanov Vlad [vlad.romanov@intel.com](mailto:vlad.romanov@intel.com)"

…ntel#14129) A single basic file to compile and run to test functionality of --offload-new-driver --------- Co-authored-by: Marcos Maronas <maarquitos14@users.noreply.github.com>

…d ConvertFToBFloat16INTEL (intel#14085) This PR adds vector overloads of `ConvertBFloat16ToFINTEL` and `ConvertFToBFloat16INTEL` to libdevice (SPEC: https://spec.oneapi.io/level-zero/latest/core/SPIRV.html#bfloat16-conversions) and a wrapper around it (`BF16VecToFloatVec` and `FloatVecToBF16Vec`) in `ext::oneapi::detail`. These overloads are intended to optimize BFloat16 `marray`, `vec` operations, for which we currently do element-by-element `bfloat16 -> float -> bfloat16` conversions.

…4130) Replaces intel#13270 Changing the storage to std::array instead of Clang's extension fixes strict ansi-aliasing violation and simplifies device code.

[SYCL] Adding support for missing math ops: - truncf - sinpif - rsqrtf - exp10f

Add section to the contribution guide detailing the current process for integrating Unified Runtime updates into DPC++.

…ntel#14123) Current implementation of profiling info for NOP barriers is inconsistent with other events from the same queue (e.g., if the previous event started after the barrier was submitted). To make them consistent while keeping the optimization, we would need to duplicate the event on our side and make the duplicate check and potentially use profiling info of its previous event. Instead, as the first step, disable the NOP optimization during profiling since profiling is known to incur a performance hit anyway. The proper duplicate event approach can be implemented as a follow up if this causes issues for users. Partially reverts intel#12949

atomic_ref<T *> uses 64-bit atomics and it should be decorated with the corresponding aspect. fixes: intel#12743

I'm observing cache overflow when running heavy tests on OCL backend with gpu. Clear cache in case of PI_ERROR_OUT_OF_HOST_MEMORY as well as for PI_ERROR_OUT_OF_RESOURCES. Using as reference: intel#11987

Fixed by KhronosGroup/SYCL-CTS#895

Scheduled igc dev drivers uplift Co-authored-by: GitHub Actions <actions@github.com>

The function is using the `operator=` before it's defined which can cause some build failures: ``` build/include/sycl/ext/oneapi/bfloat16.hpp:98:19: error: no match for ‘operator=’ (operand types are ‘sycl::_V1::ext::oneapi::bfloat16’ and ‘float’) 98 | dst[i] = src[i]; | ^ ``` Moving it after the bfloat16 class definition fixes it.

Bumps [braces](https://github.com/micromatch/braces) from 3.0.2 to 3.0.3. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/micromatch/braces/commit/74b2db2938fad48a2ea54a9c8bf27a37a62c350d"><code>74b2db2</code></a> 3.0.3</li> <li><a href="https://github.com/micromatch/braces/commit/88f1429a0f47e1dd3813de35211fc97ffda27f9e"><code>88f1429</code></a> update eslint. lint, fix unit tests.</li> <li><a href="https://github.com/micromatch/braces/commit/415d660c3002d1ab7e63dbf490c9851da80596ff"><code>415d660</code></a> Snyk js braces 6838727 (<a href="https://github.com/micromatch/braces/issues/40">#40</a>)</li> <li><a href="https://github.com/micromatch/braces/commit/190510f79db1adf21d92798b0bb6fccc1f72c9d6"><code>190510f</code></a> fix tests, skip 1 test in test/braces.expand</li> <li><a href="https://github.com/micromatch/braces/commit/716eb9f12d820b145a831ad678618731927e8856"><code>716eb9f</code></a> readme bump</li> <li><a href="https://github.com/micromatch/braces/commit/a5851e57f45c3431a94d83fc565754bc10f5bbc3"><code>a5851e5</code></a> Merge pull request <a href="https://github.com/micromatch/braces/issues/37">#37</a> from coderaiser/fix/vulnerability</li> <li><a href="https://github.com/micromatch/braces/commit/2092bd1fb108d2c59bd62e243b70ad98db961538"><code>2092bd1</code></a> feature: braces: add maxSymbols (<a href="https://github.com/micromatch/braces/issues/">https://github.com/micromatch/braces/issues/</a>...</li> <li><a href="https://github.com/micromatch/braces/commit/9f5b4cf47329351bcb64287223ffb6ecc9a5e6d3"><code>9f5b4cf</code></a> fix: vulnerability (<a href="https://security.snyk.io/vuln/SNYK-JS-BRACES-6838727">https://security.snyk.io/vuln/SNYK-JS-BRACES-6838727</a>)</li> <li><a href="https://github.com/micromatch/braces/commit/98414f9f1fabe021736e26836d8306d5de747e0d"><code>98414f9</code></a> remove funding file</li> <li><a href="https://github.com/micromatch/braces/commit/665ab5d561c017a38ba7aafd92cc6655b91d8c14"><code>665ab5d</code></a> update keepEscaping doc (<a href="https://github.com/micromatch/braces/issues/27">#27</a>)</li> <li>Additional commits viewable in <a href="https://github.com/micromatch/braces/compare/3.0.2...3.0.3">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=braces&package-manager=npm_and_yarn&previous-version=3.0.2&new-version=3.0.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/intel/llvm/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…tly (intel#12872) This PR refactors the builtin fence helper macro for AMDGPU to take in and process the order semantic explicitly because that is the only semantic argument accepted by the amdgcn builtin. Additionally, makes the `None` (Monotonic) order semantic which maps to C++/SYCL's `relaxed` to be a no-op instead of falling back to the previous `acq_rel` default order. --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie83@gmail.com>

Closes intel#7330.

The rotate functions are technically c++20 and MSVC hasn't implemented them yet. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

…ntel#14151) One of the models that is used for specifying the device architecture for spir64_gen is to use the -Xsycl-target-backend "-device arg" syntax on the command line. Hook up the ability to scan the target backend values to embed the proper information in the packaged binary when using the new offload model.

This PR adds math `extend_vcompare[2/4] `operators (4 in total) along with unit-tests for signed and unsigned int32 cases. Also, Unit-tests from previous `extend_v*4` intel#14078 and `extend_v*2` intel#13953 are moved to two different files. --------- Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@intel.com> Co-authored-by: Joe Todd <joe.todd@codeplay.com> Co-authored-by: Yihan Wang <yihan.wang@intel.com>

…ntel#14150) pre-commit PR for oneapi-src/unified-runtime#1749 --------- Signed-off-by: Neil R. Spruit <neil.r.spruit@intel.com> Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

intel#14162) Instead of using old device selector objects, use SYCL 2020 device selector callables to construct devices in `FilterSelector` e2e tests.

- Update UR tag to include L0 command-buffer copy engine optimization - Add test which mixes copy and kernel commands - Update design doc to detail copy engine optimization

Co-authored-by: Kenneth Benzie (Benie) <k.benzie83@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][Graph] Update design doc for copy queue #362

[SYCL][Graph] Update design doc for copy queue #362

Commits on Jun 11, 2024

Commits on Jun 12, 2024

Commits on Jun 13, 2024

Commits on Jun 14, 2024

[SYCL][Graph] Update design doc for copy queue #362

Are you sure you want to change the base?

[SYCL][Graph] Update design doc for copy queue #362

Commits on Jun 11, 2024

Commits on Jun 12, 2024

Commits on Jun 13, 2024

Commits on Jun 14, 2024