-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Graph] Update design doc for copy queue #362
base: sycl
Are you sure you want to change the base?
Commits on Jun 11, 2024
-
[Driver][SYCL][NewOffload] Fix duplication of device targets (intel#1…
…4143) When passing along multiple targets in the form of -fsycl-targets=intel_gpu_dg1,intel_gpu_pvc, the number of the device compilations was n*n as opposed to just n. Due to how we were handling duplicate entries for toolchain generation, the different names used even though they had the same target triple (spir64_gen) we being considered as unique, causing the multiple entries. This is the second attempt to push this one in, updated the sycl-offload-new-driver.c test to reflect ordering issues encountered.
Configuration menu - View commit details
-
Copy full SHA for 934b46f - Browse repository at this point
Copy the full SHA 934b46fView commit details -
[New offload driver][Device lib] Add SYCL device library files for al…
…l targets (intel#14102) clang-linker-wrapper is not target-specific. i.e. it is not called for a single target device. It is called only once. Currently, clang-linker-wrapper is called only with device images with spir64 targets. So, the existing approach to capture the first target triple in the list of triples and use it for gathering sycl-device-library files is valid. As we plan to add support for more targets (AOT), we need to gather sycl-device-libraries for all targets. This PR addresses this change. Also, the triple should not be passed to the linker wrapper. The linker wrapper should get the triples from device images. Thanks --------- Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 6ecce4f - Browse repository at this point
Copy the full SHA 6ecce4fView commit details
Commits on Jun 12, 2024
-
[SYCL] Enable CET for wqlibsycl-devicelib-host.a (intel#14135)
Signed-off-by: jinge90 <ge.jin@intel.com>
Configuration menu - View commit details
-
Copy full SHA for c1b17e0 - Browse repository at this point
Copy the full SHA c1b17e0View commit details -
[UR] Fix size confusion for several device property queries (intel#12488
) For testing oneapi-src/unified-runtime#1282 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
Configuration menu - View commit details
-
Copy full SHA for c168f21 - Browse repository at this point
Copy the full SHA c168f21View commit details -
[SYCL][COMPAT] Added non-const image2d_max and image3d_max getters (i…
…ntel#14138) Required for specific use-cases in SYCLomatic. --------- Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>
Configuration menu - View commit details
-
Copy full SHA for bdeb0ef - Browse repository at this point
Copy the full SHA bdeb0efView commit details -
[SYCL][Graph] Update L0 aspect test (intel#14093)
The `Graph/UnsupportedDevice/device_query.cpp` test asserts that L0 devices will never have full graph support. This is not the case, depending on the L0 device and driver version full graphs support is possible. Update the test to remove asserting on this, as diving into these details is out of the scope of the test. This was previously decided when discussion how to check the OpenCL backend for similar possible variances in aspect support.
Configuration menu - View commit details
-
Copy full SHA for d7bc4fc - Browse repository at this point
Copy the full SHA d7bc4fcView commit details -
[SYCL][E2E] Fix CUDA include and lib paths. (intel#14118)
`cuda_dev_kit` is not set properly in [test-e2e/lit.cfg.py](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/lit.cfg.py) due to invalid CUDA paths. Fixing the paths showed errors in [14115](intel#14115) and [14116](intel#14116) which are XFAILed. The patch fixes the failure of [cuda_queue_priority.cpp](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/Plugin/cuda_queue_priority.cpp) on Windows / CUDA.
Configuration menu - View commit details
-
Copy full SHA for a497788 - Browse repository at this point
Copy the full SHA a497788View commit details -
[UR] Bump main tag to 78d02039 (intel#12269)
oneapi-src/unified-runtime#1128 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
Configuration menu - View commit details
-
Copy full SHA for 7c530e1 - Browse repository at this point
Copy the full SHA 7c530e1View commit details -
[SYCL][COMPAT] Add math extend_v*4 to SYCLCompat (intel#14078)
This PR adds math `extend_v*4` operators (18 in total) along with unit-tests for signed and unsigned int32 cases. *Some changes overlap with the previous `extend_v*2` PR intel#13953 and thus should be reviewed/merged first. --------- Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@intel.com> Co-authored-by: Joe Todd <joe.todd@codeplay.com> Co-authored-by: Yihan Wang <yihan.wang@intel.com>
Configuration menu - View commit details
-
Copy full SHA for da735fe - Browse repository at this point
Copy the full SHA da735feView commit details -
[SYCL] Remove unneeded parameter from
getOrInsertMemObjRecord
(inte……l#13807) It was propagated to `getOrCreateAllocaForReq` when creating a new record, but then no commands are expected to be enqueued there since the first alloca for a record cannot exceed its leaf limit or be linked to another alloca.
Configuration menu - View commit details
-
Copy full SHA for 1460126 - Browse repository at this point
Copy the full SHA 1460126View commit details -
[E2E] Modify commands to address running on Windows. (intel#13682)
Running the test on Windows failed due to missing support of `ls`. Replacing `ls` with `cat` made the test pass on Windows.
Configuration menu - View commit details
-
Copy full SHA for bd33aaf - Browse repository at this point
Copy the full SHA bd33aafView commit details -
[SYCL][E2E] Fix deprecated warnings in
InorderQueue
e2e tests (inte……l#14120) - `InorderQueue/in_order_get_property.cpp` -> Use non-deprecated `sycl::exception`, add check for errc to ensure we are still catching the correct exception - `InorderQueue/in_order_kernels.cpp` -> Use group `get_group_id` function instead of deprecated `get_id` - `InorderQueue/in_order_usm_implicit.cpp` -> Use queue `mem_advice` function that uses `int` instead of `pi_mem_advice`
Configuration menu - View commit details
-
Copy full SHA for 56f6c24 - Browse repository at this point
Copy the full SHA 56f6c24View commit details -
[UR] Update UR tag to include L0 loader related changes (intel#14109)
Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
Configuration menu - View commit details
-
Copy full SHA for 1a885ec - Browse repository at this point
Copy the full SHA 1a885ecView commit details -
[UR] Bump main tag to b13c5e1f (intel#14042)
oneapi-src/unified-runtime#1711 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
Configuration menu - View commit details
-
Copy full SHA for ae79b95 - Browse repository at this point
Copy the full SHA ae79b95View commit details -
[SYCL] Remove redundant code from L0 plugin's cmake file (intel#14108)
Currently Level Zero plugin uses loader and headers fetched by the Level Zero adapter (LevelZeroLoader-Headers, LevelZeroLoader targets). Currently downloaded loader code is not used, only headers are used for xpti. So, get headers location from LevelZeroLoader-Headers target instead and remove unnecessary code.
Configuration menu - View commit details
-
Copy full SHA for 87f47b4 - Browse repository at this point
Copy the full SHA 87f47b4View commit details -
[SYCL] Add support for key/value sorting APIs (intel#13942)
Add group_key_value_sorter sorters and sort_key_value_over_group APIs based on https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_group_sort.asciidoc extension. This PR was split out from larger PR: intel#13713 Co-authored-by: "Andrei Fedorov [andrey.fedorov@intel.com](mailto:andrey.fedorov@intel.com)" Co-authored-by: "Romanov Vlad [vlad.romanov@intel.com](mailto:vlad.romanov@intel.com)"
Configuration menu - View commit details
-
Copy full SHA for 3910d0c - Browse repository at this point
Copy the full SHA 3910d0cView commit details -
[SYCL][NewOffload][E2E] add a single test for --offload-new-driver (i…
…ntel#14129) A single basic file to compile and run to test functionality of --offload-new-driver --------- Co-authored-by: Marcos Maronas <maarquitos14@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for fe8c284 - Browse repository at this point
Copy the full SHA fe8c284View commit details -
[SYCL] [libdevice] Add vector overloads of ConvertBFloat16ToFINTEL an…
…d ConvertFToBFloat16INTEL (intel#14085) This PR adds vector overloads of `ConvertBFloat16ToFINTEL` and `ConvertFToBFloat16INTEL` to libdevice (SPEC: https://spec.oneapi.io/level-zero/latest/core/SPIRV.html#bfloat16-conversions) and a wrapper around it (`BF16VecToFloatVec` and `FloatVecToBF16Vec`) in `ext::oneapi::detail`. These overloads are intended to optimize BFloat16 `marray`, `vec` operations, for which we currently do element-by-element `bfloat16 -> float -> bfloat16` conversions.
Configuration menu - View commit details
-
Copy full SHA for 13a7b3a - Browse repository at this point
Copy the full SHA 13a7b3aView commit details -
[SYCL] Use
std::array
as storage forsycl::vec
on device (intel#1……4130) Replaces intel#13270 Changing the storage to std::array instead of Clang's extension fixes strict ansi-aliasing violation and simplifies device code.
Configuration menu - View commit details
-
Copy full SHA for e7defab - Browse repository at this point
Copy the full SHA e7defabView commit details
Commits on Jun 13, 2024
-
[SYCL] Adding support for missing math ops (intel#14132)
[SYCL] Adding support for missing math ops: - truncf - sinpif - rsqrtf - exp10f
Configuration menu - View commit details
-
Copy full SHA for 9942378 - Browse repository at this point
Copy the full SHA 9942378View commit details -
[Doc] Document Unified Runtime update process (intel#14097)
Add section to the contribution guide detailing the current process for integrating Unified Runtime updates into DPC++.
Configuration menu - View commit details
-
Copy full SHA for e34b7ff - Browse repository at this point
Copy the full SHA e34b7ffView commit details -
[SYCL] Disable in-order queue barrier optimization while profiling (i…
…ntel#14123) Current implementation of profiling info for NOP barriers is inconsistent with other events from the same queue (e.g., if the previous event started after the barrier was submitted). To make them consistent while keeping the optimization, we would need to duplicate the event on our side and make the duplicate check and potentially use profiling info of its previous event. Instead, as the first step, disable the NOP optimization during profiling since profiling is known to incur a performance hit anyway. The proper duplicate event approach can be implemented as a follow up if this causes issues for users. Partially reverts intel#12949
Configuration menu - View commit details
-
Copy full SHA for f2cd2a8 - Browse repository at this point
Copy the full SHA f2cd2a8View commit details -
[SYCL] Add atomic64 aspect decoration to atomic_ref<T *> (intel#14052)
atomic_ref<T *> uses 64-bit atomics and it should be decorated with the corresponding aspect. fixes: intel#12743
Configuration menu - View commit details
-
Copy full SHA for da3b5df - Browse repository at this point
Copy the full SHA da3b5dfView commit details -
[SYCL] Clear cache in case of PI_ERROR_OUT_OF_HOST_MEMORY (intel#14119)
I'm observing cache overflow when running heavy tests on OCL backend with gpu. Clear cache in case of PI_ERROR_OUT_OF_HOST_MEMORY as well as for PI_ERROR_OUT_OF_RESOURCES. Using as reference: intel#11987
Configuration menu - View commit details
-
Copy full SHA for c342a78 - Browse repository at this point
Copy the full SHA c342a78View commit details -
Configuration menu - View commit details
-
Copy full SHA for a5a36f8 - Browse repository at this point
Copy the full SHA a5a36f8View commit details -
[GHA] Uplift Linux IGC Dev RT version to igc-dev-480f8b6 (intel#14155)
Scheduled igc dev drivers uplift Co-authored-by: GitHub Actions <actions@github.com>
Configuration menu - View commit details
-
Copy full SHA for 957f762 - Browse repository at this point
Copy the full SHA 957f762View commit details -
[SYCL] Fix FloatVecToBF16Vec build (intel#14161)
The function is using the `operator=` before it's defined which can cause some build failures: ``` build/include/sycl/ext/oneapi/bfloat16.hpp:98:19: error: no match for ‘operator=’ (operand types are ‘sycl::_V1::ext::oneapi::bfloat16’ and ‘float’) 98 | dst[i] = src[i]; | ^ ``` Moving it after the bfloat16 class definition fixes it.
Configuration menu - View commit details
-
Copy full SHA for 8eff95c - Browse repository at this point
Copy the full SHA 8eff95cView commit details -
Bump braces from 3.0.2 to 3.0.3 in /mlir/utils/vscode (intel#14144)
Bumps [braces](https://github.com/micromatch/braces) from 3.0.2 to 3.0.3. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/micromatch/braces/commit/74b2db2938fad48a2ea54a9c8bf27a37a62c350d"><code>74b2db2</code></a> 3.0.3</li> <li><a href="https://github.com/micromatch/braces/commit/88f1429a0f47e1dd3813de35211fc97ffda27f9e"><code>88f1429</code></a> update eslint. lint, fix unit tests.</li> <li><a href="https://github.com/micromatch/braces/commit/415d660c3002d1ab7e63dbf490c9851da80596ff"><code>415d660</code></a> Snyk js braces 6838727 (<a href="https://github.com/micromatch/braces/issues/40">#40</a>)</li> <li><a href="https://github.com/micromatch/braces/commit/190510f79db1adf21d92798b0bb6fccc1f72c9d6"><code>190510f</code></a> fix tests, skip 1 test in test/braces.expand</li> <li><a href="https://github.com/micromatch/braces/commit/716eb9f12d820b145a831ad678618731927e8856"><code>716eb9f</code></a> readme bump</li> <li><a href="https://github.com/micromatch/braces/commit/a5851e57f45c3431a94d83fc565754bc10f5bbc3"><code>a5851e5</code></a> Merge pull request <a href="https://github.com/micromatch/braces/issues/37">#37</a> from coderaiser/fix/vulnerability</li> <li><a href="https://github.com/micromatch/braces/commit/2092bd1fb108d2c59bd62e243b70ad98db961538"><code>2092bd1</code></a> feature: braces: add maxSymbols (<a href="https://github.com/micromatch/braces/issues/">https://github.com/micromatch/braces/issues/</a>...</li> <li><a href="https://github.com/micromatch/braces/commit/9f5b4cf47329351bcb64287223ffb6ecc9a5e6d3"><code>9f5b4cf</code></a> fix: vulnerability (<a href="https://security.snyk.io/vuln/SNYK-JS-BRACES-6838727">https://security.snyk.io/vuln/SNYK-JS-BRACES-6838727</a>)</li> <li><a href="https://github.com/micromatch/braces/commit/98414f9f1fabe021736e26836d8306d5de747e0d"><code>98414f9</code></a> remove funding file</li> <li><a href="https://github.com/micromatch/braces/commit/665ab5d561c017a38ba7aafd92cc6655b91d8c14"><code>665ab5d</code></a> update keepEscaping doc (<a href="https://github.com/micromatch/braces/issues/27">#27</a>)</li> <li>Additional commits viewable in <a href="https://github.com/micromatch/braces/compare/3.0.2...3.0.3">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=braces&package-manager=npm_and_yarn&previous-version=3.0.2&new-version=3.0.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/intel/llvm/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 32911b2 - Browse repository at this point
Copy the full SHA 32911b2View commit details -
[CLC][AMDGPU] Refactor fence helper to process order semantic explici…
…tly (intel#12872) This PR refactors the builtin fence helper macro for AMDGPU to take in and process the order semantic explicitly because that is the only semantic argument accepted by the amdgcn builtin. Additionally, makes the `None` (Monotonic) order semantic which maps to C++/SYCL's `relaxed` to be a no-op instead of falling back to the previous `acq_rel` default order. --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie83@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 4acca90 - Browse repository at this point
Copy the full SHA 4acca90View commit details -
Configuration menu - View commit details
-
Copy full SHA for c2e5529 - Browse repository at this point
Copy the full SHA c2e5529View commit details -
[SYCL][ESIMD][E2E] Fix rotate.cpp on Windows (intel#14152)
The rotate functions are technically c++20 and MSVC hasn't implemented them yet. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 4e41992 - Browse repository at this point
Copy the full SHA 4e41992View commit details -
[Driver][SYCL][NewOffloadModel] Incorporate -device settings for GPU (i…
…ntel#14151) One of the models that is used for specifying the device architecture for spir64_gen is to use the -Xsycl-target-backend "-device arg" syntax on the command line. Hook up the ability to scan the target backend values to embed the proper information in the packaged binary when using the new offload model.
Configuration menu - View commit details
-
Copy full SHA for f9fd95e - Browse repository at this point
Copy the full SHA f9fd95eView commit details
Commits on Jun 14, 2024
-
[SYCL][COMPAT] Add math extend_vcompare[2/4] to SYCLCompat (intel#14079)
This PR adds math `extend_vcompare[2/4] `operators (4 in total) along with unit-tests for signed and unsigned int32 cases. Also, Unit-tests from previous `extend_v*4` intel#14078 and `extend_v*2` intel#13953 are moved to two different files. --------- Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@intel.com> Co-authored-by: Joe Todd <joe.todd@codeplay.com> Co-authored-by: Yihan Wang <yihan.wang@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 73cf85d - Browse repository at this point
Copy the full SHA 73cf85dView commit details -
[UR][L0] Maintain Lock of Queue while syncing the Last Command Event (i…
…ntel#14150) pre-commit PR for oneapi-src/unified-runtime#1749 --------- Signed-off-by: Neil R. Spruit <neil.r.spruit@intel.com> Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
Configuration menu - View commit details
-
Copy full SHA for 579484f - Browse repository at this point
Copy the full SHA 579484fView commit details -
[SYCL][E2E] Use callable device selector in
FilterSelector
e2e tests (intel#14162) Instead of using old device selector objects, use SYCL 2020 device selector callables to construct devices in `FilterSelector` e2e tests.
Configuration menu - View commit details
-
Copy full SHA for 19052da - Browse repository at this point
Copy the full SHA 19052daView commit details -
[SYCL][Graph] Update design doc for copy optimization and add test
- Update UR tag to include L0 command-buffer copy engine optimization - Add test which mixes copy and kernel commands - Update design doc to detail copy engine optimization
Configuration menu - View commit details
-
Copy full SHA for 090c9aa - Browse repository at this point
Copy the full SHA 090c9aaView commit details -
Update sycl/plugins/unified_runtime/CMakeLists.txt
Co-authored-by: Kenneth Benzie (Benie) <k.benzie83@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 01b1582 - Browse repository at this point
Copy the full SHA 01b1582View commit details