Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Graph] Update design doc for copy queue #362

Open
wants to merge 37 commits into
base: sycl
Choose a base branch
from

Commits on Jun 11, 2024

  1. [Driver][SYCL][NewOffload] Fix duplication of device targets (intel#1…

    …4143)
    
    When passing along multiple targets in the form of
    -fsycl-targets=intel_gpu_dg1,intel_gpu_pvc, the number of the device
    compilations was n*n as opposed to just n. Due to how we were handling
    duplicate entries for toolchain generation, the different names used
    even though they had the same target triple (spir64_gen) we being
    considered as unique, causing the multiple entries.
    
    This is the second attempt to push this one in, updated the
    sycl-offload-new-driver.c test to reflect ordering issues encountered.
    mdtoguchi authored Jun 11, 2024
    Configuration menu
    Copy the full SHA
    934b46f View commit details
    Browse the repository at this point in the history
  2. [New offload driver][Device lib] Add SYCL device library files for al…

    …l targets (intel#14102)
    
    clang-linker-wrapper is not target-specific. i.e. it is not called for a
    single target device. It is called only once.
    Currently, clang-linker-wrapper is called only with device images with
    spir64 targets. So, the existing approach to capture the first target
    triple in the list of triples and use it for gathering
    sycl-device-library files is valid. As we plan to add support for more
    targets (AOT), we need to gather sycl-device-libraries for all targets.
    This PR addresses this change.
    
    Also, the triple should not be passed to the linker wrapper. The linker
    wrapper should get the triples from device images.
    
    Thanks
    
    ---------
    
    Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
    asudarsa authored Jun 11, 2024
    Configuration menu
    Copy the full SHA
    6ecce4f View commit details
    Browse the repository at this point in the history

Commits on Jun 12, 2024

  1. [SYCL] Enable CET for wqlibsycl-devicelib-host.a (intel#14135)

    Signed-off-by: jinge90 <ge.jin@intel.com>
    jinge90 authored Jun 12, 2024
    Configuration menu
    Copy the full SHA
    c1b17e0 View commit details
    Browse the repository at this point in the history
  2. [UR] Fix size confusion for several device property queries (intel#12488

    )
    
    For testing oneapi-src/unified-runtime#1282
    
    ---------
    
    Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
    al42and and kbenzie authored Jun 12, 2024
    Configuration menu
    Copy the full SHA
    c168f21 View commit details
    Browse the repository at this point in the history
  3. [SYCL][COMPAT] Added non-const image2d_max and image3d_max getters (i…

    …ntel#14138)
    
    Required for specific use-cases in SYCLomatic.
    
    ---------
    
    Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>
    Alcpz authored Jun 12, 2024
    Configuration menu
    Copy the full SHA
    bdeb0ef View commit details
    Browse the repository at this point in the history
  4. [SYCL][Graph] Update L0 aspect test (intel#14093)

    The `Graph/UnsupportedDevice/device_query.cpp` test asserts that L0
    devices will never have full graph support. This is not the case,
    depending on the L0 device and driver version full graphs support is
    possible.
    
    Update the test to remove asserting on this, as diving into these
    details is out of the scope of the test. This was previously decided
    when discussion how to check the OpenCL backend for similar possible
    variances in aspect support.
    EwanC authored Jun 12, 2024
    Configuration menu
    Copy the full SHA
    d7bc4fc View commit details
    Browse the repository at this point in the history
  5. [SYCL][E2E] Fix CUDA include and lib paths. (intel#14118)

    `cuda_dev_kit` is not set properly in
    [test-e2e/lit.cfg.py](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/lit.cfg.py)
    due to invalid CUDA paths.
    Fixing the paths showed errors in
    [14115](intel#14115) and
    [14116](intel#14116) which are XFAILed.
    The patch fixes the failure of
    [cuda_queue_priority.cpp](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/Plugin/cuda_queue_priority.cpp)
    on Windows / CUDA.
    mmoadeli authored Jun 12, 2024
    Configuration menu
    Copy the full SHA
    a497788 View commit details
    Browse the repository at this point in the history
  6. [UR] Bump main tag to 78d02039 (intel#12269)

    oneapi-src/unified-runtime#1128
    
    ---------
    
    Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
    aarongreig and kbenzie authored Jun 12, 2024
    Configuration menu
    Copy the full SHA
    7c530e1 View commit details
    Browse the repository at this point in the history
  7. [SYCL][COMPAT] Add math extend_v*4 to SYCLCompat (intel#14078)

    This PR adds math `extend_v*4` operators (18 in total) along with
    unit-tests for signed and unsigned int32 cases.
    *Some changes overlap with the previous `extend_v*2` PR intel#13953 and thus
    should be reviewed/merged first.
    
    ---------
    
    Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@intel.com>
    Co-authored-by: Joe Todd <joe.todd@codeplay.com>
    Co-authored-by: Yihan Wang <yihan.wang@intel.com>
    4 people authored Jun 12, 2024
    Configuration menu
    Copy the full SHA
    da735fe View commit details
    Browse the repository at this point in the history
  8. [SYCL] Remove unneeded parameter from getOrInsertMemObjRecord (inte…

    …l#13807)
    
    It was propagated to `getOrCreateAllocaForReq` when creating a new
    record, but then no commands are expected to be enqueued there since the
    first alloca for a record cannot exceed its leaf limit or be linked to
    another alloca.
    sergey-semenov authored Jun 12, 2024
    Configuration menu
    Copy the full SHA
    1460126 View commit details
    Browse the repository at this point in the history
  9. [E2E] Modify commands to address running on Windows. (intel#13682)

    Running the test on Windows failed due to missing support of `ls`.
    Replacing `ls` with `cat` made the test pass on Windows.
    mmoadeli authored Jun 12, 2024
    Configuration menu
    Copy the full SHA
    bd33aaf View commit details
    Browse the repository at this point in the history
  10. [SYCL][E2E] Fix deprecated warnings in InorderQueue e2e tests (inte…

    …l#14120)
    
    - `InorderQueue/in_order_get_property.cpp` -> Use non-deprecated
    `sycl::exception`, add check for errc to ensure we are still catching
    the correct exception
    - `InorderQueue/in_order_kernels.cpp` -> Use group `get_group_id`
    function instead of deprecated `get_id`
    - `InorderQueue/in_order_usm_implicit.cpp` -> Use queue `mem_advice`
    function that uses `int` instead of `pi_mem_advice`
    ayylol authored Jun 12, 2024
    Configuration menu
    Copy the full SHA
    56f6c24 View commit details
    Browse the repository at this point in the history
  11. [UR] Update UR tag to include L0 loader related changes (intel#14109)

    Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
    againull and kbenzie authored Jun 12, 2024
    Configuration menu
    Copy the full SHA
    1a885ec View commit details
    Browse the repository at this point in the history
  12. [UR] Bump main tag to b13c5e1f (intel#14042)

    oneapi-src/unified-runtime#1711
    
    ---------
    
    Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
    hdelan and kbenzie authored Jun 12, 2024
    Configuration menu
    Copy the full SHA
    ae79b95 View commit details
    Browse the repository at this point in the history
  13. [SYCL] Remove redundant code from L0 plugin's cmake file (intel#14108)

    Currently Level Zero plugin uses loader and headers fetched by the Level
    Zero adapter (LevelZeroLoader-Headers, LevelZeroLoader targets).
    Currently downloaded loader code is not used, only headers are used for
    xpti.
    So, get headers location from LevelZeroLoader-Headers target instead and
    remove unnecessary code.
    againull authored Jun 12, 2024
    Configuration menu
    Copy the full SHA
    87f47b4 View commit details
    Browse the repository at this point in the history
  14. [SYCL] Add support for key/value sorting APIs (intel#13942)

    Add group_key_value_sorter sorters and sort_key_value_over_group APIs
    based on
    https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_group_sort.asciidoc
    extension.
    
    This PR was split out from larger PR:
    intel#13713
    
    Co-authored-by: "Andrei Fedorov
    [andrey.fedorov@intel.com](mailto:andrey.fedorov@intel.com)"
    Co-authored-by: "Romanov Vlad
    [vlad.romanov@intel.com](mailto:vlad.romanov@intel.com)"
    againull authored Jun 12, 2024
    Configuration menu
    Copy the full SHA
    3910d0c View commit details
    Browse the repository at this point in the history
  15. [SYCL][NewOffload][E2E] add a single test for --offload-new-driver (i…

    …ntel#14129)
    
    A single basic file to compile and run to test functionality of
    --offload-new-driver
    
    ---------
    
    Co-authored-by: Marcos Maronas <maarquitos14@users.noreply.github.com>
    jasonlizhengjian and maarquitos14 authored Jun 12, 2024
    Configuration menu
    Copy the full SHA
    fe8c284 View commit details
    Browse the repository at this point in the history
  16. [SYCL] [libdevice] Add vector overloads of ConvertBFloat16ToFINTEL an…

    …d ConvertFToBFloat16INTEL (intel#14085)
    
    This PR adds vector overloads of `ConvertBFloat16ToFINTEL` and
    `ConvertFToBFloat16INTEL` to libdevice (SPEC:
    https://spec.oneapi.io/level-zero/latest/core/SPIRV.html#bfloat16-conversions)
    and a wrapper around it (`BF16VecToFloatVec` and `FloatVecToBF16Vec`) in
    `ext::oneapi::detail`.
    
    These overloads are intended to optimize BFloat16 `marray`, `vec`
    operations, for which we currently do element-by-element `bfloat16 ->
    float -> bfloat16` conversions.
    uditagarwal97 authored Jun 12, 2024
    Configuration menu
    Copy the full SHA
    13a7b3a View commit details
    Browse the repository at this point in the history
  17. [SYCL] Use std::array as storage for sycl::vec on device (intel#1…

    …4130)
    
    Replaces intel#13270
    
    Changing the storage to std::array instead of Clang's extension fixes
    strict ansi-aliasing violation and simplifies device code.
    uditagarwal97 authored Jun 12, 2024
    Configuration menu
    Copy the full SHA
    e7defab View commit details
    Browse the repository at this point in the history

Commits on Jun 13, 2024

  1. [SYCL] Adding support for missing math ops (intel#14132)

    [SYCL] Adding support for missing math ops:
    - truncf
    - sinpif
    - rsqrtf
    - exp10f
    MaryaSharf authored Jun 13, 2024
    Configuration menu
    Copy the full SHA
    9942378 View commit details
    Browse the repository at this point in the history
  2. [Doc] Document Unified Runtime update process (intel#14097)

    Add section to the contribution guide detailing the current process for
    integrating Unified Runtime updates into DPC++.
    kbenzie authored Jun 13, 2024
    Configuration menu
    Copy the full SHA
    e34b7ff View commit details
    Browse the repository at this point in the history
  3. [SYCL] Disable in-order queue barrier optimization while profiling (i…

    …ntel#14123)
    
    Current implementation of profiling info for NOP barriers is
    inconsistent
    with other events from the same queue (e.g., if the previous event
    started
    after the barrier was submitted). To make them consistent while keeping
    the optimization, we would need to duplicate the event on our side and
    make the duplicate check and potentially use profiling info of its
    previous event.
    
    Instead, as the first step, disable the NOP optimization during
    profiling
    since profiling is known to incur a performance hit anyway. The proper
    duplicate event approach can be implemented as a follow up if this
    causes issues for users.
    
    Partially reverts intel#12949
    sergey-semenov authored Jun 13, 2024
    Configuration menu
    Copy the full SHA
    f2cd2a8 View commit details
    Browse the repository at this point in the history
  4. [SYCL] Add atomic64 aspect decoration to atomic_ref<T *> (intel#14052)

    atomic_ref<T *> uses 64-bit atomics and it should be decorated with the
    corresponding aspect.
    
    fixes: intel#12743
    maksimsab authored Jun 13, 2024
    Configuration menu
    Copy the full SHA
    da3b5df View commit details
    Browse the repository at this point in the history
  5. [SYCL] Clear cache in case of PI_ERROR_OUT_OF_HOST_MEMORY (intel#14119)

    I'm observing cache overflow when running heavy tests on OCL backend
    with gpu. Clear cache in case of PI_ERROR_OUT_OF_HOST_MEMORY as well as
    for PI_ERROR_OUT_OF_RESOURCES.
    Using as reference: intel#11987
    KornevNikita authored Jun 13, 2024
    Configuration menu
    Copy the full SHA
    c342a78 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    a5a36f8 View commit details
    Browse the repository at this point in the history
  7. [GHA] Uplift Linux IGC Dev RT version to igc-dev-480f8b6 (intel#14155)

    Scheduled igc dev drivers uplift
    
    Co-authored-by: GitHub Actions <actions@github.com>
    bb-sycl and actions-user authored Jun 13, 2024
    Configuration menu
    Copy the full SHA
    957f762 View commit details
    Browse the repository at this point in the history
  8. [SYCL] Fix FloatVecToBF16Vec build (intel#14161)

    The function is using the `operator=` before it's defined which can
    cause some build failures:
    
    ```
    build/include/sycl/ext/oneapi/bfloat16.hpp:98:19: error: no match for ‘operator=’ (operand types are ‘sycl::_V1::ext::oneapi::bfloat16’ and ‘float’)
       98 |     dst[i] = src[i];
          |                   ^
    ```
    
    Moving it after the bfloat16 class definition fixes it.
    npmiller authored Jun 13, 2024
    Configuration menu
    Copy the full SHA
    8eff95c View commit details
    Browse the repository at this point in the history
  9. Bump braces from 3.0.2 to 3.0.3 in /mlir/utils/vscode (intel#14144)

    Bumps [braces](https://github.com/micromatch/braces) from 3.0.2 to
    3.0.3.
    <details>
    <summary>Commits</summary>
    <ul>
    <li><a
    href="https://github.com/micromatch/braces/commit/74b2db2938fad48a2ea54a9c8bf27a37a62c350d"><code>74b2db2</code></a>
    3.0.3</li>
    <li><a
    href="https://github.com/micromatch/braces/commit/88f1429a0f47e1dd3813de35211fc97ffda27f9e"><code>88f1429</code></a>
    update eslint. lint, fix unit tests.</li>
    <li><a
    href="https://github.com/micromatch/braces/commit/415d660c3002d1ab7e63dbf490c9851da80596ff"><code>415d660</code></a>
    Snyk js braces 6838727 (<a
    href="https://github.com/micromatch/braces/issues/40">#40</a>)</li>
    <li><a
    href="https://github.com/micromatch/braces/commit/190510f79db1adf21d92798b0bb6fccc1f72c9d6"><code>190510f</code></a>
    fix tests, skip 1 test in test/braces.expand</li>
    <li><a
    href="https://github.com/micromatch/braces/commit/716eb9f12d820b145a831ad678618731927e8856"><code>716eb9f</code></a>
    readme bump</li>
    <li><a
    href="https://github.com/micromatch/braces/commit/a5851e57f45c3431a94d83fc565754bc10f5bbc3"><code>a5851e5</code></a>
    Merge pull request <a
    href="https://github.com/micromatch/braces/issues/37">#37</a>
    from coderaiser/fix/vulnerability</li>
    <li><a
    href="https://github.com/micromatch/braces/commit/2092bd1fb108d2c59bd62e243b70ad98db961538"><code>2092bd1</code></a>
    feature: braces: add maxSymbols (<a
    href="https://github.com/micromatch/braces/issues/">https://github.com/micromatch/braces/issues/</a>...</li>
    <li><a
    href="https://github.com/micromatch/braces/commit/9f5b4cf47329351bcb64287223ffb6ecc9a5e6d3"><code>9f5b4cf</code></a>
    fix: vulnerability (<a
    href="https://security.snyk.io/vuln/SNYK-JS-BRACES-6838727">https://security.snyk.io/vuln/SNYK-JS-BRACES-6838727</a>)</li>
    <li><a
    href="https://github.com/micromatch/braces/commit/98414f9f1fabe021736e26836d8306d5de747e0d"><code>98414f9</code></a>
    remove funding file</li>
    <li><a
    href="https://github.com/micromatch/braces/commit/665ab5d561c017a38ba7aafd92cc6655b91d8c14"><code>665ab5d</code></a>
    update keepEscaping doc (<a
    href="https://github.com/micromatch/braces/issues/27">#27</a>)</li>
    <li>Additional commits viewable in <a
    href="https://github.com/micromatch/braces/compare/3.0.2...3.0.3">compare
    view</a></li>
    </ul>
    </details>
    <br />
    
    
    [![Dependabot compatibility
    score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=braces&package-manager=npm_and_yarn&previous-version=3.0.2&new-version=3.0.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
    
    Dependabot will resolve any conflicts with this PR as long as you don't
    alter it yourself. You can also trigger a rebase manually by commenting
    `@dependabot rebase`.
    
    [//]: # (dependabot-automerge-start)
    [//]: # (dependabot-automerge-end)
    
    ---
    
    <details>
    <summary>Dependabot commands and options</summary>
    <br />
    
    You can trigger Dependabot actions by commenting on this PR:
    - `@dependabot rebase` will rebase this PR
    - `@dependabot recreate` will recreate this PR, overwriting any edits
    that have been made to it
    - `@dependabot merge` will merge this PR after your CI passes on it
    - `@dependabot squash and merge` will squash and merge this PR after
    your CI passes on it
    - `@dependabot cancel merge` will cancel a previously requested merge
    and block automerging
    - `@dependabot reopen` will reopen this PR if it is closed
    - `@dependabot close` will close this PR and stop Dependabot recreating
    it. You can achieve the same result by closing it manually
    - `@dependabot show <dependency name> ignore conditions` will show all
    of the ignore conditions of the specified dependency
    - `@dependabot ignore this major version` will close this PR and stop
    Dependabot creating any more for this major version (unless you reopen
    the PR or upgrade to it yourself)
    - `@dependabot ignore this minor version` will close this PR and stop
    Dependabot creating any more for this minor version (unless you reopen
    the PR or upgrade to it yourself)
    - `@dependabot ignore this dependency` will close this PR and stop
    Dependabot creating any more for this dependency (unless you reopen the
    PR or upgrade to it yourself)
    You can disable automated security fix PRs for this repo from the
    [Security Alerts page](https://github.com/intel/llvm/network/alerts).
    
    </details>
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    dependabot[bot] authored Jun 13, 2024
    Configuration menu
    Copy the full SHA
    32911b2 View commit details
    Browse the repository at this point in the history
  10. [CLC][AMDGPU] Refactor fence helper to process order semantic explici…

    …tly (intel#12872)
    
    This PR refactors the builtin fence helper macro for AMDGPU to take in
    and process the order semantic explicitly because that is the only
    semantic argument accepted by the amdgcn builtin.
    
    Additionally, makes the `None` (Monotonic) order semantic which maps to
    C++/SYCL's `relaxed` to be a no-op instead of falling back to the
    previous `acq_rel` default order.
    
    ---------
    
    Co-authored-by: Kenneth Benzie (Benie) <k.benzie83@gmail.com>
    GeorgeWeb and kbenzie authored Jun 13, 2024
    Configuration menu
    Copy the full SHA
    4acca90 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    c2e5529 View commit details
    Browse the repository at this point in the history
  12. [SYCL][ESIMD][E2E] Fix rotate.cpp on Windows (intel#14152)

    The rotate functions are technically c++20 and MSVC hasn't implemented
    them yet.
    
    Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
    sarnex authored Jun 13, 2024
    Configuration menu
    Copy the full SHA
    4e41992 View commit details
    Browse the repository at this point in the history
  13. [Driver][SYCL][NewOffloadModel] Incorporate -device settings for GPU (i…

    …ntel#14151)
    
    One of the models that is used for specifying the device architecture
    for spir64_gen is to use the -Xsycl-target-backend "-device arg" syntax
    on the command line.
    
    Hook up the ability to scan the target backend values to embed the
    proper information in the packaged binary when using the new offload
    model.
    mdtoguchi authored Jun 13, 2024
    Configuration menu
    Copy the full SHA
    f9fd95e View commit details
    Browse the repository at this point in the history

Commits on Jun 14, 2024

  1. [SYCL][COMPAT] Add math extend_vcompare[2/4] to SYCLCompat (intel#14079)

    This PR adds math `extend_vcompare[2/4] `operators (4 in total) along
    with unit-tests for signed and unsigned int32 cases.
    Also, Unit-tests from previous `extend_v*4` intel#14078 and `extend_v*2`
    intel#13953 are moved to two different files.
    
    ---------
    
    Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@intel.com>
    Co-authored-by: Joe Todd <joe.todd@codeplay.com>
    Co-authored-by: Yihan Wang <yihan.wang@intel.com>
    4 people authored Jun 14, 2024
    Configuration menu
    Copy the full SHA
    73cf85d View commit details
    Browse the repository at this point in the history
  2. [UR][L0] Maintain Lock of Queue while syncing the Last Command Event (i…

    …ntel#14150)
    
    pre-commit PR for
    oneapi-src/unified-runtime#1749
    
    ---------
    
    Signed-off-by: Neil R. Spruit <neil.r.spruit@intel.com>
    Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
    nrspruit and kbenzie authored Jun 14, 2024
    Configuration menu
    Copy the full SHA
    579484f View commit details
    Browse the repository at this point in the history
  3. [SYCL][E2E] Use callable device selector in FilterSelector e2e tests (

    intel#14162)
    
    Instead of using old device selector objects, use SYCL 2020 device
    selector callables to construct devices in `FilterSelector` e2e tests.
    ayylol authored Jun 14, 2024
    Configuration menu
    Copy the full SHA
    19052da View commit details
    Browse the repository at this point in the history
  4. [SYCL][Graph] Update design doc for copy optimization and add test

    - Update UR tag to include L0 command-buffer copy engine optimization
    - Add test which mixes copy and kernel commands
    - Update design doc to detail copy engine optimization
    mfrancepillois authored and EwanC committed Jun 14, 2024
    Configuration menu
    Copy the full SHA
    090c9aa View commit details
    Browse the repository at this point in the history
  5. Update sycl/plugins/unified_runtime/CMakeLists.txt

    Co-authored-by: Kenneth Benzie (Benie) <k.benzie83@gmail.com>
    EwanC and kbenzie committed Jun 14, 2024
    Configuration menu
    Copy the full SHA
    01b1582 View commit details
    Browse the repository at this point in the history