Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure in Graph E2E tests on unrelated changes on Windows #11852

Open
dm-vodopyanov opened this issue Nov 10, 2023 · 6 comments
Open

Failure in Graph E2E tests on unrelated changes on Windows #11852

dm-vodopyanov opened this issue Nov 10, 2023 · 6 comments
Assignees
Labels
bug Something isn't working sycl-graph

Comments

@dm-vodopyanov
Copy link
Contributor

dm-vodopyanov commented Nov 10, 2023

https://github.com/intel/llvm/actions/runs/6813588700/job/18545304133?pr=11837

Looks flaky.

FAIL: SYCL :: Graph/RecordReplay/multiple_exec_graphs.cpp (875 of 1756)
******************** TEST 'SYCL :: Graph/RecordReplay/multiple_exec_graphs.cpp' FAILED ********************
Exit Code: 3221226505

Command Output (stdout):
--
# RUN: at line 2
D:/github/actions-runner/_work/llvm/llvm/install/bin/clang++.exe   -fsycl -fsycl-targets=spir64 D:\github\actions-runner\_work\llvm\llvm\llvm\sycl\test-e2e\Graph\RecordReplay\multiple_exec_graphs.cpp -o D:\github\actions-runner\_work\llvm\llvm\build-e2e\Graph\RecordReplay\Output\multiple_exec_graphs.cpp.tmp.out
# executed command: D:/github/actions-runner/_work/llvm/llvm/install/bin/clang++.exe -fsycl -fsycl-targets=spir64 'D:\github\actions-runner\_work\llvm\llvm\llvm\sycl\test-e2e\Graph\RecordReplay\multiple_exec_graphs.cpp' -o 'D:\github\actions-runner\_work\llvm\llvm\build-e2e\Graph\RecordReplay\Output\multiple_exec_graphs.cpp.tmp.out'
# RUN: at line 3
env ONEAPI_DEVICE_SELECTOR=ext_oneapi_level_zero:gpu  D:\github\actions-runner\_work\llvm\llvm\build-e2e\Graph\RecordReplay\Output\multiple_exec_graphs.cpp.tmp.out
# executed command: env ONEAPI_DEVICE_SELECTOR=ext_oneapi_level_zero:gpu 'D:\github\actions-runner\_work\llvm\llvm\build-e2e\Graph\RecordReplay\Output\multiple_exec_graphs.cpp.tmp.out'
# .---command stdout------------
# | Unexpected value at index 1016 for DataA: 1017 (got) vs 1022 (expected)
# `-----------------------------
# .---command stderr------------
# | Assertion failed: check_value(i, ReferenceA[i], DataA[i], "DataA"), file D:\github\actions-runner\_work\llvm\llvm\llvm\sycl\test-e2e\Graph\RecordReplay\../Inputs/multiple_exec_graphs.cpp, line [58](https://github.com/intel/llvm/actions/runs/6813588700/job/18545304133?pr=11837#step:12:59)
# `-----------------------------
# error: command failed with exit status: 0xc0000409
@dm-vodopyanov
Copy link
Contributor Author

Tagging @intel/sycl-graphs-reviewers

@EwanC
Copy link
Contributor

EwanC commented Nov 13, 2023

Noting that a fail in another graphs test on windows with the same error code was spotted in post-commit CI https://github.com/intel/llvm/actions/runs/6830226522/job/18578894197

FAIL: SYCL :: Graph/Explicit/basic_usm_mixed.cpp (800 of 1733)
******************** TEST 'SYCL :: Graph/Explicit/basic_usm_mixed.cpp' FAILED ********************
Exit Code: 3221226505

@EwanC
Copy link
Contributor

EwanC commented Jan 31, 2024

Another data point that in PR #12524 CI run https://github.com/intel/llvm/actions/runs/7710761184/job/21016979144?pr=12524 failed with the same error

 FAIL: SYCL :: Graph/Explicit/basic_usm_mixed.cpp (865 of 1853)
******************** TEST 'SYCL :: Graph/Explicit/basic_usm_mixed.cpp' FAILED ********************
Exit Code: 3221226505

@EwanC
Copy link
Contributor

EwanC commented Jan 31, 2024

Also https://github.com/intel/llvm/actions/runs/7723004272/job/21053409769

FAIL: SYCL :: Graph/RecordReplay/queue_constructor_usm.cpp (982 of 1853)
******************** TEST 'SYCL :: Graph/RecordReplay/queue_constructor_usm.cpp' FAILED ********************
Exit Code: 3221226505

@EwanC
Copy link
Contributor

EwanC commented Jan 31, 2024

Fail in https://github.com/intel/llvm/actions/runs/7724843393/job/21059943006?pr=12297

FAIL: SYCL :: Graph/RecordReplay/after_use.cpp (927 of 1855)
******************** TEST 'SYCL :: Graph/RecordReplay/after_use.cpp' FAILED ********************
Exit Code: 3221226505

Command Output (stdout):
--
# RUN: at line 1
D:/github/_work/llvm/llvm/install/bin/clang++.exe   -fsycl -fsycl-targets=spir64 D:\github\_work\llvm\llvm\llvm\sycl\test-e2e\Graph\RecordReplay\after_use.cpp -o D:\github\_work\llvm\llvm\build-e2e\Graph\RecordReplay\Output\after_use.cpp.tmp.out
# executed command: D:/github/_work/llvm/llvm/install/bin/clang++.exe -fsycl -fsycl-targets=spir64 'D:\github\_work\llvm\llvm\llvm\sycl\test-e2e\Graph\RecordReplay\after_use.cpp' -o 'D:\github\_work\llvm\llvm\build-e2e\Graph\RecordReplay\Output\after_use.cpp.tmp.out'
# RUN: at line 2
env ONEAPI_DEVICE_SELECTOR=level_zero:gpu  D:\github\_work\llvm\llvm\build-e2e\Graph\RecordReplay\Output\after_use.cpp.tmp.out
# executed command: env ONEAPI_DEVICE_SELECTOR=level_zero:gpu 'D:\github\_work\llvm\llvm\build-e2e\Graph\RecordReplay\Output\after_use.cpp.tmp.out'
# .---command stdout------------
# | Unexpected value at index 416 for DataA: 417 (got) vs 422 (expected)
# `-----------------------------
# .---command stderr------------
# | Assertion failed: check_value(i, ReferenceA[i], DataA[i], "DataA"), file D:/github/_work/llvm/llvm/llvm/sycl/test-e2e/Graph/RecordReplay/after_use.cpp, line 71
# `-----------------------------
# error: command failed with exit status: 0xc0000409

--

********************

@EwanC EwanC changed the title Failure in SYCL :: Graph/RecordReplay/multiple_exec_graphs.cpp on unrelated changes on Windows Failure in Graph E2E tests on unrelated changes on Windows Jan 31, 2024
EwanC added a commit to reble/llvm that referenced this issue May 30, 2024
Several of the SYCL-Graph E2E tests occasionally fail in
CI for unrelated PRs. Disable all the tests which have
been documented as failing on Windows at any point.

See related GitHub Issues:

* intel#13951
* intel#12941
* intel#11852
sommerlukas pushed a commit that referenced this issue May 30, 2024
Several of the SYCL-Graph E2E tests occasionally fail in CI on Windows
for unrelated PRs. We can replicate this locally with a lot of effort,
but have not yet been able to diagnose the root cause of find a fix.
Disable all the tests which have been documented as sporadically failing
on Windows at any point.

See related GitHub Issues:

* #13951
* #12941
* #11852
@AlexeySachkov
Copy link
Contributor

AlexeySachkov commented Jun 6, 2024

SYCL :: Graph/Explicit/add_nodes_after_finalize.cpp seems to be another flaky graph test, it failed in https://github.com/intel/llvm/actions/runs/9371141725/job/25803459105?pr=13985 :

FAIL: SYCL :: Graph/Explicit/add_nodes_after_finalize.cpp (981 of 2065)
******************** TEST 'SYCL :: Graph/Explicit/add_nodes_after_finalize.cpp' FAILED ********************
Exit Code: 3221226505

Command Output (stdout):
--
# RUN: at line 1
D:/github/_work/llvm/llvm/install/bin/clang++.exe   -fsycl -fsycl-targets=spir[64](https://github.com/intel/llvm/actions/runs/9371141725/job/25803459105?pr=13985#step:12:65)  D:\github\_work\llvm\llvm\llvm\sycl\test-e2e\Graph\Explicit\add_nodes_after_finalize.cpp -o D:\github\_work\llvm\llvm\build-e2e\Graph\Explicit\Output\add_nodes_after_finalize.cpp.tmp.out
# executed command: D:/github/_work/llvm/llvm/install/bin/clang++.exe -fsycl -fsycl-targets=spir64 'D:\github\_work\llvm\llvm\llvm\sycl\test-e2e\Graph\Explicit\add_nodes_after_finalize.cpp' -o 'D:\github\_work\llvm\llvm\build-e2e\Graph\Explicit\Output\add_nodes_after_finalize.cpp.tmp.out'
# RUN: at line 2
env ONEAPI_DEVICE_SELECTOR=level_zero:gpu  D:\github\_work\llvm\llvm\build-e2e\Graph\Explicit\Output\add_nodes_after_finalize.cpp.tmp.out
# executed command: env ONEAPI_DEVICE_SELECTOR=level_zero:gpu 'D:\github\_work\llvm\llvm\build-e2e\Graph\Explicit\Output\add_nodes_after_finalize.cpp.tmp.out'
# RUN: at line 4
env SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=0 env UR_L0_LEAKS_DEBUG=1 SYCL_ENABLE_DEFAULT_CONTEXTS=0 env ONEAPI_DEVICE_SELECTOR=level_zero:gpu  D:\github\_work\llvm\llvm\build-e2e\Graph\Explicit\Output\add_nodes_after_finalize.cpp.tmp.out 2>&1 | d:\github\_work\llvm\llvm\install\bin\filecheck.exe D:\github\_work\llvm\llvm\llvm\sycl\test-e2e\Graph\Explicit\add_nodes_after_finalize.cpp --implicit-check-not=LEAK
# executed command: env SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=0 env UR_L0_LEAKS_DEBUG=1 SYCL_ENABLE_DEFAULT_CONTEXTS=0 env ONEAPI_DEVICE_SELECTOR=level_zero:gpu 'D:\github\_work\llvm\llvm\build-e2e\Graph\Explicit\Output\add_nodes_after_finalize.cpp.tmp.out'
# note: command had no output on stdout or stderr
# error: command failed with exit status: 0xc0000409
# executed command: 'd:\github\_work\llvm\llvm\install\bin\filecheck.exe' 'D:\github\_work\llvm\llvm\llvm\sycl\test-e2e\Graph\Explicit\add_nodes_after_finalize.cpp' --implicit-check-not=LEAK

But passed after a job restart (no changes to the PR were made, restart triggered through UI): https://github.com/intel/llvm/actions/runs/9371141725/job/25840050618?pr=13985

EwanC added a commit to reble/llvm that referenced this issue Jun 6, 2024
`test-e2e/Graph/Explicit/add_nodes_after_finalize.cpp` has
been reported as failing on an unrelated PR on Windows - intel#11852 (comment)

Disable this test in line with how other flaky graphs tests
have been disabled on Windows intel#13966

The `RecordReplay` equivalent of this Explicit test is already disabled
on Windows.
EwanC added a commit to reble/llvm that referenced this issue Jun 10, 2024
`test-e2e/Graph/Explicit/add_nodes_after_finalize.cpp` has
been reported as failing on an unrelated PR on Windows - intel#11852 (comment)

Disable this test in line with how other flaky graphs tests
have been disabled on Windows intel#13966

The `RecordReplay` equivalent of this Explicit test is already disabled
on Windows.
steffenlarsen pushed a commit that referenced this issue Jun 10, 2024
`test-e2e/Graph/Explicit/add_nodes_after_finalize.cpp` has been reported
as failing on an unrelated PR on Windows -
#11852 (comment)

Disable this test in line with how other flaky graphs tests have been
disabled on Windows in #13966

The `RecordReplay` equivalent of this Explicit test is already disabled
on Windows.
ianayl pushed a commit to ianayl/sycl that referenced this issue Jun 13, 2024
`test-e2e/Graph/Explicit/add_nodes_after_finalize.cpp` has been reported
as failing on an unrelated PR on Windows -
intel#11852 (comment)

Disable this test in line with how other flaky graphs tests have been
disabled on Windows in intel#13966

The `RecordReplay` equivalent of this Explicit test is already disabled
on Windows.
@EwanC EwanC assigned fabiomestre and unassigned mfrancepillois Jul 8, 2024
EwanC added a commit to reble/llvm that referenced this issue Jul 8, 2024
Improve tracking of disabled E2E tests by providing
a link to GitHub issue where bug is tracker, or explicitly
mentioning that a skip is intended, i.e. Due to a known limitation
external to the SYCL RT.

See the [OpenCL
section](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/CommandGraph.md#opencl)
of the SYCL-Graph design doc as to why a large number of tests
are intended skips on the OpenCL backend.

Unsupported SYCL-Graph tests currently fall into 3 categories:
* [Flaky Windows tests](intel#11852)
* [Arc fails](intel#14474)
* [host-task leaks when using with L0 immediate
  command-lists](intel#14473)
sommerlukas pushed a commit that referenced this issue Jul 10, 2024
Improve tracking of disabled E2E tests by providing a link to GitHub
issue where the bug is tracked, or explicitly mentioning that a skip is
intended, i.e. Due to a known limitation external to the SYCL RT.

See the [OpenCL
section](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/CommandGraph.md#limitations)
of the SYCL-Graph design doc as to why a large number of tests are
intended skips on the OpenCL backend.

Unsupported SYCL-Graph tests currently fall into 3 categories:
* [Flaky Windows tests](#11852)
* [Arc fails](#14474)
* [host-task leaks when using with L0 immediate
command-lists](#14473)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working sycl-graph
Projects
None yet
Development

No branches or pull requests

8 participants