Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] Run Linux OpenCL tests against POCL instead of the AMD App SDK #5282

Merged
merged 4 commits into from
Jun 13, 2022

Conversation

jgiannuzzi
Copy link
Contributor

As discussed in #5252 (comment), I think that using PoCL for the Linux gpu tests would make a lot of sense, as it is open source and actively maintained, as opposed to the AMD App SDK.

The dask tests are currently disabled for the gpu task, but they in my local testing the do succeed with PoCL. Should we maybe (re-)enable them?

@StrikerRUS
Copy link
Collaborator

The dask tests are currently disabled for the gpu task, but they in my local testing the do succeed with PoCL. Should we maybe (re-)enable them?

Here is the reason why they are disabled: #3708 (comment). I think for this PR we shouldn't enable them to not delay #5252. In a follow-up PR we'll definitely try to re-enable them and run multiple times to ensure their stability.

@StrikerRUS
Copy link
Collaborator

How about Windows, is PoCL available there?

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Jun 12, 2022

Could you please update GPU Targets Table and OpenCL SDK Installation with PoCL in this PR as well?

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this PR!

Just some minor comments below.

.ci/setup.sh Show resolved Hide resolved
.ci/setup.sh Outdated Show resolved Hide resolved
.vsts-ci.yml Outdated Show resolved Hide resolved
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
@jgiannuzzi
Copy link
Contributor Author

How about Windows, is PoCL available there?

It's available, but unfortunately not as an ICD, which means that we wouldn't be able to test the integrated OpenCL builds, and that the other OpenCL builds would not test something representative.

@StrikerRUS
Copy link
Collaborator

It's available, but unfortunately not as an ICD,

OK, got it! Thank you for the explanation.

@jgiannuzzi
Copy link
Contributor Author

Could you please update GPU Targets Table and OpenCL SDK Installation with PoCL in this PR as well?

I have updated the GPU Targets Table. I haven't updated the Windows docs, as I don't think we can recommend using PoCL on that platform.

@StrikerRUS
Copy link
Collaborator

I haven't updated the Windows docs, as I don't think we can recommend using PoCL on that platform.

Given this your previous comment #5282 (comment), this sounds reasonable. Thank you!

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks so much!

@StrikerRUS StrikerRUS changed the title Run Linux OpenCL tests against POCL instead of the AMD App SDK [ci] Run Linux OpenCL tests against POCL instead of the AMD App SDK Jun 13, 2022
@StrikerRUS StrikerRUS merged commit 6b89651 into microsoft:master Jun 13, 2022
@jameslamb
Copy link
Collaborator

Love this change, thanks very much for all the hard work @jgiannuzzi !

@StrikerRUS
Copy link
Collaborator

The dask tests are currently disabled for the gpu task, but they in my local testing the do succeed with PoCL. Should we maybe (re-)enable them?

Here is the reason why they are disabled: #3708 (comment). I think for this PR we shouldn't enable them to not delay #5252. In a follow-up PR we'll definitely try to re-enable them and run multiple times to ensure their stability.

Unfortunately, with PoCL Dask tests fail with OSError: dlopen: cannot load any more object with static TLS error in both 14.04 and 20.04 Ubuntu.
#5285 (comment)

@jgiannuzzi
Copy link
Contributor Author

Unfortunately, with PoCL Dask tests fail with OSError: dlopen: cannot load any more object with static TLS error in both 14.04 and 20.04 Ubuntu. #5285 (comment)

My understanding is that the dask tests only fail with this error on Ubuntu 14.04 with Python 3.10. I ran them locally with Python 3.8 and 3.9 and they work fine. Do we maybe want to set PYTHON_VERSION again on the Linux gpu_source job?

On Ubuntu 20.04, the tests seem to fail because of a timeout. Enabling those tests on a fake GPU is expected to take more time. Maybe setting timeoutInMinutes like we do for the aarch64 job would help?

I'm happy to create a PR for this if you'd like.

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants