Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add skip to flaky MacOS RPC test #9753

Merged
merged 5 commits into from
Jan 6, 2022
Merged

Add skip to flaky MacOS RPC test #9753

merged 5 commits into from
Jan 6, 2022

Conversation

driazati
Copy link
Member

@driazati driazati commented Dec 16, 2021

Skip for #9824

cc @areusch

@KJlaccHoeUM9l
Copy link
Contributor

Hello @driazati!

Thank you for your comment! Our team also noticed this problem.
It looks like the first time this error occurred when running this action in PR#9483.

Failure occurs in a test that seeks to verify that the auto scheduler can be used with an RPC Session. The error occurs due to checking the tuning log after a tuning session. The test says that at this stage, an error code was found inside the log, which says the following:

Errors happen when compiling code on device
(e.g. OpenCL JIT on the device)

It doesn't look like the problem is in the whole test or in the iOS RPC application.

At the moment, it has not been possible to reproduce the problem locally to find out more details about this failure. If you have any additional information about this or can reproduce this problem yourself, then we would be very grateful for any additional observations.

@driazati
Copy link
Member Author

I wasn't able to reproduce it locally either (and it doesn't seem to fail in CI super often either, maybe 1 out of every 20 recent runs on main) and I don't really know the autotuning/rpc code well enough to guess where the problem might be, but it still comes up from time to time so I think it should be disabled until a proper repro/fix is found. It makes more sense to mark this as a flaky failure with xfail rather than skip though so it will still run, but not report as an unexpected error if it fails.

@KJlaccHoeUM9l
Copy link
Contributor

Hello @driazati!
We have only been able to reproduce this issue in the azure cloud.
To do this, we slightly modified main.yaml as follows:

for ((i=1; i < 100; i++)); do python -m pytest -vrP tests/python/contrib/test_rpc_server_device.py; done

It looks like this line has fallen off:

func = remote.load_module (os.path.split (build_res.filename) [1])

The problem requires further investigation.

@areusch
Copy link
Contributor

areusch commented Jan 3, 2022

thanks @driazati can you create a GH issue for this test or mention it in this PR?

@areusch
Copy link
Contributor

areusch commented Jan 6, 2022

sorry one q: has anyone repro'd this on linux? if it hasn't failed there, i'd lean towards disabling it only on windows/os x for now. the GH actions can be retriggered and don't necessarily block commit.

@KJlaccHoeUM9l could you let us know which os you were using in azure?

@areusch
Copy link
Contributor

areusch commented Jan 6, 2022

oh sorry--ignore. i misread the test decorators.

@areusch areusch merged commit 33724bb into apache:main Jan 6, 2022
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 7, 2022
* Add skip to flaky MacOS RPC test

* Use flaky marker instead

* link issue

* trigger ci

* trigger ci

Co-authored-by: driazati <driazati@users.noreply.github.com>
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
* Add skip to flaky MacOS RPC test

* Use flaky marker instead

* link issue

* trigger ci

* trigger ci

Co-authored-by: driazati <driazati@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants