-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add timeouts for tests #7657
add timeouts for tests #7657
Conversation
This is plenty for a test to run: we shouldn't have tests that take more than 30 seconds, but the windows / macos runners tend to be slower than the linux ones.
apparently python 3.11 on windows is also pretty slow, which makes the timeouts appear there as well. But in any case, here's the affected tests:
which are the exact same tests that also fail for >300s, so I'm guessing those are the ones that stall. Does anyone know why those take so long only on macos (and, partially, windows)? Does that have to do with the runner hardware? |
Some of there files are 10 numbers, so the netcdf should be ~50kb or so (mostly the header overhead). |
I guess that means that the CPUs of the windows and macos runners are just slow, or there's other tasks that get prioritized, or something. All of this results in the tests being pretty flaky, so I'm not sure what we can do about it. I don't have access to any, but maybe someone with a mac could try to reproduce? In any case, I'll increase the timeout a bit again since I think a timeout after ~1hour is better than the CI job being cancelled after 6 hours. |
since the |
Is it for a particular backend? |
Not sure I understand. Can you elaborate, please? |
I'm thinking this failures is some bad interaction between an I/O backend and dask threads. But I'm having trouble figuring out which backend. |
depends on the test, I guess. Most of them are related to one of the netcdf backends (not sure which, the tests don't specify that), I've also seen a drastic reduction in performance with HDF5 1.12.2 (both netcdf4 and h5netcdf) on one of my colleague's datasets, so maybe that's much more visible on a mac? That doesn't explain the slow Do we isolate the dask scheduler in any way? I assume that makes use of the builtin (non- |
IIRC all distributed tests are in
The default scheduler is |
* main: (40 commits) Faq pull request (According to pull request pydata#7604 & issue pydata#1285 (pydata#7638) add timeouts for tests (pydata#7657) Pull Request Labeler - Undo workaround sync-labels bug (pydata#7667) [pre-commit.ci] pre-commit autoupdate (pydata#7651) Allow all integer dtypes in `polyval` (pydata#7619) [skip-ci] dev whats-new (pydata#7660) Redo whats-new for 2023.03.0 (pydata#7659) Set copy=False when calling pd.Series (pydata#7642) Pin pandas < 2 (pydata#7650) Whats-new for release 2023.03.0 (pydata#7643) Bump pypa/gh-action-pypi-publish from 1.7.1 to 1.8.1 (pydata#7648) Use more descriptive link texts (pydata#7625) Fix missing 'dim' argument in _get_nan_block_lengths (pydata#7598) Fix `pcolormesh` with str coords (pydata#7612) [skip-ci] Fix groupby binary ops benchmarks (pydata#7603) Remove incomplete sentence in IO docs (pydata#7631) Allow indexing unindexed dimensions using dask arrays (pydata#5873) Bump pypa/gh-action-pypi-publish from 1.6.4 to 1.7.1 (pydata#7618) [pre-commit.ci] pre-commit autoupdate (pydata#7620) add a test for scatter colorbar extend (pydata#7616) ...
* upstream/main: (716 commits) Faq pull request (According to pull request pydata#7604 & issue pydata#1285 (pydata#7638) add timeouts for tests (pydata#7657) Pull Request Labeler - Undo workaround sync-labels bug (pydata#7667) [pre-commit.ci] pre-commit autoupdate (pydata#7651) Allow all integer dtypes in `polyval` (pydata#7619) [skip-ci] dev whats-new (pydata#7660) Redo whats-new for 2023.03.0 (pydata#7659) Set copy=False when calling pd.Series (pydata#7642) Pin pandas < 2 (pydata#7650) Whats-new for release 2023.03.0 (pydata#7643) Bump pypa/gh-action-pypi-publish from 1.7.1 to 1.8.1 (pydata#7648) Use more descriptive link texts (pydata#7625) Fix missing 'dim' argument in _get_nan_block_lengths (pydata#7598) Fix `pcolormesh` with str coords (pydata#7612) [skip-ci] Fix groupby binary ops benchmarks (pydata#7603) Remove incomplete sentence in IO docs (pydata#7631) Allow indexing unindexed dimensions using dask arrays (pydata#5873) Bump pypa/gh-action-pypi-publish from 1.6.4 to 1.7.1 (pydata#7618) [pre-commit.ci] pre-commit autoupdate (pydata#7620) add a test for scatter colorbar extend (pydata#7616) ...
The
macos
3.11 CI seems stall very often at the moment, which makes it hit the 6 hours mark and get cancelled. Since our tests should never run that long (the ubuntu runners usually take between 15-20 minutes), I'm introducing a pretty generous timeout of 5 minutes. By comparison, we already have a timeout of 60 seconds in the upstream-dev CI, but that's on a ubuntu runner which usually is much faster than any of the macos / windows runners.Tests that time out raise an error, which might help us with figuring out which test it is that stalls, and also if we can do anything about that.