-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Package Sync #235
Package Sync #235
Conversation
6e357d3
to
61effde
Compare
It seems that something did break: |
software environments, TEST_UPSTREAM and COILED_SOFTWARE_VERSION environment variables are now gone, COILED_RUNTIME_VERSION is the only controlling environment variable, which can take a version number, "upstream", or "latest"
61effde
to
b1c4c36
Compare
I'm happy with the state of package sync and can give a solid thumbs up to merging this! |
Just pushed an empty commit to rerun CI (we were getting AWS limit errors in the most recent CI run) |
We're getting
in this CI build cc @shughes-uk |
Having to boot my up my x86 macbook for the first time in months to solve this one, not quite sure what's going on and couldn't replicate on my arm macbook (using a slightly stripped down version of the env export from CI) |
The source seems to be a pin on pyarrow 8. With that pin the macos environment generated can't solve properly for linux64, not sure which dep, looking now |
Got it working! 🥂 |
Woo! |
897531f
to
d0ad7f1
Compare
I'm suspicious that's some kind of transient network error. I'll have to ensure they get printed out properly as It looks like it got eaten |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be all green, though I would note that there was a solid amount of mamba/conda/mysterious flakiness that makes me a bit nervous.
.github/workflows/tests.yml
Outdated
@@ -446,7 +315,7 @@ jobs: | |||
static-site: | |||
needs: process-results | |||
# Always generate the site, as this can be skipped even if an indirect dependency fails (like a test run) | |||
if: always() && github.ref == 'refs/heads/main' && github.repository == 'coiled/coiled-runtime' | |||
if: always() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed things so we always generate the static site and upload it as an artifact. So if there is a regression detected, it's easier to download the artifacts to inspect (cf #251)
I'll add some changes so I can see the flakyness serverside and hopefully push fixes fast. Yell at me in the mean time |
Thanks for the empty commit @ian-r-rose -- I think sanity checks like that are a good thing to do at this point. @shughes-uk
is currently popping up |
Interesting. That package was brand new yesterday. I think you might have had the bad luck to have tried things exactly after release of the package, when it was propagated to the CDN edge node serving CI but not the one used by the cluster. If i'm right re-running now should work |
I was not correct, looks like an issue dealing with the way the version gets parsed! |
uses |
This reverts commit 5d13ab9.
Not sure if this matters but I noticed that in the package sync list, we have two dask versions installed. Not sure if that matters/affects but mentioning it for visibility.
|
Good eye -- I think @shughes-uk pushed a fix for that yesterday afternoon, are you still seeing it? |
I'm doing a census of the failures and reported regressions for this branch during the last week. I'm deliberately removing regressions related to cluster startup time, as those are expected. Regressions,
|
My top observation of the above is that the package sync PR has many more detected regressions than the ones on
Many (but not all!) of these are parquet-related. So one thing we should look at pretty closely is whether there is something different going on in the solves for boto, pyarrow, or s3fs |
040dbfc
to
d46ff90
Compare
worker_disk_size=64, | ||
worker_vm_types=["t3.large"], | ||
scheduler_vm_types=["t3.large"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hendrikmakait in #280 you seemed okay with making the instances here a bit beefier. Is that still the case?
@pytest.mark.stability | ||
@pytest.mark.parametrize("keep_around", (True, False)) | ||
def test_spilling(spill_client, keep_around): | ||
arr = da.random.random((64, 2**27)).persist() # 64 GiB, ~2x memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hendrikmakait same question here.
Remove duplicated code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks all for working on this! I believe there's consensus that we want to move forward with the changes here. Two things to note:
- There are questions around a couple of spill-related tests which @hendrikmakait has been pinged on. @ian-r-rose said he thinks @hendrikmakait is okay with the those test changes. @hendrikmakait we're going to move forward for the time being, but let us know if you have any reservations about those changes and we can easily revert.
- This PR is starting running the
upstream
build as part of our normal test suite on PRs. This is nice because it's more automatically run / no longer an opt-in and also allows us to clean up some code. This isn't nice because it's possible we'll see more unrelated test failures on PRs due to upstream changes indask
+distributed
. We've decided to try outupstream
as it's own build on PRs and see what that feels like. If things are too noisy, we'll move back to opt-in only on PRs.
Pushing #222 a bit further to see if anything breaks. This
TEST_UPSTREAM
has been folded intoCOILED_RUNTIME_VERSION
, which can be a version number, "upstream", or "latest", andCOILED_SOFTWARE_VERSION
is no longer necessary in apackage_sync
universe.