Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use package sync #222

Closed
wants to merge 1 commit into from
Closed

Use package sync #222

wants to merge 1 commit into from

Conversation

shughes-uk
Copy link
Contributor

@shughes-uk shughes-uk commented Aug 2, 2022

Lets see if this works!

Note: I'm mostly seeing if this works for CI. Although it might be nice to use it?

@ian-r-rose
Copy link
Contributor

Thanks @shughes-uk! This, I think, will result in an enormous simplification to the CI setup here. Does package_sync work for things installed via git+https://...? If so, I think that will allow us to delete a couple hundred lines of yaml.

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it would be great if we could only have to worry about what's installed locally on the CI machine and let automatic package syncing take care of the rest 👍

@jrbourbeau
Copy link
Member

In this CI build we're seeing the following environment-related error:

E               coiled.errors.ClusterCreationError: Cluster status is error (reason: Scheduler Stopped -> Errors installing conda packages:
E               package pywin32-303-py37h89c1867_0 requires python_abi 3.7.* *_cp37m, but none of the providers can be installed) (cluster_id: 45747)

Though I'm confused why Python 3.7 is involved at all as this is supposed to be running on Python 3.9

@shughes-uk
Copy link
Contributor Author

Haven't tested this at all on windows, what I think is happening is it's trying to install a windows specific package (pywin32), but only an extremely stale python 3.7 based build exists for linux which mamba is trying to use. The fact it exists at all for linux is somewhat surprising (who is building pywin32 for linux??)

@shughes-uk
Copy link
Contributor Author

Apart from the windows errors it looks like all the rest are related to not being robust for 502's in a couple of places, which I'll push a fix for.

It seems like y'all would be very keen for this to work, are proper git dependencies a blocker for that?

@jrbourbeau
Copy link
Member

I believe so. We need to be able to install the current main branch of dask and distributed from GitHub

@ian-r-rose
Copy link
Contributor

It seems like y'all would be very keen for this to work, are proper git dependencies a blocker for that?

From my perspective, git dependencies will be very nice to have for local A/B testing, but package_sync already useful as it is today. So, as far as prioritization goes, I might say medium?

@jrbourbeau
Copy link
Member

Wait, actually we're using nightly dask and distributed releases on conda. So installing directly from GitHub isn't a blocker. @shughes-uk does package_sync handle detecting different conda channels?

@shughes-uk
Copy link
Contributor Author

shughes-uk commented Aug 3, 2022

Wait, actually we're using nightly dask and distributed releases on conda. So installing directly from GitHub isn't a blocker. @shughes-uk does package_sync handle detecting different conda channels?

Yes, as long as it's not a private conda channel It should work. Private one's may even work but I have not tested them at all.

@jrbourbeau
Copy link
Member

Okay, good to know. We're only installed from public channels

@shughes-uk
Copy link
Contributor Author

Working on those macos/windows issues. I have a fix in a MR that should hopefully make it to prod tomorrow

@shughes-uk
Copy link
Contributor Author

shughes-uk commented Aug 4, 2022

Windows/macos is passing! The cleanup steps fail because I commented out the senv creation step.

@ian-r-rose
Copy link
Contributor

Windows/macos is passing! The cleanup steps fail because I commented out the senv creation step.

Woo! Nothing gives me so much joy as deleting yaml!

@shughes-uk shughes-uk force-pushed the shughes/package-sync-as-default branch from 7166e1a to 344837f Compare August 4, 2022 22:54
@shughes-uk
Copy link
Contributor Author

Woo! Nothing gives me so much joy as deleting yaml!

This has been a great test of package sync, now it's working on linux, macos and windows! Any thoughts? Should this be merged?

@ian-r-rose
Copy link
Contributor

Any thoughts? Should this be merged?

I think @jrbourbeau and I can take it over, if you don't mind -- we'll probably want to update some of the pytest fixtures here that do some reasoning about the software environment version, in addition to deleting most of the setup/cleanup work.

@gjoseph92
Copy link
Contributor

I just used this branch locally to run a test #212 (comment) because I absolutely could not be bothered to deal with software environments ever again and it worked fine. Some rather silly mismatches, but no problems AFAICT.

The mismatches are concerning, because it's the sort of thing that'll work until it doesn't. Not being able to be confident that it's deterministic or reproducible will always make me hesitant. But it's not like the senvs currently built in this repo are either, and this is so so so much better to work with.

@ian-r-rose
Copy link
Contributor

(But yes, this should be merged)

@shughes-uk
Copy link
Contributor Author

shughes-uk commented Aug 4, 2022

I just used this branch locally to run a test #212 (comment) because I absolutely could not be bothered to deal with software environments ever again and it worked fine. Some rather silly mismatches, but no problems AFAICT.

The mismatches are concerning, because it's the sort of thing that'll work until it doesn't. Not being able to be confident that it's deterministic or reproducible will always make me hesitant. But it's not like the senvs currently built in this repo are either, and this is so so so much better to work with.

The mismatches that are "missing" packages are mostly cosmetic and due to a couple of bugs I smoothed out in staging (prod tomorrow hopefully). The actual version mismatches, are there any packages you'd consider 'critical' and should never mismatch? So far the list is dask, distributed , msgpack, pandas

I'm also going to add a strict mode which means it would only ever replicate an environment exactly (this would fail for everything except a linux machine creating a cluster)

@shughes-uk
Copy link
Contributor Author

Any thoughts? Should this be merged?

I think @jrbourbeau and I can take it over, if you don't mind -- we'll probably want to update some of the pytest fixtures here that do some reasoning about the software environment version, in addition to deleting most of the setup/cleanup work.

Happy to let you have at it!

@gjoseph92
Copy link
Contributor

are there any packages you'd consider 'critical' and should never mismatch?

msgpack had a patch version mismatch in https://cloud.coiled.io/dask-engineering/clusters/46264/details. cloudpickle, pyarrow, etc. obviously belong on that list. But I also really don't want us to have to maintain a list of "critical packages".

@shughes-uk
Copy link
Contributor Author

are there any packages you'd consider 'critical' and should never mismatch?

msgpack had a patch version mismatch in https://cloud.coiled.io/dask-engineering/clusters/46264/details. cloudpickle, pyarrow, etc. obviously belong on that list. But I also really don't want us to have to maintain a list of "critical packages".

Ah nice catch, upgraded msgpack to critical on prod. I'm not sure there's any way around it sadly.

@gjoseph92
Copy link
Contributor

@ian-r-rose @jrbourbeau could we merge this? I'm adding some more benchmarks and would love to not have to deal with software environments.

@ian-r-rose
Copy link
Contributor

I almost certainly need to update some of the benchmarking-related pytest fixtures before this can be merged, I haven't had the chance to do that yet

@ian-r-rose
Copy link
Contributor

ian-r-rose commented Aug 9, 2022

Specifically, these functions (not fixtures themselves, but used in fixtures): https://github.com/coiled/coiled-runtime/blob/0a3eab5e476b9432b0b24d4317af2522bbcb0319/conftest.py#L66-L100

@jrbourbeau
Copy link
Member

@gjoseph92, @ian-r-rose @shughes-uk and I spoke offline last Friday about this PR. We're all excited about the prospect of package_sync (as you mentioned, it will makes our lives better) but didn't quite have enough signal that package_sync was 100% the direction coiled is moving in and would be around for a while. We want to avoid ripping out a bunch of software environment code and then having to put it back if package_sync goes away. Though, to be clear, if package_sync is here to stay then I would love to rip out the software environment code.

@shughes-uk we were planning to check back in on this topic this Friday after package_sync had gone through more testing. Are we still in the same place as last Friday, or do we have more information related to package_sync being the way of the future?

@ian-r-rose ian-r-rose mentioned this pull request Aug 9, 2022
@gjoseph92
Copy link
Contributor

didn't quite have enough signal that package_sync was 100% the direction coiled is moving in and would be around for a while. We want to avoid ripping out a bunch of software environment code and then having to put it back

Though I understand that the package_sync interface and implementation may change, I would be very disappointed to hear that having no form of environment synchronization is something we're actually considering. Whether it happens through package_sync, lockfiles, or some other mechanism we haven't come up with yet, I would very much hope that we agree that however we do it, managing software environments as explicit artifacts will be a thing of the past, or at least that there will be an option besides it. So in that case, dropping the software environment code seems like a safe bet, even if we eventually have to change the spelling of package_sync to something else.

@jrbourbeau
Copy link
Member

Certainly understand your concern @gjoseph92

even if we eventually have to change the spelling of package_sync to something else.

I'm totally fine with changing the spelling of package_sync or some other small interface changes. I'm just wanting more clarity on whether or not environment syncing (regardless of the keyword we choose) is in fact something we generally intend to support. I know folks are excited about it (myself included) but I've yet to hear a definitive answer. My understanding is we're still gathering more data / testing package_sync at the moment. If that understanding is outdated, I'd love to be updated to whatever the current state is.

FWIW it looks like @ian-r-rose is pushing forward over in #235 so we can see what impact switching to package_sync would have on the runtime

@shughes-uk
Copy link
Contributor Author

@gjoseph92 don't sweat it, getting there slowly. I know the current system is painful but hold on just a little longer

@jrbourbeau
Copy link
Member

@shughes-uk mind if we close this in favor of #235?

@shughes-uk
Copy link
Contributor Author

@shughes-uk mind if we close this in favor of #235?

all yours!

@jrbourbeau jrbourbeau closed this Aug 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants