-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Idea: pixi env
"conda workflow"
#1610
Comments
One thought: |
My initial thought is to not reinvent the wheel. From my perspective the useful part would be pixi create-external-env -p destination-prefix -e pixi-environment-name and then let the user use |
@maresb from the single command I'm not getting enough information to get the gist of your proposal. We're not looking to reinvent the wheel here, it's more of adding another wheel to the oiled machine that is It sounds that what you are looking for might be |
EDIT2: I don't like what I wrote, and I think @synapticarbors is saying something very similar and articulates it much better, completely and simply. Previous commentSorry that I didn't communicate clearly, and thanks @ruben-arts for pushing me to articulate this better. Here is my opinionated thought process for whatever it's worth. As TLDR, I think I'm proposing something similar but distinct from #1427. Feel free to mark my comments as off-topic if appropriate to keep this discussion focused, and if there's something to discuss I could open a new issue.
I know this proposal is preliminary, but the implication here seems to me that you're thinking about some sort of partial reimplementation, which seems perilous, and raises alarms from my admittedly-often-very-wrong software design senses.
I'm wondering why users might want this. Maybe users don't have the time to learn a new interface and transition from I finally found the time, and now I'm sold. But it still feels like something's missing, and so perhaps it's sufficient to just address that?...
I totally agree, and this vision was a major selling point for me. But environments I create in pixi are bound to a particular project, and often it's nice to have some general-purpose user-scoped environments. I'm suggesting: instead of adding some huge amount of functionality to the Thus I'm pondering a situation where we could have pixi-managed environments that I could (Honestly I arrived at this use case from the motivation of using I just realized that an alternative hacky way of mostly accomplishing my stated goal is to point mkdir ~/repos/user-environments
cd ~/repos/user-environments
pixi init
cat <<EOF >> pixi.toml
[feature.forge.dependencies]
conda-smithy = "*"
conda-build = "*"
boa = "*" # still behind the times here
[environments]
forge = ["forge"]
EOF
pixi install -e forge
export MAMBA_ROOT_PREFIX=~/repos/user-environments/.pixi
micromamba activate forge In summary, I'm thinking it would be nice to have a non-hacky way to install pixi environments (not just create an environment spec as per #1427) to a particular prefix (perhaps EDIT: I coincidentally ran across #188 which seems to overlap a lot with what I'm saying here. |
As a quick minimal solution, could we expose the configuration option |
@ruben-arts -- Thank you for introducing this. I'm definitely one of those users that would benefit from an alternative workflow. In thinking about what you've proposed, you said this is not going to be "a moved pixi project". I think this may result in a missed opportunity and result in something more limited than "pixi with a conda workflow". What I would want as a user of both pixi and conda is something that is just pixi with a global project/env registry, so the workflow would look something like:
If My ideal here would be to have a very lightweight |
@synapticarbors Thanks for you comment! It sounds like you would just like to add a Example of that map:
Then adding a
Resulting in the mapping file including:
allowing for
What I'm affraid of, is that pixi projects are designed with much more features than a conda environment. Think tasks, multi-environment setups, system-requirements. All these features don't fit into the conda workflow and might spook or influence the expected behavior. That all said, this could be a first step and possibly enough for most users. |
I would also like this.
I don't really think that's an issue -- the point of this is to support workflows that conda supported and pixi hasn't really supported while bringing pixi's unique features to those kinds of workflows. Anyone who wants extremely conda-like behavior can just use conda, mamba, or micromamba, or a new minimalistic rattler client could be built for them. |
@ruben-arts I'm sure you've talked to more people about this, but just based on various discussions here and on discord, I think what most people want when they say a "conda-like workflow", is just pixi with an optional global registry of environments like conda. What you were suggesting with: # create project in global registry
$ pixi init my_env -g
# activate env in global registry (with optional -e flag)
$ pixi shell -n my_env would satisfy this as long it things worked as expected if you were to use further |
I think another way would be to allow simply naming the manifest files. For example, when I run |
While I understand the desire to have a familiar workflow, I think it is an inherently broken one that shouldn't be supported. Having environments that can be enabled anywhere encourages re-using these environments between unrelated projects, which will inevitably lead to issues where changes done for one project will break another. This in turn means that there will be no useful documented computational environment for either project, breaking one of the main advantages of pixi IMO. To avoid this issue you need one environment per project, at which point you can just use normal pixi projects. What I do see some value in is having a singular global environment that is always available, to install packages in that are always needed and not tied to a single project (e.g. git, git-annex and DataLad in my case). This already exists with |
It's quite often that you would have multiple environments that are usable for a project or multiple related projects in different directories. Also, you might be working on a project and one of its dependencies at the same time, in which case it'd be useful to share an environment between them. There are many more examples of usecases where you would want a project to have the same environment. Lastly, small "playgrounds" where you just need to write some code and test it against some dependencies, to understand how the dependencies work, are also useful, and it's not useful to create a new environment just to test a single dependency, especially if you're planning to use the information about your dependencies' behavior or API in an existing project. |
I'll share my thoughts as someone who has been using Conda for ~6 years in computational science (specifically, bioinformatics). It seems to me that the project/directory-centric workflow works great for software development, where I agree that reusing the same environment across multiple projects is beneficial. However, in bioinformatics, we typically don't work that way (though I'm using "we" loosely here, as this is based on personal experience rather than broad consensus). We often create environments to install the software required to run specific pipelines and to avoid conflicts between different pipelines or even different steps within the same pipeline. Forcing users to always run pipeline X in the same directory might be too restrictive since pipeline X could be executed on very different datasets across distinct scientific projects. I also wonder how something like Snakemake would work within Pixi's current paradigm, considering it creates Conda environments on demand for different steps of a workflow, all within a single directory as the workflow runs (or at least that's my understanding of it). It would be great to hear from others in the BioConda community, as they might have different perspectives (and might even disagree with me). |
Pixi supports multiple environments per project already.
Again, I think that this is never a good idea because you will never be able to name the exact environment you have used in your project without making it exclusive to the project. All cases that I can think of in which someone might want this fall into one of two categories:
I don't think so. If one project depends on the other in the sense of it needs the other installed as a software, then it should just be declared as a dependency (Granted, I don't know if it is possible with pixi to install a dependency from another local project. Probably only as PyPI dependencies, currently.). If the other project is otherwise free-standing, then I would want it to have its own environment. If both depend on each other then they might as well be the same project anyway.
That's what I have a
If you plan to use this testing in an existing project, then just do it in that project. This is what branches are for, no need to set up a completely separate test project/environment.
How do you tie it all together when your pipeline has produced a result and you want to reference it e.g. in a publication? You would have to reference the exact version of the pipeline, the pipelines environment(s), and the data. In my opinion this is only really feasible when bringing all three together into one project, e.g. with DataLad to store pipeline and data, and pixi (or nix, or guix, or another similar tool) to describe the environment. Users shouldn't be forced to always run pipeline X in the same directory. They need to create a new project for their intended data processing, install the pipeline and the necessary dependencies into it, and then run the pipeline on their data. I would see some value in templating a pixi project with a common set of dependencies for this purpose, but in the end they must be distinct projects so that updating pipeline X and its environment does not break the provenance of the results processed from some data using a previous version of that pipeline. This is just my opinion though, from the somewhat idealistic viewpoint of making (scientific) results traceable to their origin and giving them enough accompanying metadata to make it even remotely feasible to understand how they were conceived and what issues there might have been (or not) in the process. |
@matrss -- While I appreciate that you have a clear idea of what a scientific workflow should look like, it's mildly off putting to say that other workflows are broken or wrong. I think the goal here is to make the transition to pixi easier for the large community of conda users who have a particular workflow that differs from what pixi currently offers. I'm guessing a lot of people try pixi, find it doesn't work with their workflow and then return to just using conda/mamba. It's better to give them an option, and then when they stick around, maybe they will adopt the project-based workflow. |
I didn't say that other workflows are wrong, I said that I think the workflow induced by conda/mamba environments is fundamentally flawed and shouldn't be encouraged by supporting it, and I also explained why I think that way. Yes, sure, it might make the transition to pixi a bit simpler for some, but is that worth diluting the major benefit of using pixi over conda/mamba, instead of communicating that benefit and encouraging taking advantage of it? Yes, giving an option might get people to use pixi and adopt a project-based workflow later on, but it also makes it simpler to not re-consider a flawed workflow and stick with the familiar, instead of adopting something much better. |
There are still many reasons why one would not want to use the workflow that you claim is "much better". Meanwhile, |
If you use Snakemake, for instance, the environment files for each step in the pipeline will be there. Even if you don't, it's not that hard to generate files describing your exact environment. I don't disagree with you that |
Hi all, thanks for all your feedback. This is a great discussion but I would like to make it a little more concrete. Could you share your use-case in a user story format. e.g. “As a [persona], I [want to], [so that].” This would help us greatly, thanks in advance. |
Maybe, but I also didn't say that pixi's project-based workflow is always better than everything else. I said that every instance in which someone might think that they want to use conda-style environments, they would be better served by using pixi projects. So, there might be reasons why one would not want to use what I called "much better", but I don't see any situation in which conda-style environments would then be a better pick than pixi's project-based workflow. If you can name a situation that doesn't fit into one of the two categories I mentioned then please do so, I genuinely can't think of any.
I am not sure how you would want to take advantage of pixi build without associating the build environment to the thing to be build.
True enough, yes.
Yes, but IMO
Fair.
Implementing conda-style environments in pixi will necessarily weaken this point, since the CLI to interact with them must be similar to conda's CLI, otherwise you wouldn't get the benefit of familiarity.
OK so I don't have much experience with snakemake, but my understanding of snakemake is that the environment files are part of a snakemake pipeline definition, which resides in a directory. So they are part of the project that defines the pipeline, which in turn will be referenced in a specific version to apply the pipeline to something (either by directly running it in that project, or including it in a different project's snakemake workflow). I don't see why the snakemake pipeline couldn't use a pixi project to define these environments (utilizing pixi's multi environment feature), other than that snakemake does not yet support pixi in that way. But it would need to be extended to support pixi's hypothetical conda-style environments as well, so that is not a downside, just some necessary integration work. What it does with environment files is already just a more brittle version of using a pixi project to define these environments. IMO it is not necessary to weaken pixi's benefits to support this use case, it can be implemented with what pixi has already.
That's the thing, I don't think it can have benefits in any context and instead think that conda-like environments are an anti-pattern that encourage bad practices, especially in scientific research. I would be happy to be proven wrong though.
As someone involved in research data management and wanting to make research more traceable and reproducible (improving mainly the last two aspects of the "FAIR" principles), I want to encourage people to publish artifacts with their research that fully encapsulate the process that they followed from raw data or even data acquisition to the results they wrote about, so that anyone is able to go from e.g. a plot in a paper back through the exact processes used to generate it (this includes the entire computational environment, all pipelines, etc.) and arrive at the data from which one started. I see pixi's currently implemented project-based workflow as one potential piece of this puzzle, to clearly define the computational environment(s) used. Conda-style environments are fundamentally not compatible with this vision, unless they are used in a way that simply emulates a project-based workflow. I have been recommending pixi over conda/mamba to colleagues for this reason and if pixi implemented support for conda-style environments this would make it harder for me, since I would always have to add the disclaimer "but don't use these conda-style environments, they will inevitably break reproducibility if you don't limit yourself to a single project per environment, at which point you can just use the easier to work with project-based workflow". |
I think this exposes a fundamental schism between software engineers and scientific development which IME is exploratory, REPL-driven development, NOT project based development. It is simultaneously true that:
As a quant analyst who has moved more into development I tend to use more (pixi) project based development in an IDE such as PyCharm or VSCode, but when I want to test things out I use Jupyter Notebooks in my kitchen-sink environment. There are different stages of development and when you just want to test some Python code out against your latest models you don't want the overhead associated with creating an arbitrary "project" just to run some code. If I want to run arbitrary code against arbitrary dependencies I have a I only create "projects" when I want to actually develop/deploy something and make it usable by others. This is a different phase of development from the exploratory analysis phase. It's not a phase most scientists are familiar with so it's important to make it easy to transition to a more robust/reproducible project based workflow, and I think I'm (one of) the biggest |
I've been using conda for 10 years and like it. Thank you all for making the conda package ecosystem better and better and now creating pixi! I would love to adopt pixi as a drop-in replacement and just use pixi in the next years instead of a pixi / conda / pip combination. Similar to what some others said here, I do have the use case to have personal "dev" and "play" conda envs and I also sometimes use a conda "base" env just for generic Python scripts or one-offs where I don't create a project and manifest files. So I'm +1 to extend pixi to allow creating / managing / deleting "global envs" just like conda does. Why not keep using conda? Well I will for now. But pixi would give me (a) better speed and (b) easier bootstrapping and (c) consistency with the pixi workflow and specs I'm happy to adopt for projects where I an convince collaborators - usually long-time conda users like me. Made one concrete related suggestion here: #1877 |
As one of the developers of PyMC, we encourage users to set up Conda environments to ensure that beginners can easily set up the required C compilers and BLAS. We end up dealing with a large number of beginners asking for help with broken Conda environments. I would like to add pixi to our installation instructions since I believe it's a major step up from other tools and would be much simpler for everyone in the long-term. (I believe that pixi's isolated project-based config would fix most causes of broken environments.) However, many of these beginners are copy-pasting instructions without understanding what a Conda environment actually is. Some of these people will encounter problems and then run any command they can find in an attempt to "just fix it" without understanding what's broken. While In order to add pixi to the PyMC installation instructions I must convince skeptical comaintainers that the support and maintenance efforts are worthwhile. If pixi adds a Thanks everyone for the great discussion. I appreciate the focus on the user story approach. |
After going over this issue with the core team we've got a specific proposal based on the comment by @synapticarbors Create new project and register it immediately
Register an existing project with an optional name that you could give the project otherwise default to the name in the manifest.
This would add the following field to your configuration. registered-projects = { project_a = "/path/to/project_a", project_b = "/path/to/project_b", optional_name = "/path/to/project_b" } Then you can run the following commands
Basically adding the I hope this fits the most use-cases while staying true to the pixi workflow. |
It sounds great to me & I'm keen to test it out! |
I am not sure whether this question belongs to the repo, but I would like to ask how this feature would interact with pixi jupyter kernel? |
Awesome! Thank you to the pixi team for taking the time to consider this. |
I just write my own kernelspecs. for {
"argv": [
"pixi", "run", "-n", "my-env",
"python", "-m", "ipykernel_launcher", "-f", "{connection_file}"
],
"display_name": "Python (my-env)",
"language": "python",
"env" : {
"A_VARIABLE": "a_value"
},
"metadata": {
"debugger": true
}
} |
Couldn't this issue be resolved by adding a |
@ruben-arts's proposal (#1610 (comment)) seems a lot cleaner -- you just make environments like in a normal pixi project, and then you give them names which you can use from elsewhere, rather than using |
Here's my take on it: activate() {
env="${1:-dev310}"
history -a
pixi shell --manifest-path "/opt/python/envs/${env}/pixi.toml"
} |
But the same is true for |
EDIT: This initial proposal is outdated because its already clear this is not what the community wants. Please follow the thread as there is no clear conclusion yet!.
One of the most asked features I hear constantly is "conda drop in replacement". Note that this would not be a move against conda but a request I would like to fill in for the users.
This issue describes a feature that gives a more conda-like feel to the pixi environments.
Workflow example
Create an environment
Activation
Add packages
Remove packages
In the background
.pixi/envs
pixi.toml
file to the.pixi/envs/test-env/conda-meta/
pixi.lock
file after solve to the.pixi/envs/test-env/conda-meta/
What it should be
conda
is replaced withpixi env
uv
like pixi projects do, instead of callingpip
for thepypi
dependencies.What it is not going to be
conda
s CLIconfig.toml
and not conda's.condarc
. Mixing those two configuration files is going to be a mess. We're much more likely to add a converter between these files then to support both in one tool.tasks
,features
,system-requirements
, etc.Ps. this is a start of the design, tracking the interest and a place to discuss design. After the design is finalized we can start building and other contributors are free to help.
The text was updated successfully, but these errors were encountered: