Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use mamba under a feature flag to create conda environments #6815

Merged
merged 4 commits into from
Dec 14, 2020

Conversation

humitos
Copy link
Member

@humitos humitos commented Mar 24, 2020

mamba is a a fast drop-in replacement for the conda command-line utility, in C++. See https://github.com/QuantStack/mamba

I'm adding a feature flag so we can test it out in selective projects that are failing over and over again because of OOM when solving dependencies, even if they have just one, but they are adding conda-forge as channel in their environment file.

This is another attempt trying to make conda environment more stable. I'm not sold on this solution, but the tests I did where successful and time was reduced in half (conda env create compared to mamba env create). Peak memory was 230Mb with mamba and 955Mb with conda-env

The changes add a new step (install mamba) that requires using conda-forge to install it, which takes some extra seconds. We could install it inside the docker image directly (after installing conda) if we found that mamba helps us with conda environment considerably.

Here is a good explanation about all these memory/cpu intense problems when using conda: https://www.anaconda.com/understanding-and-improving-condas-performance/

There is a MiniMamba version as well that we can install, https://quantstack.net/mamba.html

@humitos humitos requested a review from a team March 24, 2020 15:54
@humitos humitos force-pushed the humitos/use-mamba-for-conda-environment branch from e1592ec to 1a7d11b Compare March 24, 2020 19:35
@humitos
Copy link
Member Author

humitos commented Mar 24, 2020

Peak memory was 230Mb with mamba and 955Mb with conda-env

(that was with a small project installing just one package from conda-forge)

I did a local test using geopandas (https://readthedocs.org/projects/geopandas/) that have a bigger environment.yml file:

  • mamba: 470 Mb (and 1Gb to run conda install mamba step)
  • conda: >5000 Mb (I wasn't even able to do a build for this project locally)

`mamba` is a a fast drop-in replacement for the conda command-line
utility, in C++.

I'm adding a feature flag so we can test it out in selective projects
that are failing over and over again because of OOM when solving
dependencies, even if they have just one, but they are adding
conda-forge as channel.
@humitos humitos force-pushed the humitos/use-mamba-for-conda-environment branch from 1a7d11b to d34b8d0 Compare March 24, 2020 20:10
@humitos humitos added the Needed: design decision A core team decision is required label Apr 6, 2020
@humitos
Copy link
Member Author

humitos commented Apr 21, 2020

IMHO, this is a good PR and it could help us to reduce resources in our builders when building with conda. Although, we have migrated our builders to bigger servers and builds are not failing anymore. It's more like a nice to have currently.

We can come back to this if we start having performance issues with conda again that make our builds to fail.

@humitos humitos closed this Apr 21, 2020
@wolfv
Copy link

wolfv commented May 7, 2020

we're working towards a micromamba which will not be installed with conda (pure C++). So I think this has the potential to reduce the required memory and CPU dramatically.

@humitos
Copy link
Member Author

humitos commented Dec 3, 2020

I talked to Eric today to raise this topic again due to the acceptance that mamba has gain and because users started experimenting memory issues with our biggest servers (see #7718).

I'm reopening this PR to re-visit soon and give it another test pass (I remember that it was working good but just in case) and see if we can deploy this. The rollout plan would be something like:

  • deploy it under a feature flag
  • install micromamba in the docker image and stop doing conda install mamba
    • think about having -conda and -mamba docker images if needed
  • add a config to use mamba
  • remove feature flag

@humitos humitos reopened this Dec 3, 2020
'--yes',
'--quiet',
'--name=base',
'--channel=conda-forge',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SylvainCorlay! is it possible to install mamba from a different, and smaller, channel than conda-forge here?

I can't use micromamba at this point for "reasons" and I would like to installing it with conda but ideally using a channel that only contains mamba, so it does not make conda to consume too many resources.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you consider using a miniforge flavor including mamba instead of miniconda.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from the peanut gallery: regro/conda-metachannel#31 (comment) conda-metachannel service is down, unfortunately.

Perhaps if there was a mamba channel, it would help the bootstrapping problem. conda install mamba -c mamba would be blazing fast, and from there mamba & conda-forge could be used as usual.

About mamba in miniforge... from what I read in conda-forge/miniforge#23, looks like there is no consensus (the path of least resistance seems to be creating a miniforge-mamba or a microforge with micromamba?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't use anything different than regular conda (like micromamba, miniforge, etc) at this moment because I can't modify the Docker image we are currently using (I will be able to do this in the future, but I don't know exactly when yet).

So, given this current restriction, I was asking if something like @astrojuanlu mentioned already existed (conda install mamba -c mamba) because calling conda install mamba -c conda-forge currently has the same problem of consuming a lot of resources just because it uses conda-forge channel that contains millions of packages. Although the problem is there, it's not a blocker to start testing mamba here, but having a better workaround for this would be good.

Comment from the peanut gallery: regro/conda-metachannel#31 (comment) conda-metachannel service is down, unfortunately.

This was a good idea when @astrojuanlu commented it to me. However, if it's currently down it doesn't seem to be something we can rely on by default 😞

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal here is to avoid this random problem with the available tools and restrictions:

$ conda install --yes --quiet --name=base --channel=conda-forge mamba
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Solving environment: ...working... failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): ...working... Killed


Command killed due to excessive memory consumption 

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @humitos miniforge now includes a "mambaforge" installer which has mamba pre-installed.

https://github.com/conda-forge/miniforge

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am looking into what we could do to allow conda install mamba from a raw miniconda to be faster.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@humitos
One way to do this without having to maintain another channel with a copy of mamba and its dependencies is to use a conda lock file.

https://pypi.org/project/conda-lock/

You could generate a conda lock file offline, and use it in your script so that no solving is required.

We don't have `mamba` at this point, so we need to force using `conda`.

When the have `micromamba` installed in the Docker image, we will need to update
`mamba` here instead.
@humitos humitos removed the Needed: design decision A core team decision is required label Dec 10, 2020
@humitos humitos requested a review from stsewd December 10, 2020 10:10
@@ -27,6 +27,18 @@ In case you prefer to use the latest ``conda`` version available, this is the fl
Makes Read the Docs to install all the requirements at once on ``conda create`` step.
This helps users to pin dependencies on conda and to improve build time.

``CONDA_USES_MAMBA``: :featureflags:`CONDA_USES_MAMBA`

``conda`` solver consumes 1Gb minimum when installing any package using ``conda-forge`` channel.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean resolver

Suggested change
``conda`` solver consumes 1Gb minimum when installing any package using ``conda-forge`` channel.
Conda's resolver consumes 1Gb minimum when installing any package using ``conda-forge`` channel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well... I think it's the same, but they call it "Solver" :)

https://docs.conda.io/projects/conda/en/latest/api/solver.html#conda.core.solve.Solver

readthedocs/doc_builder/python_environments.py Outdated Show resolved Hide resolved
readthedocs/doc_builder/python_environments.py Outdated Show resolved Hide resolved
Co-authored-by: Santos Gallegos <santos_g@outlook.com>
@humitos humitos merged commit b326a21 into master Dec 14, 2020
@humitos humitos deleted the humitos/use-mamba-for-conda-environment branch December 14, 2020 17:41
@wolfv
Copy link

wolfv commented Dec 14, 2020

Awesome! 🎉

@humitos
Copy link
Member Author

humitos commented Dec 14, 2020

We are deploying this tomorrow. Please, contact us at email support if you want to enable this feature on your projects and give us feedback about how it works for your cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants