Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use mamba under a feature flag to create conda environments #6815

Merged
merged 4 commits into from
Dec 14, 2020
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions docs/guides/feature-flags.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,18 @@ In case you prefer to use the latest ``conda`` version available, this is the fl
Makes Read the Docs to install all the requirements at once on ``conda create`` step.
This helps users to pin dependencies on conda and to improve build time.

``CONDA_USES_MAMBA``: :featureflags:`CONDA_USES_MAMBA`

``conda`` solver consumes 1Gb minimum when installing any package using ``conda-forge`` channel.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean resolver

Suggested change
``conda`` solver consumes 1Gb minimum when installing any package using ``conda-forge`` channel.
Conda's resolver consumes 1Gb minimum when installing any package using ``conda-forge`` channel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well... I think it's the same, but they call it "Solver" :)

https://docs.conda.io/projects/conda/en/latest/api/solver.html#conda.core.solve.Solver

This seems to be `a known issue`_ due conda forge has so many packages on it, among others.
Using this feature flag allows you to use mamba_ instead of ``conda`` to create the environment
and install the dependencies.
``mamba`` is a drop-in replacement for conda that it's much faster and also
reduces considerably the amount of memory required to solve the dependencies.

.. _mamba: https://quantstack.net/mamba.html
.. _a known issue: https://www.anaconda.com/understanding-and-improving-condas-performance/

``DONT_OVERWRITE_SPHINX_CONTEXT``: :featureflags:`DONT_OVERWRITE_SPHINX_CONTEXT`

``DONT_SHALLOW_CLONE``: :featureflags:`DONT_SHALLOW_CLONE`
Expand Down
43 changes: 41 additions & 2 deletions readthedocs/doc_builder/python_environments.py
Original file line number Diff line number Diff line change
Expand Up @@ -493,6 +493,23 @@ class Conda(PythonEnvironment):
def venv_path(self):
return os.path.join(self.project.doc_path, 'conda', self.version.slug)

def conda_bin_name(self):
"""
Decide whether use ``mamba`` or ``conda`` to create the environment.

Return ``mamba`` if the project has ``CONDA_USES_MAMBA`` feature and
``conda`` otherwise. This will be the executable name to be used when
creating the conda environment.

``mamba`` is really fast to solve dependencies and download channel
metadata on startup.

See https://github.com/QuantStack/mamba
"""
if self.project.has_feature(Feature.CONDA_USES_MAMBA):
return 'mamba'
return 'conda'

def _update_conda_startup(self):
"""
Update ``conda`` before use it for the first time.
Expand All @@ -501,7 +518,10 @@ def _update_conda_startup(self):
independently the version of Miniconda that it has installed.
"""
self.build_env.run(
# TODO: use ``self.conda_bin_name()`` once ``mama`` is installed in
humitos marked this conversation as resolved.
Show resolved Hide resolved
# the Docker image
'conda',

humitos marked this conversation as resolved.
Show resolved Hide resolved
'update',
'--yes',
'--quiet',
Expand All @@ -511,6 +531,18 @@ def _update_conda_startup(self):
cwd=self.checkout_path,
)

def _install_mamba(self):
self.build_env.run(
'conda',
'install',
'--yes',
'--quiet',
'--name=base',
'--channel=conda-forge',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SylvainCorlay! is it possible to install mamba from a different, and smaller, channel than conda-forge here?

I can't use micromamba at this point for "reasons" and I would like to installing it with conda but ideally using a channel that only contains mamba, so it does not make conda to consume too many resources.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you consider using a miniforge flavor including mamba instead of miniconda.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from the peanut gallery: regro/conda-metachannel#31 (comment) conda-metachannel service is down, unfortunately.

Perhaps if there was a mamba channel, it would help the bootstrapping problem. conda install mamba -c mamba would be blazing fast, and from there mamba & conda-forge could be used as usual.

About mamba in miniforge... from what I read in conda-forge/miniforge#23, looks like there is no consensus (the path of least resistance seems to be creating a miniforge-mamba or a microforge with micromamba?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't use anything different than regular conda (like micromamba, miniforge, etc) at this moment because I can't modify the Docker image we are currently using (I will be able to do this in the future, but I don't know exactly when yet).

So, given this current restriction, I was asking if something like @astrojuanlu mentioned already existed (conda install mamba -c mamba) because calling conda install mamba -c conda-forge currently has the same problem of consuming a lot of resources just because it uses conda-forge channel that contains millions of packages. Although the problem is there, it's not a blocker to start testing mamba here, but having a better workaround for this would be good.

Comment from the peanut gallery: regro/conda-metachannel#31 (comment) conda-metachannel service is down, unfortunately.

This was a good idea when @astrojuanlu commented it to me. However, if it's currently down it doesn't seem to be something we can rely on by default 😞

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal here is to avoid this random problem with the available tools and restrictions:

$ conda install --yes --quiet --name=base --channel=conda-forge mamba
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Solving environment: ...working... failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): ...working... Killed


Command killed due to excessive memory consumption 

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @humitos miniforge now includes a "mambaforge" installer which has mamba pre-installed.

https://github.com/conda-forge/miniforge

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am looking into what we could do to allow conda install mamba from a raw miniconda to be faster.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@humitos
One way to do this without having to maintain another channel with a copy of mamba and its dependencies is to use a conda lock file.

https://pypi.org/project/conda-lock/

You could generate a conda lock file offline, and use it in your script so that no solving is required.

'mamba',
cwd=self.checkout_path,
)

def setup_base(self):
conda_env_path = os.path.join(self.project.doc_path, 'conda')
version_path = os.path.join(conda_env_path, self.version.slug)
Expand All @@ -534,8 +566,12 @@ def setup_base(self):
self._append_core_requirements()
self._show_environment_yaml()

# TODO: remove it when ``mamba`` is installed in the Docker image
if self.project.has_feature(Feature.CONDA_USES_MAMBA):
self._install_mamba()

self.build_env.run(
'conda',
self.conda_bin_name(),
'env',
'create',
'--quiet',
Expand Down Expand Up @@ -621,6 +657,9 @@ def _get_core_requirements(self):
'pillow',
]

if self.project.has_feature(Feature.CONDA_USES_MAMBA):
conda_requirements.append('pip')

# Install pip-only things.
pip_requirements = [
'recommonmark',
Expand Down Expand Up @@ -648,7 +687,7 @@ def install_core_requirements(self):
# Install requirements via ``conda install`` command if they were
# not appended to the ``environment.yml`` file.
cmd = [
'conda',
self.conda_bin_name(),
'install',
'--yes',
'--quiet',
Expand Down
5 changes: 5 additions & 0 deletions readthedocs/projects/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -1579,6 +1579,7 @@ def add_features(sender, **kwargs):
EXTERNAL_VERSION_BUILD = 'external_version_build'
UPDATE_CONDA_STARTUP = 'update_conda_startup'
CONDA_APPEND_CORE_REQUIREMENTS = 'conda_append_core_requirements'
CONDA_USES_MAMBA = 'conda_uses_mamba'
ALL_VERSIONS_IN_HTML_CONTEXT = 'all_versions_in_html_context'
SKIP_SYNC_TAGS = 'skip_sync_tags'
SKIP_SYNC_BRANCHES = 'skip_sync_branches'
Expand Down Expand Up @@ -1661,6 +1662,10 @@ def add_features(sender, **kwargs):
CONDA_APPEND_CORE_REQUIREMENTS,
_('Append Read the Docs core requirements to environment.yml file'),
),
(
CONDA_USES_MAMBA,
_('Uses mamba binary instead of conda to create the environment'),
),
(
ALL_VERSIONS_IN_HTML_CONTEXT,
_(
Expand Down