Skip to content

Commit

Permalink
add some conda docs
Browse files Browse the repository at this point in the history
  • Loading branch information
aryarm committed Mar 9, 2024
1 parent 0238993 commit bfc4bc8
Showing 1 changed file with 318 additions and 11 deletions.
329 changes: 318 additions & 11 deletions wiki/docs/reproducibility/conda.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -32,11 +32,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"There are two sets of input to a package manager: 1) the packages requested by the user and 2) a database of available packages and versions. Consider the following `pip` command.\n",
"A package manager has two inputs: 1) the packages requested by the user and 2) a database of available packages and versions. Consider the following `pip` command, for example.\n",
"```bash\n",
"pip install 'scikit-learn' 'numpy>=1.21.6'\n",
"```\n",
"In this example, the user is asking the `pip` package manager to install `scikit-learn` and some version of `numpy` greater than `1.21.6`. In the process, `pip` will search PyPI (the python packaging index), a database of all versions of every python package and their dependency constraints."
"In this case, the user is asking the `pip` package manager to install `scikit-learn` and some version of `numpy` greater than `1.21.6`. In the process, `pip` will search PyPI (the python packaging index), a database of all versions of every python package and their dependency constraints."
]
},
{
Expand All @@ -58,13 +58,73 @@
},
{
"cell_type": "markdown",
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"metadata": {},
"source": [
"It's common to encounter version conflicts with bioinformatics software. And in some cases, they will be impossible to resolve."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What about when conflicts are unavoidable?"
"**TODO: check that this example is valid**\n",
"For example, let's say that you want to install `numpy==1.14.4`, but it requires `python==2.7` and all of your other software uses `python>=3.0`. Your best bet is to install `numpy==1.14.4` in a separate _virtual environment_. This ensures that it lives in a separate place where it won't conflict with your other software."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TODO: include image here**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using `pip` and `venv` for python environment management"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`pip` is a package manager exclusively for python packages. It downloads software from a _package index_ called PyPI, where developers will often upload their tools."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To create and manage virtual environments consisting of python packages, you can use a tool called `venv`. You can install it using `pip`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Command cheat sheet**\n",
"Command | Description\n",
"--------|-----------\n",
"python -m venv myenv | create a new environment called `myenv`\n",
"source myenv/bin/activate | activate the `myenv` environment\n",
"pip install 'pysam>=0.19.1' | install pysam in the current environment\n",
"pip list | list packages in the current environment\n",
"pip freeze > requirements.txt | export the current environment to share with others\n",
"pip install -r requirements.txt | install packages from an exported environment"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `pip freeze` command will export an environment to a `requirements.txt` file, which can then be shared with your collaborators. For example, an environment with `pysam` v0.19.1 and `numpy` v1.14.4 will appear in the `requirements.txt` file like this:\n",
"\n",
"**TODO: include image here instead of code block**\n",
"```\n",
"pysam==0.19.1\n",
"numpy==1.14.4\n",
"```"
]
},
{
Expand All @@ -75,10 +135,257 @@
]
},
{
"cell_type": "code",
"execution_count": null,
"cell_type": "markdown",
"metadata": {},
"source": [
"`conda` is the ultimate package manager because it can install *any* type of package -- not just python packages. And it is also an environment manager! We recommend exclusively using `conda` to install all your software."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Command cheat sheet**\n",
"Command | Description\n",
"--------|-----------\n",
"conda create -n myenv | create a new environment called `myenv`\n",
"conda activate myenv | activate the `myenv` environment\n",
"conda install -n myenv 'python=3.8' | install python in the `myenv` environment\n",
"conda install -n myenv -c conda-forge 'python=3.8' | install python from the conda-forge channel in the `myenv` environment\n",
"conda list | list packages in the current environment\n",
"conda env export > env.yml | export the current environment to share with others* *\\[not recommended!\\]*\n",
"conda env create -f env.yml | recreate an exported environment\n",
"\n",
"**Note:* We do not recommend using `conda env export` to share environments because the environments won't be reproducible. Read on to learn about how to properly share an environment."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### When should I create a new environment?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span style=\"color:red\">When installing a new package in an existing environment, it will be *more* likely that you encounter a conflict if the existing environment has *many* packages.</span> Refer to the previous chapter of this wiki if you don't yet understand why."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Based on this fact, we've created the following list of best practices.\n",
"\n",
"**Best practices for environment creation**\n",
"1. Try to keep your environments small with few existing packages\n",
"2. Create a new environment whenever you start a new project or run into a conflict\n",
"3. Manually maintain a YAML file that lists each package in your environment in case you need to recreate it. Read on to learn about how to do this\n",
"4. Avoid installing packages in the original `base` environment"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Conda uses multiple package indexes called **channels**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Each package that you install with `conda` will belong to a *channel*. A channel consists of a set of packages and their available versions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Anybody can create a *channel* so there are many online. Here are the most important to know if you're a bioinformatician.\n",
"\n",
"channel name | description\n",
"-------------|------------\n",
"anaconda | data-science packages that come pre-installed with the Anaconda distribution\n",
"r | R packages re-distributed by Anaconda, Inc\n",
"defaults | an umbrella channel that refers to anaconda, r, and others created by Anaconda Inc\n",
"conda-forge | data-science packages, curated by an open source community\n",
"bioconda | bioinformatics packages! open source. Packages have dependencies in conda-forge\n",
"\n",
"We do not recommend using the `anaconda`, `r`, or `defaults` channels. Read on to learn why."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Best practices for conda channels"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It's best to use open-source channels. They're more up-to-date and more comprehensive. So how do you tell `conda` to use `conda-forge` and `bioconda`?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"By default, `conda` uses the channels listed in your `~/.condarc` config file. When you first install `conda`, Anaconda Inc channels will be listed there, but channels created by Anaconda Inc are known to **conflict** with conda-forge! For this reason, <span style=\"color:red\">we recommend adding `conda-forge` and `bioconda` to your `~/.condarc` config file and removing the `defaults` channel</span>. To ensure the defaults channel is never used, you can list `nodefaults` at the end of your channel list."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You should never install packages from the `r` channel because it is part of the `defaults` channel, which conflicts with `conda-forge`. You can find all of the R packages in the `r` channel within either `conda-forge` or `bioconda`, anyway."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Two channels may contain the same package. The order that you specify in your `~/.condarc` config file or environment YAML file determines which channel is preferred. Since many `bioconda` packages have dependencies in `conda-forge`, <span style=\"color:red\">we recommend listing `conda-forge` before `bioconda`</span>."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here are some bad examples of channel listings, pictured in <span style=\"color:red\">red</span>. There is only one good example, which we've illustrated below in <span style=\"color:blue\">blue</span>. After installing conda, <span style=\"color:red\">you should edit your `~/.condarc` config file to ensure it matches</span> <span style=\"color:blue\">the blue example</span>.\n",
"\n",
"**TODO: add good and bad examples as an image**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The best, reproducible env.yml files are created **manually**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to create `env.yml` files that are reproducible, that get installed quickly, that are easy to read and understand, and that are easy to adapt and change in the future, then you need to create your `env.yml` files manually -- **don't use conda env export**!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TODO: include image with list of best practices**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Use `conda-lock` for fully reproducible `.lock` files"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Choosing how much of your conda environment to lock"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The best way to install `conda`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to use `conda`, you should install `mamba`. Specifically, you should install it from the Mambaforge distribution."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conda, in summary"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## FAQs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Should I install R packages with `conda` or R's built-in `install.packages()`?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You shouldn't use R's `install.packages()` because it isn't a proper package manager. It doesn't report conflicts. Instead, if a conflict arises between a package that you've installed and a package that you'd like to install, `install.packages()` will silently upgrade your old package.\n",
"\n",
"This is dangerous. If you installed a package once but a new version of it makes a change that is backwards-incompatible or introduces a bug, the new version may break your scripts! The broken behavior may even be subtle enough that you don't catch it until much later.\n",
"\n",
"To mitigate the backwards compatibility issue, many R developers will publish versions of their tools with breaking changes as entirely separate packages. In practice, this system can still lead to issues when bugs arise.\n",
"\n",
"**tldr;** You should exclusively use `conda` to install R packages. Otherwise, you should be prepared for unexpected changes to the software you've installed."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What if a tool isn't yet installable wuth `conda`?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What's the difference between Anaconda and miniconda?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": []
}
],
Expand Down

0 comments on commit bfc4bc8

Please sign in to comment.