conda_curation
is a tool which is designed to filter conda repositories, especially Conda Forge, in order to remove packages based on a variety of kinds of criteria.
- Remove packages that do not match any of the user-provided matchspecs for that package (for an example, see
matchspecs/secure_python.yaml
) - Remove packages that have been superceded by new builds (i.e.
python-3.9.18-h12345678_0
is superceded bypython-3.9.18-h12345678_1
, and so the former package is removed) - Remove
dev
andrc
packages (i.e.2.0.0.dev0
or2.0.0.rc0
). - Remove packages that track undesired features (i.e.
pypy
, etc) - Remove packages that are incompatible with any available candidates of another package chosen by the user. For example, if user only selects
python >=3.12
, and specifies-C python
, then olderopenssl
such asopenssl 1.1.1n
will be removed, sincemamba create -n ... openssl==1.1.1n python>=3.12
cannot be solved. - After applying any/all of the above filters, perform follow-up analysis to find packages which depended on now-removed dependencies, and remove those as well, and apply this recursively. For example, filtering out Python 2.7 will also filter out all builds of numpy that were compiled against Python 2.7.
-
Supports CEP-15
base_url
: if the source respository (as specified by the--channel-alias
flag) does not already have ainfo.base_url
set, then the outputrepodata.json
will have itsinfo.base_url
set to the--channel-alias
. If it was set in the originalrepodata.json
, then it will be preserved.If all clients support CEP-15, then this obviates the need for a proxy server configured to 30x redirect all package requests to the
--channel-alias
destination.
conda_curation
serves a small-to-medium sized enterprise that want to begin using Conda internally and wants to leverage the rich Conda Forge package ecosystem rather than create their own packages or hand-curate.
The main reason why conda_curation
was created was for performance: by reducing the Conda Forge repodata to a smaller size, substantial Conda client performance improvements may be observed. At Chicago Trading Company, a prototype of this repodata-filtering system applied to Conda-Forge reduced mamba mambabuild
runtimes by about two minutes across a wide variety of pipelines. mamba create --dry-run
commands were seen to take 10 seconds instead of 20 seconds. Solve failures were also rendered much faster (and cleaner).
A security team may demand that insecure packages, such as older Python interpreters, CA certificate bundles, OpenSSL versions, etc. are completely unavailable from within the enterprise. conda_curation
is capable of creating these kinds of policies.
There are significant feature limitations of this software, as it was initially only targeting a Minimum Viable Product (MVP) of fitting into a specific point in Chicago Trading Company's artifact delivery. As such, it will be necessary for the user to bring their own HTTP proxy / cache proxy system for serving packages, but also contains a diversion for .*repodata.*\.json.*
URLs that redirects to the rendered output of conda_curation
. We have successfully done this using nginx
with 301 redirects for asset downloads to the artifact server during thest testing phase, and by putting nginx directly in front of the artifact server in the deployment phase.
The original prototype of this tool was developed by myself (@AaronOpfer) at Chicago Trading Company, based on observations from my colleague Bozhao Jiang that hand-crafted "curated" channels caused conda builds to finish several minutes faster than they were previously. The original version was written in Python and, due to its performance issues, reached a hard limit on feature development as the development cycle time lengthened. I rewrote the project in Rust in my free time to create this version, and have received permission to release it to the community under the MIT License.
- Bozhao Jiang
- Derek Shoemaker
- Jason Bryan
- Mel Williams