Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Monorepo in Binder #1030

Closed
zhang123cnn opened this issue Dec 12, 2019 · 2 comments
Closed

Support Monorepo in Binder #1030

zhang123cnn opened this issue Dec 12, 2019 · 2 comments

Comments

@zhang123cnn
Copy link

Hey folks, I am hosting an internal Binder instance for my organisation. Some engineering teams here own very big mono-repo with many notebooks inside. When they load such repo in Binder, it is going to install dependencies for all notebooks in the runtime container.

Given most people are only interested in a few notebooks at one launch, this bulk loading behaviour is very undesirable because

  1. Building takes much longer than necessary.
  2. Force the repo owner to aggregate all dependencies on the top level.
  3. Master branch get updated very frequently which leaves little chance to reuse existing built image.

If Binder can support building repo and tracking changes on the subdirectory level, then it would be very helpful to us. What do you folks think?

@betatim
Copy link
Member

betatim commented Jan 25, 2020

This is unlikely to be implemented. BinderHub operates with the assumption that "the shareable unit is the repo".

There is support for "build from a sub directory" in the underlying tool we use (http://repo2docker.readthedocs.io/) however we don't expose this at the BinderHub level. Some reasons for this are that the functionality is only asked for by a small number of people (monorepos aren't mega popular in the languages we support), our caching mechanisms are based on commit SHAs (different SHA implies "needs rebuild", we don't inspect sub-directories), how to expose this in the GUI without making it more complicated/adding a new field (the GUI is already overwhelming for newcomers).


To help with slow (re)builds via a different route:

Depending on how you specify the dependencies in your repository you should get (very) good caching of the builds when only notebooks get edited. For example a repository with a basic requirements.txt will be rebuilt when a notebook in it changes but because of how the steps in the generated Dockerfile are organised all these steps should hit the docker build cache (unless that cache isn't present any more). To understand a bit more about how the caching works there is currently no alternative to diving into the code at a place like https://github.com/jupyter/repo2docker/blob/master/repo2docker/buildpacks/conda/__init__.py#L212. A relevant config option in BinderHub itself is "sticky builds". Some context for related configuration changes that you might want to make to your binderhub are in #949

We invested quite a bit of effort to try and organise the layers to maximise caching. Empirically it works for our mybinder.org deployment but it would be interesting to get feedback from other deployments. For this we need a link to a repo that doesn't end up profiting from the caching even though it looks like it should.

@prateekrastogi
Copy link

prateekrastogi commented Mar 7, 2020

@betatim I am stuck with the issue of supporting different binder configs for my monorepo examples without needing dangling branches.

Would be glad to have some clean workarounds. Maybe allowing/processing extra url query parameters without changing the mybinder GUI itself would suffice for advanced use-cases we are facing.

My use case is to primarily create a jupyterlab environment for my monorepo, and add another custom draw.io docker binder cross-connecting jupyterlab workflow for same monorepo.

@betatim betatim closed this as completed Apr 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants