Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customize Docker build image #103

Merged
1 commit merged into from
Dec 27, 2017
Merged

Customize Docker build image #103

1 commit merged into from
Dec 27, 2017

Conversation

native-api
Copy link
Contributor

Use case: if there are multiple libs to build from source. Used in https://github.com/skvark/opencv-python (see opencv/opencv-python#58 (comment) ).

@anthrotype
Copy link
Contributor

thanks for this!
for reference https://github.com/matthew-brett/multibuild/issues/94

@ghost
Copy link

ghost commented Dec 20, 2017

How are these docker images built? Again, I maintain that you don't need to maintain a separate manylinux image.

@native-api
Copy link
Contributor Author

native-api commented Dec 20, 2017

@skvark , could you comment? You can indeed package whatever additional libraries you need as RPMs or something. Why is that not an option?

@ghost
Copy link

ghost commented Dec 20, 2017

The idea is that anyone can set up multibuild for a package in thirty minutes due to strong conventions, namely the fact that library_builders will install any library that you need in a reasonable time-frame and setup.py will automatically detect the library locations.

If we allow third-party docker images, we're solving one person's problem, but we're not providing a solution that's going to work for everyone.

@matthew-brett
Copy link
Collaborator

My own feeling is that it might be better to avoid a custom docker image, but we shouldn't legislate against it. Maybe just add comment in the patch that it's nearly always possible to avoid a custom image with caching.

@ghost
Copy link

ghost commented Dec 21, 2017

The problem is that if we allow a custom docker image, we'll end up with configuration over convention, which in the case of building manylinux wheels, doesn't work out of the box. If we don't allow it, then people will come here thinking that they want to use their own docker image, but then we can inform them that they should actually be using library_builders.

@matthew-brett
Copy link
Collaborator

Could you explain what you mean by "configuration over convention"?

As long as we explain right next to the configuration option that this is not the configuration option you're looking for, and maybe add "please add your favorite library recipe to library_builders, it seems to me that we'll likely be OK.

@ghost
Copy link

ghost commented Dec 21, 2017

Could you explain what you mean by "configuration over convention"?

What I mean is that instead of adding a function to library builders when you need a library (or just using the one that's already there), you'll have to find a docker image that already has the software that you want. This doesn't work if a docker image has some software but not others -- it results in far more maintenance collectively than having a single docker image and some functions in library_builders.

The underlying assumption here is that there's a need for a custom docker image, but there's not. It breaks the entire model of pushing all of the functions upstream so that someone only needs to write a function to build a library once.

@ghost
Copy link

ghost commented Dec 21, 2017

I looked into this further, and it isn't possible to compile qt from source on Travis-CI because it takes too long. Ideally we would compile and host a .tar.gz for like is done from openblas.

@skvark
Copy link

skvark commented Dec 21, 2017

Long build times (Qt) is one of the reason why I extended the official manylinux images for opencv-python. There are also other dependencies which will make the combined build time even longer. I don't want to host or create separate packages and maintain them separately. It was just a lot easier and faster to write a couple of Dockerfiles.

@xoviat I understand your concern about the forked images but I believe it's not very common use case.

I was also thinking (I don't know if this is possible) if multibuild somehow could create its own custom Docker image from recipes (Dockerfiles) here in the repo. Just an idea. Then everything would be centralized and precompiled.

@ghost
Copy link

ghost commented Dec 21, 2017

I was also thinking (I don't know if this is possible) if multibuild somehow could create its own custom Docker image from recipes (Dockerfiles) here in the repo. Just an idea. Then everything would be centralized and precompiled.

That could be one option. If you (and @matthew-brett ) are okay with this, we could just switch the default image to your docker, but there are a few considerations:

  1. The docker image needs to work for everyone (numpy, scipy, scikit-learn, psutil, etc.); people's builds starting to fail because the image is silently updated is unacceptable.
  2. The docker image needs to result in similar performance (download, compile time) as the current docker image.

Another option is to add qt to the manylinux docker image; it already has quite a few maintainers. However, IMO the best option is probably to set-up a repository attached to circle-ci that can build and upload especially egregious libraries, like qt, that take forever to build to wheels.scipy.org, and then people can submit PRs to that repository if there are more libraries that need to be added.

@native-api
Copy link
Contributor Author

There's no principal difference. It all boils down to which kind of hosting you're able to get -- RPM hosting, WHL hosting, Docker hosting, ccache hosting (a comparatively recent feature of Travis) etc.

Manylinux project itself uses RPMs from CentOS Vault and Tru Huynh's private CentOS hosting. I didn't see anything that would hint they provide hosting for built libraries that aren't available from these two.

@native-api
Copy link
Contributor Author

native-api commented Dec 22, 2017

Anyway, I believe it has been demonstrated that a derived image is a viable way to cache additional libraries, with its pros and cons on par with other methods.

I don't think that using a random custom image by default is a wise move unless the multibuild project is going to maintain it.

@skvark
Copy link

skvark commented Dec 22, 2017

Yeah, I don't recommend using my image. The manylinux image which I used as a base is already pretty old.

Instead of uploading single library artifacts to some location I would suggest that the separate repository (for example multibuild-manylinux) had a single Dockerfile which uses similar bash scripts as library_builders to construct a new multibuild specific layer on top of the official manylinux image.

When new build scripts are added or the official manylinux image has been updated the multibuild-manylinux image is recreated by building it again and replacing the old image at quay.io. This would then work like the official manylinux image build: it overwrites always the old image at quay.io when new changes are added (https://quay.io/repository/pypa/manylinux1_i686?tab=history).

The image doesn't need to contain all possible libraries. For example every library which takes over 10 minutes to build would be a good start. How does this sound? This would reduce Linux build times across different projects significantly.

@ghost
Copy link

ghost commented Dec 23, 2017

Which libraries do you specifically need that take a long time to build? https://github.com/dockcross provides docker images as well; I (or you) can ask them if they're willing to add these libraries to their docker image.

The goal here is to avoid duplicating effort so that this problem only needs to be solved once across the entire ecosystem.

@skvark
Copy link

skvark commented Dec 23, 2017

Qt and FFmpeg (depends on the machine which builds it but usually around 5 - 10 minutes, takes probably longer on Travis). Also libjpeg-turbo, nasm and libvpx have to be built and maybe other dependencies in the future. Dockcross is new to me but it seems to work like I described in my previous message. It would maybe good ask about this in the manylinux repo itself and get more opinions which is the best way to proceed e.g. is it encouraged to use some common extended images like Dockcross or is better to add build recipes to the pypa/manylinux repo itself.

@ghost
Copy link

ghost commented Dec 23, 2017

@njsmith Thoughts on adding these libraries to the manylinux image?

@njsmith
Copy link

njsmith commented Dec 23, 2017 via email

@ghost
Copy link

ghost commented Dec 24, 2017

So after some consideration, I think the best approach is actually to upload the libraries to pypi. This can be automated so that there is minimal boilerplate for each new library added. However, the biggest issue is that qt cannot actually be built on Travis-CI (it takes too long). However, I can take care of everything else.

@ghost
Copy link

ghost commented Dec 24, 2017

Example packages are mkl and intel_openmp. The boilerplate for adding a library for something like qt would be a GitHub repository with this config.sh:

function pre_build {
    build_simple [qt]
}
function build_wheel_cmd {
    build_universal_binary
}

Essentially the build_universal_binary would collect the files in the prefix and put them into the wheel.

@anthrotype
Copy link
Contributor

upload the libraries to pypi

Interesting, i’d like to know more. You’re suggesting one could make a distribution package with no python content, only some shared libraries. But then where should the libraries be installed inside the python prefix? Using data_files keyword in setup.py with some relative path? I imagine this would be different for each platform. And how would other packages make use of these installed libraries? E.g. how can a ctypes or cffi wrapper find these libraries? Or a cython one, which would need to link at compile time?
I understand this is a bit out of scope for the current issue, but it’d be nice to see some examples (the intel ones you linked don’t provide sources, or I couldn’t find them).

@native-api
Copy link
Contributor Author

native-api commented Dec 24, 2017 via email

@skvark
Copy link

skvark commented Dec 24, 2017

Imho python packaging tools and PyPI should not be used to distribute some arbitrary libraries which have nothing to do with python. I still think that this PR is completely valid and the most simple way to solve the issue.

@ghost
Copy link

ghost commented Dec 24, 2017

This PR will be merged if I cannot provide an alternative solution to this problem. However, the PR was only opened only four days ago and we're only now having this discussion, so I think I deserve a bit more time to fix this issue.

I will take care of everything except providing qt, and that's only because I cannot build qt on Travis-CI under the time limit.

@native-api
Copy link
Contributor Author

I'll be monkey-patching the subroutine for now, so you can take your time.

@matthew-brett
Copy link
Collaborator

@xoviat - I'd like to merge this - but we should certainly try and make better alternatives to gradually make it unnecessary ...

@ghost ghost merged commit c662450 into multi-build:master Dec 27, 2017
@ghost
Copy link

ghost commented Dec 28, 2017

Oh, sorry. @matthew-brett I need to update devel and then revert this PR.

@native-api Please file against devel in the future.

@ghost ghost mentioned this pull request Dec 28, 2017
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants