Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make our CI faster #1655

Closed
gabrieldemarmiesse opened this issue Apr 13, 2020 · 10 comments
Closed

Make our CI faster #1655

gabrieldemarmiesse opened this issue Apr 13, 2020 · 10 comments
Assignees

Comments

@gabrieldemarmiesse
Copy link
Member

Currently our CI take ~30 min to run. After some analysis, here is the critical path and the big steps:

  • Building the wheel on linux: 25m
    • pull the docker image: 3m
    • install TF and dependencies: 2m
    • build CPU and GPU ops: 8m (building CPU ops only takes 2m in another build, so we can assume 6m for building the GPU ops)
    • Run all tests: 6m, I should re-enable pytest-xdist here.
    • Rebuild all ops for the wheel: 5m (because the bazel cache was cleaned before)
  • Test the release wheel on Windows: 4m10s
  • Upload to pypi, but not really: 25s.

I'll work toward making this faster, help and ideas are welcome too of course.

@bhack
Copy link
Contributor

bhack commented Apr 13, 2020

Can you point on the CI scripts sources?
Can we take any speed-up with https://github.com/actions/cache?

@bhack
Copy link
Contributor

bhack commented Apr 13, 2020

@gabrieldemarmiesse
Copy link
Member Author

Caching may help, but it may also make things slower. It's usually hard to know until we try.

@bhack
Copy link
Contributor

bhack commented Apr 13, 2020

I've not built the linux wheel Dockerfile with the progress option but I suppose that in 25m the main problem is that /root/.cache/bazel is an empty volume when you run the Action vs local build right?

@bhack
Copy link
Contributor

bhack commented Apr 15, 2020

I've asked if we there is a way to cache with action/cache the mount of buildkit RUN --mount=type=cache at actions/cache#260.

@bhack
Copy link
Contributor

bhack commented Apr 16, 2020

In the meantime I think that

--cache-from=type=registry,ref=$CACHE_TAG
--cache-to=type=registry,ref=$CACHE_TAG,mode=max

could work if we have a registry to push. See https://medium.com/titansoft-engineering/docker-build-cache-sharing-on-multi-hosts-with-buildkit-and-buildx-eb8f7005918e

@gabrieldemarmiesse
Copy link
Member Author

I did that a while ago, didn't seem very fast to me at the time. If someone wants to try again, a pull request is welcome: #1043

@seanpmorgan
Copy link
Member

@gabrieldemarmiesse Can this be closed or are there pending actions we need to do?

@gabrieldemarmiesse
Copy link
Member Author

Let's close it as there is no obvios path right now

@bhack
Copy link
Contributor

bhack commented May 15, 2020

If you want you could subscribe to tensorflow/build#5 cause we are marginally talking also about the bazel cache for github actions.
That ticket was created yesterday by one of the SIG Build leaders after our chat on the SIG Build Gitter channel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants