Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suggestions for large repository #22

Closed
softprops opened this issue Aug 24, 2019 · 17 comments
Closed

suggestions for large repository #22

softprops opened this issue Aug 24, 2019 · 17 comments
Assignees
Milestone

Comments

@softprops
Copy link

I'm experimenting with Github Actions on a large code repository at $WORK. We use a mix of Concourse ci and Jenkins and are looking towards Github Actions as a potential CI/CD tool with less hosting maintenance.

In some simple experiments we found the actions/checkout command to be slower that expected. A checkout step takes ~6m 53s using with.fetch-depth: 1, and that's before my job can do anything useful.

In our Jenkins set up we have a persistent clone that we do local clones from. In concourse we use the git resource. In both cases the process of fetching a given version of the code feels faster than actions/checkout and was wondering if there as any actions tuning parameters would could apply to speed up the process

I can see a glimpse into what the action is doing in the output.

git remote add origin https://github.com/{org}/{repo}
 git config gc.auto 0
git config --get-all http.https://github.com/{org}/{repo}.extraheader
git config --get-all http.proxy
git -c http.extraheader="AUTHORIZATION: basic ***" fetch --tags --prune --progress --no-recurse-submodules --depth=1 origin +refs/heads/*:refs/remotes/origin/*

Is there a way we can opt out of features like fetching all of the tags (we have a lot of tags) or submodules (we don't use submodules)?

I started looking at want concourse git resource does to draw some comparisons. Looking here I can see it does something like a clone + checkout operation

git clone --single-branch $depthflag $uri $branchflag $destination $tagflag

cd $destination

git fetch origin refs/notes/*:refs/notes/* $tagflag

if [ "$depth" -gt 0 ]; then
  "$bin_dir"/deepen_shallow_clone_until_ref_is_found_then_check_out "$depth" "$ref" "$tagflag"
else
  git checkout -q "$ref"
fi

I'm not sure if that's any better or worse our experience is that it "feels" more performance that what we see with github actions/checkout.

@zoispag
Copy link

zoispag commented Sep 6, 2019

I have similar problem.

git fetch + git checkout takes ~4 minutes (TeamCity needs a few seconds, because it only applies the changes from previous run)

I solved it with:

steps:
    - name: Clone working branch
      run: git clone --single-branch --branch ${{ github.head_ref }} --depth 1 https://${{ secrets.CLONE_TOKEN }}:x-oauth-basic@github.com/${{ github.repository }}.git .

Shallow clone takes ~ 30 seconds.

Maybe we need an official shallow-clone action?

@softprops
Copy link
Author

@zoispag just curious what was the scope applied to secrets. CLONE_TOKEN vs the default GITHUB_TOKEN that GitHub provides to each action?

@zoispag
Copy link

zoispag commented Sep 29, 2019

@softprops I gave repo access.
image
Because I needed to clone via https from a private repo, I used the OAuth key of an actual account/github user

@softprops
Copy link
Author

Thanks. It would be great if GITHUB_TOKEN had access to do that. I feel hesitant about these personal access tokens with repo status because they give access to all of the repos your github user has access to. I believe GITHUB_TOKEN provided by GitHub is scoped to only permissions for the specific repo.

In any case I tried your solution and it worked! down from 7 minutes I'm at about 1m 22s. That's a huge savings. Thank for for posting a reply and sorry I was late to notice it!

@zoispag
Copy link

zoispag commented Sep 29, 2019

No worries for late replying!
I feel the same about the CLONE_TOKEN, but GITHUB_TOKEN did not allow to perform this task. 😞Though I am using a "dedicated" GitHub account, which generally acts as a bot, so by definition has access only to this repo.

7m to 1m22s is a huge improvement, so happy to help! 🎉 😁

@softprops
Copy link
Author

Codified what worked for us here
https://github.com/meetup/express-checkout

@zoispag
Copy link

zoispag commented Oct 7, 2019

@softprops I wanted to do the same, but didn't have the time yet.
It would be great to include me as a contributor in your README though 😉

@softprops
Copy link
Author

@zoispag done! meetuparchive/express-checkout#2

@XhmikosR
Copy link

This is definitely something that needs to be improved. Fetching all tags and branches can be slow.

@ericsciple
Copy link
Contributor

fixing this in v2, will hopefully merge early next week

@ericsciple
Copy link
Contributor

@softprops @XhmikosR checkout v2-beta (now in master). waiting for feedback/stabilization, then will push v2 tag. v2 fetches only a single commit by default

@ericsciple ericsciple added this to the v2 milestone Dec 3, 2019
@stayallive
Copy link

stayallive commented Dec 3, 2019

@ericsciple I've enabled it in our repo and we went from 30-40 seconds checkouts to 5-7 seconds ❤️ (even in our teeny tiny 12k commits repo) this is gut! I can only imagine what actual bug repo's will see as improvements!

Thanks for all the work from you and everyone involved!

@softprops
Copy link
Author

I haven't had a change to take a look yet but would it be possible to post a comment to this issue when there's an official v2 release published?

@softprops
Copy link
Author

I just ran a benchmark on our largest repo. This now comparable in performance to our work around. As soon as the beta version changes to v2 we're just going to switch to actions/checkout proper

@ericsciple
Copy link
Contributor

published v2 tag

@softprops
Copy link
Author

Works great! Thanks folks

@gregwym
Copy link

gregwym commented Dec 19, 2019

with checkout@v2 it indeed saves a lot of bandwidth but seems much slower on the download speed. Please see the comparison bellow. I suspect it is because GitHub has to do the compression on the fly when fetching with --depth=1?

V1
image

V2
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants