Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate all branches to LFS #690

Closed
ronag opened this issue Jan 27, 2018 · 18 comments
Closed

Migrate all branches to LFS #690

ronag opened this issue Jan 27, 2018 · 18 comments
Assignees
Milestone

Comments

@ronag
Copy link
Member

ronag commented Jan 27, 2018

To properly clean up the root git. Applies to:

remotes/origin/2.1.0
remotes/origin/avoid_readback
remotes/origin/flash_monitor
remotes/origin/master
remotes/origin/matrix-based-transforms

@dotarmin maybe something you could do?

See, https://github.com/CasparCG/Server/blob/2.2.0/.gitattributes and git lfs migrate with rewrite history.

@ronag
Copy link
Member Author

ronag commented Jan 27, 2018

bash

git clone https://github.com/CasparCG/Server
cd Server
git lfs migrate import --include="TODO" --include-ref=refs/heads/master --include-ref=refs/heads/2.1.0 --include-ref=refs/heads/avoid_readback --include-ref=refs/heads/flash_monitor --include-ref=refs/heads/matrix-based-transforms
git push -f --all

Or something like that.

Oh and have another clone for backup.

@ronag ronag added this to the 2.3.0 milestone Jan 27, 2018
@ronag
Copy link
Member Author

ronag commented Jan 27, 2018

Should make CI builds a lot faster

@Julusian
Copy link
Member

I suspect something may need to be done for tags too, but can't be certain without trying it.

@ronag ronag modified the milestones: 2.3.0, 2.2.0 Jan 27, 2018
@Julusian
Copy link
Member

Julusian commented Jan 29, 2018

Ive just discovered that LFS on github has a quite small quota of 1gb, with additional purchasable data packs for both bandwidth (50gb for $5).
https://help.github.com/articles/about-storage-and-bandwidth-usage/

As we are working with a large amount of data (git lfs pull says 2.3GB) the quota and datapacks arent going to last long, especially once automated builds are added into the mix.
Do we have some higher quota from github, or is SVT currently paying for the additional quota. and more importantly, are SVT happy to keep on doing so going forwards?

One key thing as well is that any forks use the quota from the repo, so any automated builds i setup for my branches will be using your quota. Meaning that any push I make to my fork will use ~4gb quota (assuming i build 2 versions). Seeing as yesterday I pushed 4 commits, that if my builds were running would have cost SVT $1.6, for changes which I don't have to submit back in a PR

Depending on the ci it might be possible to cache the lfs data, but that depends on whether the cache is restored before or after the git clone travis-ci/travis-ci#8787 and relies on whoever set up the builds to consider doing that

Some other projects have faced similar issues and have reconsidered their usage of lfs because of it.
streamlink/streamlink#811
azul3d/engine#156

@ronag
Copy link
Member Author

ronag commented Jan 29, 2018

@dotarmin ping

@ronag
Copy link
Member Author

ronag commented Jan 29, 2018

We hsouldm aybe just delete files in history using bfg

@Julusian
Copy link
Member

@dotarmin ping

He is away currently, but will be back tomorrow.

We hsouldm aybe just delete files in history using bfg

Would that be instead of using lfs?
I dont mind slightly mangling the history with that as long as we retain the original somewhere. A seperate 'archive' repo under the caspar org would be fine. Just in case someone still wants to build an older version, especially as 2.0.7 is the latest stable release and will likely end up with its dependencies stripped.

@dotarmin
Copy link
Contributor

dotarmin commented Jan 30, 2018

@ronag maybe something you could do?

Absolutely, I can do that.

@Julusian I suspect something may need to be done for tags too, but can't be certain without trying it.

I think if we start with branches first and then move on to tags but I have to try it out first before I say more about specifically tags.

@dotarmin
Copy link
Contributor

dotarmin commented Jan 30, 2018

Ive just discovered that LFS on github has a quite small quota of 1gb, with additional purchasable data packs for both bandwidth (50gb for $5).
https://help.github.com/articles/about-storage-and-bandwidth-usage/

As we are working with a large amount of data (git lfs pull says 2.3GB) the quota and datapacks arent going to last long, especially once automated builds are added into the mix.

That's right, when automated builds are added into the mix again the quota won't last for that long just as you say, but this issue only applies if we decide to have a build triggered for each push to GitHub.

If we instead go with the approach to build once or two times per day then we could get rid of this problem and won't "waste" bandwidth. The bandwidth is used when a fork is created, when changes are pulled from upstream and when a commit and push is done to a file tracked with Git LFS source. If needed it would still be possible to start a manual build on-demand.

Automated builds once or two times per day:
👍 👎

Do we have some higher quota from github, or is SVT currently paying for the additional quota. and more importantly, are SVT happy to keep on doing so going forwards?

This isn't an issue for the moment but might be if we go with triggering a build for each push/pr merge to upstream and or if the bandwidth hits the ceil all the time.

One key thing as well is that any forks use the quota from the repo, so any automated builds i setup for my branches will be using your quota. Meaning that any push I make to my fork will use ~4gb quota (assuming i build 2 versions). Seeing as yesterday I pushed 4 commits, that if my builds were running would have cost SVT $1.6, for changes which I don't have to submit back in a PR

I will check if their is a way to use cached Git LFS files. If their is a way to use the cached files then those would be pulled instead and you would receive only the files that has changed. But all this depends on how each community member sets up their own build-servers of course, if a fresh clone is done for each triggered build then this won't help.

ping @ronag @Julusian @k0d

As the last way out to this is maybe to go back to not use Git LFS (puuh). What do you think?

@dotarmin
Copy link
Contributor

@ronag We hsouldm aybe just delete files in history using bfg

I don't have experience with bfg.

@Julusian
Copy link
Member

Julusian commented Jan 30, 2018

If we instead go with the approach to build once or two times per day

I don't know if that is easily possible using a service like Travis, but is via Jenkins. I have no problem with branch builds happening up to a couple of times a day, but I would like to see PR builds. There have been a few PRs that don't build (mine included) on at least one platform, and it hasn't been noticed until after they are merged which then leaves them to someone else to fix, or for them to be reverted (both of which have happened recently)

I will check if their is a way to use cached Git LFS files.

I did have a quick google the other day, but didn't find anything. They are cached under .git/lfs so you will only need to pull them for each tree once, but each fresh tree will need to pull them again. The impact of this for ci will vary, depending on the service, as those which don't do a fresh clone each time (Jenkins typically does this) will have less impact.

As the last way out to this is maybe to go back to not use Git LFS

I am leaning this way currently, but am undecided. Especially with the LFS issues I was having LFS not detecting files that should have been included.

@Julusian
Copy link
Member

Julusian commented Feb 2, 2018

After I found a clone of 2.2.0 took a long time was curious about clone speed of 2.1.0 vs 2.2.0. You can see my results below.

One big thing to note is that cloning the dependencies is quicker from Github than from LFS, and results in considerably less data transfer. I checked, and the unzipped dependencies in 2.1.0 totals up to 2.2G, so while slightly less than what 2.2.0 was pulling, not enough to invalidate any conclusions drawn.

After this, I think that going back to non-lfs is the right way to go. Perhaps it would be better to compress each large file separately though, rather than grouped in archives, to help reduce churn when dependencies are updated.

I have had a play with the deps in the repo, and by compressing a couple of the larger ones, the clone times are much better than 2.1.0. This is without adding anything back in the history, so this clone time will increase over time, but that will only happen when deps are updated which shouldn't happen often.

2.2.0:

julus@build:~$ time git lfs clone --single-branch https://github.com/casparcg/server.git --branch=2.2.0
Cloning into 'server'...
remote: Counting objects: 68639, done.
remote: Compressing objects: 100% (184/184), done.
remote: Total 68639 (delta 158), reused 200 (delta 119), pack-reused 68336
Receiving objects: 100% (68639/68639), 84.23 MiB | 8.68 MiB/s, done.
Resolving deltas: 100% (42943/42943), done.
Checking connectivity... done.
Git LFS: (779 of 779 files) 2.40 GB / 2.40 GB

real    5m17.755s
user    1m38.956s
sys     0m49.120s

2.2.0 with deps moved in tree (with some in archives to reduce size)

julus@build:~$ time git clone --single-branch https://github.com/julusian/casparcg-server.git --branch=2.2.0-no-lfs
Cloning into 'casparcg-server'...
remote: Counting objects: 69036, done.
remote: Compressing objects: 100% (223/223), done.
remote: Total 69036 (delta 165), reused 291 (delta 147), pack-reused 68653
Receiving objects: 100% (69036/69036), 306.14 MiB | 8.69 MiB/s, done.
Resolving deltas: 100% (43135/43135), done.
Checking connectivity... done.

real    0m40.835s
user    0m13.460s
sys     0m8.204s

2.1.0:

julus@build:~$ time git lfs clone --single-branch https://github.com/casparcg/server.git --branch=2.1.0
Cloning into 'server'...
remote: Counting objects: 59200, done.
remote: Total 59200 (delta 0), reused 1 (delta 0), pack-reused 59199
Receiving objects: 100% (59200/59200), 1.20 GiB | 8.68 MiB/s, done.
Resolving deltas: 100% (38343/38343), done.
Checking connectivity... done.

real    2m41.507s
user    1m3.028s
sys     0m31.064s

2.1.0 shallow (useful for ci):

julus@build:~$ time git lfs clone --single-branch https://github.com/casparcg/server.git --branch=2.1.0 --depth=1
Cloning into 'server'...
remote: Counting objects: 14606, done.
remote: Compressing objects: 100% (10217/10217), done.
remote: Total 14606 (delta 4752), reused 11457 (delta 4198), pack-reused 0
Receiving objects: 100% (14606/14606), 389.95 MiB | 8.70 MiB/s, done.
Resolving deltas: 100% (4752/4752), done.
Checking connectivity... done.

real    1m6.049s
user    0m27.680s
sys     0m11.252s

2.2.0 with deps in tree - shallow:

julus@build:~$ time git clone --single-branch https://github.com/julusian/casparcg-server.git --branch=2.2.0-no-lfs --depth=1
Cloning into 'casparcg-server'...
remote: Counting objects: 15797, done.
remote: Compressing objects: 100% (10928/10928), done.
Receiving objects:  88% (13958/15797), 75.79 MiB | 8.68 MiB/s
remote: Total 15797 (delta 4732), reused 15793 (delta 4732), pack-reused 0
Receiving objects: 100% (15797/15797), 243.86 MiB | 8.63 MiB/s, done.
Resolving deltas: 100% (4732/4732), done.
Checking connectivity... done.

real    0m31.650s
user    0m8.916s
sys     0m6.756s

@ronag
Copy link
Member Author

ronag commented Feb 4, 2018

@Julusian I've nugetified the windows build so the repo should be significantly smaller now.

@ronag
Copy link
Member Author

ronag commented Feb 4, 2018

@Julusian You might want to look into nugetifying CEF (there are nuget packages for it).

@Julusian
Copy link
Member

Julusian commented Feb 4, 2018

Yep, that clone time is signifantly quicker now:

julus@oppenheimer:~/Projects$ time git lfs clone --single-branch https://github.com/casparcg/server.git --branch=2.2.0
WARNING: 'git lfs clone' is deprecated and will not be updated
          with new flags from 'git clone'

'git clone' has been updated in upstream Git to have comparable
speeds to 'git lfs clone'.
Cloning into 'server'...
remote: Counting objects: 70022, done.
remote: Compressing objects: 100% (112/112), done.
remote: Total 70022 (delta 73), reused 156 (delta 70), pack-reused 69839
Receiving objects: 100% (70022/70022), 88.15 MiB | 8.07 MiB/s, done.
Resolving deltas: 100% (43956/43956), done.
Git LFS: (94 of 94 files) 245.93 MB / 245.93 MB                                                                                                                                                                    

real	0m50.462s
user	0m15.671s
sys	0m5.165s

shallow:

julus@oppenheimer:~/Projects$ time git lfs clone --single-branch https://github.com/casparcg/server.git --branch=2.2.0 --depth=1
WARNING: 'git lfs clone' is deprecated and will not be updated
          with new flags from 'git clone'

'git clone' has been updated in upstream Git to have comparable
speeds to 'git lfs clone'.
Cloning into 'server'...
remote: Counting objects: 1459, done.
remote: Compressing objects: 100% (971/971), done.
remote: Total 1459 (delta 590), reused 941 (delta 464), pack-reused 0
Receiving objects: 100% (1459/1459), 8.62 MiB | 7.46 MiB/s, done.
Resolving deltas: 100% (590/590), done.
Git LFS: (94 of 94 files) 245.93 MB / 245.93 MB                                                                                                                                                                    

real	0m36.256s
user	0m10.762s
sys	0m3.760s

@Julusian
Copy link
Member

Julusian commented Feb 4, 2018

@ronag I can take a look at doing that at some point.
Does this mean we are wanting to disable LFS for the repo? What about the remaining linux packages?

@k0d
Copy link
Collaborator

k0d commented Feb 4, 2018

I'll deal with the linux deps....just recovering from a really nasty flu bug!

@ronag
Copy link
Member Author

ronag commented Feb 4, 2018

@Julusian once we get out all the large deps we can remove LFS. Until then it will have to stay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants