-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best practice for migrating an existing Git repo to support LFS? #326
Comments
I'm HOPING there is a better way to do this... BUT after DISCLAIMER I'm just a stubborn user, can't promise this will work out for you, BACKUP!I have run into a similar situation myself, but in my case, all my large files were sitting in submodules. So I was able to remove the submodules, track files, and add them back (without the submodules, HORRAY!) This way I can say I did maintain all my history, only some of the history is in the submodules. I'm guessing this is not similar to your situation, and that everything is in one big repo.
(Optional) Collect garbage to shrink currently checked out repo
(Optional) Collect garbage on a bare repo (on the remotes)
I hope this helps! Untested ideas
inside the for loop on step two. It should save that wasted initial rebase, and prevent the need for the Tested using git-lfs 0.5.1 and git 1.9.4.msysgit.1 (Yes, on windows 64 bit) |
bfg is much more efficient than git-filter-branch, especially for long histories. Here's roughly what I did to convert all my .mov files to lfs:
In a fresh directory:
Back to my src directory:
Kind of a chore to figure out, but now my repo is small and zippy. |
@tlbtlbtlb in your solution, when you check out all the previous versions of your history, are all the .mov files correctly there, or are the missing everywhere except the newest commits? |
They're gone from previous versions. Which is what's necessary to cut the size of the repository. |
I guess I was being stubborn and trying to give anyone an option where they DON'T loose the history of all large files, even though it was stated he'd be happy without them. I guess the slower build-in equivalent to
|
Although I'm happy to loose the history of the large files, I would need them to continue to exist in previous commits though - No point having any history at all if every commit previous is broken with missing files. |
@strich I agree with the second statement, I'm a little confused about the first. Are you saying that you are ok with losing the contents AND history of any large files NOT on the tip of your current branch, but the version of the large files that are there, you want them to exist in previous commits up until that version of the file, where before that it will just not exist (be a missing file)? Example: If I understand correctly, are you saying that if I have something like this,
You want to make sure that master and master~1 both point to big1.bin(version 2), but you are ok with big2.bin and big1.bin(version 1) just disappearing? |
Thanks for continuing to reply @andyneff - Yes that is exactly what I'd like to achieve if it is possible. |
Oh, its possible... My current solution actually preserves ALL the files (so big1.bin(version 1), big1.bin(version 2) AND big2.bin). You you are actually asking for a less, more limited version. It is possible to just use the latest commit if you add a little more
This will remove all files matching the lfs track patterns, (in this case, *.npy) except for those listed in the KEEP_FILE file. The files listed in KEEP_FILE and their history will be maintained. Of course, you can replace the first 5 lines with anything you want to get the list of files you want to keeps. Side effect: It is possible that was two separate commit will be merged into one, if the only difference was a file that is now gone. This can also change the topology of your branches too. It will still be "as correct as possible", only some commit messages would disappear. Of course, it is unlikely it will happen, but just an FYI |
Other things I've tried that did NOT work out
|
I write simple java code which can convert repository for LFS usage: https://github.com/bozaro/git-lfs-migrate |
Version of the first script that works on OS X with files that contains spaces (but not newline characters): git filter-branch --prune-empty --tree-filter '
git lfs track "*.zip"
git lfs track "*.exe"
git add .gitattributes
git ls-files -z | xargs -0 git check-attr filter | grep "filter: lfs" | sed -E "s/(.*): filter: lfs/\1/" | tr "\n" "\0" | while read -r -d $'"'\0'"' file; do
echo "Processing ${file}"
git rm -f --cached "${file}"
echo "Adding $file lfs style"
git add "${file}"
done
' --tag-name-filter cat -- --all The most unusual part is The parameter to read, -d, is |
As @tlbtlbtlb mentioned, the BFG is much faster than
Incidentally, the |
I think both of these projects need to work roughly the same speed. |
@andyneff did you keep your original migration script around? I'd be very interested :) I'm asking as I'm in the same situation you had: I have a whole lot of binary files in a submodule, and I'd like to merge the submodule to the main repository while converting the files inside to LFS (and keep the whole history of course). The script you posted doesn't seem to deal with this, as this wasn't requested in this issue (I admit I didn't test it yet). |
@ltrzesniewski When I said "I maintained my submodule history" what I meant I kept the submodules hosted, just abandoned using them for future commits. This means if I went back in history before the conversion, it would checkout the submodule and use it. This is very clunky and not really a great idea... I believe the new preferred way as @rtyley pointed out, is using bfg to convert a repo to lfs, now that it has lfs support. So I believe what you could do is
Where M are your main repo commits, and S are submodule commits. This means that the versions of the submodule the M To summarize, I only know of two tricks
I see references to another merge method I'm unfamiliar with, maybe it can help you. http://stackoverflow.com/a/8901691/4166604 |
@andyneff thanks for your help, I appreciate it very much! Now I understand what you meant in your first post, I guess I just got prone to wishful thinking but you cleared up the confusion - I basically thought you've already got a solution for that PERFECT method you describe 😉 In my case I don't really need to keep the commit history of the submodule, but I need to keep track of the relevant file versions referenced by the main repo, so I was thinking about performing a submodule checkout for each tree in the |
@andyneff |
I am in a slightly different situation where I have an orphan branch called design in a repo where we store sketch files and other assets. I am ok with losing the history since the project is less than a month old, and I already tried to do so but it doesn't seem to work. Couple of questions:
|
This command need to run only one time on user computer. Usually it's runned by git-lfs installer. This command add to $HOME/.gitconfig lines like:
|
I hope I'm not beating a dead horse, but.. This little script is still probably slower than bfg, but I couldn't figure out how to get bfg to honor my lfs remote location. So, I wanted to build on the work from @andyneff and @vmrob and make the filter-branch commands they provided faster. git filter-branch --prune-empty --tree-filter '
git config -f .gitconfig lfs.url "http://artifactory.local:8081/artifactory/api/lfs/git-lfs"
git lfs track "*.exe" "*.gz" "*.msi" "*.pdf" "*.ppt" "*.pptx" "*.rar" "*.vdx" "*.vsd" "*.war" "*.xls" "*.xlsm" "*.xlsx" "*.zip" > /dev/null
git add .gitattributes .gitconfig
git ls-files | xargs -d "\n" git check-attr filter | grep "filter: lfs" | sed -r "s/(.*): filter: lfs/\1/" | xargs -d "\n" -r -n 50 bash -c "git rm -f --cached \"\$@\"; git add \"\$@\"" bash \
' --tag-name-filter cat -- --all By combining the "git lfs track" line into one and by using "xargs -n 50" I was able to cut down on invoking git by more than 50 times per revision, in my case. (Way too many binaries in our repository!) That made things FAR faster... It handles spaces in the filenames also. It seems to be working on Linux, but I can't comment on whether it would work for Mac OS X. |
@kalibyrn |
Super easy to use and worked a treat. Thanks @bozaro ! |
Does |
Because the file object blobs do change (from a large file, to a text pointer) it by the design of git does change the blob SHA, and therefore the commit SHA. The result is there isn't a way, in git, of changing the content of blobs without changing the SHAs. |
@jamesblackburn But git could add a feature (or plugin) to fake the SHA for some special blobs (blobs that have their SHA hard-coded). Problem with changing the commit SHA is that you suddenly lose all of the references to old commits (issue trackers, wikis, urls .. become invalid). |
Faking the sha would be VERY bad (if it were even possible). The SHAs are different, they NEED to be pulled down. If they looked the same, other people fetching the latest version wouldn't know they need the new SHAs As for your issue tracking, etc... problem... Yes, those would be broken. There is a git replace and git graphs feature... I'm not sure if those could help, and I don't think You are justified so be worried about all the SHAs changing, but this is necessary. The entire graph (at least for the branch you convert over) still retain its original topology, only ALL the SHAs in the graph will change (as of the first lfs file, at least). I don't remember if So a few points
|
git-lfs-migrate has an option now for mapping old commit hashes to new ones. I'm linking to the pull request for adding --annotate-id-prefix since I installed from a non-master branch, but it may be in master at some point soon: bozaro/git-lfs-migrate#24 |
@Permafacture Any idea if this option exists in this project's |
Isn't |
@revolter it didn't exist when this thread was created. FWIW I tried to use it to convert a large repo just after it was released, but with no luck. @bozaro's tool worked just fine, although I had to revert the "Dramatically reduce memory usage" commit (so it used lots of memory but it actually finished before the heat death of the universe 😉). Maybe now the issues are ironed out, I don't know. |
LFS v2.3.0 improves |
Haven't looked into it, but I don't see any documentation saying this
feature (updating all the old commit messages to include the original hash)
exists with that tool.
|
I'd like to give Git LFS a good crack with our existing ~25GB Git repo but I'm at a bit of a loss as to how we should go about migrating it to support LFS.
I'm happy to loose all history of the existing large files I'd like to track with LFS.
Do I need to write a script to find and purge all the filetypes I want to track with LFS from the existing Git history, untrack them, then retrack them with LFS?
The text was updated successfully, but these errors were encountered: