-
-
Notifications
You must be signed in to change notification settings - Fork 709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix check-added-large-files --enforce-all to correctly consider all git-lfs files. #674
Conversation
try: | ||
# This also includes all empty files in the repository: | ||
# https://github.com/git-lfs/git-lfs/issues/4660 | ||
lfs_ret = cmd_output('git', 'lfs', 'ls-files', '--name-only') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this output does not work for files with special names -- typically you need structured output (json) or null delimeters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe git-lfs ls-files doesn't have any option to output json. There's an open issue about it: git-lfs/git-lfs#4679
Do you think it's reasonable to add a TODO here to switch to another command when the related issue is fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not add the feature with a known shortcoming. I wonder if we can bypass git-lfs
entirely and query git's attributes directly? (something something git check-attr filter
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. I've been investigating alternatives here, but I couldn't come up with a good one.
The main issue that I'm concerned with is that it is relatively easy to end up with a file that is marked with filter=lfs
in gitattributes but is not actually using git-lfs:
- If you don't have git-lfs installed, then git will just ignore the lfs filter and will commit as a regular file.
- If you have it installed, but you edit gitattributes directly instead of using
git lfs track
and you don'tgit add --renormalize
, then the files are also committed as regular files.
As an aside - it would be great to have a pre-commit check that catches these cases, checking that all files that have filter=lfs
in gitattributes are actually being tracked by git-lfs. If you know a good way of enforcing that, then it would be easier to switch this to something else based on git check-attr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, yeah that's tricky -- given git-lfs
can't produce a parseable output I don't think we can use that. I think the attributes would be a good compromise though, despite the ways to get around it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good; re-implemented using check-attr instead. PTAL
Do you think it is worth keeping the old lfs_files
function, or should I switch both paths (with and without --enforce-all
) to use the new approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah let's switch both of them to be consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
return set(json.loads(lfs_ret)['files']) | ||
check_attr_ret = cmd_output( | ||
'git', 'check-attr', 'filter', '-z', '--', *filenames, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could exceed commandline lengths -- this should probably use --stdin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Since pre_commit_hooks.util.cmd_output
doesn't allow passing an input, I switched to use subprocess.run()
.
…it-lfs files. `git lfs status` only outputs status for files that are pending some git-lfs related operation. For usage with --enforce-all, we need the list of all files that are tracked, which can be achived by `git lfs ls-files`. Fixes: #560
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
git lfs status
only outputs status for files that are pending some git-lfs related operation.For usage with --enforce-all, we need the list of all files that are tracked. Ideally, this would
be done using
git lfs ls-files
; however, that doesn't provide a parseable output option(git-lfs/git-lfs#4679). Therefore, this uses
git check-attr
and usesthe attribute
filter=lfs
to determine which files to consider as git-lfs files.Fixes: #560