Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix check-added-large-files --enforce-all to correctly consider all git-lfs files. #674

Merged
merged 1 commit into from
Dec 21, 2021

Conversation

amartani
Copy link
Contributor

@amartani amartani commented Oct 21, 2021

git lfs status only outputs status for files that are pending some git-lfs related operation.
For usage with --enforce-all, we need the list of all files that are tracked. Ideally, this would
be done using git lfs ls-files; however, that doesn't provide a parseable output option
(git-lfs/git-lfs#4679). Therefore, this uses git check-attr and uses
the attribute filter=lfs to determine which files to consider as git-lfs files.

Fixes: #560

try:
# This also includes all empty files in the repository:
# https://github.com/git-lfs/git-lfs/issues/4660
lfs_ret = cmd_output('git', 'lfs', 'ls-files', '--name-only')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this output does not work for files with special names -- typically you need structured output (json) or null delimeters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe git-lfs ls-files doesn't have any option to output json. There's an open issue about it: git-lfs/git-lfs#4679

Do you think it's reasonable to add a TODO here to switch to another command when the related issue is fixed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not add the feature with a known shortcoming. I wonder if we can bypass git-lfs entirely and query git's attributes directly? (something something git check-attr filter?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. I've been investigating alternatives here, but I couldn't come up with a good one.

The main issue that I'm concerned with is that it is relatively easy to end up with a file that is marked with filter=lfs in gitattributes but is not actually using git-lfs:

  • If you don't have git-lfs installed, then git will just ignore the lfs filter and will commit as a regular file.
  • If you have it installed, but you edit gitattributes directly instead of using git lfs track and you don't git add --renormalize, then the files are also committed as regular files.

As an aside - it would be great to have a pre-commit check that catches these cases, checking that all files that have filter=lfs in gitattributes are actually being tracked by git-lfs. If you know a good way of enforcing that, then it would be easier to switch this to something else based on git check-attr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, yeah that's tricky -- given git-lfs can't produce a parseable output I don't think we can use that. I think the attributes would be a good compromise though, despite the ways to get around it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good; re-implemented using check-attr instead. PTAL

Do you think it is worth keeping the old lfs_files function, or should I switch both paths (with and without --enforce-all) to use the new approach?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah let's switch both of them to be consistent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pre_commit_hooks/check_added_large_files.py Outdated Show resolved Hide resolved

return set(json.loads(lfs_ret)['files'])
check_attr_ret = cmd_output(
'git', 'check-attr', 'filter', '-z', '--', *filenames,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could exceed commandline lengths -- this should probably use --stdin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Since pre_commit_hooks.util.cmd_output doesn't allow passing an input, I switched to use subprocess.run().

…it-lfs files.

`git lfs status` only outputs status for files that are pending some git-lfs related operation.
For usage with --enforce-all, we need the list of all files that are tracked, which can be
achived by `git lfs ls-files`.

Fixes: #560
Copy link
Member

@asottile asottile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asottile asottile merged commit 829867b into pre-commit:master Dec 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

check-added-large-files doesn't work with --all-files
2 participants