Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

300 remember all previously verified configurations #665

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

steinybot
Copy link
Contributor

Initial attempt at solving #300.

This works by creating git blobs for each part of the signature and then taking each object id (SHA1) and pairing that with the hash (SHA256) of the contents of that blob. Those pairs are then written to another blob, delimited by a space, and one pair per line.

When checking if a signature has been verified then we compare the contents of the signature blob and not just the hash.

We can't just rely on the object id (SHA1) as collision attacks are feasible. A malicious repo owner could use such an attack to create an object with the same SHA1 as the configuration and hook contents.

I have kept the overcommit.configuration.signature in the git config for now but I don't think it is necessary. Removing it would loose the ability to show the No previously recorded signature for configuration file error but that doesn't seem like a huge loss. It may still be useful for change detection which is what the previous signature_changed? method did but nothing actually used it for that purpose, it was for signature verification instead.

I haven't tried using this locally yet so it could do with some manual testing before merging.

There are quite a few tests that fail for me but they also fail on master. Will see what the tests do on this PR.

Let me know what you think.

Copy link
Owner

@sds sds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pull request with tests, @steinybot.

Some questions:

  • What does using Git's blob storage ultimately get you?
  • Will these blobs always remain local? (i.e. can they be pushed to the remote? If they could, that would be bad as an attacker could push fake verified blobs that are automatically accepted by others)
  • When are unused blobs garbage collected?
  • How would one "unverify" a configuration with this scheme?

In short, it would be great to understand why we're using blob storage when a file under the .git directory would suffice and not raise the above questions. This justification should be included in the PR description.

See #505 for the previous attempt at this feature. Excited for us to proceed if we can have satisfactory answers to the above questions.

#
# @param arg [Hash]
# @return [String]
def sign_signatures(arg)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto on comment about arg name.

end

# Returns the object signature for some blob.
#
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing doc for the argument type.

#
# @param arg [String]
# @return [String]
def sign(arg)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could there be a more descriptive name than the generic term arg?

@steinybot
Copy link
Contributor Author

Thanks for the pull request with tests, @steinybot.

Some questions:

  • What does using Git's blob storage ultimately get you?

The reason was to solve the "quick lookup" requirement since the time taken to lookup a git object is same as retrieving a file. That depends on the file system but in the worst case it is no worse than how well git itself is performing.

Having it as git objects also makes it possible to audit signatures, although in its current state this doesn't work very well as there needs to be a tree object to group them all together.

If we do it right (i.e. using a tree) then there would be a trivial way of seeing all the overcommit hooks and configuration without any other files and also do diffs between them without including changes from the rest of the repository.

Git is going to move to SHA-256 at some point so this would simplify things.

I also wasn't really a big fan of putting non-git stuff in the .git directory but this isn't really a big deal so long as it is namespaced in a way that makes it obvious that it is for overcommit and avoids collisions. This is not uncommon. Sourcetree creates a .git/sourcetreeconfig file and GitHub creates .git/pull/ID/head for PRs (which only exist on the server unless you fetch them explicitly).

  • Will these blobs always remain local? (i.e. can they be pushed to the remote? If they could, that would be bad as an attacker could push fake verified blobs that are automatically accepted by others)

From a normal users perspective yes they would be local since there wouldn't be a refspec that points to them so they wouldn't be pushed BUT now I realise that a malicious attacker could intentionally make these signature objects reachable from a refspec (i.e. branch or tag) and then a fetch would retrieve them. This is a serious problem.

  • When are unused blobs garbage collected?

That's something I hadn't considered. Having unreachable objects is not a good idea.

  • How would one "unverify" a configuration with this scheme?

With the current implementation then it would be to run git prune.

In short, it would be great to understand why we're using blob storage when a file under the .git directory would suffice and not raise the above questions. This justification should be included in the PR description.

See #505 for the previous attempt at this feature. Excited for us to proceed if we can have satisfactory answers to the above questions.

Clearly there are some things that I regrettably overlooked. The current problems are:

  1. Having the trust stored only in objects is flawed since it is not possible to prevent an attacker from creating arbitrary objects.
  2. Objects must be reachable otherwise gc will remove them.

To solve 1 I think we should be using proper digital signatures rather than just hashes, something like Ed25519. With this we would generate a private key and store that in ~/.overcommit/keys/ and have the path to that in the local git config. The key could also be password protected and we just prompt for it when signing. The signatures themselves can be public or not, it no longer matters.

Having public signatures means that it wouldn't be a big step to start using something like PGP to create a web of trust. In my opinion having everyone on a team being responsible for reviewing every single config change is a bit of a nightmare and usually just results in everyone signing it without checking.

We could add a new ref like how git-notes work, something like .git/refs/overcommit/signatures pointing to a tree. Each entry in that tree is a blob with the name that is the hash of the tree and the contents is the signature of the whole tree.

This doesn't quite solve 2 but that could also just be a new ref under .git/refs/overcommit/configurations which points to a tree where each other tree is a subtree.

The downside of using the refs in this way is that the tree grows as the number of signatures grow. I'm not sure if git notes has this problem but it would seem like it would. We could always limit the branching factor on the tree if it becomes a problem.

Thoughts?

@sds
Copy link
Owner

sds commented Sep 20, 2019

Thanks for your reply. Apologies for the delay in response, as I am traveling.

If I simplify what you're proposing down to to its core elements, it appears you are asking:

  1. Whether to use public key-based signatures to utilize a trust model similar to PGP
  2. Whether to use a different ref space to store signatures (such as the one used by git-notes)

These still seem more complicated than a simple file of SHA256 signatures, for benefits that don't seem to have strong justification.

With respect to 1: a web of trust model adds complexity and is still considered broken.

With respect to 2: I've never used git-notes, but this seems like we are tying our implementation to another Git feature, rather than simply taking a hash of a configuration and storing that in a file. Furthermore, if it's in a ref space, could an attacker inject notes? Can you trust notes?

My personal experience running this project is that additional complexity needs to have strong reasoning in order to be included. Revoking a signed configuration should be simple (delete a line). Revoking all signed configurations should be simple (delete a file).

If I've mischaracterized these suggestions, please correct me. At this point in time, I am still strongly in favor of someone taking #505 to completion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants