300 remember all previously verified configurations #665

steinybot · 2019-09-01T04:55:50Z

Initial attempt at solving #300.

This works by creating git blobs for each part of the signature and then taking each object id (SHA1) and pairing that with the hash (SHA256) of the contents of that blob. Those pairs are then written to another blob, delimited by a space, and one pair per line.

When checking if a signature has been verified then we compare the contents of the signature blob and not just the hash.

We can't just rely on the object id (SHA1) as collision attacks are feasible. A malicious repo owner could use such an attack to create an object with the same SHA1 as the configuration and hook contents.

I have kept the overcommit.configuration.signature in the git config for now but I don't think it is necessary. Removing it would loose the ability to show the No previously recorded signature for configuration file error but that doesn't seem like a huge loss. It may still be useful for change detection which is what the previous signature_changed? method did but nothing actually used it for that purpose, it was for signature verification instead.

I haven't tried using this locally yet so it could do with some manual testing before merging.

There are quite a few tests that fail for me but they also fail on master. Will see what the tests do on this PR.

Let me know what you think.

sds

Thanks for the pull request with tests, @steinybot.

Some questions:

What does using Git's blob storage ultimately get you?
Will these blobs always remain local? (i.e. can they be pushed to the remote? If they could, that would be bad as an attacker could push fake verified blobs that are automatically accepted by others)
When are unused blobs garbage collected?
How would one "unverify" a configuration with this scheme?

In short, it would be great to understand why we're using blob storage when a file under the .git directory would suffice and not raise the above questions. This justification should be included in the PR description.

See #505 for the previous attempt at this feature. Excited for us to proceed if we can have satisfactory answers to the above questions.

sds · 2019-09-07T18:06:46Z

lib/overcommit/signature.rb

+      #
+      # @param arg [Hash]
+      # @return [String]
+      def sign_signatures(arg)


Ditto on comment about arg name.

sds · 2019-09-07T18:07:16Z

lib/overcommit/signature.rb

+      end
+
+      # Returns the object signature for some blob.
+      #


Missing doc for the argument type.

sds · 2019-09-07T18:32:43Z

lib/overcommit/signature.rb

+      #
+      # @param arg [String]
+      # @return [String]
+      def sign(arg)


Could there be a more descriptive name than the generic term arg?

steinybot · 2019-09-10T22:10:23Z

Thanks for the pull request with tests, @steinybot.

Some questions:

What does using Git's blob storage ultimately get you?

The reason was to solve the "quick lookup" requirement since the time taken to lookup a git object is same as retrieving a file. That depends on the file system but in the worst case it is no worse than how well git itself is performing.

Having it as git objects also makes it possible to audit signatures, although in its current state this doesn't work very well as there needs to be a tree object to group them all together.

If we do it right (i.e. using a tree) then there would be a trivial way of seeing all the overcommit hooks and configuration without any other files and also do diffs between them without including changes from the rest of the repository.

Git is going to move to SHA-256 at some point so this would simplify things.

I also wasn't really a big fan of putting non-git stuff in the .git directory but this isn't really a big deal so long as it is namespaced in a way that makes it obvious that it is for overcommit and avoids collisions. This is not uncommon. Sourcetree creates a .git/sourcetreeconfig file and GitHub creates .git/pull/ID/head for PRs (which only exist on the server unless you fetch them explicitly).

Will these blobs always remain local? (i.e. can they be pushed to the remote? If they could, that would be bad as an attacker could push fake verified blobs that are automatically accepted by others)

From a normal users perspective yes they would be local since there wouldn't be a refspec that points to them so they wouldn't be pushed BUT now I realise that a malicious attacker could intentionally make these signature objects reachable from a refspec (i.e. branch or tag) and then a fetch would retrieve them. This is a serious problem.

When are unused blobs garbage collected?

That's something I hadn't considered. Having unreachable objects is not a good idea.

How would one "unverify" a configuration with this scheme?

With the current implementation then it would be to run git prune.

In short, it would be great to understand why we're using blob storage when a file under the .git directory would suffice and not raise the above questions. This justification should be included in the PR description.

See #505 for the previous attempt at this feature. Excited for us to proceed if we can have satisfactory answers to the above questions.

Clearly there are some things that I regrettably overlooked. The current problems are:

Having the trust stored only in objects is flawed since it is not possible to prevent an attacker from creating arbitrary objects.
Objects must be reachable otherwise gc will remove them.

To solve 1 I think we should be using proper digital signatures rather than just hashes, something like Ed25519. With this we would generate a private key and store that in ~/.overcommit/keys/ and have the path to that in the local git config. The key could also be password protected and we just prompt for it when signing. The signatures themselves can be public or not, it no longer matters.

Having public signatures means that it wouldn't be a big step to start using something like PGP to create a web of trust. In my opinion having everyone on a team being responsible for reviewing every single config change is a bit of a nightmare and usually just results in everyone signing it without checking.

We could add a new ref like how git-notes work, something like .git/refs/overcommit/signatures pointing to a tree. Each entry in that tree is a blob with the name that is the hash of the tree and the contents is the signature of the whole tree.

This doesn't quite solve 2 but that could also just be a new ref under .git/refs/overcommit/configurations which points to a tree where each other tree is a subtree.

The downside of using the refs in this way is that the tree grows as the number of signatures grow. I'm not sure if git notes has this problem but it would seem like it would. We could always limit the branching factor on the tree if it becomes a problem.

Thoughts?

sds · 2019-09-20T12:52:37Z

Thanks for your reply. Apologies for the delay in response, as I am traveling.

If I simplify what you're proposing down to to its core elements, it appears you are asking:

Whether to use public key-based signatures to utilize a trust model similar to PGP
Whether to use a different ref space to store signatures (such as the one used by git-notes)

These still seem more complicated than a simple file of SHA256 signatures, for benefits that don't seem to have strong justification.

With respect to 1: a web of trust model adds complexity and is still considered broken.

With respect to 2: I've never used git-notes, but this seems like we are tying our implementation to another Git feature, rather than simply taking a hash of a configuration and storing that in a file. Furthermore, if it's in a ref space, could an attacker inject notes? Can you trust notes?

My personal experience running this project is that additional complexity needs to have strong reasoning in order to be included. Revoking a signed configuration should be simple (delete a line). Revoking all signed configurations should be simple (delete a file).

If I've mischaracterized these suggestions, please correct me. At this point in time, I am still strongly in favor of someone taking #505 to completion.

steinybot added 7 commits September 1, 2019 16:15

Store signatures as git blobs

3530e4a

Store hook signatures as git blobs

b55d9e9

Store hook signatures as git blobs

64d7863

Fix git config arguments

67bf108

Fix verify signatures result

fa39dea

Configure RSpec example status persistence file

e20d4d6

Fix hook signer tests

15dd91d

sds reviewed Sep 7, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

300 remember all previously verified configurations #665

300 remember all previously verified configurations #665

steinybot commented Sep 1, 2019

sds left a comment

sds Sep 7, 2019

sds Sep 7, 2019

sds Sep 7, 2019

steinybot commented Sep 10, 2019

sds commented Sep 20, 2019

300 remember all previously verified configurations #665

Are you sure you want to change the base?

300 remember all previously verified configurations #665

Conversation

steinybot commented Sep 1, 2019

sds left a comment

Choose a reason for hiding this comment

sds Sep 7, 2019

Choose a reason for hiding this comment

sds Sep 7, 2019

Choose a reason for hiding this comment

sds Sep 7, 2019

Choose a reason for hiding this comment

steinybot commented Sep 10, 2019

sds commented Sep 20, 2019