Correctness issue: Change to batched entry upload to support witnessing #1067

haydentherapper · 2022-09-21T23:45:09Z

Witnesses monitor the consistency of the log, verifying that the log is append-only and immutable. Roughly, the verification process for a witness is:

Persist a checkpoint (signed tree head)
Periodically, request the current checkpoint
Verify the signature on the checkpoint
Request and verify a consistency proof between the persisted checkpoint and current checkpoint
Persist the new checkpoint
Publish a countersignature over the current checkpoint

Witnesses also help mitigate split-view attacks, where the log presents different views to different clients. As a consumer, I can use witnesses' countersignatures to verify that all witnesses have seen the same checkpoint. For a client that wants to verify an entry in the log and verify the log is consistent and no split-view attack is occurring, the client needs to:

Verify the entry's inclusion proof
- If presented with an SET (offline promise of inclusion), the client should request an inclusion proof
Verify consistency by reaching quorum with a set of witnesses. This involves a) checking signatures for each countersignature from a witness, and b) verifying that the root hash that's been signed over is the same as the root hash of the inclusion proof.
- Note that a client doesn't need to request a consistency proof if they trust the set of witnesses
- It's up to the client how many witnesses constitutes quorum
- Note that quorum could be reached via a gossip protocol too, where witnesses share signatures with one another.

This quorum mechanism requires that witnesses have seen the same root hash. This means that the root hash cannot frequently update. With the current state of Rekor, the root hash updates with almost every entry, because we've set Trillian's MMD (maximum merge delay, the time it takes for Trillian to process an entry for inclusion) to 0. This means that witnesses are very, very unlikely to ever witness the same checkpoint.

The proposed fix would be to move towards a model like CT, where entries are batched and processed periodically. Whatever this period is, we can set witnesses to poll more frequently than this period, giving them a chance to witness the same checkpoint. For example, if we set MMD to 5 minutes, we can have witnesses poll every minute or two.

The impact of this change is:

The SET will no longer include a log index. This will be a breaking change to clients, because the SET signature is over the log index
The SET will no longer include the integrated time. This timestamp will have to be the received/requested time, or a signed timestamp.
The response on entry upload will no longer include the inclusion proof, also a breaking client change. I do not see this as a significant problem. Technically, an inclusion proof is a stronger statement than an SET (which is just a promise), but the log can always lie about the inclusion proof and present a split view. You still need to verify a) consistency with that inclusion proof's root hash, and b) find quorum. Persisting an inclusion proof saves one call to Rekor however, since for SET verification, you should also request an inclusion proof to confirm that the promise has been resolved and the entry is actually in Rekor.
Entries can't be verified online immediately. This is not likely an issue for two reasons:
- Most Sigstore clients have just been using SETs to "verify inclusion" and never querying the Rekor log
- Artifacts are likely not going to be verified immediately after upload, but later when they're consumed. We just need to keep the MMD low enough (<1hr would be my recommendation).

haydentherapper · 2022-09-21T23:45:37Z

cc @asraa @priyawadhwa @znewman01 for more discussion

haydentherapper · 2022-09-22T00:10:55Z

An alternative to making breaking changes, while still increasing the batch size, would be to keep the current design and increase the MMD to O(minutes). Rekor already polls Trillian until the upload is complete, so this would only require a configuration change. However, the downside to this is that connections will be kept open for minutes, and the SLO for upload would have to be significantly increased. I assume this is not a viable solution due to filling up the connection pool, but maybe there's some way around this? If anyone has knowledge of Go + networking, please chime in!

Also worth noting the current state:

We can deploy a GHA-based witness using rekor-monitor. This currently just verifies consistency.
Trillian's omniwitness verifies consistency, publishes a countersignature, and gathers signatures from other witnesses ("distributors"). If you look at one distributor, you'll see that it has not been able to gather matching checkpoints from different distributors in months (checkpoint.2/checkpoint.3), and I assume it was only able to due to some outage in Rekor.
Witnesses verifying consistency without a quorum mechanism will not prevent split-view attacks, because each witness could be presented a different view of the log.

dlorenc · 2022-09-22T15:03:45Z

I'm not sure I agree that witnessing and gossip protocols are necessary to mitigate split-view attacks. In order to maintain a long-term split view attack, the server would need to track individual users across IP addresses over long periods of time in order to target or create one.

A split-view attack is impractical if you assume that clients/witnesses are relatively anonymous. There's plenty more work that can be done here to make these attacks even harder, but this is still pretty effective today.

znewman01 · 2022-09-22T15:15:35Z

I still think it's a good idea to make the SET just a promise, and not wait for completion.

This has some advantages architecturally: as the Merkle tree grows, each append will be slower, and batching helps, so we may want to batch for reasons other than security.

EDIT: plus it's weirdly inconsistent with CT

asraa · 2022-09-22T16:21:34Z

This has some advantages architecturally: as the Merkle tree grows, each append will be slower, and batching helps, so we may want to batch for reasons other than security.

@rbehjati has also brought up points in favor of batched updates, in face of CAP theorem too: and we can ideally make this a configurable parameter of the server.

haydentherapper · 2023-02-15T19:26:02Z

I have written a proposal for supporting witnessing in Rekor - https://docs.google.com/document/d/1NFSrkcIvrwvqV-2Ax2NoE0FdKh6ccnZufdqawAetKmY/edit?resourcekey=0-p8eFLQub4klCf3wrOAFSTg (shared with sigstore-dev@)

tl;dr: We don't need any breaking changes, we don't need to switch to batching entries. Rekor will publish a checkpoint periodically that witnesses will sign. As an artifact verifier, I will use this witnessed checkpoint (and the log size from that checkpoint) when computing inclusion proofs. The consequence of this is that artifacts aren't immediately verifiable when added to the log, but this is no different than if we went with a batched approach. This approach also comes with privacy improvements (k-anonymity), since checkpoints no longer uniquely identity an entry.

Proposed changes to Rekor are to:

Add an API for fetching a to-be-witnessed checkpoint (with a mechanism to compute this periodically)
Add support for fetching an inclusion proof by tree size (supported by Trillian, but not Rekor's current API)

There will be changes to Sigstore clients to support this stronger online verification and witnesses to use the new API.

Currently, a log witness countersigns the latest checkpoint Rekor publishes. Rekor updates its checkpoint on every entry upload, which is extremely frequent. This means that two witnesses are very unlikely to countersign the same checkpoint. While gossiping, it will not be possible to reach a consensus on the same checkpoint, and therefore we can't mitigate split-view attacks. This change publishes a checkpoint, a "stable checkpoint", every 5 minutes (configurable) to Redis. This runs as a goroutine, with a Redis key derived from the current time rounded to the nearest 5 minutes. We use set-if-not-exist for Redis, meaning you can run replicated instances of Rekor, with all instances writing to the same Redis key. For a client that wants to gossip, this means waiting 5 minutes before a checkpoint is published that witnesses will countersign (Note that this is an area of active development and research too). The stable checkpoint can be accessed with a query parameter. Fixes sigstore#1067. There is still value in batching in terms of reliablity, but stable checkpoints solve the gossiping issue without a breaking change. Signed-off-by: Hayden Blauzvern <hblauzvern@google.com>

* Publish stable checkpoint periodically to Redis Currently, a log witness countersigns the latest checkpoint Rekor publishes. Rekor updates its checkpoint on every entry upload, which is extremely frequent. This means that two witnesses are very unlikely to countersign the same checkpoint. While gossiping, it will not be possible to reach a consensus on the same checkpoint, and therefore we can't mitigate split-view attacks. This change publishes a checkpoint, a "stable checkpoint", every 5 minutes (configurable) to Redis. This runs as a goroutine, with a Redis key derived from the current time rounded to the nearest 5 minutes. We use set-if-not-exist for Redis, meaning you can run replicated instances of Rekor, with all instances writing to the same Redis key. For a client that wants to gossip, this means waiting 5 minutes before a checkpoint is published that witnesses will countersign (Note that this is an area of active development and research too). The stable checkpoint can be accessed with a query parameter. Fixes #1067. There is still value in batching in terms of reliablity, but stable checkpoints solve the gossiping issue without a breaking change. Signed-off-by: Hayden Blauzvern <hblauzvern@google.com> * Use latest key to access latest checkpoint Signed-off-by: Hayden Blauzvern <hblauzvern@google.com> * Add test Signed-off-by: Hayden Blauzvern <hblauzvern@google.com> * Return early if key already exists Signed-off-by: Hayden Blauzvern <hblauzvern@google.com> * Add comment explaining failure handling Signed-off-by: Hayden Blauzvern <hblauzvern@google.com> * Apply suggestions from code review Co-authored-by: Bob Callaway <bobcallaway@users.noreply.github.com> Signed-off-by: Hayden B <hblauzvern@google.com> * Fix goroutine leak, check if redis client is configured Signed-off-by: Hayden Blauzvern <hblauzvern@google.com> * Add test for goroutine leak Signed-off-by: Hayden Blauzvern <hblauzvern@google.com> --------- Signed-off-by: Hayden Blauzvern <hblauzvern@google.com> Signed-off-by: Hayden B <hblauzvern@google.com> Co-authored-by: Bob Callaway <bobcallaway@users.noreply.github.com>

haydentherapper added bug Something isn't working ga_candidate Proposed blocking issue for GA release labels Sep 21, 2022

priyawadhwa removed the ga_candidate Proposed blocking issue for GA release label Sep 22, 2022

haydentherapper mentioned this issue Jan 3, 2023

Feature Request: Federated access #210

Closed

haydentherapper self-assigned this Feb 15, 2023

haydentherapper mentioned this issue May 2, 2023

Publish stable checkpoint periodically to Redis #1461

Merged

bobcallaway closed this as completed in #1461 May 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctness issue: Change to batched entry upload to support witnessing #1067

Correctness issue: Change to batched entry upload to support witnessing #1067

haydentherapper commented Sep 21, 2022 •

edited

Loading

haydentherapper commented Sep 21, 2022

haydentherapper commented Sep 22, 2022

dlorenc commented Sep 22, 2022

znewman01 commented Sep 22, 2022 •

edited

Loading

asraa commented Sep 22, 2022

haydentherapper commented Feb 15, 2023 •

edited

Loading

Correctness issue: Change to batched entry upload to support witnessing #1067

Correctness issue: Change to batched entry upload to support witnessing #1067

Comments

haydentherapper commented Sep 21, 2022 • edited Loading

haydentherapper commented Sep 21, 2022

haydentherapper commented Sep 22, 2022

dlorenc commented Sep 22, 2022

znewman01 commented Sep 22, 2022 • edited Loading

asraa commented Sep 22, 2022

haydentherapper commented Feb 15, 2023 • edited Loading

haydentherapper commented Sep 21, 2022 •

edited

Loading

znewman01 commented Sep 22, 2022 •

edited

Loading

haydentherapper commented Feb 15, 2023 •

edited

Loading