Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TAP introducing snapshot Merkle trees #125

Merged
merged 13 commits into from
Mar 1, 2021

Conversation

mnm678
Copy link
Contributor

@mnm678 mnm678 commented Sep 17, 2020

This TAP introduces snapshot Merkle trees as a scalable solution to snapshot metadata for large repositories. This grew out of a discussion of metadata overheads associated with the Notary v2 effort.

There is also a draft of the reference implementation available at theupdateframework/python-tuf#1113.

@mnm678
Copy link
Contributor Author

mnm678 commented Sep 17, 2020

cc @JustinCappos @joshuagl

internal nodes of a Merkle tree contain the hash of the leaf nodes. The exact
algorithm for generating this Merkle tree (ie the order of leaves in the hash,
how version information is encoded), is left to the implementer, but this
algorithm should be documented in a POUF so that implementations can be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
algorithm should be documented in a POUF so that implementations can be
algorithm should be documented in a [POUF](https://github.com/theupdateframework/taps/blob/master/tap11.md) so that implementations can be

longer valid. At this point, the repository may garbage collect all snapshot
Merkle metadata files.

# Security Analysis
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the descriptions and analysis of these attacks, although I think it could be made clearer to non-experts by illustrating a short and concrete example. WDYT?

forward attack recovery in a similar manner, by instructing clients to delete
stored version information after a timestamp key replacement.

## Mix and match attack
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I think it's important to clarify that although mix-and-match attacks by MitM attackers are not possible, they are possible to do by attackers who control timestamp and snapshot keys.


# Abstract

To optimize the snapshot metadata file size for large registries, registries
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

registry? is this the same as repository?

* +TUF-Version:
* +Post-History:

# Abstract
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This chapter feels quite heavy on technical detail and light on the high level impact of this TAP

Comment on lines 41 to 45
could significantly impact the metadata overhead. For example, if a repository
has 50,000,000 targets, the snapshot metadata will be about 380,000,000 bytes
(https://docs.google.com/spreadsheets/d/18iwWnWvAAZ4In33EWJBgdAWVFE720B_z0eQlB4FpjNc/edit?ts=5ed7d6f4#gid=0).
For this reason, it is necessary to create a more scalable solution for snapshot
metadata that does not significantly impact the security properties of TUF.
Copy link
Member

@jku jku Sep 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning seems sound but the wording is misleading I think: the issue isn't the number of targets alone but the design decision of 1 target per metadata file? So it would be more accurate to say

For example, if a repository
has 10,000,000 target metadata files, the snapshot metadata will be about 380,000,000 bytes

or did I misunderstand? (it's entirely possible: I'm not able to understand most of the spreadsheet with the context I have)

Comment on lines 201 to 202
Merkle tree. As long as the client checks for an auditor’s verification, the
client will not install the rolled-back version of the target.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be clear about how this proposal compares to current situation -- both when clients accept only 3rd party audited trees and when auditors are not used.

This actually applies to all security impact sections: Clear comparison to status quo would be great

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely agree, I would love to see a concluding paragraph for each of these security analysis that compares snapshot Merkle trees, both with and without auditors, to the current status quo.

@mnm678
Copy link
Contributor Author

mnm678 commented Sep 18, 2020

Thanks for the reviews @trishankatdatadog and @jku! I pushed some new commits to address your comments, especially in the security analysis section.

This information will be included in the following metadata format:
```
{ “leaf_contents”: {METAFILES},
“Merkle_path”: {INDEX:HASH}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think you meant merkle_path.

{ “leaf_contents”: {METAFILES},
“Merkle_path”: {INDEX:HASH}
“path_directions”:{INDEX:DIR}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is worthwhile to scope out how the hash algorithm is specified, in case we wanted to switch implementations over time, both for how a hash tree is computed (for example, I just found out about Verkle Trees), and the underlying hash algorithm itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current design leaves the hash algorithm to the implementer (to be specified in a POUF). Although, it might be nice to include the algorithm in the timestamp metadata along with the root hash so that the algorithm could change over time without needing a major version release.

Switching from a Merkle tree to something like a Verkle tree might be trickier without a major version release as it changes the whole algorithm for computation rather than just the hash algorithm used.


By replacing a single snapshot metadata file with individual snapshot Merkle
metadata files, this TAP reduces the metadata overhead for repositories with
large numbers of targets metadata files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be worth at least hinting at the security trade-offs and the need for an auditor here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I added a sentence below

protect against mix-and-match attacks and rollback attacks. In order to provide
these protections, snapshot metadata is responsible for keeping track of the
version number of each targets metadata file, ensuring that all targets downloaded are
from the same snapshot, and ensuring that no target file decreases its version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from the same snapshot, and ensuring that no target file decreases its version
from the same snapshot, and ensuring that no targets metadata file decreases its version

we develop must provide these same protections.

A snapshot Merkle tree manages version information for each targets metadata file by including
this information in each leaf node. By using a Merkle tree to store these nodes,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this information in each leaf node. By using a Merkle tree to store these nodes,
this information in a leaf node for each targets metadata file. By using a Merkle tree to store these nodes,

candidate-snapshot-merkle-tap.md Outdated Show resolved Hide resolved
and ensure that the version information has not decreased for each leaf.
Alternatively, the repository may provide auditors with information about the
contents and ordering of leaf nodes so that the auditors can more efficiently
verify the entire tree.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd appreciate some discussion here of what happens if an auditor starts watching a repository that has already gone through several Merkle tree generations. Is it a problem that the auditor hasn't validated all trees generated by the repository?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From context in the security analysis section below, I think an auditor is valid so long as it has the full Merkle tree since the last timestamp key rotation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a paragraph about fast forward attack recovery that describes which versions should be verified by new auditors

TUF currently protects against rollback attacks by checking the current time
signed by timestamp and ensuring that no version information provided by
snapshot has decreased since the last update. With both of these protections,
the client is secure against a rollback attack to any version released
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the client is secure against a rollback attack to any version released
a client that has a copy of trusted metadata is secure against a rollback attack to any version released

somewhat implicit in the "before the previous update cycle" below, but possibly worth being explicit

client verification and by third party auditors. If no keys are compromised,
the timestamp keys protect against a rollback attack by ensuring a valid
snapshot Merkle tree. If the timestamp key is compromised, the client
verification of previous Merkle trees provides rollback protection for the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only for a client with previously downloaded Merkle trees? Or are we suggesting that a client could (should?) download previous Merkle trees to ensure the targets metadata file they are downloading isn't susceptible to a rollback attack (I think so based on context below)? In which case, I'd like to see the TAP suggest when the download of previous snapshot Merkle trees might be performed and how.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a description of the client previous tree verification to the Merkle tree verification section.

@mnm678
Copy link
Contributor Author

mnm678 commented Sep 23, 2020

Thanks for the review @joshuagl! I responded to your comments inline.

Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for taking so long to get back to this PR @mnm678, I've reviewed it this week with fresh eyes. I really like this idea, I think we would benefit from a few additional steps before we merge as a draft:

  • minor edits of language for clarity, suggested in-line
  • summarising each subsection of the security analysis with a concluding statement comparing snapshot Merkel trees, with and without an auditor, to the current status quo
  • better quantifying how much less data clients will need to download under this model
  • assign the TAP a number

Before the TAP comes out of draft, I think it would be good to also add:

  • PoC implementation of an auditor
  • PoC and recommendations for integrating an auditor into the client workflow

candidate-snapshot-merkle-tap.md Outdated Show resolved Hide resolved
candidate-snapshot-merkle-tap.md Outdated Show resolved Hide resolved
candidate-snapshot-merkle-tap.md Outdated Show resolved Hide resolved
candidate-snapshot-merkle-tap.md Outdated Show resolved Hide resolved
candidate-snapshot-merkle-tap.md Outdated Show resolved Hide resolved
candidate-snapshot-merkle-tap.md Outdated Show resolved Hide resolved
metadata that does not significantly impact the security properties of TUF.

We designed a new approach to snapshot that improves scalability while
achieving similar security properties to the existing snapshot metadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we quantify this improved scalability a bit? It would help readers understand the motivation if we indicate how much snapshot metadata an average client will need to download on each update cycle compared to the 380,000,000 bytes above?

Comment on lines 15 to 31
To optimize the snapshot metadata file size for large repositories, repositories
can use a snapshot Merkle tree to conceptually store version information about
all images in a single snapshot without needing to distribute this entire
snapshot to all clients. To do so, it puts version information for each targets
metadata file into a leaf of the Merkle tree, and distributes snapshot Merkle
files containing this leaf information and a path to the root of the Merkle tree.
Clients can then compare this information with a Merkle root provided in the
timestamp metadata. To prove that there has not been a reversion of the
snapshot Merkle tree when downloading an image, the client and third-party
auditors download the prior snapshot Merkle trees and check that the version
numbers did not decrease at any point.

By replacing a single snapshot metadata file with individual snapshot Merkle
metadata files, this TAP reduces the metadata overhead for repositories with
large numbers of targets metadata files. It maintains the security protections
of the single snapshot metadata file in part through the use of third party
auditors that check for rollback attacks anywhere in the Merkle tree.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From TAP 1 the purpose of the Abstract is to provide:

a short (~200 word) description of the technical issue being addressed

therefore, I tried to condense what you've written here into just a description of the issue being addressed:

Suggested change
To optimize the snapshot metadata file size for large repositories, repositories
can use a snapshot Merkle tree to conceptually store version information about
all images in a single snapshot without needing to distribute this entire
snapshot to all clients. To do so, it puts version information for each targets
metadata file into a leaf of the Merkle tree, and distributes snapshot Merkle
files containing this leaf information and a path to the root of the Merkle tree.
Clients can then compare this information with a Merkle root provided in the
timestamp metadata. To prove that there has not been a reversion of the
snapshot Merkle tree when downloading an image, the client and third-party
auditors download the prior snapshot Merkle trees and check that the version
numbers did not decrease at any point.
By replacing a single snapshot metadata file with individual snapshot Merkle
metadata files, this TAP reduces the metadata overhead for repositories with
large numbers of targets metadata files. It maintains the security protections
of the single snapshot metadata file in part through the use of third party
auditors that check for rollback attacks anywhere in the Merkle tree.
Snapshot metadata for large repositories, with a high number of targets
metadata files (through significant use of delegations), can become
prohibitively large. Due to the need to download the snapshot file on every
update cycle, very large snapshot metadata files can become a significant
overhead for TUF clients.
This TAP proposes a method for reducing the size of snapshot metadata a client
must download without significantly weakening the security properties of TUF.

Comment on lines 201 to 202
Merkle tree. As long as the client checks for an auditor’s verification, the
client will not install the rolled-back version of the target.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely agree, I would love to see a concluding paragraph for each of these security analysis that compares snapshot Merkle trees, both with and without auditors, to the current status quo.

@@ -0,0 +1,293 @@
* TAP:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this will become TAP 16, but let's be sure to assign the correct number (and rename the file) before we merge this PR.

Copy link
Contributor Author

@mnm678 mnm678 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @joshuagl!

Comment on lines 113 to 115
sibling nodes needed to reconstruct the tree during verification (see diagram).
In addition the path should contain direction information so that the client
will know whether each node is a left or right sibling when reconstructing the
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a matter of perspective. Each node contains the hashes of it's child nodes, but to re-construct a node's parent, you need that node's sibling. I'll work on making that more clear.

## Auditing Merkle trees

In order to ensure the validity of all targets metadata version information in the
Merkle tree, third-party auditors should validate the entire tree each time it
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the TUF spec is a bit inconsistent about capitalizing these, so I'm not sure whether to do so here.

@mnm678
Copy link
Contributor Author

mnm678 commented Jan 26, 2021

@joshuagl This is ready for another review. I addressed everything except the auditor implementation in the PoC, which I will do before this moves out of the draft stage.

@mnm678
Copy link
Contributor Author

mnm678 commented Feb 1, 2021

I added a PoC auditor implementation. It doesn't yet include client/auditor interaction, but it demonstrates how an auditor could download and verify all snapshot merkle metadata.

Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the continued iteration on this TAP Marina! I think this looks in a good shape to merge as a draft to the TAPs repository.

I appreciate you sharing a diff of the reference POUF. However, I don't think we should be merging that until the reference implementation is updated, could you remove that commit from this PR?
Aside: the updated POUF doesn't quite answer the questions I was expecting, either. From the TAP:

The exact algorithm for generating this Merkle tree (ie the order of nodes in the hash, how version information is encoded, etc.), is left to the implementer, but this
algorithm should be documented in a POUF

I currently need to read the PoC code and the POUF to understand how to generate the Merkle tree.


Following are some thoughts on considerations/work items once the PR is merged:

  • A reader has to understand Merkle trees to implement this TAP, for example to understand how to compute the root hash from a leaf node. Is that a concern? Should we link to a good overview of Merkle trees? Should we specify how this is expected to work?
    • are the metadata extensions for the Merkle tree truly abstract enough to support arbitrary (non-Merkle) tree algorithms? Should we PoC some other algorithms? perhaps in a different implementation?
  • Further PoC development of auditor integrations?
  • Recommendations on specifying the algorithms for a POUF
  • The Specification section suggests the snapshot Merkle tree replaces the single snapshot metadata file – is there any reason we shouldn't generate both? If we generate a Merkle tree only, should integrations still have a snapshot role with associated key? Should we explore this a bit more?
  • In places the TAP seems to suggest an auditor is required, whereas in others it indicates auditors are optional. Let's be sure to clarify.

tap16.md Show resolved Hide resolved
mnm678 and others added 11 commits February 3, 2021 10:36
Signed-off-by: marinamoore <mnm678@gmail.com>
The TUF spec uses the term repository to refer to the server that
stores updates.

Signed-off-by: marinamoore <mnm678@gmail.com>
Signed-off-by: marinamoore <mnm678@gmail.com>
Signed-off-by: marinamoore <mnm678@gmail.com>
Add attack descriptions and comparison to the existing TUF specification

Signed-off-by: marinamoore <mnm678@gmail.com>
Signed-off-by: marinamoore <mnm678@gmail.com>
Add clarifications based on feedback on the pr.

Signed-off-by: marinamoore <mnm678@gmail.com>
Co-authored-by: Joshua Lock <jlock@vmware.com>
This commit adds clarifications to:
* The size of snapshot merkle metadata
* Information included in the merkle metadata
* Fast forward attack recovery

Signed-off-by: Marina Moore <mnm678@gmail.com>
Signed-off-by: Marina Moore <mnm678@gmail.com>
Add a quick comparison to the existing specification
to each section of the security analysis to make it
clear what the differences are with and without auditors.

Signed-off-by: Marina Moore <mnm678@gmail.com>
Credit to @joshuagl for helping with the text here.
This commit makes the abstract shorter and more in line
with the requirements of TAP 1.

Signed-off-by: Marina Moore <mnm678@gmail.com>
Add a section that describes a few options for how clients
can verify that a Merkle tree has been verified by an auditor.

Signed-off-by: Marina Moore <mnm678@gmail.com>
@mnm678
Copy link
Contributor Author

mnm678 commented Feb 3, 2021

Thanks again for the review @joshuagl!

I appreciate you sharing a diff of the reference POUF. However, I don't think we should be merging that until the reference implementation is updated, could you remove that commit from this PR?
Aside: the updated POUF doesn't quite answer the questions I was expecting, either. From the TAP:

The exact algorithm for generating this Merkle tree (ie the order of nodes in the hash, how version information is encoded, etc.), is left to the implementer, but this
algorithm should be documented in a POUF

I currently need to read the PoC code and the POUF to understand how to generate the Merkle tree.

I moved the POUF changes to #133. I'll continue to iterate on this there.

Following are some thoughts on considerations/work items once the PR is merged:

I moved these to #134 for further discussion.

Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the continued iteration on this TAP Marina. I think this is ready for merging as a draft.

@mnm678 mnm678 merged commit 49dcf1d into theupdateframework:master Mar 1, 2021
@mnm678 mnm678 deleted the snapshot-merkle branch March 1, 2023 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants