diff --git a/README.md b/README.md index feca40ff..abf259a6 100644 --- a/README.md +++ b/README.md @@ -4,23 +4,27 @@

Project Oak Logo

The Transparent Release project builds on the idea of binary transparency and extends it in a few -directions, as described below. +directions, as described below. ## Release Transparency The goal of release transparency is to generate and transparently publish non-forgeable metadata about a released binary. We refer to this kind of metadata as the endorsement statement. It can be -used in scenarios where the identity of a binary must be verified before use. One prominent use -case is remote attestation. See [Oak’s remote attestation protocol](https://github.com/project-oak/oak/blob/main/docs/remote-attestation.md) for instance. +used in scenarios where the identity of a binary must be verified before use. One prominent use case +is remote attestation. See +[Oak’s remote attestation protocol](https://github.com/project-oak/oak/blob/main/docs/remote-attestation.md) +for instance. The life cycle of an endorsement statement has three phases as shown in the diagram below: ![The journey of a software binary](docs/images/release-transparency.png) ### The pre-release phase -In this phase, a trusted builder, for instance [the generic slsa-generator GitHub workflow](https://github.com/slsa-framework/slsa-github-generator/blob/de4491844e9be4184f786666af40f5b1b8e7ddc0/internal/builders/generic/README.md), -generates a binary and a corresponding SLSA provenance statement. The trusted builder also signs -the generated SLSA provenance statement. A provenance statement is another type of metadata about a + +In this phase, a trusted builder, for instance +[the generic slsa-generator GitHub workflow](https://github.com/slsa-framework/slsa-github-generator/blob/de4491844e9be4184f786666af40f5b1b8e7ddc0/internal/builders/generic/README.md), +generates a binary and a corresponding SLSA provenance statement. The trusted builder also signs the +generated SLSA provenance statement. A provenance statement is another type of metadata about a binary that describes how and from which sources the binary was generated. Both the SLSA provenance statements, and our endorsement statements are customizations of the in-toto statements standard, and refer to one or more binaries as the subject of the statement. A binary can be uniquely @@ -28,60 +32,65 @@ identified by its cryptographic hash. The cryptographic hash function often used binary is SHA256. For details about the format of the provenance statement please refer to [the SLSA provenance documentation](https://slsa.dev/provenance/v0.2). -We need provenance statements to be signed and published in a transparency log. This log, in case -of the generic slsa-generator GitHub workflow, is an instance of Rekor hosted by sigstore. The -binary is often uploaded to a storage server for future use. This can be content-addressable storage -such as [ent](https://github.com/google/ent). - -To support the pre-release phase we provide the [cmd/builder](/cmd/README.md#building-binaries-using-the-cmdbuilder-tool) tool and a [guide on how to get started on your repo](https://github.com/project-oak/hello-transparent-release). +We need provenance statements to be signed and published in a transparency log. This log, in case of +the generic slsa-generator GitHub workflow, is an instance of Rekor hosted by sigstore. The binary +is often uploaded to a storage server for future use. This can be content-addressable storage such +as [ent](https://github.com/google/ent). +To support the pre-release phase we provide the +[cmd/builder](/cmd/README.md#building-binaries-using-the-cmdbuilder-tool) tool and a +[guide on how to get started on your repo](https://github.com/project-oak/hello-transparent-release). ### The release phase + A binary is often released when some release-readiness criteria are met. Evaluating -release-readiness criteria often requires manual effort. As a result, triggering the release -process is often manual as well. To transparently document the fact that a binary meets the +release-readiness criteria often requires manual effort. As a result, triggering the release process +is often manual as well. To transparently document the fact that a binary meets the release-readiness criteria and is endorsed for use, in the release phase, an endorsement statement -is generated, signed, and published transparently. +is generated, signed, and published transparently. An endorsement statement specifies that a binary, identified by its cryptographic hash, is endorsed for use for a specific time range. An endorsement statement is a special type of claim in our -design. For more information about the format of an endorsement statement see [the claim format](docs/claim-transparency.md#the-claim-format) -and [this example endorsement](schema/claim/v1/example.json). +design. For more information about the format of an endorsement statement see +[the claim format](docs/claim-transparency.md#the-claim-format) and +[this example endorsement](schema/claim/v1/example.json). Endorsement statements can be generated using a tool that we call _endorser_. Given a binary, a non-empty list of its provenances, and a validity time range, the endorser generates an endorsement statement. The release team has to separately sign and publish the endorsement statement in a -transparency log. +transparency log. Before generating the endorsement statement, the endorser performs a number of verification steps. Ideally the verification logic should be flexible enough to allow verifying arbitrary conditions (including some or all of the release-readiness criteria). However, currently we have only a minimal implementation that verifies the presence and correctness of at least one provenance statement for -the binary. +the binary. The endorser must be fed with any required information to perform the verification. In its simplest form, and as shown in the diagram, the binary and its provenance are fed as inputs to the endorser. Strictly speaking, the binary itself is not necessary as an input. What we actually need is the -cryptographic hash of the binary. The input provenances used during the verification are included -as evidence in the generated endorsement statement. The endorsement statement format allows -including different types of evidence, but with the minimal verification, the only type of evidence -is a provenance statement. Note that while provenances are often signed by the trusted builder -generating the binary, an endorsement statement must be signed by the team releasing the binary. +cryptographic hash of the binary. The input provenances used during the verification are included as +evidence in the generated endorsement statement. The endorsement statement format allows including +different types of evidence, but with the minimal verification, the only type of evidence is a +provenance statement. Note that while provenances are often signed by the trusted builder generating +the binary, an endorsement statement must be signed by the team releasing the binary. It is possible to provide multiple provenances, generated and signed by different trusted builders, as input to the endorser. This provides stronger trust on the claim provided by the provenance statements. In such situations, all provenances must be included in the list of evidence in the -endorsement statement. +endorsement statement. #### Passive Revocation + The most important piece of information that distinguishes an endorsement statement from a provenance statement is the validity time range. Since endorsement statements are published in a transparency log, and therefore cannot be removed or modified, a mechanism is required for revoking them once they no longer are valid (e.g., once a security vulnerability is discovered in the endorsed binary). Revocation is often implemented by specifying a validity time range. This -mechanism is called passive revocation. +mechanism is called passive revocation. ### The post-release phase + After a binary is released, and an endorsement statement is generated and transparently published, the endorsement statement can be used wherever trust in the binary, its validity, and its origin is required. One such use case is remote attestation. In this scenario, an endorsement statement is @@ -96,21 +105,20 @@ An endorsement statement is in essence a claim about a binary. A wide range of c endorsements, can be imagined about binaries. For instance, any release-readiness criteria can be stated in terms of a claim about the binary. [Claim Transparency](docs/claim-transparency.md) provides a generic and customizable format for specifying claims and exemplifies a few types of -security and privacy claims about binaries. +security and privacy claims about binaries. ## Policy Transparency -Policy transparency builds on top of claim transparency, and allows combining claims, according to -a policy, to either make a decisions (e.g., to trust a binary for a specific purpose or not), or -generate a summary of a large number of claims. Summary claims allow reusing the verification -effort across invocations. + +Policy transparency builds on top of claim transparency, and allows combining claims, according to a +policy, to either make a decisions (e.g., to trust a binary for a specific purpose or not), or +generate a summary of a large number of claims. Summary claims allow reusing the verification effort +across invocations. There is a duality between claims and policies. A claim about a binary may imply that a specific policy was followed when releasing a binary. On the other hand, a policy can specify if a claim about a binary can be believed or trusted. More specifically, policies can be used for stating release-readiness criteria, or for specifying conditions for accepting a binary, for instance in a -remote attestation scenario. - -Read more about Policy Transparency in [Policy Transparency: Authorization Logic Meets General Transparency to Prove Software Supply Chain Integrity](https://research.google/pubs/pub51673/). - - +remote attestation scenario. +Read more about Policy Transparency in +[Policy Transparency: Authorization Logic Meets General Transparency to Prove Software Supply Chain Integrity](https://research.google/pubs/pub51673/). diff --git a/docs/claim-transparency.md b/docs/claim-transparency.md index bc4d994f..dd87986f 100644 --- a/docs/claim-transparency.md +++ b/docs/claim-transparency.md @@ -3,9 +3,9 @@ The following diagram shows the journey that software takes from code to a deployable application used by an end user (either as an application deployed locally or as a remote server). During this process several artifacts (e.g., code, software libraries, and binaries) are generated and -transformed into another (e.g., through compiling and linking). The premise of software supply -chain security is that many things could go wrong during this process, either due to human error -or attacks on the software supply chain by malicious actors. +transformed into another (e.g., through compiling and linking). The premise of software supply chain +security is that many things could go wrong during this process, either due to human error or +attacks on the software supply chain by malicious actors. ![The journey of a software binary](images/journey.png) @@ -13,13 +13,15 @@ To provide assurances to the end users about the security and privacy of a softw the transparent release project our goal is to provide transparency into the build and release processes. In our solution, in each step of the process software artifacts are being reviewed and audited and the result, a claim about the security and privacy properties of the software artifact, -is signed and published into a [transparency log](https://continusec.com/static/VerifiableDataStructures.pdf). -The goal is to make these claims easily discoverable so that anyone can use the claims in the -assessment of privacy policies. +is signed and published into a +[transparency log](https://continusec.com/static/VerifiableDataStructures.pdf). The goal is to make +these claims easily discoverable so that anyone can use the claims in the assessment of privacy +policies. ## The Claim Format -We define the following structure, based on the [in-toto Statement standard](https://github.com/in-toto/attestation/blob/main/spec/README.md#statement), +We define the following structure, based on the +[in-toto Statement standard](https://github.com/in-toto/attestation/blob/main/spec/README.md#statement), for specifying security and privacy claims. This format is meant to be generic and allow specifying different types of claims. @@ -47,6 +49,7 @@ different types of claims. } } ``` + Section [Examples](##Examples) demonstrates the customization and use of the claim format via a number of examples. @@ -54,26 +57,31 @@ number of examples. This section describes the semantics of each field in the claim format: -- **subject** _(array of objects, required)_: - Set of artifacts (e.g., source code, or some binary) that the claim applies to. +- **subject** _(array of objects, required)_: Set of artifacts (e.g., source code, or some binary) + that the claim applies to. - **subject[*].digest** and **subject[*].name** as defined by Statement in the in-toto standard. -- **claimType** _(string ([TypeURI](https://github.com/in-toto/attestation/blob/main/spec/field_types.md#TypeURI)), required)_: - URI indicating what type of claim was issued. It determines the meaning of claimSpec and evidence below. -- **issuedOn** _(string ([Timestamp](https://github.com/in-toto/attestation/blob/main/spec/field_types.md#Timestamp)), required)_: - The timestamp at which this claims was generated. -- **validity** _(object, required)_: - Validity duration of the claim. It enables implementing passive revocation. - - - **validity.notBefore** _(string ([Timestamp](https://github.com/in-toto/attestation/blob/main/spec/field_types.md#Timestamp)), required)_: - The timestamp from which the claim is effective, and the artifact is endorsed for use. Must be - equal or after the issuedOn timestamp. - - **validity.notAfter** _(string ([Timestamp](https://github.com/in-toto/attestation/blob/main/spec/field_types.md#Timestamp)), required)_: - The timestamp of when the artifact is no longer endorsed for use. - -- **claimSpec** _(object, optional)_: - Gives a detailed description of the claim, and the steps that were taken to perform the assessment - of the artifact in the subject. This is an arbitrary JSON object with a schema defined by - claimType. Depending on the claimType, the claimSpec could be anything, including: +- **claimType** _(string + ([TypeURI](https://github.com/in-toto/attestation/blob/main/spec/field_types.md#TypeURI)), + required)_: URI indicating what type of claim was issued. It determines the meaning of claimSpec + and evidence below. +- **issuedOn** _(string + ([Timestamp](https://github.com/in-toto/attestation/blob/main/spec/field_types.md#Timestamp)), + required)_: The timestamp at which this claims was generated. +- **validity** _(object, required)_: Validity duration of the claim. It enables implementing passive + revocation. + + - **validity.notBefore** _(string + ([Timestamp](https://github.com/in-toto/attestation/blob/main/spec/field_types.md#Timestamp)), + required)_: The timestamp from which the claim is effective, and the artifact is endorsed for + use. Must be equal or after the issuedOn timestamp. + - **validity.notAfter** _(string + ([Timestamp](https://github.com/in-toto/attestation/blob/main/spec/field_types.md#Timestamp)), + required)_: The timestamp of when the artifact is no longer endorsed for use. + +- **claimSpec** _(object, optional)_: Gives a detailed description of the claim, and the steps that + were taken to perform the assessment of the artifact in the subject. This is an arbitrary JSON + object with a schema defined by claimType. Depending on the claimType, the claimSpec could be + anything, including: - A free-text description of the claim and the review/audit process. A certain type of claim with a more detailed schema for claimSpec may explicitly capture such details as the scope, @@ -87,10 +95,9 @@ This section describes the semantics of each field in the claim format: - An auto-generated report, for instance a fuzz testing report from ClusterFuzz. - A [datasheet about a dataset](https://arxiv.org/abs/1803.09010). -- **evidence** _(array of objects, optional)_: - The collection of artifacts that were generated during the assessment to support the claim, or - existing claims that were assumed to be true, and were used as input to the assessment process. - Some examples of evidence include: +- **evidence** _(array of objects, optional)_: The collection of artifacts that were generated + during the assessment to support the claim, or existing claims that were assumed to be true, and + were used as input to the assessment process. Some examples of evidence include: - Provenance - Reports from executed test suites @@ -98,52 +105,42 @@ This section describes the semantics of each field in the claim format: - Audits of earlier versions of the same artifact (e.g., source code). For instance if an earlier version had a rigorous external audit, for a new revision, the audit/review could focus on the diff (e.g., cargo has a feature for it: `review --diff`). A suite of regression tests or - security analysis tools dedicated to checking specific security properties, could be very - useful in such cases. + security analysis tools dedicated to checking specific security properties, could be very useful + in such cases. The reliance on the evidence is not quantified. So there is not a field for stating the level of - trustworthiness or relevance for a piece of evidence. Instead, all included pieces of evidence - are treated the same. Note that claimSpec may still distinguish between them based on their roles. - - - **evidence[*].role** _(string, required)_: - This field is used to specify the type and role of the evidence within the claim. The meaning - of it is specified by claimType and within the context of claimSpec. - - **evidence[*].uri** _(string ([ResourceURI](https://github.com/in-toto/attestation/blob/main/spec/field_types.md#ResourceURI)), required)_: - An evidence could be another claim (possibly of another claimType) or a report publicly - available from a URI. Either way, the URI should be provided in this field. - - **evidence[*].digest** _(object ([DigestSet](https://github.com/in-toto/attestation/blob/main/spec/field_types.md#DigestSet)), required)_: - Collection of cryptographic digests for the contents of this artifact. + trustworthiness or relevance for a piece of evidence. Instead, all included pieces of evidence are + treated the same. Note that claimSpec may still distinguish between them based on their roles. + + - **evidence[*].role** _(string, required)_: This field is used to specify the type and role of + the evidence within the claim. The meaning of it is specified by claimType and within the + context of claimSpec. + - **evidence[*].uri** _(string + ([ResourceURI](https://github.com/in-toto/attestation/blob/main/spec/field_types.md#ResourceURI)), + required)_: An evidence could be another claim (possibly of another claimType) or a report + publicly available from a URI. Either way, the URI should be provided in this field. + - **evidence[*].digest** _(object + ([DigestSet](https://github.com/in-toto/attestation/blob/main/spec/field_types.md#DigestSet)), + required)_: Collection of cryptographic digests for the contents of this artifact. ## Comparison to the SLSA provenance format The following table shows the correspondence between the fields in a claim statement as described -above, and a [SLSA provenance statement](https://slsa.dev/provenance/v0.2). Note that the table -does not provide a correspondence between all fields. Rather, the goal is to show that the two -formats follow the same design principles. In particular, to support flexibility, via -buildType/buildConfig, and claimType/claimSpec; and to allow linking of related materials/evidence. -The table does not intend to suggest that one format could replace the other, as the two formats -are conceptually different. For instance, the SLSA provenance format has an invocations field, -which is meaningless if the format were to be used for specifying a security or privacy claim. -Builder and buildConfig are other fields that are irrelevant to security or privacy claims. -Similarly the field names in the schema suggested for claims are meaningless in the context of a -provenance statement. - -| Field in a Claim statement | Field in a SLSA provenance | Comments | -|:----------------|:---------------|:-----------------------------------------------------------| -| claimType | buildType | Both define the meanings of the other fields in the predicate.| -| claimSpec | buildConfig | Both provide a flexible way of supporting different types of content (claims, and build processes).| -| evidence | materials | Optional list of (a subset of ) additional artifacts that influenced the statement. | - -## Comparison to RATS - -The Remote ATtestation procedureS (RATS) working group has provided an [architecture](https://datatracker.ietf.org/doc/html/draft-ietf-rats-architecture) -and glossary of concepts related to remote attestation. [This cheatsheet](https://github.com/thomas-fossati/rats-cheatsheet) -and [this slides deck](https://confidentialcomputing.io/wp-content/uploads/sites/85/2021/09/IETF-Remote-Attestation-Architecture-Overview.pdf) -give an overview of the architecture and the main concepts. RATS has many concepts similar -to the ones in our design, but seems to be focused on claims and evidence that are generated and -consumed automatically. Claims and evidence in RATS are designed to be used for remote attestation. -The claims in our binary transparency ecosystem, however, are not limited to the ones used for -remote attestation. We target a wider range of use cases. +above, and a [SLSA provenance statement](https://slsa.dev/provenance/v0.2). Note that the table does +not provide a correspondence between all fields. Rather, the goal is to show that the two formats +follow the same design principles. In particular, to support flexibility, via buildType/buildConfig, +and claimType/claimSpec; and to allow linking of related materials/evidence. The table does not +intend to suggest that one format could replace the other, as the two formats are conceptually +different. For instance, the SLSA provenance format has an invocations field, which is meaningless +if the format were to be used for specifying a security or privacy claim. Builder and buildConfig +are other fields that are irrelevant to security or privacy claims. Similarly the field names in the +schema suggested for claims are meaningless in the context of a provenance statement. + +| Field in a Claim statement | Field in a SLSA provenance | Comments | +| :------------------------- | :------------------------- | :-------------------------------------------------------------------------------------------------- | +| claimType | buildType | Both define the meanings of the other fields in the predicate. | +| claimSpec | buildConfig | Both provide a flexible way of supporting different types of content (claims, and build processes). | +| evidence | materials | Optional list of (a subset of ) additional artifacts that influenced the statement. | ## Examples @@ -190,9 +187,9 @@ include a reference to a Rekor log entry corresponding to the provenance. ``` A more sophisticated claimType for endorsements would have a non-empty claimSpec, containing a -specification of the policy that was checked before issuing the endorsement statement. -Authorization logic is a good candidate for providing a specification of such a policy. In this -case the tool that verified the policy and generated the claim will as well sign the claim. +specification of the policy that was checked before issuing the endorsement statement. Authorization +logic is a good candidate for providing a specification of such a policy. In this case the tool that +verified the policy and generated the claim will as well sign the claim. ```json { @@ -320,3 +317,33 @@ shown in the following example. } } ``` + +## Related work and standards + +The concepts used and introduced in this repository rely heavily on the in-toto standard, and the +SLSA framework. However, there are other similar efforts in the open-source and security community +that we try to be consistent with. The most notable of these are +[SCITT](https://github.com/ietf-scitt) and RATS. + +### Comparison to RATS + +The Remote ATtestation procedureS (RATS) working group has provided an +[architecture](https://datatracker.ietf.org/doc/html/draft-ietf-rats-architecture) and glossary of +concepts related to remote attestation. +[This cheatsheet](https://github.com/thomas-fossati/rats-cheatsheet) and +[this slides deck](https://confidentialcomputing.io/wp-content/uploads/sites/85/2021/09/IETF-Remote-Attestation-Architecture-Overview.pdf) +give an overview of the architecture and the main concepts. RATS has many concepts similar to the +ones in our design, but seems to be focused on claims and evidence that are generated and consumed +automatically. Claims and evidence in RATS are designed to be used for remote attestation. The +claims in our binary transparency ecosystem, however, are not limited to the ones used for remote +attestation. We target a wider range of use cases, e.g., auto-generated Fuzz claims. + +In addition, the concept of +[Concise Reference Integrity Manifest (CoRIM)](https://datatracker.ietf.org/doc/draft-ietf-rats-corim/) +provided by the RATS working group is very similar to a `Claim` as described in this document. + +TODO(#222): Summarize the comparison. + +### Comparison to SCITT + +TODO(#222): TBA. diff --git a/docs/images/release-transparency.png b/docs/images/release-transparency.png index 8e793334..1ed02592 100644 Binary files a/docs/images/release-transparency.png and b/docs/images/release-transparency.png differ