Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPIP-402: Partial CAR Support on Trustless Gateways #402

Merged
merged 21 commits into from
Jul 27, 2023
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions ipip-template.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,6 @@ interoperable implementations.
When modifying an existing specification file, this section should provide a
summary of changes. When adding new specification files, list all of them.

## Test fixtures

List relevant CIDs. Describe how implementations can use them to determine
specification compliance. This section can be skipped if IPIP does not deal
with the way IPFS handles content-addressed data, or the modified specification
file already includes this information.

## Design rationale

The rationale fleshes out the specification by describing what motivated
Expand All @@ -67,6 +60,13 @@ Explain the security implications/considerations relevant to the proposed change

Describe alternate designs that were considered and related work.

## Test fixtures

List relevant CIDs. Describe how implementations can use them to determine
specification compliance. This section can be skipped if IPIP does not deal
with the way IPFS handles content-addressed data, or the modified specification
file already includes this information.

### Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
19 changes: 13 additions & 6 deletions src/http-gateways/path-gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ editors:
url: https://hacdias.com/
xref:
- url
- trustless-gateway
tags: ['httpGateways', 'lowLevelHttpGateways']
order: 0
---
Expand Down Expand Up @@ -214,11 +215,13 @@ These are the equivalents:
- `format=cbor` → `Accept: application/cbor`
- `format=ipns-record` → `Accept: application/vnd.ipfs.ipns-record`

<!-- TODO Planned: https://github.com/ipfs/go-ipfs/issues/8769
- `selector=<cid>` can be used for passing a CID with [IPLD selector](https://ipld.io/specs/selectors)
- Selector should be in dag-json or dag-cbor format
- This is a powerful primitive that allows for fetching subsets of data in specific order, either as raw bytes, or a CAR stream. Think “HTTP range requests”, but for IPLD, and more powerful.
-->
### `dag-scope` (request query parameter)

Only used on CAR requests, same as :ref[dag-scope] from :cite[trustless-gateway].

### `entity-bytes` (request query parameter)

Only used on CAR requests, same as :ref[entity-bytes] from :cite[trustless-gateway].

# HTTP Response

Expand Down Expand Up @@ -576,7 +579,11 @@ The following response types require an explicit opt-in, can only be requested w
- Raw Block (`?format=raw`)
- Opaque bytes, see [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw).
- CAR (`?format=car`)
lidel marked this conversation as resolved.
Show resolved Hide resolved
- Arbitrary DAG as a verifiable CAR file or a stream, see [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car).
- A CAR file or a stream that contains all blocks required to trustlessly verify the requested content path query, see [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) and :cite[trustless-gateway].
- **Note:** by default, block order in CAR response is not deterministic,
blocks can be returned in different order, depending on implementation
choices (traversal, speed at which blocks arrive from the network, etc).
An opt-in ordered CAR responses MAY be introduced in a future IPIP.
- TAR (`?format=tar`)
- Deserialized UnixFS files and directories as a TAR file or a stream, see :cite[ipip-0288].
- IPNS Record
Expand Down
160 changes: 151 additions & 9 deletions src/http-gateways/trustless-gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: >
Trustless Gateways are a minimal subset of Path Gateways that allow light IPFS
clients to retrieve data behind a CID and verify its integrity without delegating any
trust to the gateway itself.
date: 2023-03-30
date: 2023-04-17
maturity: reliable
editors:
- name: Marcin Rataj
Expand All @@ -17,25 +17,33 @@ tags: ['httpGateways', 'lowLevelHttpGateways']
order: 1
---

Trustless Gateway is a minimal _subset_ of :cite[path-gateway]
Trustless Gateway is a _subset_ of :cite[path-gateway]
that allows light IPFS clients to retrieve data behind a CID and verify its
integrity without delegating any trust to the gateway itself.

The minimal implementation means:

- data is requested by CID, only supported path is `/ipfs/{cid}`
- no path traversal or recursive resolution, no UnixFS/IPLD decoding server-side
- response type is always fully verifiable: client can decide between a raw block or a CAR stream
- no UnixFS/IPLD deserialization
- for CAR files:
- the behavior is identical to :cite[path-gateway]
- for raw blocks:
- data is requested by CID, only supported path is `/ipfs/{cid}`
- no path traversal or recursive resolution

# HTTP API

A subset of "HTTP API" of :cite[path-gateway].

## `GET /ipfs/{cid}[?{params}]`
## `GET /ipfs/{cid}[/{path}][?{params}]`

Downloads data at specified CID.
Downloads verifiable data for the specified **immutable** content path.

## `HEAD /ipfs/{cid}[?{params}]`
Optional `path` is permitted for requests that specify CAR format (`application/vnd.ipld.car`).

For RAW requests, only `GET /ipfs/{cid}[?{params}]` is supported.

## `HEAD /ipfs/{cid}[/{path}][?{params}]`

Same as GET, but does not return any payload.

Expand All @@ -45,13 +53,13 @@ Downloads data at specified IPNS Key. Verifiable :cite[ipns-record] can be reque

## `HEAD /ipns/{key}[?{params}]`

same as GET, but does not return any payload.
Same as GET, but does not return any payload.

lidel marked this conversation as resolved.
Show resolved Hide resolved
# HTTP Request

Same as in :cite[path-gateway], but with limited number of supported response types.

## HTTP Request Headers
## Request Headers

### `Accept` (request header)

Expand All @@ -66,12 +74,146 @@ Below response types MUST to be supported:
- [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) – disables IPLD/IPFS deserialization, requests a verifiable CAR stream to be returned
- [application/vnd.ipfs.ipns-record](https://www.iana.org/assignments/media-types/application/vnd.ipfs.ipns-record) – requests a verifiable :cite[ipns-record] (multicodec `0x0300`).

## Request Query Parameters

### :dfn[dag-scope] (request query parameter)

Optional, `dag-scope=(block|entity|all)` with default value `all`, only available for CAR requests.

Describes the shape of the DAG fetched the terminus of the specified path whose blocks
are included in the returned CAR file after the blocks required to traverse
path segments.

- `block` - Only the root block at the end of the path is returned after blocks
required to verify the specified path segments.

- `entity` - For queries that traverse UnixFS data, `entity` roughly means return
blocks needed to verify the terminating element of the requested content path.
For UnixFS, all the blocks needed to read an entire UnixFS file, or enumerate a UnixFS directory.
For all queries that reference non-UnixFS data, `entity` is equivalent to `block`

- `all` - Transmit the entire contiguous DAG that begins at the end of the path
query, after blocks required to verify path segments

When present, returned `Etag` must include unique prefix based on the passed scope type.

### :dfn[entity-bytes] (request query parameter)

Optional, `entity-bytes=from:to` with the default value `0:*`, only available for CAR requests.
Serves as a trustless form of an HTTP Range Request.

When the terminating entity at the end of the specified content path can be
interpreted as a continuous array of bytes (such as a UnixFS file), returns
only the minimal set of blocks required to verify the specified byte range of
said entity.

Allowed values for `from` and `to` are positive integers where `to` >= `from`, which
lidel marked this conversation as resolved.
Show resolved Hide resolved
limit the return blocks to needed to satisfy the range `[from,to]`:

- `from` value gives the byte-offset of the first byte in a range.
- `to` value gives the byte-offset of the last byte in the range; that is,
the byte positions specified are inclusive. Byte offsets start at zero.
lidel marked this conversation as resolved.
Show resolved Hide resolved

If the entity at the end of the path cannot be interpreted as a continuous
array of bytes (such as a DAG-CBOR/JSON map, or UnixFS directory), this
parameter has no effect.

The following additional values are supported:

- `*` can be substituted for end-of-file
- `entity-bytes=0:*` is the entire file (a verifiable version of HTTP request for `Range: 0-`)
- Negative numbers can be used for referring to bytes from the end of a file
lidel marked this conversation as resolved.
Show resolved Hide resolved
- `entity-bytes=-1024:*` is the last 1024 bytes of a file
(verifiable version of HTTP request for `Range: -1024`)
- It is also permissible (unlike with HTTP Range Requests) to ask for the
range of 500 bytes from the beginning of the file to 1000 bytes from the
end: `entity-bytes=499:-1000`

Copy link
Member

@rvagg rvagg May 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When the entity is a sharded UnixFS file, and the `from` value is _not_ `0`, the
claimed encoded size (`Tsize`) values of the individual shards that make up the
file will be implicitly trusted to determine the block that contains that
offset. Because this cannot be fully verified for correctness without having all
blocks from the start of the file content, consumers of this data must trust
that the producer of the data has properly encoded the blocks of the UnixFS
sharded file. If a stronger guarantees are required, byte ranges should be
avoided, or should be anchored with a `from` of `0`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh look! Real-world use proving this problem: ipfs/js-ipfs-unixfs#335

Copy link
Contributor

@aschmahmann aschmahmann May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvagg dag-pb TSize has nothing to do with entity-bytes at all. The UnixFS filesize and blocksizes (protobuf elements 3 and 4 of the Data field of the dag-pb nodes for UnixFS files) does.

AFAIK the only thing TSize is potentially used by for anyone in the context of the HTTP Gateway API is in the context of the trusted component of the API and giving rough estimates of file/directory sizes in directory HTMLs when the gateway doesn't want to use resources to grab the first block of each directory element to figure out if it's a file/directory and for files what the filesize is. However, what goes into a directory HTML file is non-normative (see

## Generated HTML with directory index
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, different value, same problem though; the value is encoded by the producer of the blocks and has to be implicitly trusted to get the right byte range

Copy link
Contributor

@aschmahmann aschmahmann May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, the issue with malformed graphs still exists along with the trust problems you mentioned. It shows up here, but it also likely belongs as an implementers note on the trusted gateway spec as well.

I was just calling out that:

  1. TSize is the wrong field
  2. Unlike TSize filesize and blocksizes have actual definitions which means when they're not obeyed the UnixFS graph is malformed unlike the argument about TSize where people seem more hesitant to give it an actual definition (although it seems to have been defined as cumulative graph size since day 1 ipfs/go-merkledag@63e4477#diff-10837cc3557cec0045183193a03f17af589591f2be0753262027534bb8f64ad9R38-R39)

Edit: Also, IIUC because filesize and blocksizes have actual definitions implementations don' t actually have to fall into this trap (even if common ones today do). As long as ReadAll(<UnixFS file cid>)[X:Y] matches Read(<UnixFS file cid>, X,Y) this problem goes away. So this isn't really a spec issue here, or necessarily in UnixFS (although notes to implementers seem reasonable), since these issues only happen with malformed UnixFS data.

Copy link
Member

@lidel lidel Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added filesize / blocksizes note in 0953cb6, but agree, the implicit trust in DAG being correctly created is a concern beyond this IPIP or even this API.

It is the same class of problems as "I do trust data match the CID i requested, but where do I get trusted CIDs from?" – IMO beyond the scope of this IPIP.

When present, returned `Etag` must include unique prefix based on the passed range.

# HTTP Response

Below MUST be implemented **in addition** to "HTTP Response" of :cite[path-gateway].

## HTTP Response Headers

### `Content-Type` (response header)

MUST be returned and include additional format-specific parameters when possible.

If a CAR stream was requested, the response MUST include the parameter specifying CAR version.
For example: `Content-Type: application/vnd.ipld.car; version=1`

### `Content-Disposition` (response header)

MUST be returned and set to `attachment` to ensure requested bytes are not rendered by a web browser.

## HTTP Response Payload

### Block Response

An opaque bytes matching the requested block CID
([application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw)).

The Body hash MUST match the Multihash from the requested CID.

### CAR Response

A CAR stream
([application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car))
for the requested content type, path and optional `dag-scope` and `entity-bytes` URL parameters.

:::note

By default, block order in CAR response is not deterministic, blocks can
be returned in different order, depending on implementation choices (traversal,
speed at which blocks arrive from the network, etc). An opt-in ordered CAR
responses MAY be introduced in a future, see [IPIP-412](https://github.com/ipfs/specs/pull/412).

:::

#### CAR version

Value returned in `CarV1Header.version` struct MUST match the `version`
parameter returned in `Content-Type` header

#### CAR roots

:::issue

TODO: we need to specify expectations about what should be returned in
[`CarV1Header.roots`](https://ipld.io/specs/transport/car/carv1/#header).
lidel marked this conversation as resolved.
Show resolved Hide resolved

##### Option A: always empty

If the response uses version 1 or 2 of the CAR spec, the
[`CarV1Header.roots`](https://ipld.io/specs/transport/car/carv1/#header) struct
MUST be empty.

##### Option B: only CID of the terminating element

If the response uses version 1 or 2 of the CAR spec, the
[`CarV1Header.roots`](https://ipld.io/specs/transport/car/carv1/#header) struct
MUST contain CID of the terminating entity.

##### Option C: only CIDs of fully returned DAGs

If the response uses version 1 or 2 of the CAR spec, the
[`CarV1Header.roots`](https://ipld.io/specs/transport/car/carv1/#header) struct
MUST be either empty, or only contain CIDs of complete DAGs present in the response.

CIDs from partial DAGs, such as parent nodes on the path, or terminating
element returned with `dag-scope=block`, or UnixFS directory returned with
`dag-scope=entity` MUST never be returned in the `CarV1Header.roots` list, as
they may cause overfetching on systems that perform recursive pinning of DAGs
listed in `CarV1Header.roots`.

##### Option D: CIDs for all logical path segments (same as X-Ipfs-Roots)

If the response uses version 1 or 2 of the CAR spec, the
[`CarV1Header.roots`](https://ipld.io/specs/transport/car/carv1/#header) struct
MUST contain all the logical roots related to the requested content path.

The CIDs here MUST be the same as ones in `X-Ipfs-Roots` header.

:::
Loading