Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reframe: cacheable GET endpoint for HTTP transport #287

Merged
merged 10 commits into from
Jul 7, 2022
Merged

Conversation

lidel
Copy link
Member

@lidel lidel commented Jun 7, 2022

Closes #282

Surgical spec tweak that aims to resolve caching problems described in #282, by providing both GET and POST endpoints (similar to json-rpc)

  • adding GET /reframe/{mbase64url-dag-cbor} endpoint for use cases where being able to leverage HTTP caching provides significant value

  • adding HTTP Caching Considerations section with

    • POST vs GET explainer
    • Etag for avoiding sending same response twice
    • Last-Modified as a mechanism for clients to force response freshness

    lmk if this would do

@aschmahmann @guseggert @willscott @thattommyhall @masih – this is by no means "final" – opened this PR to get the conversation started, looking for an acceptable solution.

TODO

  • TBD POST should return Content-Location with GET URL to ensure POST response is still cacheable, under correct GET URL
    • probably better to skip - in most cases this introduces more cost (dual serialization to dag-json and dag-cbort in GET URL) than we get in return from opportunistic cache hits
  • split transports to separate REFRAME_TRANSPORTS.md (and methods like "IPFS PUT" to REFRAME_METHODS.md ?)
  • make it clear that we require support for both GET and POST to be implemented – leaving it to implementation to decide when to use which based on Reframe method inside of message sent

Aims to resolve caching problem described in #282
Copy link
Contributor

@willscott willscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me as described

REFRAME.md Outdated Show resolved Hide resolved
REFRAME.md Outdated
Note: This version header is what allows the transport to more easily evolve over time (e.g. if it was desired to change the transport to support other encodings than DAG-JSON, utilize headers differently, move the request data from the body, etc.). Not including the version number is may lead to incompatibility with future versions of the transport.

#### HTTP Caching Considerations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have a discussion of expectations around Cache-Control header, like if max-age should be set?

Copy link
Member Author

@lidel lidel Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spec could propose some "safe default" for Cache-Control but don't know what it would be.

My understanding is that this Reframe spec defines a generic RPC, so it is difficult to reason how long a stale (cached) response is valid/useful – no idea what max-age makes sense when RPC messages are opaque.

I am open to suggestions. My hunch here is that we need Reframe methods to include caching hint (can response be cached, and if so, for how long) – see https://github.com/ipfs/specs/pull/287/files#r892840930

Proposed Last-Modified and If-Modified-Since only partially solves the problem: since the client knows what was in the request message, it can make a caching decision based on message type (do I need a response that is not older than 5 minutes, or is it ok to skip this header and have a response generated a week ago).

Copy link
Member Author

@lidel lidel Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willscott I assume you are ok with recent updates: Last-Modified being optional and Etag being mandatory.

We can add some guidance around Cache-Control with max-age TTL in a separate PR, but I think it would require having specific TTL be per cachable message type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems reasonable yeah. Figuring out some etag, like the hash/cid of the contents, seems like something that should basically always be possible.

REFRAME.md Outdated Show resolved Hide resolved
REFRAME.md Outdated Show resolved Hide resolved
REFRAME.md Outdated Show resolved Hide resolved
REFRAME.md Outdated
expects to handle the same message query multiple times, and want to leverage
existing HTTP tooling to maximize HTTP cache hits.

**Avoiding sending the same response messages twice**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is defined per method and not collectively right? Since some methods (e.g. the proposed #285 or the IPNS Put method) need to actually make their way to the endpoint for processing and response and will likely just respond with something like OK.

If so do we need to start marking the various reframe methods as get-like vs modify-like to help implementers know what to do here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caching in the context of RPC is really tricky.
This behavior can be customized per reframe method on the server side, but we need a caching policy for each method.

There are two types of methods that you need to mark.. somehow:

  • "safe to be cached for X amount of time"
    • example: message with providers for a popular CID could be cached for 5 minutes
    • HTTP response can include
      • Cache-Control: max-age={5m} providing a hint for CDN and clients to cache at least for 5 minutes
  • "should never be cached"
    • example: response to IPNS put
    • these should always be sent as POST or as GET with Cache-Control: no-cache (or If-Modified-Since set to the current timestamp, forcing cache-busting), and the response should also have Cache-Control: no-cache

REFRAME.md Outdated
Note: This version header is what allows the transport to more easily evolve over time (e.g. if it was desired to change the transport to support other encodings than DAG-JSON, utilize headers differently, move the request data from the body, etc.). Not including the version number is may lead to incompatibility with future versions of the transport.

#### HTTP Caching Considerations

**POST vs GET**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lidel I think you had some thoughts here we didn't get to cover synchronously. Could you help me understand what the user model should reasonably be if someone has a browser app that's able to utilize some Reframe process running on localhost rather than remotely?

If the methods come in via POST all seems fine, but if they can come in via GET then there's potential for abuse (e.g. a bad actor hurting your reputation by using your ID to provide many bogus provider records) due to browser CORS limitations.

The alternative here seems to be having the user put their auth tokens in the browser page and having that authenticate requests. Is the idea that this isn't realistically more burdensome than dealing with CORS and so this isn't a concern?

Copy link
Member Author

@lidel lidel Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aschmahmann
I think the high level insight is that we should not use CORS and POST as access control mechanism for general-purpose RPC. This applies to both localhost and public endpoints. We don't want to repeat mistakes made in go-ipfs RPC.

If you run reframe endpoint that does sensitive things, then we need either:

  • HTTPS endpoint to have auth (require Authorization header in requests, reject if missing)
  • DAG-JSON messages to have auth (require signature/token in RPC message, reject if missing)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we trying to keep HTTP-specific from non-HTTP-specific features of the protocol separate? If we plan to use reframe over other transports (like CBOR-over-libp2p) we should. If this is the case, then non-HTTP-specific authentication would be preferred.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an open question, with huge caching implications:

  • if we include auth inside a reframe message, it will change the GET /reframe/{inlined-message} path (because message is different for each user, or even per request if we include a once), making it impossible to cache responses for "reads"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, I agree that this becomes tricky. I don't think I necessarily mind "transports" having their own built-in auth mechanisms such as HTTP bearer tokens. It does mean a libp2p transport would either need to replicate that capability (more likely) or rely on other mechanisms like PeerIDs for auth.

This definitely effects caching basically because if we start adding user/per-request things to otherwise cacheable requests then server implementations will likely need fancier logic to do the caching than just relying on the content of the request itself.

…e GET/POST as used for cachable/non-cachable methods.
REFRAME.md Outdated Show resolved Hide resolved
REFRAME.md Outdated Show resolved Hide resolved
Copy link
Member Author

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should split Reframe specs to make clear separation between protocol, currently fleshed out methods, and currently fleshed out transports.

I'm thinking:

  • reframe/REFRAME_PROTOCOL.md - description of the generic RPC protocol
  • reframe/REFRAME_METHODS.md - currently 'standardized' methods
  • reframe/REFRAME_TRANSPORTS.md - list of available transports
  • IPIP/000N-reframe-rpc-protocol.md - (optional, but nice to have it a historical artifact) a short IPIP with references to our Motivation for Reframe (mostly links to existing discussions)

This way we have clear separation of concerns and specs don't leak transport or method abstractions upwards.

@aschmahmann lmk if it is ok for me to apply these (and below) changes

REFRAME.md Outdated Show resolved Hide resolved
REFRAME.md Outdated Show resolved Hide resolved
REFRAME.md Outdated Show resolved Hide resolved
lidel added a commit that referenced this pull request Jun 22, 2022
Implements split proposed in
#287 (review)

This decouples Transports and Known Methods from the generic RPC
protocol, and allows us to add more methods and transports without
touching the protocol itself.
@lidel
Copy link
Member Author

lidel commented Jun 22, 2022

Proposed split in #292 (PR against this one for easier review, same content)

@lidel lidel requested a review from aschmahmann June 24, 2022 13:21
@lidel
Copy link
Member Author

lidel commented Jun 24, 2022

@aschmahmann ok for me to merge #292 into this PR, so they don't get out of sync? 🙏

Implements split proposed in
#287 (review)

This decouples Transports and Known Methods from the generic RPC
protocol, and allows us to add more methods and transports without
touching the protocol itself.
Copy link
Contributor

@aschmahmann aschmahmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@petar could you take a look at the changes here to get an understanding of the implementation complexity in Edelweiss + go-delegated-routing to support the changes here (i.e. supporting POST for all requests and GET for some, as well as being able to mark some of the requests as cacheable so we know they can use GET)?

@aschmahmann aschmahmann requested a review from petar June 24, 2022 14:09
@lidel
Copy link
Member Author

lidel commented Jun 27, 2022

@petar the implementation gist is for reframe "server" to support responding to both GET and POST, and for "client" to send GET for cachable Reframe methods to leverage HTTP Caching where possible.

@petar
Copy link
Contributor

petar commented Jun 27, 2022

@aschmahmann @lidel I will follow up here. I'm traveling the next two days and have food poisoning on top. So I'll chime in when i'm back.

# Reframe Specifications

- [`REFRAME_PROTOCOL.md`](./REFRAME_PROTOCOL.md) ← start here
- [`REFRAME_KNOWN_METHODS.md`](./REFRAME_KNOWN_METHODS.md)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't "API" more standard terminology than "known methods"?

Copy link
Member Author

@lidel lidel Jul 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Document had "Known methods" section already, so I assumed it is similar terminology as things like https://www.w3.org/TR/did-spec-registries/#did-methods (anyone can come up with own reframe method, just like anyone can come up with DID method).

API is super vague, as it implies the HTTP endpoint specifics, and we want to keep transport specs separate.

@petar perhaps REFRAME_MESSAGE_TYPES.md is better? We already use "messages" all over the document.

Copy link
Member Author

@lidel lidel Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To unblock this PR from merging, I've kept "methods" but clarified it means "request-response message types" (#285)
We can refine it later.

reframe/REFRAME_HTTP_TRANSPORT.md Outdated Show resolved Hide resolved
reframe/REFRAME_HTTP_TRANSPORT.md Show resolved Hide resolved
@willscott
Copy link
Contributor

is this spec PR okay to merge? i'd like to rebase #285 into the reframe directory this makes, but it seems like it might be easier to wait for this to complete first if it isn't expected to take too much longer

@lidel
Copy link
Member Author

lidel commented Jul 4, 2022

I believe it is ok to merge: if anything changes, that will be editorial or adding more clarification and features, and not changing the underlying GET/POST approach.

We have sync tomorrow when US folks are back, let's wait until then. (cc @aschmahmann @petar @BigLep)

lidel added 2 commits July 5, 2022 21:32
Should remove confusion in contest of HTTP where "method" means "GET" or
"POST".
@lidel
Copy link
Member Author

lidel commented Jul 7, 2022

I am merging as per verbal with @aschmahmann :)

@lidel lidel merged commit d4f4ef1 into main Jul 7, 2022
@lidel lidel deleted the reframe-caching branch July 7, 2022 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

reframe: HTTP Caching support
4 participants