-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reframe: cacheable GET endpoint for HTTP transport #287
Conversation
Aims to resolve caching problem described in #282
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable to me as described
REFRAME.md
Outdated
Note: This version header is what allows the transport to more easily evolve over time (e.g. if it was desired to change the transport to support other encodings than DAG-JSON, utilize headers differently, move the request data from the body, etc.). Not including the version number is may lead to incompatibility with future versions of the transport. | ||
|
||
#### HTTP Caching Considerations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have a discussion of expectations around Cache-Control
header, like if max-age
should be set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This spec could propose some "safe default" for Cache-Control
but don't know what it would be.
My understanding is that this Reframe spec defines a generic RPC, so it is difficult to reason how long a stale (cached) response is valid/useful – no idea what max-age
makes sense when RPC messages are opaque.
I am open to suggestions. My hunch here is that we need Reframe methods to include caching hint (can response be cached, and if so, for how long) – see https://github.com/ipfs/specs/pull/287/files#r892840930
Proposed Last-Modified
and If-Modified-Since
only partially solves the problem: since the client knows what was in the request message, it can make a caching decision based on message type (do I need a response that is not older than 5 minutes, or is it ok to skip this header and have a response generated a week ago).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willscott I assume you are ok with recent updates: Last-Modified
being optional and Etag
being mandatory.
We can add some guidance around Cache-Control
with max-age
TTL in a separate PR, but I think it would require having specific TTL be per cachable message type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems reasonable yeah. Figuring out some etag, like the hash/cid of the contents, seems like something that should basically always be possible.
REFRAME.md
Outdated
expects to handle the same message query multiple times, and want to leverage | ||
existing HTTP tooling to maximize HTTP cache hits. | ||
|
||
**Avoiding sending the same response messages twice** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is defined per method and not collectively right? Since some methods (e.g. the proposed #285 or the IPNS Put method) need to actually make their way to the endpoint for processing and response and will likely just respond with something like OK
.
If so do we need to start marking the various reframe methods as get-like vs modify-like to help implementers know what to do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caching in the context of RPC is really tricky.
This behavior can be customized per reframe method on the server side, but we need a caching policy for each method.
There are two types of methods that you need to mark.. somehow:
- "safe to be cached for X amount of time"
- example: message with providers for a popular CID could be cached for 5 minutes
- HTTP response can include
Cache-Control: max-age={5m}
providing a hint for CDN and clients to cache at least for 5 minutes
- "should never be cached"
- example: response to IPNS put
- these should always be sent as
POST
or asGET
withCache-Control: no-cache
(orIf-Modified-Since
set to the current timestamp, forcing cache-busting), and the response should also haveCache-Control: no-cache
REFRAME.md
Outdated
Note: This version header is what allows the transport to more easily evolve over time (e.g. if it was desired to change the transport to support other encodings than DAG-JSON, utilize headers differently, move the request data from the body, etc.). Not including the version number is may lead to incompatibility with future versions of the transport. | ||
|
||
#### HTTP Caching Considerations | ||
|
||
**POST vs GET** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lidel I think you had some thoughts here we didn't get to cover synchronously. Could you help me understand what the user model should reasonably be if someone has a browser app that's able to utilize some Reframe process running on localhost rather than remotely?
If the methods come in via POST
all seems fine, but if they can come in via GET
then there's potential for abuse (e.g. a bad actor hurting your reputation by using your ID to provide many bogus provider records) due to browser CORS limitations.
The alternative here seems to be having the user put their auth tokens in the browser page and having that authenticate requests. Is the idea that this isn't realistically more burdensome than dealing with CORS and so this isn't a concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aschmahmann
I think the high level insight is that we should not use CORS and POST as access control mechanism for general-purpose RPC. This applies to both localhost and public endpoints. We don't want to repeat mistakes made in go-ipfs RPC.
If you run reframe endpoint that does sensitive things, then we need either:
- HTTPS endpoint to have auth (require
Authorization
header in requests, reject if missing) - DAG-JSON messages to have auth (require signature/token in RPC message, reject if missing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we trying to keep HTTP-specific from non-HTTP-specific features of the protocol separate? If we plan to use reframe over other transports (like CBOR-over-libp2p) we should. If this is the case, then non-HTTP-specific authentication would be preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an open question, with huge caching implications:
- if we include auth inside a reframe message, it will change the
GET /reframe/{inlined-message}
path (because message is different for each user, or even per request if we include a once), making it impossible to cache responses for "reads"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, I agree that this becomes tricky. I don't think I necessarily mind "transports" having their own built-in auth mechanisms such as HTTP bearer tokens. It does mean a libp2p transport would either need to replicate that capability (more likely) or rely on other mechanisms like PeerIDs for auth.
This definitely effects caching basically because if we start adding user/per-request things to otherwise cacheable requests then server implementations will likely need fancier logic to do the caching than just relying on the content of the request itself.
…e GET/POST as used for cachable/non-cachable methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should split Reframe specs to make clear separation between protocol, currently fleshed out methods, and currently fleshed out transports.
I'm thinking:
reframe/REFRAME_PROTOCOL.md
- description of the generic RPC protocolreframe/REFRAME_METHODS.md
- currently 'standardized' methodsreframe/REFRAME_TRANSPORTS.md
- list of available transportsIPIP/000N-reframe-rpc-protocol.md
- (optional, but nice to have it a historical artifact) a short IPIP with references to our Motivation for Reframe (mostly links to existing discussions)
This way we have clear separation of concerns and specs don't leak transport or method abstractions upwards.
@aschmahmann lmk if it is ok for me to apply these (and below) changes
Implements split proposed in #287 (review) This decouples Transports and Known Methods from the generic RPC protocol, and allows us to add more methods and transports without touching the protocol itself.
Proposed split in #292 (PR against this one for easier review, same content) |
@aschmahmann ok for me to merge #292 into this PR, so they don't get out of sync? 🙏 |
Implements split proposed in #287 (review) This decouples Transports and Known Methods from the generic RPC protocol, and allows us to add more methods and transports without touching the protocol itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@petar could you take a look at the changes here to get an understanding of the implementation complexity in Edelweiss + go-delegated-routing to support the changes here (i.e. supporting POST for all requests and GET for some, as well as being able to mark some of the requests as cacheable so we know they can use GET)?
@petar the implementation gist is for reframe "server" to support responding to both GET and POST, and for "client" to send GET for cachable Reframe methods to leverage HTTP Caching where possible. |
@aschmahmann @lidel I will follow up here. I'm traveling the next two days and have food poisoning on top. So I'll chime in when i'm back. |
# Reframe Specifications | ||
|
||
- [`REFRAME_PROTOCOL.md`](./REFRAME_PROTOCOL.md) ← start here | ||
- [`REFRAME_KNOWN_METHODS.md`](./REFRAME_KNOWN_METHODS.md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't "API" more standard terminology than "known methods"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. Document had "Known methods" section already, so I assumed it is similar terminology as things like https://www.w3.org/TR/did-spec-registries/#did-methods (anyone can come up with own reframe method, just like anyone can come up with DID method).
API is super vague, as it implies the HTTP endpoint specifics, and we want to keep transport specs separate.
@petar perhaps REFRAME_MESSAGE_TYPES.md
is better? We already use "messages" all over the document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To unblock this PR from merging, I've kept "methods" but clarified it means "request-response message types" (#285)
We can refine it later.
is this spec PR okay to merge? i'd like to rebase #285 into the reframe directory this makes, but it seems like it might be easier to wait for this to complete first if it isn't expected to take too much longer |
I believe it is ok to merge: if anything changes, that will be editorial or adding more clarification and features, and not changing the underlying GET/POST approach. We have sync tomorrow when US folks are back, let's wait until then. (cc @aschmahmann @petar @BigLep) |
Should remove confusion in contest of HTTP where "method" means "GET" or "POST".
I am merging as per verbal with @aschmahmann :) |
Surgical spec tweak that aims to resolve caching problems described in #282, by providing both GET and POST endpoints (similar to json-rpc)
adding
GET /reframe/{mbase64url-dag-cbor}
endpoint for use cases where being able to leverage HTTP caching provides significant valueadding HTTP Caching Considerations section with
Etag
for avoiding sending same response twiceLast-Modified
as a mechanism for clients to force response freshnesslmk if this would do
@aschmahmann @guseggert @willscott @thattommyhall @masih – this is by no means "final" – opened this PR to get the conversation started, looking for an acceptable solution.
TODO
TBD POST should returnContent-Location
withGET
URL to ensure POST response is still cacheable, under correct GET URL