Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETag support #40

Open
alganet opened this issue Apr 17, 2012 · 10 comments
Open

ETag support #40

alganet opened this issue Apr 17, 2012 · 10 comments
Labels
Milestone

Comments

@alganet
Copy link
Member

alganet commented Apr 17, 2012

Automatic generation by MD5ing the content. Custom logic can be implemented separately.

@nickl-
Copy link
Member

nickl- commented Jun 21, 2012

This is brilliant in it's simplicity and shouldn't be too difficult to make happen either...

@ramsey
Copy link
Contributor

ramsey commented Jun 22, 2012

The difficulty may come in supporting conditional requests. For example, if a conditional PUT request includes an If-Match header, then we need to generate the content first, MD5 it, and then compare the If-Match value against the MD5 we've just generated. If it doesn't match, then we send back 409 Conflict.

We might also want to consider supporting RFC 6585, in particular 428 Precondition Required, since it's related to conditional requests.

I'm just thinking off the top of my head with this stuff, by the way. I think the right thing to do is implement ETags in responses first, and then we can think about conditional requests and how/whether we want to support them. Conditional requests may end up being cases where the implementer really needs to be the one making the decisions (and not the framework).

@nickl-
Copy link
Member

nickl- commented Jun 22, 2012

You are correct @ramsey: ETag is used in the conditional requests If-Match, If-Modified-Since, If-None-Match, If-Range, If-Unmodified-Since and Last-Modified but its main purpose is as a Weak or Strong CACHE validator which is precisely why it is used in conditional requests. You dig?

Which is why I am inclined to revoke my support to simply MD5 the response and call that ETag and I will explain why but first: I very strong +1 for the brilliant idea to MD5 the response your intuitions were right @alganet but to benefit from it's capabilities with regard to data integrity and CRC we should add the MD5 to the Content-MD5 tag instead, before any transport encoding is added.

Content-MD5 looks like this:

              Content-MD5   = "Content-MD5" ":" md5-digest
              md5-digest   = <base64 of 128 bit MD5 digest as per RFC 1864>

Awesome stuff!!! =)

Right lets get back to our friend the ETag.

So the purpose of the ETag is as cache validator and even though it is not ideal, since there are several occasions where we might have slight variations on the content which should not necessarily completely invalidate the cache, which is rare and we could close our eyes, cross our fingers and the MD5 would more than likely serve the purpose of identifying any semantic changes to the entity sufficiently enough to ascertain it's identity if it was not for just that. What exactly did I say there, ETag needs to indicate the identity of the entity, ouch. Therefor ETag is not only a hash of it's content but also a UUID and therefor MD5 cannot be utilized as we cannot vouch that it will be unique.

As mr Fielding puts it so nicely in 2616:

Note: in order to provide semantically transparent caching, an origin server must avoid reusing a specific strong entity tag value for two different entities, or reusing a specific weak entity tag value for two semantically different entities. Cache entries might persist for arbitrarily long periods, regardless of expiration times, so it might be inappropriate to expect that a cache will never again attempt to validate an entry using a validator that it obtained at some point in the past.

This is not the end of the world and we can still have a quick fix, not ideal fix but one that will suffice as per my statement regarding the MD5 solution. Even though it is not practical to utilize time in the construction of our UUID, as this will obviously null and void any type of caching, we do still have a unique identifier int the form of the url and the Content-Type of the response, unless I am missing something else we should add to the mix? We should be able to create a reproducible ETAG based on the content and with our current limitations of being without a server side CACHE see #60, by claiming that:

ETag is the md5 of url and Content-type and response body

Or do I have the cat by its tail here?

Right if we can agree to live with that for the meantime and get it committed so that we can start working on conditional requests #61 here are some thought around what should actually be done for your considerations.

2616 defines the difference between strong and weak cache identifiers which we should be capable of producing:

Since both origin servers and caches will compare two validators to decide if they represent the same or different entities, one normally would expect that if the entity (the entity-body or any entity-headers) changes in any way, then the associated validator would change as well. If this is true, then we call this validator a "strong validator."

[...]A validator that does not always change when the resource changes is a "weak validator."

This is what the ETag should look like

ETag = "ETag" ":" entity-tag

Examples:

ETag: "xyzzy"
ETag: W/"xyzzy"
ETag: ""

I am assuming the W/ addition to the second example indicates that the ETag is a weak identifier. There are several rules about when to and when not to use weak identifiers which we can discuss further if you see a need. Let me conclude this post on the requirements when to respond with an ETag which is also pertinent to this discussion.

HTTP/1.1 origin servers:

  • SHOULD send an entity tag validator unless it is not feasible to generate one.
  • MAY send a weak entity tag instead of a strong entity tag, if performance considerations support the use of weak entity tags, or if it is unfeasible to send a strong entity tag.
  • SHOULD send a Last-Modified value if it is feasible to send one, unless the risk of a breakdown in semantic transparency that could result from using this date in an If-Modified-Since header would lead to serious problems.

In addition to that the following status code responses have to say about ETags that:

  • 201 Created - MAY contain
  • 206 Partial Content - MUST contain
  • 304 Not Modified - MUST contain

Some final points on the matter:

In other words, the preferred behavior for an HTTP/1.1 origin server is to send both a strong entity tag and a Last-Modified value.

In order to be legal, a strong entity tag MUST change whenever the associated entity value changes in any way. A weak entity tag SHOULD change whenever the associated entity changes in a semantically significant way.

Right, who wants to go first?

@nickl-
Copy link
Member

nickl- commented Jun 22, 2012

@ramsey There you go again with "implementer really needs to be the one making the decisions" which I don't understand. If you think you can come up with a better way to implement HATEOS than using 2616 conditional requests plus get everyone tin the world to agree with your way and as well as comply then boy am I glad that we have you to help us here. =)

I think that because developers are left to have to implement things on their own is precisely the reason why it has taken the best part of 15 years(since 2616) and we have yet to see a complete implementation. I say we are the best developers for the task right here, right now, to implement this once and for all, in public open to scrutiny to finally produce a solution that is complete, who doesn't want that?

You argued earlier at #39 that "... is one of those cases where a framework needs to leave much of the decision up to the developer as to how it should work." Aside from the notion that any tool should only enable you and not get in your way to accomplish what you want, which is indeed true for any of the Respect modules, why then would you insist that it would be better to leave out, which we have a clear specification for, which falls exactly in the scope of this project, which there is an expressed need for, with eager willing and capable hands, like yourself, available to accomplish the tasks? Why?

I don't want this to hinder you from saying what you want just take note that I, personally, don't care if you want to leave things out and do it on your own, you were going to have to do that anyway, agreed? Please keep on making your comments here or else we are no better off than some lonely dev doing it on their own and wo would much appreciate it if you would bring those implementations for review and additions to the repository. I see great things to come! Glad to have you on board!

Thank you for: 6585 it gave me the conclusion I was looking for.

@alganet
Copy link
Member Author

alganet commented Jun 23, 2012

I was thinking of:

->eTag(); //weak ETags, auto-generated

->eTag(true); //strong ETag, auto-generated

->eTag(false, function(... //weak, custom-generated

->eTag(true, function(... //strong, custom-generated

@ramsey
Copy link
Contributor

ramsey commented Jun 23, 2012

@ramsey There you go again with "implementer really needs to be the one making the decisions" which I don't understand. If you think you can come up with a better way to implement HATEOS than using 2616 conditional requests plus get everyone tin the world to agree with your way and as well as comply then boy am I glad that we have you to help us here.

This is precisely why I don't think we can get it right and that it should be left up to the implementer. I cannot presume to know what's best for someone else's API. To do so would be to do exactly what you're criticizing me of doing here.

Furthermore, I wasn't suggesting that I could come up with "a better way to implement HATEOS than using 2616 conditional requests" at all. The reason I suggested that the implementer might be better at deciding how to handle conditional requests than us is because the implementer is the one who knows his or her data the best. This isn't to say that we can't make assumptions, but we should allow the implementer to override the default behavior if our assumptions don't match what it is they are trying to do.

@nickl-
Copy link
Member

nickl- commented Jun 23, 2012

@ramsey see 6522893 in response

Damn I was counting on you delivering that HATEOS solution. =)

@nickl-
Copy link
Member

nickl- commented Jun 23, 2012

@alganet love the sugar can you elaborate on the spice please?

@nickl-
Copy link
Member

nickl- commented Jun 24, 2012

The Content-MD5 :rfc1864 Header Field in all it's glory.

@alganet
Copy link
Member Author

alganet commented Jul 3, 2012

@nickl- implementations rarely come to my mind, I really love interfaces =D

I would go on with md5() or any other low-collision algorithm for caching. For strong auto-generated ETags, we should increase entropy and generate a larger hash (at a considerable performance cost).

We need RouteInspector for this, so we can cache according to the spec (considering all headers).

This approach gives the developer four choices: strong custom etags (generated by themselves using a callback), strong auto etags (generated from the content and headers with higher entropy), weak auto and weak custom (same as strong ones on use).

By default, I believe we should use weak auto etags. We can't choose any expire time or last modification time, weak ETags are the single only mechanism we can ship by default without making our users choiceless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants