Specify how document.cookie diverges from [COOKIES] RFC #804

domenic · 2016-03-04T20:16:06Z

Currently the spec says

the user agent must act as it would when receiving a set-cookie-string for the document's address via a "non-HTTP" API, consisting of the new value encoded as UTF-8.

However, in the real world things like document.cookie = "foo" work and have an effect. There are probably many other possibilities; in general the RFC just has a grammar that things might not match, whereas I imagine browsers just accept anything and try to make sense of it, even if it fails to match the grammar.

@bsittler noticed this while working on some service worker cookie stuff, and previously it has come up in the jsdom project and its related tough-cookie helper:

@Sebmaster and @inikulin led the charge for this in jsdom, so maybe they could help us spec the correct behavior for how document.cookie parses cookies? Alternately, looking at open-source browser code would get us pretty far.

This might be a compat issue if everyone hasn't managed to magically converge on a single behavior despite the lack of precise spec. Tentatively tagging as such for now.

The text was updated successfully, but these errors were encountered:

annevk · 2016-03-05T06:41:24Z

Paging @mikewest.

inikulin · 2016-03-05T18:09:25Z

I'd love to help!

Here is what we can do:

Create test runner for the IETF test suite that will produce output in machine readable format. Currently it can run only individual tests, or requires dev builds of the browsers in some cases which renders it unusable for the testing of IE and Edge. Also it can't produce machine readable reports at the moment. We will need them to aggregate and analyze results across browsers lately. I'm already working on it.
Run tests in all major browsers:
- Chrome
- Safari
- Firefox
- IE
- Edge
Using aggregated test fails info we can build table in format:

Test case /browser	Expected	Chrome	Firefox	Safari
"foo="	""	"foo"	"foo"	""

Triage fails into groups, e.g. if test fails in the majority of the browsers consider it as a de facto behavior and add this difference to the spec. For the minor cases consult with the developers / search for the issue tracker tickets to find motivation behind it.
Modify IETF test suite by the way to align it with the proposed behavior. Make it default test suite for the spec.

I will try to provide you test results somewhere around next week.

annevk · 2016-03-06T06:42:35Z

Cool! Another thing here worth checking is <meta http-equiv=set-cookie>. If these invalid values still result in HTTP headers, it's likely the RFC will need to be updated somehow.

inikulin · 2016-03-06T15:00:40Z

@annevk AFAIK browsers uses the same code for all cookie parsing scenarios. Spec violations in document.cookie setter also shows up when you set cookie via HTTP-header. I'm pretty sure we will have the same results with <meta>.

annevk · 2016-03-06T16:36:20Z

I see, in that case it seems like something @mikewest and @mnot should be solving in the RFC. Your testing will still be useful, obviously, but given the scope of the problem it does not seem like something that needs to be addressed in the HTML Standard. Although I can understand if we need to make adjustments for a revised RFC that does handle this properly.

inikulin · 2016-03-06T22:42:40Z

Although I can understand if we need to make adjustments for a revised RFC that does handle this properly.

So, we will continue discussion here for now and once we will have some data and analyzis we will ping IETF guys, I guess?

mnot · 2016-03-07T06:36:38Z

Very good timing. We're about to start opening up the cookie RFC, so yes do ping us when you have some results. Any idea how long that will be?

annevk · 2016-03-07T07:42:09Z

@inikulin, yeah, we'll keep this open until the issue is resolved. @mnot, @inikulin mentioned earlier he was hoping to have something this week.

inikulin · 2016-03-10T14:21:02Z

Voilà
http://inikulin.github.io/cookie-compat/

domenic · 2016-03-10T14:26:04Z

OMG, this is amazing!!

bsittler · 2016-03-10T16:49:55Z

@inikulin this is really sobering. Thank you! What was the effective document charset for the test page?

inikulin · 2016-03-10T16:56:56Z

@bsittler UTF-8

inikulin · 2016-03-10T16:58:39Z

FYI test runner sources are here: https://github.com/inikulin/cookie-compat

inikulin · 2016-03-10T17:20:20Z

Thank you guys for all the kind words, I hope you will find it useful.

Further steps:

~~Add expires= date parsing tests. They are in the separate test suite and requires conversion.~~ (just realized what there is no way to access parsed expiration date)
Currently we don't have reference implementation. It bothers me. I will try to create one based on tough-cookie. Actually, tough-cookie is implemented nearly per spec with just some minor relaxations (e.g. symbols restrictions for the token are ignored).
Report issues for the obvious bugs to implementors and reference them in the table.

mnot · 2016-03-11T00:45:06Z

Wow indeed, really great stuff!

It seems to me that the first 17 tests could be brought into (at least rough) interop with a fairly simple spec change to Section 5.2. The remaining tests demonstrate enough interop that they look more like browser bugs to me.

That's assuming that all of the browsers don't want to fix the underlying bugs in the first 17 tests, of course. It'd be very useful to know how much content on the Web currently relies upon this behaviour, but gathering that data is likely to be problematic...

If we do want to change the spec, someone will need to write up an Internet-Draft describing the proposed changes. I can help with that.

@inikulin would you mind pinging the HTTP-WG about this on its mailing list https://lists.w3.org/Archives/Public/ietf-http-wg/? If you don't want to subscribe, I can forward a message for you, or you could even just open up a bug at https://github.com/httpwg/http-extensions/issues. I just want to make sure that you get credit for this awesome work.

inikulin · 2016-03-15T14:27:02Z

@mnot Done: httpwg/http-extensions#159

bsittler · 2016-06-21T18:38:28Z

@inikulin what was the system codepage for Edge and IE? Have you tried changing it? If https://stackoverflow.com/questions/1969232/allowed-characters-in-cookies is to be believed, non-ASCII characters may "work" in IE when they are present in the system codepage, where "work" means they will be wire-encoded in that codepage (never UTF-8, since Windows system codepage can't be set to 65001) but exposed to JavaScript using the corresponding Unicode characters. I'd be especially interested to see the results for systems with larger-coverage (CJK?) or non-1252 system codepages.

Likewise, have you tried server-generated cookies with encodings other than UTF-8, e.g. latin-1?

inikulin · 2016-06-21T18:47:22Z

Nope, haven't adjusted windows code page for tests. I'll try to run with codepages with bigger character set tomorrow at work, because I don't have access to win machine currently.

inikulin · 2016-06-21T19:01:25Z

Likewise, have you tried server-generated cookies with encodings other than UTF-8, e.g. latin-1?

Nope

bsittler · 2016-06-21T20:18:54Z

One more thought: it may be worth checking both reading and writing behavior of the backslash \u005c \ and yen sign \u00a5 ¥ in cookies on the server side, from HTML (meta http-equiv=set-cookie) and from document.cookie across Latin 1, UTF-8 and Shift JIS/CP 932 document encodings and with both US English and Japanese system codepages in effect. It's a large matrix, but it may uncover some useful information about how browsers currently interoperate (or don't) in the presence of incompatible character encodings. In particular it would be good to know whether backslash is reliably round-tripped under all these circumstances and whether or not it is ever remapped to a non-ASCII character.

Same question for tilde \u007e ~ and wave dash \u301c 〜 actually.

(I'm asking these oddly specific questions because I'm wondering whether all of printable ASCII other than semicolon is actually safe in cookie values across browsers) Edit: names too (barring equal sign of course)

Edit: Also, in the meta http-equiv case, are the results the same for raw document-charset characters vs. HTML-entified versions?

more edit: Yet another IE-specific question: does document.cookie in IE (and Edge?) round-trip Unicode when the characters are first converted to bytes? e.g. document.cookie = unescape(encodeURIComponent('test=三猿🙈🙉🙊')) and decodeURIComponent(escape(document.cookie)) [or the (better) TextDecoder/TextEncoder equivalents except there's no TextDecoder/TextEncoder in IE]

inikulin · 2016-06-22T12:21:35Z

@bsittler

I'd be especially interested to see the results for systems with larger-coverage (CJK?) or non-1252 system codepages.

I've added results for IE and Edge with system codepage 950 (big5) and 932 (shift_jis): http://inikulin.github.io/cookie-compat/ (spoiler: it didn't work out)

Regarding #804 (comment) if you wouldn't mind, I will work on it later, because I'm really running out of spare time currently. I've created issue in cookie-compat for this task to not forget about it: inikulin/cookie-compat#3

bsittler · 2016-06-22T21:04:29Z

Thank you very much

On Wed, Jun 22, 2016, 05:21 Ivan Nikulin notifications@github.com wrote:

@bsittler https://github.com/bsittler

I'd be especially interested to see the results for systems with
larger-coverage (CJK?) or non-1252 system codepages.

I've added results for IE and Edge with system codepage 950 (big5) and 932
(shift_jis): http://inikulin.github.io/cookie-compat/ (spoiler: it didn't
work out)

Regarding #804 (comment)
#804 (comment) if you
wouldn't mind, I will work on it later, because I'm really running out of
spare time currently. I've created issue in cookie-compat for this task to
not forget about it: inikulin/cookie-compat#3
inikulin/cookie-compat#3

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#804 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAD3R3OiA9gj3SUrChCOVgrDswuODR8Oks5qOSjVgaJpZM4HpsLj
.

bsittler · 2016-06-27T21:01:26Z

On Windows 7 with a US English system locale running IE 9, JavaScript-written cookies subsequently read from JavaScript seem to reliably round-trip characters whose ISO 8859-1 encodings fall in the ISO 2022 GR range (0xA0 ... 0xFF) in addition to most of printable ASCII. This seems to be the case regardless of the document character encoding. Additionally, I tried a few characters whose Windows-1252 encodings fall in the ISO 2022 C1 range (0x80 ... 0x9F) and they appear to round-trip successfully, too. Characters not representable in Windows-1252 are apparently converted to question mark (other printable characters) or dropped (ASCII control characters.)

I have not yet tested with a different system locale.

I suspect that cookies are simply serialized in the IE cookie jar using the default codepage of the system locale.

bsittler · 2016-06-27T23:19:51Z

Indeed, after switching the system locale to Japanese (with "ANSI" and "OEM" codepages both switched to 932) and rebooting, cookies behave exactly as if they are being stored in CP932 (approximately Shift JIS), with characters like Euro sign \u20ac converted to question mark and japanese text preserved. This is independent of document charset, so the same Japanese text written by script running in a Shift JIS document is readable by script running in a UTF-8 document without mangling, and vice versa.

annevk · 2016-06-28T07:02:22Z

Wow, that is not something we want to standardize upon. How would that even work with code points that cannot be represented by the encoding?

bsittler · 2016-06-28T15:04:09Z

It doesn't. They are converted to question marks (in other words, data is
lost.) Because it's based on the system "ANSI" code page it is however
somewhat likely that text entered by the user in the system locale's
primary language will round-trip successfully from script to script across
page loads. Compatibility with other modern browsers however seems to be
zero for non-ASCII text.

On Tue, Jun 28, 2016, 00:02 Anne van Kesteren notifications@github.com
wrote:

Wow, that is not something we want to standardize upon. How would that
even work with code points that cannot be represented by the encoding?

—
You are receiving this because you commented.

Reply to this email directly, view it on GitHub
#804 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAD3R0T0ufS3iGTcdq_8a_H49eZyMzn0ks5qQMcEgaJpZM4HpsLj
.

bsittler · 2016-06-28T20:30:27Z

Just did a little further testing, and verified that even with explicit UTF-8 or UTF-16 (little-endian) byte-order marks in the cookie name and/or cookie value, IE and Edge still always interpret the cookie according to the system "ANSI" codepage. Non-ASCII cookie names and values set by the server are sent back to the server without mangling, so there's nothing to prevent a server from storing UTF-8 in a cookie (e.g. UTF-8 cookie names/values containing Ő [\xc5\x90] round-trip server-to-server via US English-locale Edge even though \x90 is nominally unmapped in Windows code page 1252), however scripts running in IE always misinterpret such cookies according to the system ANSI codepage (in this case the nominally unmapped byte is in fact exposed as-is to script, as '\x90'.)

Also, attempts to set cookies from scripts with "ANSI" code page-unrepresentable characters in their names and/or values do not always convert those to question marks - sometimes a different fallback is used. For instance, with a US English system locale document.cookie = 'Ő=Ő' results in O=O instead. I suspect it's using the default substitutions from WideCharToMultiByte.

domenic · 2016-06-28T21:02:03Z

I'm doubtful that further testing of IE/Edge's quirks is going to be helpful. We know they do weird stuff they would never put into a web spec.

bsittler · 2016-06-29T20:32:38Z

Right, I was merely attempting to assess the compatibility risk of having the new API only support UTF-8 (and possibly also "raw byte array") interpretation for cookie data, which would be incompatible (in Edge) with the system "ANSI" codepage interpretation in document.cookie and <meta http-equiv="set-cookie" ...> but consistent with other browsers.

Ms2ger · 2017-08-15T15:32:11Z

One "fun" thing I noticed today: document.cookie = 'foo' will add a trailing = in macOS WebKit, but not GTK+ WebKit.

domenic added normative change compat Standard is not web compatible or proprietary feature needs standardizing labels Mar 4, 2016

domenic mentioned this issue Mar 4, 2016

Support no-value cookies? WICG/cookie-store#1

Closed

inikulin mentioned this issue Mar 15, 2016

Allow cookies without key or value httpwg/http-extensions#159

Closed

inikulin mentioned this issue Jun 22, 2016

Perform additional encoding testing inikulin/cookie-compat#3

Open

bsittler mentioned this issue Jul 19, 2016

Bytes vs. characters and "cookie charset" WICG/cookie-store#15

Closed

annevk mentioned this issue Aug 15, 2017

"receiving a set-cookie-string" does not seem defined in the linked RFC #2921

Closed

annevk added the topic: cookie label Sep 6, 2017

This was referenced Nov 13, 2017

Cookie Change Events mozilla/standards-positions#50

Closed

Update cookie-related OWNERS and READMEs web-platform-tests/wpt#7531

Merged

ibeizhu mentioned this issue May 10, 2018

jsdom 中文文档 one-gourd/blog#1

Open

annevk mentioned this issue Jun 6, 2018

Cookie Store API w3ctag/design-reviews#290

Closed

3 tasks

annevk mentioned this issue Feb 25, 2020

[rfc6265bis] Cookie parser - UTF-8 chars httpwg/http-extensions#1073

Open

annevk mentioned this issue Jul 20, 2020

Clarify SameSite behavior for non-HTTP API httpwg/http-extensions#769

Closed

github-actions bot mentioned this issue Mar 30, 2024

chore: removing unused dependencies trumant/github-action-new-dependencies-advisor#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify how document.cookie diverges from [COOKIES] RFC #804

Specify how document.cookie diverges from [COOKIES] RFC #804

domenic commented Mar 4, 2016

annevk commented Mar 5, 2016

inikulin commented Mar 5, 2016

annevk commented Mar 6, 2016

inikulin commented Mar 6, 2016

annevk commented Mar 6, 2016

inikulin commented Mar 6, 2016

mnot commented Mar 7, 2016

annevk commented Mar 7, 2016

inikulin commented Mar 10, 2016

domenic commented Mar 10, 2016

bsittler commented Mar 10, 2016

inikulin commented Mar 10, 2016

inikulin commented Mar 10, 2016

inikulin commented Mar 10, 2016

mnot commented Mar 11, 2016

inikulin commented Mar 15, 2016

bsittler commented Jun 21, 2016

inikulin commented Jun 21, 2016

inikulin commented Jun 21, 2016

bsittler commented Jun 21, 2016 •

edited

Loading

inikulin commented Jun 22, 2016

bsittler commented Jun 22, 2016

bsittler commented Jun 27, 2016 •

edited

Loading

bsittler commented Jun 27, 2016

annevk commented Jun 28, 2016

bsittler commented Jun 28, 2016

bsittler commented Jun 28, 2016 •

edited

Loading

domenic commented Jun 28, 2016

bsittler commented Jun 29, 2016 •

edited

Loading

Ms2ger commented Aug 15, 2017

Specify how document.cookie diverges from [COOKIES] RFC #804

Specify how document.cookie diverges from [COOKIES] RFC #804

Comments

domenic commented Mar 4, 2016

annevk commented Mar 5, 2016

inikulin commented Mar 5, 2016

annevk commented Mar 6, 2016

inikulin commented Mar 6, 2016

annevk commented Mar 6, 2016

inikulin commented Mar 6, 2016

mnot commented Mar 7, 2016

annevk commented Mar 7, 2016

inikulin commented Mar 10, 2016

domenic commented Mar 10, 2016

bsittler commented Mar 10, 2016

inikulin commented Mar 10, 2016

inikulin commented Mar 10, 2016

inikulin commented Mar 10, 2016

mnot commented Mar 11, 2016

inikulin commented Mar 15, 2016

bsittler commented Jun 21, 2016

inikulin commented Jun 21, 2016

inikulin commented Jun 21, 2016

bsittler commented Jun 21, 2016 • edited Loading

inikulin commented Jun 22, 2016

bsittler commented Jun 22, 2016

bsittler commented Jun 27, 2016 • edited Loading

bsittler commented Jun 27, 2016

annevk commented Jun 28, 2016

bsittler commented Jun 28, 2016

bsittler commented Jun 28, 2016 • edited Loading

domenic commented Jun 28, 2016

bsittler commented Jun 29, 2016 • edited Loading

Ms2ger commented Aug 15, 2017

bsittler commented Jun 21, 2016 •

edited

Loading

bsittler commented Jun 27, 2016 •

edited

Loading

bsittler commented Jun 28, 2016 •

edited

Loading

bsittler commented Jun 29, 2016 •

edited

Loading