You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In essence: have some heuristic to determine the input encoding (BOM, @charset, try a few common charsets and pick the first one that doesn’t produce errors), then convert to UTF-8 and, from that point on, all the tokens of interest to us will be ASCII-only and can be parsed using regular string functions.
The text was updated successfully, but these errors were encountered:
Yes good idea. Though browsers have a Content-Type header that may include a charset= specifier that we don’t have (as well as the resolved charset of the referring document). But we can definitely follow what browsers do absent charset=.
Though browsers have a Content-Type header that may include a charset= specifier that we don’t have (as well as the resolved charset of the referring document).
We can use the value provided to Settings::withDefaultCharset in its place.
From #688 (comment):
In essence: have some heuristic to determine the input encoding (BOM,
@charset
, try a few common charsets and pick the first one that doesn’t produce errors), then convert to UTF-8 and, from that point on, all the tokens of interest to us will be ASCII-only and can be parsed using regular string functions.The text was updated successfully, but these errors were encountered: