diff --git a/url.bs b/url.bs index 88845f18..59d9b17d 100644 --- a/url.bs +++ b/url.bs @@ -231,9 +231,9 @@ point URLs from A can come from untrusted sources. https://mothereff.in/punycode -->
A host is a domain, an -IPv4 address, or an IPv6 address. Typically a -host serves as a network address, but it is sometimes (ab)used as opaque -identifier in URLs where a network address is not necessary. +IPv4 address, an IPv6 address, or an opaque host. Typically a host +serves as a network address, but it is sometimes used as opaque identifier in URLs +where a network address is not necessary.
The RFCs referenced in the paragraphs below are for informative purposes only. They have no influence on host syntax, parsing, and serialization. Unless stated @@ -257,6 +257,31 @@ eight 16-bit pieces.
Support for <zone_id>
is
intentionally omitted.
+
An opaque host is an ASCII string holding data that can be used for +further processing. + +
An opaque host is only used by non-special +URLs. + +
A forbidden host code point is
+U+0000,
+U+0009,
+U+000A,
+U+000D,
+U+0020,
+"#
",
+"%
",
+"/
",
+":
",
+"?
",
+"@
",
+"[
",
+"\
",
+or
+"]
".
+
A host string must be a domain string, an
-IPv4 address string, or "[
", followed by an IPv6 address string, followed
-by "]
".
+IPv4 address string, or: "[
", followed by an IPv6 address string,
+followed by "]
".
A domain is a valid domain if these steps return success:
@@ -335,6 +360,11 @@ separated from each other by ".
".
XXX should we define the format inline instead just like STD 66? -->
+
An opaque-host string must be zero or more URL units. + +
This is not part of the definition of host string as it requires +context to be distinguished. +
If asciiDomain is failure, return failure. -
If asciiDomain contains
- U+0000,
- U+0009,
- U+000A,
- U+000D,
- U+0020,
- "#
",
- "%
",
- "/
",
- ":
",
- "?
",
- "@
",
- "[
",
- "\
",
- or
- "]
",
- syntax violation, return failure.
+
If asciiDomain contains a forbidden host code point, + syntax violation, return failure.
Let ipv4Host be the result of IPv4 parsing
asciiDomain.
@@ -700,7 +714,7 @@ They serve no purpose other than being a location the algorithm can jump to.
IPv6 serializer on host,
followed by "]
".
-
Otherwise, host is a domain, return host. +
Otherwise, host is a domain or opaque host, return host. The IPv4 serializer takes an @@ -813,15 +827,15 @@ an ASCII string identifying a user. It is initially the empty string. either null or an ASCII string identifying a user's credentials. It is initially null. -
A URL's host is either -null or a host. It is initially null. +
A URL's host is null or a +host. It is initially null.
A URL's port is either null or a 16-bit unsigned integer that identifies a networking port. It is initially null. -
A URL's path is a list of -zero or more ASCII string holding data, usually identifying a location in -hierarchical form. It is initially the empty list. +
A URL's path is a list of zero or more +ASCII strings holding data, usually identifying a location in hierarchical form. It is +initially the empty list.
A URL's query is either null or an ASCII string holding data. It is initially null. @@ -939,7 +953,7 @@ input might be a relative-URL string.
a URL-scheme string that is an ASCII case-insensitive match for a
special scheme and not an ASCII case-insensitive match for "file
",
- followed by ":
" and a scheme-relative-URL string
+ followed by ":
" and a scheme-relative-special-URL string
a URL-scheme string that is not an ASCII case-insensitive match for a
special scheme, followed by ":
" and a relative-URL string
a URL-scheme string that is an ASCII case-insensitive match for
@@ -963,8 +977,8 @@ must be a relative-URL string, optionally followed by "#
" an
switching on base URL's scheme:
file
"
- file
"
+ a scheme-relative-special-URL string
file
"
@@ -973,6 +987,10 @@ switching on base URL's scheme:
a path-absolute-non-Windows-file-URL string if base URL's host is non-null
a scheme-relative-URL string +
any optionally followed by "?
" and a URL-query string.
@@ -980,12 +998,20 @@ switching on base URL's scheme:
A non-null base URL is necessary when parsing a relative-URL string. -
A scheme-relative-URL string must be
-"//
", followed by a host string, optionally followed by ":
"
-and a URL-port string, optionally followed by a path-absolute-URL string.
+
A scheme-relative-special-URL string must be "//
", followed by a
+host string, optionally followed by ":
" and a URL-port string, optionally
+followed by a path-absolute-URL string.
A URL-port string must be zero or more ASCII digits. +
A scheme-relative-URL string must be
+"//
", followed by an opaque-host-and-port string, optionally followed by a
+path-absolute-URL string.
+
+
An opaque-host-and-port string must be either an empty
+opaque-host string or: a non-empty opaque-host string, optionally followed by
+":
" and a URL-port string.
+
A scheme-relative-file-URL string must be
"//
", followed by one of the following
@@ -1195,6 +1221,26 @@ different document encoding. Using the UTF-8 encoding everywhere solves t
The URL-host parser takes a string input +and a boolean isSpecial, and then runs these steps:
+ +If isSpecial is true, then return the result of + host parsing input. + +
If input contains a forbidden host code point, syntax violation, + return failure. + +
Let output be the empty string. + +
For each code point in input, UTF-8 percent encode it using the + simple encode set, and append the result to output. + +
Return output. +
The basic URL parser takes a string input, optionally with a base URL base, optionally with an encoding encoding override, optionally with a URL @@ -1541,8 +1587,19 @@ string input, optionally with a base URL base, opti
url is special and c is "\
"
then decrease pointer by the number of code points in buffer plus - one, set buffer to the empty string, and set state to host state. +
then run these substeps: + +
If @ flag is set and buffer is the empty string, + syntax violation, return failure. + + +
Decrease pointer by the number of code points in buffer plus + one, set buffer to the empty string, and set state to + host state. +
Otherwise, append c to buffer. @@ -1556,17 +1613,13 @@ string input, optionally with a base URL base, opti [] flag is unset, run these substeps:
If url is special and buffer is the empty - string, return failure. - - https://@/example.org/ -> https:///example.org/ -> https://example.org/ --> +
Let host be the result of URL-host parsing + buffer with url is special. -
Let host be the result of - host parsing - buffer. - -
If host is failure, return failure. +
If host is failure, then return failure.
Set url's host to host, buffer to the empty string, @@ -1588,14 +1641,15 @@ string input, optionally with a base URL base, opti
then decrease pointer by one, and run these substeps:
If url is special and buffer is the empty - string, return failure. +
If url is special and buffer is the empty string, + syntax violation, return failure. + -
Let host be the result of - host parsing - buffer. +
Let host be the result of URL-host parsing + buffer with url is special. -
If host is failure, return failure. +
If host is failure, then return failure.
Set url's host to host, buffer to the empty string, @@ -2097,7 +2151,7 @@ then runs these steps: in url's path to output.
Otherwise, append "/
", followed by the strings in url's
- path (including empty strings), separated from each other by
+ path (including empty strings), if any, separated from each other by
"/
", to output.
If url's query is non-null, append @@ -2680,11 +2734,11 @@ the setter to always "reset" both.
If context object's url's cannot-be-a-base-URL flag is - set, return the first string in context object's url's path. + set, then return context object's url's path[0].
Return "/
", followed by the strings in context object's
- url's path (including empty strings), separated from each other by
- "/
".
+ url's path (including empty strings), if any, separated from each
+ other by "/
".
The pathname
attribute's setter must