diff --git a/url.bs b/url.bs index 88845f18..59d9b17d 100644 --- a/url.bs +++ b/url.bs @@ -231,9 +231,9 @@ point URLs from A can come from untrusted sources. https://mothereff.in/punycode -->

A host is a domain, an -IPv4 address, or an IPv6 address. Typically a -host serves as a network address, but it is sometimes (ab)used as opaque -identifier in URLs where a network address is not necessary. +IPv4 address, an IPv6 address, or an opaque host. Typically a host +serves as a network address, but it is sometimes used as opaque identifier in URLs +where a network address is not necessary.

The RFCs referenced in the paragraphs below are for informative purposes only. They have no influence on host syntax, parsing, and serialization. Unless stated @@ -257,6 +257,31 @@ eight 16-bit pieces.

Support for <zone_id> is intentionally omitted. +

An opaque host is an ASCII string holding data that can be used for +further processing. + +

An opaque host is only used by non-special +URLs. + +


+ +

A forbidden host code point is +U+0000, +U+0009, +U+000A, +U+000D, +U+0020, +"#", +"%", +"/", +":", +"?", +"@", +"[", +"\", +or +"]". +

IDNA

@@ -292,8 +317,8 @@ eight 16-bit pieces.

Host syntax

A host string must be a domain string, an -IPv4 address string, or "[", followed by an IPv6 address string, followed -by "]". +IPv4 address string, or: "[", followed by an IPv6 address string, +followed by "]".

A domain is a valid domain if these steps return success: @@ -335,6 +360,11 @@ separated from each other by ".". XXX should we define the format inline instead just like STD 66? --> +

An opaque-host string must be zero or more URL units. + +

This is not part of the definition of host string as it requires +context to be distinguished. +

Host parsing

@@ -368,24 +398,8 @@ steps:
  • If asciiDomain is failure, return failure. -

  • -

    If asciiDomain contains - U+0000, - U+0009, - U+000A, - U+000D, - U+0020, - "#", - "%", - "/", - ":", - "?", - "@", - "[", - "\", - or - "]", - syntax violation, return failure. +

  • If asciiDomain contains a forbidden host code point, + syntax violation, return failure.

  • Let ipv4Host be the result of IPv4 parsing asciiDomain. @@ -700,7 +714,7 @@ They serve no purpose other than being a location the algorithm can jump to. IPv6 serializer on host, followed by "]". -

  • Otherwise, host is a domain, return host. +

  • Otherwise, host is a domain or opaque host, return host. The IPv4 serializer takes an @@ -813,15 +827,15 @@ an ASCII string identifying a user. It is initially the empty string. either null or an ASCII string identifying a user's credentials. It is initially null. -

    A URL's host is either -null or a host. It is initially null. +

    A URL's host is null or a +host. It is initially null.

    A URL's port is either null or a 16-bit unsigned integer that identifies a networking port. It is initially null. -

    A URL's path is a list of -zero or more ASCII string holding data, usually identifying a location in -hierarchical form. It is initially the empty list. +

    A URL's path is a list of zero or more +ASCII strings holding data, usually identifying a location in hierarchical form. It is +initially the empty list.

    A URL's query is either null or an ASCII string holding data. It is initially null. @@ -939,7 +953,7 @@ input might be a relative-URL string.

    -

    then decrease pointer by the number of code points in buffer plus - one, set buffer to the empty string, and set state to host state. +

    then run these substeps: + +

      +
    1. If @ flag is set and buffer is the empty string, + syntax violation, return failure. + + +

    2. Decrease pointer by the number of code points in buffer plus + one, set buffer to the empty string, and set state to + host state. +

  • Otherwise, append c to buffer. @@ -1556,17 +1613,13 @@ string input, optionally with a base URL base, opti [] flag is unset, run these substeps:

      -
    1. If url is special and buffer is the empty - string, return failure. - - https://@/example.org/ -> https:///example.org/ -> https://example.org/ --> +

    2. Let host be the result of URL-host parsing + buffer with url is special. -

    3. Let host be the result of - host parsing - buffer. - -

    4. If host is failure, return failure. +

    5. If host is failure, then return failure.

    6. Set url's host to host, buffer to the empty string, @@ -1588,14 +1641,15 @@ string input, optionally with a base URL base, opti

      then decrease pointer by one, and run these substeps:

        -
      1. If url is special and buffer is the empty - string, return failure. +

      2. If url is special and buffer is the empty string, + syntax violation, return failure. + -

      3. Let host be the result of - host parsing - buffer. +

      4. Let host be the result of URL-host parsing + buffer with url is special. -

      5. If host is failure, return failure. +

      6. If host is failure, then return failure.

      7. Set url's host to host, buffer to the empty string, @@ -2097,7 +2151,7 @@ then runs these steps: in url's path to output.

      8. Otherwise, append "/", followed by the strings in url's - path (including empty strings), separated from each other by + path (including empty strings), if any, separated from each other by "/", to output.

      9. If url's query is non-null, append @@ -2680,11 +2734,11 @@ the setter to always "reset" both.

        1. If context object's url's cannot-be-a-base-URL flag is - set, return the first string in context object's url's path. + set, then return context object's url's path[0].

        2. Return "/", followed by the strings in context object's - url's path (including empty strings), separated from each other by - "/". + url's path (including empty strings), if any, separated from each + other by "/".

        The pathname attribute's setter must