From f7eee0d47e9d897133f293da17f244cf88aa363b Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Wed, 28 Dec 2016 18:20:22 +0100 Subject: [PATCH] Add opaque hosts For URLs without a special scheme we cannot use the host parser directly due to compatibility issues. Instead we percent-encode the input. Tests: https://github.com/w3c/web-platform-tests/pull/4406. Fixes #148. --- url.bs | 140 +++++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 102 insertions(+), 38 deletions(-) diff --git a/url.bs b/url.bs index 88845f18..ce2a3bba 100644 --- a/url.bs +++ b/url.bs @@ -231,9 +231,9 @@ point URLs from A can come from untrusted sources. https://mothereff.in/punycode -->

A host is a domain, an -IPv4 address, or an IPv6 address. Typically a -host serves as a network address, but it is sometimes (ab)used as opaque -identifier in URLs where a network address is not necessary. +IPv4 address, an IPv6 address, or an opaque host. Typically a host +serves as a network address, but it is sometimes used as opaque identifier in URLs +where a network address is not necessary.

The RFCs referenced in the paragraphs below are for informative purposes only. They have no influence on host syntax, parsing, and serialization. Unless stated @@ -257,6 +257,13 @@ eight 16-bit pieces.

Support for <zone_id> is intentionally omitted. +

An opaque host is an ASCII string holding data that can be used for +further processing. + +

An opaque host is only used by non-special +URLs. + +

IDNA

@@ -292,8 +299,8 @@ eight 16-bit pieces.

Host syntax

A host string must be a domain string, an -IPv4 address string, or "[", followed by an IPv6 address string, followed -by "]". +IPv4 address string, or: "[", followed by an IPv6 address string, +followed by "]".

A domain is a valid domain if these steps return success: @@ -335,6 +342,11 @@ separated from each other by ".". XXX should we define the format inline instead just like STD 66? --> +

An opaque-host string must be zero or more URL units. + +

This is not part of the definition of host string as it requires +context to be distinguished. +

Host parsing

@@ -700,7 +712,7 @@ They serve no purpose other than being a location the algorithm can jump to. IPv6 serializer on host, followed by "]". -
  • Otherwise, host is a domain, return host. +

  • Otherwise, host is a domain or opaque host, return host. The IPv4 serializer takes an @@ -813,15 +825,15 @@ an ASCII string identifying a user. It is initially the empty string. either null or an ASCII string identifying a user's credentials. It is initially null. -

    A URL's host is either -null or a host. It is initially null. +

    A URL's host is null or a +host. It is initially null.

    A URL's port is either null or a 16-bit unsigned integer that identifies a networking port. It is initially null. -

    A URL's path is a list of -zero or more ASCII string holding data, usually identifying a location in -hierarchical form. It is initially the empty list. +

    A URL's path is a list of zero or more +ASCII strings holding data, usually identifying a location in hierarchical form. It is +initially the empty list.

    A URL's query is either null or an ASCII string holding data. It is initially null. @@ -939,7 +951,7 @@ input might be a relative-URL string.

    -

    then decrease pointer by the number of code points in buffer plus - one, set buffer to the empty string, and set state to host state. +

    then run these substeps: + +

      +
    1. If @ flag is set and buffer is the empty string, + syntax violation, return failure. + + +

    2. Decrease pointer by the number of code points in buffer plus + one, set buffer to the empty string, and set state to + host state. +

  • Otherwise, append c to buffer. @@ -1556,17 +1623,13 @@ string input, optionally with a base URL base, opti [] flag is unset, run these substeps:

      -
    1. If url is special and buffer is the empty - string, return failure. - +

    2. If buffer is the empty string, syntax violation, return failure. + -

    3. Let host be the result of - host parsing - buffer. +

    4. Let host be the result of URL-host parsing + buffer with url is special. -

    5. If host is failure, return failure. +

    6. If host is failure, then return failure.

    7. Set url's host to host, buffer to the empty string, @@ -1588,14 +1651,15 @@ string input, optionally with a base URL base, opti

      then decrease pointer by one, and run these substeps:

        -
      1. If url is special and buffer is the empty - string, return failure. +

      2. If url is special and buffer is the empty string, + syntax violation, return failure. + -

      3. Let host be the result of - host parsing - buffer. +

      4. Let host be the result of URL-host parsing + buffer with url is special. -

      5. If host is failure, return failure. +

      6. If host is failure, then return failure.

      7. Set url's host to host, buffer to the empty string, @@ -2097,7 +2161,7 @@ then runs these steps: in url's path to output.

      8. Otherwise, append "/", followed by the strings in url's - path (including empty strings), separated from each other by + path (including empty strings), if any, separated from each other by "/", to output.

      9. If url's query is non-null, append @@ -2680,11 +2744,11 @@ the setter to always "reset" both.

        1. If context object's url's cannot-be-a-base-URL flag is - set, return the first string in context object's url's path. + set, then return context object's url's path[0].

        2. Return "/", followed by the strings in context object's - url's path (including empty strings), separated from each other by - "/". + url's path (including empty strings), if any, separated from each + other by "/".

        The pathname attribute's setter must