diff --git a/docs/url/url.md b/docs/url/url.md index c9f0cbbbaa..68867ddd31 100644 --- a/docs/url/url.md +++ b/docs/url/url.md @@ -30,6 +30,26 @@ This document defines semantic conventions that describe URL and its components. | `url.path` | string | The [URI path](https://www.rfc-editor.org/rfc/rfc3986#section-3.3) component [2] | `/search` | Recommended | | `url.query` | string | The [URI query](https://www.rfc-editor.org/rfc/rfc3986#section-3.4) component [3] | `q=OpenTelemetry` | Recommended | | `url.fragment` | string | The [URI fragment](https://www.rfc-editor.org/rfc/rfc3986#section-3.5) component | `SemConv` | Recommended | +| `url.registered_domain` | string | The highest registered url domain, stripped of the subdomain. +For example, the registered domain for "foo.example.com" is "example.com". +This value can be determined precisely with a list like the public suffix list (`http://publicsuffix.org`). Trying to approximate this by simply taking the last two labels will not work well for TLDs such as "co.uk". | `example.com` | Opt-In | +| `url.subdomain` | string | The subdomain portion of a fully qualified domain name includes all of the names except the host name under the registered_domain. In a partially qualified domain, or if the the qualification level of the full name cannot be determined, subdomain contains all of the names below the registered domain. +For example the subdomain portion of `www.east.mydomain.co.uk` is "east". If the domain has multiple levels of subdomain, such as `sub2.sub1.example.com`, the subdomain field should contain "sub2.sub1", with no trailing period. | `east` | Opt-In | +| `url.top_level_domain` | string | The effective top level domain (eTLD), also known as the domain suffix, is the last part of the domain name. For example, the top level domain for example.com is "com". +This value can be determined precisely with a list like the public suffix list (`http://publicsuffix.org`). Trying to approximate this by simply taking the last label will not work well for effective TLDs such as `co.uk`. | `co.uk` | Opt-In | +| `url.username` | string | Username of the request. | `user42` | Opt-In | +| `url.password` | string | Password of the request. | `changeme` | Opt-In | +| `url.extension` | string | The field contains the file extension from the original request url, excluding the leading dot. +The file extension is only set if it exists, as not every url has a file extension. +The leading period must not be included. For example, the value must be "png", not ".png". +Note that when the file name has multiple extensions (example.tar.gz), only the last one should be captured ("gz", not "tar.gz"). | `png` | Opt-In | +| `url.domain` | string | Domain of the url, such as `www.opentelemetry.io`. +In some cases a URL may refer to an IP and/or port directly, without a domain name. In this case, the IP address would go to the domain field. +If the URL contains a literal IPv6 address enclosed by [ and ] (IETF RFC 2732), the [ and ] characters should also be captured in the domain field. | `www.opentelemetry.io` | Opt-In | +| `url.port` | string | Port of the request | `9090` | Opt-In | +| `url.original` | string | Unmodified original url as seen in the event source. +Note that in network monitoring, the observed URL may be a full URL, whereas in access logs, the URL is often just represented as a path. +This field is meant to represent the URL as it was observed, complete or not. | `https://www.opentelemetry.io/search/?q=container` | Opt-In | **[1]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment is not transmitted over HTTP, but if it is known, it should be included nevertheless. `url.full` MUST NOT contain credentials passed via URL in form of `https://username:password@www.example.com/`. In such case username and password should be redacted and attribute's value should be `https://REDACTED:REDACTED@www.example.com/`. diff --git a/model/url.yaml b/model/url.yaml index 6e839fc394..18fc3ced9f 100644 --- a/model/url.yaml +++ b/model/url.yaml @@ -37,3 +37,94 @@ groups: type: string brief: 'The [URI fragment](https://www.rfc-editor.org/rfc/rfc3986#section-3.5) component' examples: ["SemConv"] + - id: registered_domain + requirement_level: opt_in + type: string + brief: > + The highest registered url domain, stripped of the subdomain. + + For example, the registered domain for "foo.example.com" is "example.com". + + This value can be determined precisely with a list like the public suffix + list (`http://publicsuffix.org`). Trying to approximate this by simply taking + the last two labels will not work well for TLDs such as "co.uk". + examples: [ "example.com" ] + - id: subdomain + requirement_level: opt_in + type: string + brief: > + The subdomain portion of a fully qualified domain name includes all of + the names except the host name under the registered_domain. In a partially + qualified domain, or if the the qualification level of the full name cannot + be determined, subdomain contains all of the names below the registered domain. + + For example the subdomain portion of `www.east.mydomain.co.uk` is "east". + If the domain has multiple levels of subdomain, such as `sub2.sub1.example.com`, + the subdomain field should contain "sub2.sub1", with no trailing period. + examples: [ "east" ] + - id: top_level_domain + requirement_level: opt_in + type: string + brief: > + The effective top level domain (eTLD), also known as the domain suffix, + is the last part of the domain name. For example, the top level domain + for example.com is "com". + + This value can be determined precisely with a list like the public suffix list + (`http://publicsuffix.org`). Trying to approximate this by simply taking the last + label will not work well for effective TLDs such as `co.uk`. + examples: [ "co.uk" ] + - id: username + requirement_level: opt_in + type: string + brief: Username of the request. + examples: [ "user42" ] + - id: password + requirement_level: opt_in + type: string + brief: Password of the request. + examples: [ "changeme" ] + - id: extension + requirement_level: opt_in + type: string + brief: > + The field contains the file extension from the original request url, + excluding the leading dot. + + The file extension is only set if it exists, as not every url has + a file extension. + + The leading period must not be included. For example, the value must + be "png", not ".png". + + Note that when the file name has multiple extensions (example.tar.gz), + only the last one should be captured ("gz", not "tar.gz"). + examples: [ "png" ] + - id: domain + requirement_level: opt_in + type: string + brief: > + Domain of the url, such as `www.opentelemetry.io`. + + In some cases a URL may refer to an IP and/or port directly, + without a domain name. In this case, the IP address would go to the domain field. + + If the URL contains a literal IPv6 address enclosed by [ and ] (IETF RFC 2732), + the [ and ] characters should also be captured in the domain field. + examples: [ "www.opentelemetry.io" ] + - id: port + requirement_level: opt_in + type: string + brief: Port of the request + examples: [ "9090" ] + - id: original + requirement_level: opt_in + type: string + brief: > + Unmodified original url as seen in the event source. + + Note that in network monitoring, the observed URL may be + a full URL, whereas in access logs, the URL is often just represented as a path. + + This field is meant to represent the URL as it was observed, complete or not. + examples: [ "https://www.opentelemetry.io/search/?q=container" ]