Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement new stable URL semantic conventions #8491

Merged
merged 6 commits into from
Jun 5, 2023

Conversation

mateuszrzeszutek
Copy link
Member

... and use them as part of the HTTP extractors - the HTTP getters now extend the UrlAttributesGetter, and override the methods that HTTP semconv is interested in.

This PR also sets up the OTEL_SEMCONV_STABILITY_OPT_IN env var handling, and adds some tests that cover all 3 possible variants of emitted attribute sets.

@mateuszrzeszutek mateuszrzeszutek requested a review from a team May 15, 2023 12:46
Comment on lines 25 to 28
@Nullable
default String getFullUrl(REQUEST request) {
return null;
}

/**
* Returns the <a href="https://www.rfc-editor.org/rfc/rfc3986#section-3.1">URI scheme</a>
* component identifying the used protocol.
*
* <p>Examples: {@code https}, {@code ftp}, {@code telnet}
*/
@Nullable
default String getUrlScheme(REQUEST request) {
return null;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I slightly prefer consistency with other *Getter classes of not repeating Url in the method names, e.g.

  • getFull()
  • getScheme()
  • etc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my initial idea to.

But, the problem with the HTTP semconv is, it uses other semconv (URL, client, server, network) and slightly overrides how some of the "inherited" attributes are extracted. For example, client.address will be extracted from the Forwarded header if not provided; or server.port will be taken from the Host header, but only if it's not 80/443.

To apply these customizations we either have to pass multiple getters to the extractor (which is what we're doing right now; in the new semconv it would be 5 getters instead of 2 though), or have the HTTP getter extend the "inherited" semconv getters - in which case, something likegetFull() appearing on an HTTP getter would be mildly confusing to the user. Which is why I've chosen to be a bit more verbose here.

* <p>Examples: {@code /search}
*/
@Nullable
default String getUrlPath(REQUEST request) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also have a target? a low cardinality attribute for things like:
`/search/{something}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For HTTP semconv, http.route contains that kind of information.

In general, these classes are meant to reflect the semantic conventions -- if you want to introduce a new attribute, you might want to start there.

@mateuszrzeszutek
Copy link
Member Author

Some notes from the meeting:

  • The server.*, client.* and network.* namespaces will get their respective getters (and extractors); the HTTP getters will extend them;
  • Other instrumentations (e.g. RPC, or the upcoming Elasticsearch instrumentation) could simply use the UrlAttributesExtractor/NetworkAttributesExtractor/etc;
  • We could remove the getFullUrl() method from UrlAttributesGetter and move it entirely to the HttpClientAttributesGetter; this would remove some confusion about attributes that are not really supposed to be emitted by the HTTP client instrumentations but still available in the getter;
  • The UrlAttributesGetter would be left with just the URL components, without the full URL;
  • Every getter method would include the full name of the attribute, e.g. getUrlFull, getHttpRequestMethod etc.

if (path == null && query == null) {
return null;
}
return (path == null ? "" : path) + (query == null || query.isEmpty() ? "" : "?" + query);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would better handle http://xyz/path? (also related open-telemetry/opentelemetry-java#5501)

Suggested change
return (path == null ? "" : path) + (query == null || query.isEmpty() ? "" : "?" + query);
return (path == null ? "" : path) + (query == null || "?" + query);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that some HTTP servers always return a non-null value for query, even if the called URL does not contain the query component.
I'll revert to the old behavior and merge this PR; let's discuss if we want to change this behavior in the old semconv separately (I made the new url.* extractors always add a non-null value for query)

@mateuszrzeszutek mateuszrzeszutek enabled auto-merge (squash) June 5, 2023 08:50
@mateuszrzeszutek mateuszrzeszutek merged commit 8ee63d4 into open-telemetry:main Jun 5, 2023
@mateuszrzeszutek mateuszrzeszutek deleted the url-getter branch June 5, 2023 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants