-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BREAKING: Limit HTTP method values to closed set (for cardinality reason) #3478
Conversation
4d788ff
to
1a2ebfc
Compare
One thing to note is that this will move us away from both the old conventions and ECS: https://www.elastic.co/guide/en/ecs/current/ecs-http.html#field-http-request-method What are your thoughts on instead simply adding a warning that the HTTP method cannot be relied upon to have low cardinality? If we go with this new solution, for stabilization, should wording be added like "treatment of non-well-known HTTP methods is experimental"? |
f9921a9
to
789d5da
Compare
great catch - I did not notice ECS allows any case to be used. I think doing this makes metrics and manual query experience worse. I.e. as a consumer, if I want to group by method name or show metrics for specific method on server side, I depend on my clients calling me with the right method. If I don't reject the wrong methods right away, I'd have to write my queries with this in mind. So, if we document that @AlexanderWert wdyt? |
@lmolkova heads up - most likely this PR will be closed, and we'll ask you to resubmit the PR in a new repo, please refer to #3474 (comment). |
I tried to find the history around allowing all cases for One other difference, here |
How would you normalize this? The HTTP method is case sensitive. $ curl -X Get https://example.org
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>501 - Not Implemented</title>
</head>
<body>
<h1>501 - Not Implemented</h1>
</body>
</html> |
ECS did used to require that Since we already went through the cycle of limiting |
This reminds me that we have the same issue with the HTTP status code: open-telemetry/semantic-conventions#1056 |
Apparently, some web frameworks don't think HTTP methods are case-sensitive. E.g. Flask
or ASP.NET Core
while others (I tried Spring and express) consider them case-sensitive and return errors. I can easily see how ASP.NET Core or Flask instrumentations, knowing that they consider methods to be case-insensitive, normalize I added a note on case sensitivity which should not block such normalization, but that implies that in common case @epixa, @ruflin do you think this would be a reasonable solution? Why I believe common
Essentially, we should not make the default experience worse to address an edge case. |
@reyang thanks for the heads up and no problem! We can continue the discussion here even if PR is closed and then I'll resumbit the result of it to the new repo. |
@lmolkova I agree mostly with your analysis above and especially "we should not make the default experience worse to address an edge case". But I'm not sure if we should introduce an additional field for it as part of the schema. Instead I think we should "strongly recommend" using everything upper case but allow the edge cases. As this is going to be used by different storage systems, we don't know if there is / can be automation in place to make it all upper case and having If we need |
But isn't the edge case here an HTTP client making a request in non-uppercase? This whole issue is about edge cases basically. |
the only place where uppercasing makes sense is the instrumentation which knows it's valid to uppercase. E.g. So we need to capture two pieces of information:
If we capture method with the original casing, a malicious client can explode number of HTTP metrics and break grouping by span names which are also expected to be of low cardinality. So, it seems we can't get away with just one attribute. |
I believe |
I think we approach this from 2 different angles here. My focus is on the logs use case where |
Instead of adding original_method with the unaltered method, I'd suggest adding method_kind with the normalized method and keeping method aligned with ECS and old OTel. Then we only need an adjustment to the span name wording to be generated like method_kind, and probably requirement levels for metrics. This would also align with the proposed http.status_code_class from open-telemetry/semantic-conventions#1056 |
@Oberon00 @ruflin this spec applies to the side that collects telemtery. It explicitly allows custom values, so backends should not reject anything based on the value. Assuming someone calculates metrics from logs, they can do sanitization there and still map Having said this, we don't have semantic conventions for HTTP logs and if we had, |
I get your argument, but I'm leaning towards having the "default" name be the actual method, and the normalized method in a new argument. If method_kind is not a good name, it could be normalized_method, restricted_method, method_bucket, method_category, indexable_method, etc. Main argument for this way being that IMHO it's the least surprising way. |
@Oberon00 I hear your point. My concern here is:
I.e. we reserve a 'good' attribute to represent an invalid method and use a more complicated attribute for most-common case. |
Pardon if I'm misrepresenting anyone here, but I think the conflict is perhaps originating in the different use cases we're advocating for, which might just be an emerging challenge we'll continue to face as we progress with the effort of aligning/merging/adopting ECS. Coming from the SIEM/security side rather than metrics, I don't see the original http method as an edge case... The primary consideration in my mind is accurately reflecting how data is logged and enriching it with additional information where possible that can make it more valuable. Performing lossy operations like normalizing request methods is certainly OK, but it is additional information (i.e. the context of which "canonical" method the request represents), so it should be treated as an additional field. We don't have to look very far for real world evidence of this similar issue causing problems for security use cases on logging data - it caused problems for enough people using ECS that we performed a breaking change to revert the normalization recommendation, and breaking changes are something we've tried hard to avoid. |
I'd make |
I want to double down on what @epixa mentioned above. The goal of ECS was and now I assume of the otel spec is to have a common schema for ALL signals. Are we all aligned on this goal? If not, we should take it to a separate thread to dig deeper into it as otherwise this will keep popping up. Getting all 3 signals together in a single schema will mean tradeoffs as there are conflicting priorities but I'm strongly convinced, finding the right trade offs will benefit the users that want to bring all the signals together and only have to remember one schema for querying data. On my end, I'm getting back to the proposal of having a strong recommendation for the values of the fields but be lenient in accepting other values. Now a tool implementing this could always be stricter if needed to for example prevent a malicious client. @lmolkova I would like to learn more about how this schema directly affects the receiving of events / preventing malicious values. Can you point me to where this is implemented? |
Do you see offering two separate attributes for actual vs normalized method as messing with that goal? It would still be one set of attributes, and any of them could be used for any signal. Yes, it adds some "different ways of doing mostly the same thing", which is also not ideal, but as I see it we have these options:
What I don't understand is why the actual method would only be recommended to be retained on logs. IMHO, metrics are the outlier here were high cardinality is known to cause problems. For traces, I would especially recommend to retain the original method, since that's what you would look at, in addition to logs, to debug outlier requests. |
Excellent summary @Oberon00
No, and to be fair, in ECS there are other places like user_agent.original where we did exactly the same. My general concern is, adding fields is easy, removing / changing fields will be hard and the more fields there are, the harder it gets for devs implementing the fields and consumers of the fields to remember these. I see a potential we will land with the On the options proposed by @Oberon00 above, I would rule out 2 which leaves out 1 and 3. My preference is on 1 but I'm biased here from the perspective of having Elasticsearch as the storage engine as it can handle this scenario well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the discussion we already have, I believe this PR is currently inadequate as it lacks even the possibility of reporting the actual method used.
Neither does it require configurability (IMHO should be a MUST requirement, but that part would not be blocking if there was an alternative attribute)
Sorry, I overlooked the original_method in the diff. In that case, I won't block the PR
@Oberon00 a bit, it would mean creating an additional attribute set in your code that includes the attribute. Not a big deal mostly since the other attributes in the set would likely be shared internally, but it would technically mean you need to add an additional attribute when creating the set for the logs. You are right it is mainly metrics that would be best not to include it, I just mainly see its usefulness to analysis to be in logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this PR to https://github.com/open-telemetry/semantic-conventions
@epixa @ruflin @Oberon00 @AlexanderWert And thanks to @epixa for joining HTTP SIG meeting and clarifying concerns from the ECS side! |
That's a great discussion to have! Yes, I believe the goal is to have a consistent story across all signals. However, it might mean different things. E.g. metrics can only have low-cardinality attributes and traces need to have certain attributes to sufficiently describe the underlying stack (DB, HTTP, etc). We definitely need your expertise to understand how logs semantic conventions might look like for HTTP (and/or security) and other technologies. And in general, to understand what exactly this common schema across signals would mean. We have a couple of OTel spec meetings where we can discuss things like that, please check out https://github.com/open-telemetry/community#specification-sigs:
Would be great to see you, @epixa, @AlexanderWert and anyone else from Elastic there. |
OpenTelemetry does not advise or regulate how to consume the data. Currently, we only document the collection side and here's what we enforce on the semantic conventions. This applies to semantic conventions and indirectly to the instrumentation libraries. Outside of OTel-owned repos there is no mechanism to enforce the compliance. Users are free to collect data in any way or alter auto-collected telemetry with in-process span or log processors, metric views or with collector processors. |
Fixes #3470
Changes
http.method
values to well-known method namesOTHER
for unknown methods to prevent cardinality explosion if a malicious (or buggy) client sends custom and dynamic methods.