-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
X-Opaque-ID contains UUID causing ES deduplication to fail #120124
Comments
Pinging @elastic/kibana-core (Team:Core) |
we also discussed this on a core-infra sync today. For origin of the slow request/depreacated feature request we would prefer if |
Call-site would be useful, but in some cases it would be nice to also know the object that made the call. Recently we have had a number of support cases where something in Kibana has been submitting a search to Elasticsearch that uses enormous resources, for example SIEM detection rules. In this case it would be nice to not just find out that the call site is "detection rules", but to find out the name of the particular rule that was involved. I think this change would be in keeping with the original intent of this issue. For both deprecation log de-duplication and identification of the source of heavy searches it's useless to have a UUID for each search done by a particular SIEM detection rule. And in both cases it's useful to know not just that the source of the search was a SIEM detection rule, but which one. Maybe simply call-site could be used most of the time, with call-site + object name to be used in a select few cases where we anticipate needing to identify the object. |
I completely understand the need to deduplicate the deprecation logs and the challenges that unique opaque IDs cause, but TBH I'm a little confused on this -- don't we call the ID "opaque" exactly because Elasticsearch shouldn't be making any assumptions about its contents? What makes this challenging for Kibana is that we are doing more with This means that
All this is to say, if we wanted to change the value Kibana passes via |
I think it's worth discussing what we want to do long term. But we need to find a solution for 7.16/7.17 I think we have the following options:
|
Since this format's been documented for a while, I agree that it's a bug. In #71019 Kibana didn't require
In the long term, we are going to migrate from (mis-)used
We can't just remove What I'd propose is:
As a side note: we need to document |
this sounds great to me, because this is exactly what ES would expect on that header at the moment. |
I guess this relates to #101587 right? (Sorry, I don't know much about prior work in this area.) In that case I think what I was asking for is covered, but an important point is that all the sub-teams whose Kibana apps might submit heavy searches to Elasticsearch actually take advantage of the Anyway, I've subscribed to that other issue and I don't think it needs covering any more on this issue. |
Some technical pointers for the implementation The requestId is set to the executionContext in the HTTP server, here: kibana/src/core/server/http/http_server.ts Lines 338 to 343 in 755950a
We'll probably have to modify This ID is also accessible from the kibana/src/core/server/http/router/request.ts Lines 130 to 138 in 338fe1a
This is then used to set the header of the request against ES in two places:
kibana/src/core/server/elasticsearch/client/cluster_client.ts Lines 97 to 99 in 3c8fa52
This is used to set the
kibana/src/core/server/elasticsearch/client/configure_client.ts Lines 41 to 45 in 1a6a9ae
which is retrieving the value from the
kibana/src/core/server/execution_context/execution_context_service.ts Lines 155 to 163 in 4681a80
As the usage from the From a quick grep, I couldn't really find any usage of We will probably need to either make this It also makes me wonder why exactly we decided to set a default value to I would personally make Then, we'll also need to modify @lukeelmers @mshustov you probably have a better understanding of the whole problematic than I do. WDYT? |
Why is it necessary if we aren't going to generate
I suppose it's because an end-user can set any
We can do it since
Not sure we can drop it completely, but we can use something more compact, like |
if kibana/src/core/server/execution_context/execution_context_service.ts Lines 155 to 156 in 4681a80
which makes the ES client use the header value specified by If we modify the execution context service to return a static value when it's disabled, then we wouldn't need to update the logic in |
I tend to agree, it feels like they aren't all that useful, but there could be some context I am missing. @jakelandis & team, what do y'all think? |
This comment has been minimized.
This comment has been minimized.
Apologies... I was mis-understanding how ...back on topic ...
If |
I don't think we considered skipping logging the deprecations originating from elastic entirely last time (when the issue was originally created). I personally would be worried about this idea. How else would we track what deprecated features are still used within the stack (kibana for instance?) We could obviously push that responsibility on the clients to store the warning headers with a deprecations. It sound to me, like this would mean clients (like kibana) would have to effectively implement and maintain their own deprecation log. back on the issue and the discussion summary. However, I don't think we considered pushing back with releasing 7.17 and 8.0. Would that even be possible to get more time and implement option 1. If I understand it right, the only downside of this option is the short deadline for 7.17. I don't think traceparent could affect cloud proxy. We already support traceparent in elasticsearch.
@igore-kupczynski Just to confirm. Do you think there could be any problem with cloud proxy if kibana start sending traceparent to elasticsearch? |
Unfortunately it's not. There are some significant technical implications for option 1 that we discovered yesterday while discussing with @mshustov and @jportner During the investigations of #123197, we discovered a few things that would be problematic if we'd switch to using the At the moment, Kibana is already sending a However, this header is sent by APM, meaning that this is done only if APM is enabled on Kibana. When APM is disabled, there's no Initially the We can't really just change our Note that we also got a divergence between 8.1 and older versions, as #118466, which is doing some work in this area, was not backported to either 8.0 or 7.17 (as it was not covering a bug, but a feature), which would also increase the complexity of doing works in the 7.17 branch (even it's that not blocking, just pointing this out).
This PR is on hold until we agree on a technical solution as it currently breaks audit logging. |
Before we dismiss dropping Kibana deprecation logs completely as an option. Kibana is already relying only on deprecation headers for development/testing. When instrumenting CI to detect deprecations we needed a way to trace deprecations back to the calling site and this was impossible using the logs. Looking for the returned header meant we could find out the exact query and stack trace where the query comes from and log all the context. The only shortcoming is that the deprecation headers don't include a level such as WARNING/CRITICAL. |
@jportner given that the default configuration for the APM agent on Kibana is If this is acceptable, it would allow us to always delegate the
Note that if we go that way, we would kinda be forced to keep this |
What's the risk of not making APM agent configurable at all? Do users have a good reason to disable it? |
Performances. APM has some impact on performances, even if the So no, I don't see any good reason to disable APM totally and not just enable it in
Introducing a breaking change in a minor, and a X.last, version. I feel like we may want to avoid that. |
Worth noting that
AFAIK We've never documented
Users may want to disable the APM agent to reduce runtime overhead, the overhead of sending additional tracing headers. @trentm do you see any other potential problems? |
Agreed, I think this makes sense, we just need to make sure audit logging docs are updated accordingly.
Yes. |
So, to summarize, if we want to go ahead with option 1: ES side
Kibana side
@pgomulka @jakelandis if we go that way, we gonna need the changes on ES to be available for testing to make sure that audit log correlation is still working after our changes before merging/backporting #123197 @mshustov @jportner @rudolf before we confirm this plan as being an option, does it seem correct to you, maybe I missed anything? |
@pgayvallet Kibana's side looks correct. However, this |
Given the minimal impact of the agent enabled in As a sidenote, I see that @pgomulka started working on elastic/elasticsearch#82855, so if we're going with option 2 for 7.17, we would have more time to properly address option 1 in 8.1, which is why I would really like to have @pgomulka and @jakelandis 's opinion on the direction we want to take. |
Seems correct to me, though I'm unclear if audit logs would be missing the
I think that it's a necessity. Otherwise someone could have audit logging enabled (with correlation) in 7.16, then upgrade to 7.17 and lose correlation without any warning. We can't realistically expect our users to read through all the docs / release notes to catch something like this, and we don't have the upgrade assistant to surface this info in 7.16. Another question: are we certain that the APM agent supports |
I get that. To be honest my opinion is that given that APM instrumentation isn't even documented anywhere, such risks are null, but I can't argue that this is technically possible. Now, forcing the agent to be enabled could also be considered a breaking change, because in the same scenario, a user with APM disabled in 7.16 would upgrade to 7.17 to discover that the agent is now enabled. So in that case, I'm unsure we can safely do that in 7.17.
Not atm. This is why we need to backport #112973 |
Now that I understand that users will never see these deprecation warnings, my initial reservations are dampened. The issue really comes down to writing more bytes to an index than are expected and/or necessary. Those bytes will likely never be seen by users and are of minimal size (small messages with much repetition). So the problem we are trying to solve is how to avoid trivial levels of overhead writing these messages and minimal bloat in clusters that remain on 7.last for long periods of time. I think we should solve the problem, but not sure if it is really a blocker to the release. My initial reservations with the proposed fix :
"it allows Kibana to call into deprecated APIs with fewer consequences" - this ship has already sailed and this de-duplication issue should focus on just the de-duplication aspect (sorry if forked the conversation and didn't pick up what was being said) "x-opaque-id is used in an unexpected way" - There is an argument to be made that ES should not have ntroduced caching semantics to what is intended to be an opaque-id. In this case (and others) we tend to use the x-opaque-id as more of a client-id but it really isn't. "it requires ES to behave specially for Kibana" - reservation holds
For me this option (or the like) is back on the table. If we already hide all stack component deprecation for the users, then making it harder to trace the deprecation from stack components is moot point. I would vote for this option... if x-elastic-product-origin use that for caching else use x-opaque-id. However, @rjernst and @pgomulka are the code owners and have the final say for this part of ES. |
I like that. This is a variation of option 1 - where x-opaque-id would not be used for kibana's requests. Given that all requests with elastic-product-origin are hidden in upgrade assistant, I don't mind using At the same time I will merge #82849 emit trace.id into audit logs and update misleading information in audit log doc about no semantic meaning of x-opaque-id (#82868) |
CC @elastic/platform-deployment-management |
I have merged [7.17] Do no use x-opaque-id for deduplicating elastic originating requests backport (#82855) #83023 into 7.17 should we consider this issue done? Or maybe we should keep it open until kibana stop sending UUID on x-opaque-id? I also merged Emit trace.id into audit logs #82849 |
We do still need to address this properly on our side on 8.1 or 8.2. But there is a lot of noise on this issue. I will open a new one summarizing all this, then close this one. In the meantime, @pgomulka, do you know if an issue was opened to track adding |
There was no issue created. It was added a while back to ES 7.15 We only missed adding this to audit logs. |
Yea, sorry, specifically for the audit logs was what I meant |
also I did not create an issue. I considered this more like a follow up elastic/elasticsearch#82849 |
Oh, that's already done and merged, perfect then. |
Closing in favor of #123737 |
ES is originally using
X-Opaque-ID
to trace origin of the request, to help identify slow search request or the use of deprecated functionality.There was an expectation that this value would contain something in a form of user-id. No specification was given on how this id will look like. We documented this in both slow log and deprecation use cases.
The value of the
X-Opaque-ID
is used when de-duplicating deprecated log messages (together with akey
of the deprecated feature). That means if theX-Opaque-ID
is always unique, then there will be no de-duplication of those messages.This will cause deprecation log to grow and since we enabled by default deprecation log indexing, it might use significant resources of the cluster.
The problem:
Kibana is using the UUID in a form of
'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
for theX-Opaque-ID
to help diagnose conceptual origin of the HTTP request. From what I understand, it can also pass the value provided by a user on a request to Kibana?But because the value is mostly unique, it generates a lot of duplicated deprecation logs.
I wonder if we could use
trace-id
for UUID and use something like a UserID for theX-Opaque-Id
The text was updated successfully, but these errors were encountered: