-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize expressions over HTTP #29118
Comments
Pinging @elastic/kibana-app |
@chrisdavies I'd love to see the poc you're working on. As for feedback, under |
@w33ble let's sync. I'd love to show you and kick some ideas around. I agree with the bit about "letting the server run as much as it can and reporting back where it left off with the resulting output". Otherwise, something like |
Pinging @elastic/kibana-canvas |
related: #29653 |
The game plan, after chatting with @epixa and @ppisljar, @stacey-gammon and others is:
We may return to a contextual server / client exchange as a future optimization step, if we see the need, but for simplicity's sake, we want to avoid that right now. cc @elastic/kibana-canvas |
That's seems fine for the Elasticsearch use case for sure, since we control that. The way you worded that sounds like what we were trying to get away from in Kibana (talking directly to ES), but perhaps that's not what you mean.
Given this item, I'm not really concerned about the other point. It sounds like server functoins will still exist, we'll just try to use them less ourselves, which it fine; we don't have a lot of server-only functions right now anyway.
This part I'm not sure I understand correctly. Are you talking about the architecture specifically? It sounds like you're still going to send things off the server to allow execution there already. Can you clarify what you meant here? |
I agree that the "use elasticsearch API from the browser" statement needs some clarification... are we abandoning batching up requests to ES for efficiency? Is that going to create a lot of traffic to ES that we only have one thread or so to manage? |
Sorry. The plan was to move the server functions to the browser, and have them use elasticsearch-browser to call Elastic through the Kibana API. That plan has been canned. We're instead going to create an API endpoint for each of the current (I think 5) server functions, and we're going to create corresponding browser functions which call those endpoints. That way, the interpreter can run entirely in the browser, and dropping socket.io is a bit simpler once that's done. |
That makes sense. Just to be clear, you're not dropping server functions entirely then, just opting to coordinate with an API for our specific use cases, correct? Or are you saying that if a user wants to, for example, write a function that pulls data from some external API, they're going to need to write a Kibana endpoint and route through that? |
Sounds like there's a plan for tackling the use of |
@chrisdavies why was the plan to move esdocs/escount to browser canned ? (the idea about moving them to browser functions and using elasticsearch-browser to call elastic through the kibana proxy) |
@ppisljar, @tylersmalley and I chatted, and we're trying to deprecate use of elasticsearch-browser. Some of the server functions can be moved to the client, though, such as pointseries. I think we should do that kind of thing as part of a separate PR, though, as it's not really related to removing WebSockets at this point. |
We've merged HTTP batching into master. That should get us close enough for a GA, I think. We still have optimizations that we could apply:
These are both a bit more complex than the current solution, so I'd recommend not optimizing further until we know we need to. I'm happy to sync with anyone about how to stress / perf test this and determine if we need further optimizations. |
Hey all, I am following this topic about WebSocket (including PR and issues like #29792 and #29653 (comment)) because our Code plugin for Kibana also relies on the socket.io. The plugin now lies in a separate branch of the Kibana repo but is going to merge into master soon. As I am reading through the discussions here, I realize that intentions behind the adoptions of socket.io between the Canvas and Code plugins are slightly different. Correct me if I were wrong, the main reason for Canvas using socket.io is to maintain a stateful connection to interpret a series of function calls/expressions of its DSL. The Code plugin, instead, uses the WebSocket as a channel to notify the client/browser the current progress of the repositories (e.g. clone, index, delete), to deliver a real-time responsiveness user experiences. Previously, we mostly follow the footsteps of the canvas app in terms of the usage of socket.io, without fully understanding the context behind it. Now since Canvas removes the dependency on socket.io, we are thinking about removing it as well to keep the behavior s align across the entire Kibana platform. Before doing the actual work for the Code plugin, just want to confirm with you all regarding the proposal, in case anything is missing since you guys have more knowledge on this. To remove the dependency of WebSocket in Code plugin, we are going to use client-side polling to get the progress data from REST APIs (say every 5 seconds). This is the most straight-forward way to achieve this. We don’t need any sort of sticky session in a multiple nodes Kibana setup, because each request is independent to each other. Another option is the Server-Sent Event. However, it’s not available to IE or some older versions of Chrome/Firefox, because it’s a part of HTML5. Does Kibana have any existing usages of SSE so far? If so, we can consider this path as well. Let me know if you have any questions. |
Hi Mengwei,
Is your progress / context stored in Elasticsearch? I'm assuming so, but if not, polling won't be reliable in HA Kibana clusters, nor would the socket.io fallback (which is essentially polling). I don't know what our browser support story is regarding Server-Sent events, but someone from @elastic/kibana-operations could probably speak better to that. Anyway, the impetus for this issue was that we were seeing degraded performance in some of our production environments. Most of that may have been due to sub-optimal Canvas plugin builds. But some of it seems to have been due to the socket.io fallback. WebSockets + fallback made performance issues more difficult to reason about, and in the case of the interpreter, weren't strictly necessary. In addition, the interpreter was going to be used across Kibana, meaning that all of Kibana would suddenly have a socket.io dependency, and we started to see regular, and unnecessary polling requests happening on all screens. If your changes are going to only live in Code, I think the impact to Kibana as a whole is not a concern. So, the question becomes: do our production Kibana deployments have sufficient support to warrant the feature? Does it behave predictably in HA Kibana clusters? Is the support risk worth the reward (e.g. websockets behave very differently in different network configurations, and diagnosing what's going wrong can be tricky)? |
Yes, it's persisted in ES as well.
Yes, these are all valid concerns. And given these concerns, I think the simple polling would be a safe path for now. |
@mw-ding For what it's worth, this is how the Reporting interface works as well, it just keeps asking Elasticsearch for a list of jobs and their status. The experience feels "real-timey", even though it's not, and that ~5 second delay isn't really noticeable in practice. As long as the UI updates automatically, users don't really care about a small delay in that information, and it's much easier to deal with than the socket connection for all the reasons that have been pointed out. If you just need to update a UI to show the state of things, polling is probably going to lead to the least amount of headaches. SSR, as far as I know, isn't something anyone has really looked into. I think it presents a lot of the same challenges that websockets do, mostly with regards to proxies and networking in general, but also session and security things, as you've pointed out. All of that said, it might be nice to have an abstraction in Kibana that makes polling easier to set up, but to my knowledge nothing like that has been proposed. Now that more apps do this, something like that could be worth raising with the Operations team. |
It's good to learn that there are some precedents over there. Thanks for pointing this out.
Completely agreed. So sounds like 5-second polling is the best bet for now. Later if there is a Kibana level abstraction, we can move to that. Can you maybe share the link of the code in Reporting if possible? Just want to align with its implementation as much as possible for the sake of potential future abstraction. |
@mw-ding here's the code... interestingly enough, it looks like someone already made a polling abstraction, which you can just use if you're working in the X-Pack code (and I believe you are). See the Poller class. |
I think @joshdover looked into using esqueue for polling too, for the migration assistant work, and ended up doing something else, so that might be another example to use. |
Correct, we ended up creating our own mechanisms. We have a server-side poll for polling the state of Elasticsearch Tasks (probably not relevant to codesearch) and a client-side poll for updating the UI based on state stored in a saved object in the For the client-side polling, we used a RxJS BehaviorSubject as the public abstraction for UI that is consuming the data. This makes switching out the polling mechanism under the hood very trivial. |
+1 to using client-side polling for code. Some things to keep in mind that have bitten us on previous polling implementations:
|
@stacey-gammon should we close this issue, since the main reason it was created is now resolved. If we need further optimization, we probably should have a specific issue for that. @mw-ding maybe you should open an issue to track replacing websockets with polling in code? |
Thanks for the rules of thumbs. @epixa Very useful. @chrisdavies Thanks, I think we have got what we want. |
I do hope you consider ECE and it's proxy-server when designing the new Browser to ES interactions. |
Kibana is expanding its use of the Canvas expression DSL. The DSL's interpreter currently requires WebSockets in order to work properly. WebSockets are not well supported in several critical production Kibana installations, and possibly won't be for a long time.
We need to update the interpreter to work efficiently over HTTP/1. I've put a PoC together for how this might work. It involves batching requests and processing the response as a stream so that a long-running request doesn't block others in the batch.
Some possible directions are:
Making the interpreter 1-way instead of 2-way
The interpreter can be updated to behave as follows:
Evaluating the AST
As an example of the 1-way AST evaluation flow, imagine the browser has an AST that looks like this:
(servera (clientb (serverc (clientd)))
.It would do the following:
clientd
function locally (in the browser)POST /api/canvasfn/serverc {result of clientd}
clientb
function locally (in the browser) with the result of theserverc
callPOST /api/canvasfn/servera {result of clientb}
That's all pseudo-implementation details, but the flow is workable.
Chattiness
The chattiness of this implementation can be somewhat mitigated with batching / buffering of requests, and by handling the response as a stream client-side. That plus keepalives will hopefully enable reasonable performance without WebSockets.
It's still possible that we fill the browser's connection pool and end up with long-running requests causing unwanted delays in the UI. If we see this, I have some thoughts about how we can mitigate it, but we'll cross that bridge only if necessary.
This is a follow up on an offline conversation with @alexbrasetvik, @jimgoodwin, @stacey-gammon, @timroes
The text was updated successfully, but these errors were encountered: