Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Speed up fleet setup #96026

Closed
ruflin opened this issue Apr 1, 2021 · 11 comments · Fixed by #102219
Closed

[Fleet] Speed up fleet setup #96026

ruflin opened this issue Apr 1, 2021 · 11 comments · Fixed by #102219
Assignees
Labels
Team:Defend Workflows “EDR Workflows” sub-team of Security Solution Team:Fleet Team label for Observability Data Collection Fleet team technical debt Improvement of the software architecture and operational architecture

Comments

@ruflin
Copy link
Member

ruflin commented Apr 1, 2021

When the user visits Fleet for the first time, Fleet runs setup to install some required packages and many setup. This currently takes (too) long. When setup was initially built, it contained very few steps but packages grew over time and also the required packages. Along the way also things change. We should evaluate if the way setup is built is still the right way to go today and if things can be optimised / changed. Some ideas:

  • Reduce the number of required packages.
    • Example: Is endpoint really required or could be installed on first policy?
  • Not directly install assets but only the required ones to ingest data
  • Have it done by Kibana in the background: Allow Kibana to setup the stack #89827
  • Install assets in parallel (if we don't already do it)

cc @jen-huang

@ruflin ruflin added the Team:Fleet Team label for Observability Data Collection Fleet team label Apr 1, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@ruflin
Copy link
Member Author

ruflin commented Apr 1, 2021

@kevinlog FYI

@kevinlog kevinlog added the Team:Defend Workflows “EDR Workflows” sub-team of Security Solution label Apr 1, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-onboarding-and-lifecycle-mgt (Team:Onboarding and Lifecycle Mgt)

@jen-huang jen-huang added the technical debt Improvement of the software architecture and operational architecture label Apr 28, 2021
@afgomez
Copy link
Contributor

afgomez commented Jun 8, 2021

I did some benchmarking and this function is where most of the time goes.

I removed endpoint from the list of default packages and it has significantly cut the time it spends on the function

# With endpoint
  setupIngestManager: 41.561s
  ↳ ensurePreconfiguredPackagesAndPolicies: 36.003s

# Without endpoint
  setupIngestManager: 24.814s
  ↳ ensurePreconfiguredPackagesAndPolicies: 20.342s

I'm not sure why endpoint is installed at all, since it doesn't seem to be used. Maybe we can indeed remove it.

Another thing we could do is to call setupIngestManager when the plugin initializes. That means the setup will happen when kibana first starts. That will make the first yarn start slower, but maybe that's fine.

@nchaulet
Copy link
Member

nchaulet commented Jun 8, 2021

Another thing we could do is to call setupIngestManager when the plugin initializes. That means the setup will happen when kibana first starts. That will make the first yarn start slower, but maybe that's fine.

It will be great, but currently it's not really doable as we need a lot of permission that the kibana_system user do not have. (Installing a package will install index templates, pipeline, ...)

@jen-huang
Copy link
Contributor

I'm not sure why endpoint is installed at all, since it doesn't seem to be used. Maybe we can indeed remove it.

Following offline discussion with @kevinlog, it seems that we can indeed remove Endpoint from the list of default integrations.

@ruflin
Copy link
Member Author

ruflin commented Jun 9, 2021

@jen-huang Agree, I think we don't need the endpoint package anymore by default. It would be good to document the reason here as we might come back to this.

@afgomez
Copy link
Contributor

afgomez commented Jun 9, 2021

Another solution that we can do is improve the UX of the first load itself, similar to the elevator waiting time problem. Instead of only showing a spinner on first load we can explain to the user what we are doing (the initial setup) and change the spinner with a progress bar, to remove that feeling of the UI being frozen.

@afgomez
Copy link
Contributor

afgomez commented Jun 15, 2021

Alright, a small update regarding this. These are the benchmarks for all different steps for the default packages. I tried my best to represent what happens in parallel and what sequentially

  • function calls that start with = happen in parallel.
  • function calls lthat start with - happen sequentially after the previous.
  • Identation means grouping

Most of the time is spent in the last two packages, installing kibana assets, pipelines and templates. Take into account that the installKibanaAssets is fired in parallel to the other group of functions, so the total time per package is the longest of the two groups.

= installKibanaAssets: fleet_server: 0.858ms
= - installIlmForDataStream fleet_server: 9.424ms     | Group time: < 20ms
  - installPipelines fleet_server: 0.144ms            |
  - installTemplates fleet_server: 4.273ms            |
  - updateCurrentWriteIndices fleet_server: 0.077ms   |
  - installTransform fleet_server: 3.984ms            |
- saveArchiveEntries fleet_server: 1.018s


= installKibanaAssets: elastic_agent: 2.682s
= - installIlmForDataStream elastic_agent: 126.244ms  | Group time: ~4s
  - installPipelines elastic_agent: 2.560s            |
  - installTemplates elastic_agent: 1.348s            |
  - updateCurrentWriteIndices elastic_agent: 4.102ms  |
  - installTransform elastic_agent: 5.234ms           |
- saveArchiveEntries elastic_agent: 724.119ms


= installKibanaAssets: system: 2.685s
= - installIlmForDataStream system: 127.71ms          | Group time: ~8s
  - installPipelines system: 3.234s                   |  
  - installTemplates system: 4.036s                   |
  - updateCurrentWriteIndices system: 5.627ms         |
  - installTransform system: 5.673ms                  |
- saveArchiveEntries system: 629.17ms 

One thing that was proposed was to call installKibanaAssets using the task manager. This will be a positive gain only for the first package, and has the downside of complicating the code.

The second group happens sequentially because there are dependencies between data streams, pipelines, templates and transforms. We could potentially create a dependency graph of which transform depends on which template, who depends on which pipeline, who depends on with ILM, and install those things walking the graph, but I'm not sure how easy it is, or if the dependencies are clearly stated in the package's manifests.

This might be beneficial for packages with lots of pipelines/templates/transforms. Potentially can collapse the timing of the install* group to just the timing of its longest function (~4s down to 2.560s for elastic_agent, ~8s down to 4.036s for system), but realistically I expect the gains to be lower than that.

So, I think for now we can remove endpoint and update the UI to give feedback to the user.

@ruflin
Copy link
Member Author

ruflin commented Jun 15, 2021

Thanks for putting together all this data, this is great!

One thing that caught my attention is the time it takes to install all the pipelines and templates. I'm aware there are a few pipelines and templates to be loaded but the number seems higher than what I would expect. It would be interesting to know what the time of just loading a single pipeline into ES with the direct ES call and how much it is when going through our and Kibana code. Basically my question is, is it ES that is slow or some code in between?

@afgomez
Copy link
Contributor

afgomez commented Jun 15, 2021

@ruflin good questions. I haven't digged into each function, and to be fair all these timings are running against a local ES instance, so that might affect the speed as well.

I will take a look at what the functions are doing nonetheless

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Defend Workflows “EDR Workflows” sub-team of Security Solution Team:Fleet Team label for Observability Data Collection Fleet team technical debt Improvement of the software architecture and operational architecture
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants